Overlapping Domain Decomposition Non-overlapping Domain Decomposition Schur Complements
6.2. Non-overlapping Domain Decomposition
Discretization of the original problem with numbering of the unknownsrelative to the partitioning given by Ω1 and Ω2 leads to a linear systemwith a matrix in dissection form.
Parallel Numerics, WT 2013/2014 6 Domain Decomposition
page 8 of 18
Overlapping Domain Decomposition Non-overlapping Domain Decomposition Schur Complements
Non-overlapping DD (cont. 2)Poisson equation on domain Ω
−∆u = f in Ω,
u = 0 on ∂Ω
is equivalent to
−∆u1 = f in Ω1,
u1 = 0 on ∂Ω1\Γ,
u1 = u2 on Γ,
∂u1
∂n1= −∂u2
∂n2on Γ,
−∆u2 = f in Ω2,
u2 = 0 on ∂Ω2\Γ.Parallel Numerics, WT 2013/2014 6 Domain Decomposition
page 9 of 18
Overlapping Domain Decomposition Non-overlapping Domain Decomposition Schur Complements
Non-overlapping DD (cont. 3)
In matrix-vector notation Au = f can be written as
A =
A(1)I,I 0 A(1)
I,Γ
0 A(2)I,I A(2)
I,Γ
A(1)Γ,I A(2)
Γ,I AΓ,Γ
,u =
u(1)I
u(2)IuΓ
, f =
f (1)I
f (2)IfΓ
, (1)
where the degrees of freedom are partitioned into those internal toΩ1, and to Ω2, and those of the interior of Γ.
On next slide we formulate problem (1) in a more general notation.
Parallel Numerics, WT 2013/2014 6 Domain Decomposition
page 10 of 18
Overlapping Domain Decomposition Non-overlapping Domain Decomposition Schur Complements
Non-overlapping DD (cont. 4)
• A3 is the so called interface matrix:
f = Au =
A1 0 F10 A2 F2
G1 G2 A3
u1u2u3
=
f1f2f3
Better reduce the original problem to two partial subproblemsand one interface Schur complement system.
• We can solve Au = f iteratively with PCG and preconditioner:A−11 0 00 A−1
2 00 0 M
Here, for M we can use the identity or an approximate inverse forthe Schur complement.
Parallel Numerics, WT 2013/2014 6 Domain Decomposition
page 11 of 18
Overlapping Domain Decomposition Non-overlapping Domain Decomposition Schur Complements
Non-overlapping DD (cont. 5)
Leads to 16 block matrices on the diago-nal A1, . . . ,A16 and Schur complement S.
(S = A17 −G1A−1
1 F1 − . . .−G16A−116 F16
)
• Solve small problems e.g. with multigrid in parallel.
• Overlapping is easy to parallelize, but slow convergence
• Non-overlapping is harder to parallelize, more influence onconvergence in S.
Parallel Numerics, WT 2013/2014 6 Domain Decomposition
page 12 of 18
Overlapping Domain Decomposition Non-overlapping Domain Decomposition Schur Complements
6.3. Schur ComplementsWrite matrix A from (1) in block factorized form A = LR
L =
I 0 00 I 0
A(1)Γ,I A
(1)−1
I,I A(2)Γ,I A
(2)−1
I,I I
, R =
A(1)I,I 0 A(1)
I,Γ
0 A(2)I,I A(2)
I,Γ0 0 S
leading to the resulting linear systemA(1)
I,I 0 A(1)I,Γ
0 A(2)I,I A(2)
I,Γ0 0 S
u =
f (1)I
f (2)IbΓ
,
whereS = AΓ,Γ − A(1)
Γ,I A(1)−1
I,I A(1)I,Γ − A(2)
Γ,I A(2)−1
I,I A(2)I,Γ
being the Schur complement relative to the unknowns Γ.
Parallel Numerics, WT 2013/2014 6 Domain Decomposition
page 13 of 18
Overlapping Domain Decomposition Non-overlapping Domain Decomposition Schur Complements
Schur Complements (cont.)We can transform Au = f into LRu = f , resp. Ru = L−1f = b with
L−1 =
I 0 00 I 0
A(1)Γ,I A
(1)I,I
−1A(2)
Γ,I A(2)I,I
−1I
−1
=
I 0 00 I 0
−A(1)Γ,I A
(1)I,I
−1−A(2)
Γ,I A(2)I,I
−1I
A(1)
I,I 0 A(1)I,Γ
0 A(2)I,I A(2)
I,Γ0 0 S
u =
f (1)I
f (2)IbΓ
bΓ = fΓ − A(1)
Γ,I A(1)I,I
−1f (1)I − A(2)
Γ,I A(2)I,I
−1f (2)I
Once uΓ is found, the internal components can be found by
u(i)I = A(i)
I,I
−1(f (i)
I − A(i)I,ΓuΓ).
Parallel Numerics, WT 2013/2014 6 Domain Decomposition
page 14 of 18
Overlapping Domain Decomposition Non-overlapping Domain Decomposition Schur Complements
Direct Derivation of the Schur ComplementA1 0 F1
0 A2 F2G1 G2 A3
x1x2x3
=
b1b2b3
⇒ A1x1 + F1x3 = b1
A2x2 + F2x3 = b2
G1x1 + G2x2 + A3x3 = b3
⇒ x1 = A−11 b1 − A−1
1 F1x3 and
x2 = A−12 b2 − A−1
2 F2x3
⇒ (G1A−11 b1 −G1A−1
1 F1x3) + (G2A−12 b2 −G2A−1
2 F2x3) + A3x3 = b3
⇒ (A3 −G1A−11 F1 −G2A−1
2 F2)x3 = b3 −G1A−11 b1 −G2A−1
2 b2
⇒ Sx3 = b3
Parallel Numerics, WT 2013/2014 6 Domain Decomposition
page 15 of 18
Overlapping Domain Decomposition Non-overlapping Domain Decomposition Schur Complements
(Parallel) Algorithm to solve Ax = b based onSchur Complement
1. Compute S by using inv(A1) and inv(A2)
2. Solve Sx3 = b3
3. Compute x1 and x2 by using inv(A1) and inv(A2)
• The explicit computation of S can be avoided by solving thelinear system in S iteratively, e.g. Jacobi, PCG, . . ..
• Then we need only a part of S and in every iteration step wehave to compute S ∗ intermediate vector.
• To achieve fast convergence, a preconditioner (approximation)for S has to be used!
• Precondition Schur complement e.g. with MSPAI
Parallel Numerics, WT 2013/2014 6 Domain Decomposition
page 16 of 18
Overlapping Domain Decomposition Non-overlapping Domain Decomposition Schur Complements
Recursive Form of Non-overlapping DD
↔ Nested (recursive) dissection:
Parallel Numerics, WT 2013/2014 6 Domain Decomposition
page 17 of 18
Overlapping Domain Decomposition Non-overlapping Domain Decomposition Schur Complements
Domain Decomposition: Outlook
• Approach can be generalized to– non-conforming discretizations (mortar methods/Lagrange
multipliers/FETI)– time-dependent systems– ...
• Literature:– A. Toselli, O. Widlund: Domain Decomposition
Methods—Algorithms and Theory, Springer, 2004– A. Quarteroni, A. Valli: Domain Decomposition Methods for
Partial Differential Equations, Oxford Science Publications,1999
Parallel Numerics, WT 2013/2014 6 Domain Decomposition
page 18 of 18
9
Starting point: Solve Partial Differential Equation, e.g.
),( yxfuu yyxx =−−
with boundary conditions, e.g. Dirichlet BC.
A is sparse, (structured,) and ill-conditioned. Looking for O(n) solver. Direct solver O(n2) or O(n log(n)) PCG also > O(n) because matrix is ill-conditioned
Multigrid
Discretization leads to system of linear equations, resp. matrix A.
Idea
10
(1) Project the fine discretization on a coarser (smaller) problem (2) Solve the coarse matrix (smaller matrix) (3) Project the coarse solution back on fine grid
Problems: - only possible for smooth vector without high oscillatory components - backprojection introduces (high-oscillatory) errors that have to be removed, too.
Vector of unknowns:
Observation:
11
For typical PDE matrices high-oscillatory vectors are related to the subspace to large eigenvalues and are removed e.g. by the stationary Gauss-Seidel iteration: || I – M-1A || small for eigenvectors to large eigenvalues!
To solve Ax=f: (1) Apply a few steps Gauss-Seidel smoothing steps xa Residual equation A(xa+x)=f Ax = f - Axa = r (2) Project (restrict) r, resp. A on coarse grid by mean value Solve Ac xc =rc (3) Project (prolongate) xc back on fine grid x by interpolation xc x new approximate solution xa + x (4) Improve approximate solution by Gauss-Seidel steps Repeat until convergence
Apply also recursively for solving coarse equations
V-Cycle
12
Fine grid, smoothing
restriction
restriction
restriction
restriction
prolongation
prolongation
prolongation
prolongation
Direct solution of coarse residual equation
smoothing
smoothing
smoothing
smoothing
smoothing
smoothing
Fine grid, smoothing
Repeat until convergence
Parallel Aspects
13
Ω1
Ω2
Ω3
Ω4
Each processor pr has data for domain Ωr and does projections and smoothing for its components. Needs data from neighboring processor. Load balancing!
Ghost layers
14
Jacobi smoothing: (1) Each process performs Jacobi iteration (independently) (2) Send messages to update ghost layers (3) Use communication/computation overlap for interior computations and exchange of boundary data
Restriction/Prolongation
15
Problems with work load: On coarse grids less computations, more communication!
Efficiency goes like 1/log(p) with the number of processors.
Modifications
16
Agglomeration: Coarse grid partitions are no longer aligned with the finer grid partitions (in order to avoid inefficiency on coarsest grids) More communication in grid transfer, less in coarse grid solve.
Larger ghost layers can reduce the communication
Reduce number of V-cycle steps by using more powerful projections and smoothers less data transfer
Additive Multigrid
17
Consider the different levels at the same time, Compute the related corrections in parallel and sum up the corrections.
Hybrid conjugate gradient iteration with MG as preconditioner Less V-cycles
∑=
− −+=L
lll AxfPAxx
0
1 )(: α
Lxxxx +++= ...10
1
8. Computing Eigenvalues in Parallel
0, ≠= xxAx λx eigenvector with eigenvalue λ, iff:
A spd, there exists an orthogonal basis of eigenvectors Aui = λiui , i=1,2,…,n
),...,(,),...,( 1 nin
T
diaguuUUAUorUUA
λλ=Λ=Λ=Λ=
In general U may be complex unitary and Λ an upper triangular complex matrix (Schur decomposition)
Allowed operations that do not change the eigenpair: Q ∙ A ∙ QH with unitary Q
8.1 Introduction
The eigenvalues of A are the zeros of the characteristic polynomial: Pn(A) = 0 Pn(λi)=0 for all eigenvalues λi, i=1,2,…,n
λmin(A) <= r(A) = xHAx / xHx =< λmax : Rayleigh quotient
2
8.2 Parallel Methods for computing all eigenvalues:
Jacobi Method 2
1 ,11
222 :)( ij
n
i
n
ijj
n
iiiF
aaAAoff ∑ ∑∑= ≠==
=−=
Describes the magnitude of the nondiagonal part of A (should 0)
We use Givens rotations to eliminate apq for an index pair p,q
−
=
01
1
1
10
:),,(
cs
sc
qpJ θ
p q
p q
c = cos(θ), s = sin(θ)
A = AT
3
Jacobi Method
−
−
=
=
cssc
aaaa
cssc
bbbb
bb
qqqp
pqppT
qqqp
pqpp
pp !
00
)sin(),cos(,0)()(!
22 θθθ⇒=−+−== csaascabb qqpppqqppq
Effect of application of J on A on the nondiagonal entries off(A). Consider p,q – part of JT A J = B:
22222 2 qqpppqqqpp bbaaa +=++
J orthogonal Frobeniusnorm of A and B are the same:
)(2)(2)2(
)2()(
22222222222
22222222
AoffaAoffaaAaaaaA
aaabAbBBoff
pqpqiiFpqqqppqip
iiF
pqqqppqip
iiFiiF
<−=−−==++−−=
=++−−=−=
∑∑
∑∑
≠≠
≠≠
4
Elimination Sequence Choose p and q such that apq is very large (maximum). Then by JT A J the size of the off-diagonal entries is reduced by 2 (apq)2 . Repeat this transformation for next choice of p and q: A diagonal.
Different strategies for choosing a sequence of p,q: Maximum apq optimal, but sequential and expensive! Cyclic by row: First use a11 to eleminate first row: (p,q)=(1,2),(1,3),…,(1,n) Then a22 for second row: (p,q)=(2,3),….,(2,n) a33 , …, an-1,n-1 Repeat Again sequential!
5
Jacobi Method in Parallel
Choose sequence (p,q) such that it allows strong parallelism: First sweep: (p,q) = (1,2), (3,4), (5,6), (7,8) (in parallel) Second sweep: (p,q) = (1,4), (2,6), (3,8), (5,7) Third (1,6), (4,8), (2,7), (3,5) Fourth (1,8), (6,7), (4,5), (2,3) .. .. .. .. .. .. .. ..
1 3 2 4
7 8 6
5 1 2 4 6
5 7 8
3
Find sequence of partitionings of (1,…n) in pairs, such that all indices appear with the same frequency.
1 4 6 8
3 5 7
2 .. .. .. ..
n-1 different positions define (n-1)n/2 deleted entries = subdiagonal
6
Parallel Transformation
++++++++++++++++
~~~~~~~~~~~~~~~~
################****************
JT A :
++++++++++++++++
~~##**~~##**~~##**~~##**~~##**~~##**~~##**~~##**
(JT A) J:
Multiplications with JT , resp. J can be done in parallel.
7
1 3 2 4
1 2 4 3
1 4 3 2
++
++
******
******
*0**0******0**0*
++
++
*********
***
++
++
******
******
***0**0**0**0***
**0****00****0**
Twelve zeros after three sweeps twelve nondiagonal entries
Repeat until convergence to diagonal matrix.
8
8.3 Divide & Conquer for tridiagonal A
A divide and conquer approach for computing eigenvalues of a symmetric tridiagonal matrix T.
=
−
−
nn
n
abb
abba
T
1
1
21
11
Idea: Split T in two tridiagonal matrices T1 and T2. Compute eigenvalues of T1 and T2. Recover the original eigenvalues of T as perturbations. Repeat recursively.
9
Splitting of T ( )Tv 00100: θ=
TvvTT ρ−=:~Set
Aim: Generate zeros at the sub/superdiagonal entries in the middle of T
=
−−−−
=
=
−
=++
+
+
*00*
1)1:,1:(~
!
21
21
ρθρθρθρ
θθθ
ρ
mm
mm
mm
mm
abba
abba
mmmmT
mb=ρθ TvvT
TT ρ+
=
2
1
00
Rank-1 perturbation of T
10
Relation between T and T1, T2
TTT UUTUUTUUT Λ=⇒Λ=Λ=?
22221111 ,,
Assume, that we know the eigenvalues and eigenvectors of T1 and T2 . How can we get the eigenpairs of T?
Note, that T is a rank-1 perturbation of diag( T1 , T2 ). Recover the original eigenvalues as perturbations of eigenvalues of T1 and T2 .
TT
TT
T
TT vv
UU
UU
vvUU
UUvv
TT
T ρρρ +
Λ
Λ
=+
ΛΛ
=+
=
2
1
2
1
2
1
222
111
2
1
00
00
00
00
00
TT
T
vvU
UT
UU ~~
00
00
00
2
1
2
1
2
1 ρ+
Λ
Λ=
11
Computing the eigenvector
TvvD ~~ρ+Hence, we need to compute the eigenvalues of a matrix of the form “diagonal + rank-1”:
Let λi and ui be an eigenpair of D + ρvvT . Then it holds
( ) ( ) ( )
( ) vIDconstu
vuvuIDuuvvD
ii
iT
iiiiiT
~
0~~~~
1−−⋅=
=+−⇔=+
λ
ρλλρ
Hence, if we know λi, then we directly get the eigenvector ui .
12
Eigenvalues as Zeros
( ) ( ) ( )[ ]
( )[ ] [ ]
( )[ ] 0~~
1~~1)(
0~~~~
0~~~
2
1
211
1
1
=
−
++−
+=−+=
=⋅−+
=+−⋅−
−
−
−
λλρλρλ
λρ
ρλλ
n
nT
iT
iT
iT
iT
iiiT
dv
dvvIDvf
uvvIDvuv
vuvuIDIDv
Furthermore, we get the equation
Use Newton’s method, to determine the zeroes of function f(λ) These zeroes are the eigenvalues of and therefore also of T. Repeat recursively for T1 and T2 .
TvvD ~~ρ+
13
Zeros and poles of f(λ):
d1 d2 d3 d4 d5 . . . . . λ1 λ2 λ3 λ4
14
Vector iteration: eigenvector to eigenvalue with maximum absolute value Easy to parallelize (only Ax), but slow convergence! Only λmax!
Inverse iteration: Apply vector iteration on shifted problem (A – σ I)-1 for computing the eigenvector nearest to σ. Expensive! Ill-conditioned linear system!
8.4 Algorithms for computing a few eigenpairs:
vxA
xAx k
kk →= )0(
)0()(
Subspace Iteration: Apply the same idea to set of vectors U(0)=(x(0),…,x(m)) Consider eigenvalues of U(k)HAU(k) and then replace U(k) by AU(k)
Top Related