We continue our informal study of domain decomposition methods (DDMs).

We will describe non-overlaping DDMs in terms of differential operators as opposed to basic linear algebra operations. This will later allow us to easily describe more sophisticated parallel algorithms such as the Dirichlet-Neuman and Neuman-Neuman algorithms.

In order to go one step further in that study it is necessary to introduce some PDE related theory and lingo. However I will try to keep it light and will just introduce what is necessary to build an intuition.

2D Poisson approximation as the Frankenstein monster

Last time we crafted an approximation, using the finite element method, of a function \(\varphi\) solution of the boundary problem

Refer to the first post regarding the choice of \(\Omega\) and the finite element space.

After delegating the assembly of the stiffness matrix and the load vector to the FreeFem++ software, we implemented the Schur complement method as a combination of unknown reordering and block Gaussian elimination applied to a global linear system.

In this part, I will describe how we can take into account the domain decompostion from the very begining, that is, in the formulation of the boundary problem itself.

Creating a monster

Let \(\Omega_{1}\) and \(\Omega_{2}\) the two non-overlaping subdomains introduced in the previous post and let \(\Gamma := \partial\Omega_{1} \cap \partial\Omega_{2}\), the boundary between \(\Omega_{1}\) and \(\Omega_{2}\). Let \(\varphi_{1}\) (respectively \(\varphi_{2}\)) a (local) solution of \(\eqref{eq:globalPDE}\) restricted to \(\Omega_{1}\) (respectively \(\Omega_{2}\)):

Question

Is solving \(\eqref{eq:subdomainsPDE}\) solving \(\eqref{eq:globalPDE}\) ?

The answer is no. Here are a couple of examples to give you an intuition of what could go wrong:

Conclusion

Clearly, we need to pay more attention to what’s going on around \(\Gamma\).

Trace, flux and transmission conditions

So for now, our stuff looks like this

Frankenstein

We need to add extra conditions to \(\eqref{eq:subdomainsPDE}\) in order to make the global solution more regular and eventualy satisfy \(\eqref{eq:globalPDE}\).

Trace

At the very least, the traces of the local solutions \(\varphi_{1}\) and \(\varphi_{2}\) on \(\Gamma\) should be equal in order to make the global solution continous:

Flux

Then we must ensure that the global solution is regular enough around the interface, that is, differentiable on \(\Gamma\). In order to do so we will consider the normal derivatives along \(\Gamma\) of the local solutions \(\varphi_{1}\) and \(\varphi_{2}\), also know as flux, and balance them as illustrated in the following sketch:

Flux transmission

This can be concisely written as:

Just remember that the directions of \(n_{1}\) and \(n_{2}\) depend on the point of \(\Gamma\) under consideration.

\(\eqref{eq:transmissionOfTrace}\) and \(\eqref{eq:transmissionOfFlux}\) are known as the transmission conditions.

Putting everything together we end up with a new boundary problem:

Question

Is solving \(\eqref{eq:subdomainsPDEWithTransmition}\) solving \(\eqref{eq:globalPDE}\) ?

This time, under certain assumptions regarding \(\Gamma\), the answer is yes but we’re not going to talk about that here.

Conclusion

We crafted a problem taking into account a 2 domains decomposition by introducing the transmission conditions: one condition related to the trace on \(\Gamma\), another condition related to the flux going through \(\Gamma\). Now, we’re left with a choice: either solve first for the trace or for the flux. But, before looking into that I would like to take the time to describe the main steps to translate this problem into linear algebra.

FEM for dummies

In this part I will try to explain as concisely as possible how to apply the FEM to \(\eqref{eq:globalPDE}\). But please keep in mind that it is oversimplified.

Weak formulation and discretization

So far, we never discussed which functional space the solution of \(\eqref{eq:globalPDE}\) should be chosen from. So let \(H\) a finite-dimensional vector space of functions defined on \(\Omega\) vanishing on \(\partial\Omega\). Let \(N:=dim(H)\) and \((e_{i})_{1 \leq i \leq N}\) a basis of \(H\). I don’t want to discuss the gory functional analysis details here so just consider that I chose \(H\) so that everything I’m about to write works just fine.

We will search \(\varphi\) in \(H\). In order to do so, we are going to design a “test” that \(\varphi\) must pass in order to be a solution of \(\eqref{eq:globalPDE}\). In order to design that test we choose a function \(\psi\) in \(H\). \(\psi\) is called a test function and we say that \(\varphi\) is tested against \(\psi\). These are the key steps:

Weak formulation

  1. Start with what must be satisfied:
  2. Multiply by a test function \(\psi\):
  3. Sum over the domain of definition:

  4. Apply the Green-Ostrogradsky theorem (which is a generalisation of the integration by parts we were taught back in highschool) to the sinistral side of the equation. Since \(\psi\) vanishes along \(\partial\Omega\), the line integral over \(\partial\Omega\) vanishes too (we take into account the boundary conditions):

  5. Let \(a: (\psi,\varphi) \mapsto \int_\Omega \nabla\psi\nabla\varphi dxdy\) and \(b: \psi \mapsto -\int_\Omega \psi dxdy\). It is important to remark that \(a\) (respectively \(b\)) is a bilinear (respectively linear) form. We can concisely re-write the previous equation:

Discretization

  1. Since we chose \(\psi\) in a finite-dimensional vector space, and since \(a\) and \(b\) are linear with respect to \(\psi\). Instead of checking that the previous equation holds for any \(\psi\), we just need to check that it holds for each basis vector \((e_{i})_{1 \leq i \leq N}\):

  2. Again, since we chose \(\varphi\) in a finite-dimensional vector space, we can uniquely decompose \(\varphi\) as a linear combination with respect to the basis \((e_{i})_{1 \leq i \leq N}\): \(\varphi\ = \sum\limits_{i=1}^{N} \varphi_{i} e_{i}\). Since \(a\) is linear with respect to \(\varphi\), it comes:

  3. Let \(A = (a(e_{i}, e_{j}))_{1 \leq i,j \leq N}\), \(x = (\varphi_{i})_{1 \leq i \leq N}\) and \(b = (b(e_{j}))_{1 \leq j \leq N}\). We can rewrite the previous equation as a product between a matrix and a vector :

So here we are, we started looking for \(\varphi \in H\) and we end up looking for \(x \in \mathbf{R}^{N}\). All we have to do now is to compute \(A\) and \(b\), and then solve for \(x\) (note that we could just compute the effect of multiplying a vector by \(A\) if we were to use an iterative solver as explained in the previous post).

Computation of the discrete operators

The next step is to compute

and but in order to do so, we must describe what are \(H\) and \((e_{i})_{1 \leq i \leq N}\).

So let \(H\) be the space of continuous piecewise linear functions over the triangulation of \(\Omega\) (introduced in the previous post) and vanishing on \(\partial \Omega\). The dimension of \(H\), \(N\), is equal to the number of vertices in the interior of \(\Omega\).

Introducing the tent functions

So what is a basis of \(H\) ? Let \((q_{i})_{1 \leq i \leq N}\) the vertices of the triangulation. Then, let’s choose \((e_{i})_{1 \leq i \leq N}\) to be the tent functions defined as follow:

and here is what a tent function looks like:

Any function in \(H\) can be uniquely decompossed as a linear combination of tent functions.

We got everything we need to compute \(A_{i,j}\) and \(b_{i}\). We just need to walk the triangles.

Computing \(b\)

Let’s start with the easy one. Given a triangle \(T_{i} = (q_{i_{1}},q_{i_{2}},q_{i_{3}})\) what are the contributions to \(b\) ? We got 3 contributions: one to \(b_{i_{1}}\) from \(e_{i_{1}}\), one to \(b_{i_{2}}\) from \(e_{i_{2}}\) and one to \(b_{i_{3}}\) from \(e_{i_{3}}\). And the contributions are equal to:

Now using this handy quadrature formula which is exact for affine function:

where \(|T_{i}|\) is the area of \(T_{i}\),

If \(k^- = 1 + ((k -2) \mod 3)\) and \(k^+ = 1 + (k \mod 3)\), we end up with:

Done ! Next !

Computing \(A\)

First of all, let’s remark that when \(q_{i}\) and \(q_{j}\) never belong to a same triangle then the supports of \(e_{i}\) and \(e_{j}\) are disjoint and therefore \(A_{i,j}\) is zero. Otherwise, given a triangle \(T_{i} = (q_{i_{1}},q_{i_{2}},q_{i_{3}})\), \(e_{i_{1}}\restriction_{T_{i}}\), \(e_{i_{2}}\restriction_{T_{i}}\) and \(e_{i_{3}}\restriction_{T_{i}}\) being affine, their gradiants, \(\nabla e_{i_{1}}\restriction_{T_{i}}\), \(\nabla e_{i_{2}}\restriction_{T_{i}}\) and \(\nabla e_{i_{3}}\restriction_{T_{i}}\) are constants. Therefore:

So we’re left with computing \(\nabla e_{i_{j}}\restriction_{T_{i}}, 1 \leq j \leq 3\). We will do that geometricaly on a sketch: Gradiant explanation

We end up with:

which is easy to compute.

Implementation

We got everything now. Here is a baby implementation of the method in Javascript/WebGL.

Sub-assembling

If we apply the same procedure, restricted to each subdomain and if we use the same unknown reordering techniques introduced in the previous post, we can assemble two linear (sub-)systems:

where

  • \(A^{(i)}_{II}\) is the block coupling the interior of domain \(i\) to itself;
  • \(A^{(i)}_{\Gamma\Gamma}\) is the block coupling the interface to itself BUT only considering the triangles of domain \(i\);
  • \(A^{(i)}_{\Gamma I}\) is the block coupling the interface to the interior of domain \(i\);
  • \(A^{(i)}_{I\Gamma}\) is the block coupling the interior of domain \(i\) to the interface

Now, given these two linear systems, we can try to write a discrete analog of \(\eqref{eq:subdomainsPDEWithTransmition}\).

A discrete analog of \(\eqref{eq:subdomainsPDE}\) is:

But we know that is not enough, we need to take care of the transmission conditions as well.

Discrete transmition conditions

Trace

Transmission of the trace, \(\eqref{eq:transmissionOfTrace}\), is simply written as:

Flux

Transmission of the flux, \(\eqref{eq:transmissionOfFlux}\), is trickier.

How do we evaluate \(\frac{\partial\varphi_i}{\partial n_i}, i=1,2\) ?

In order to do so we have to step back a little bit and re-write \(\eqref{eq:weakFormulation}\) for a subdomain:

So what we have is an expression of \( \psi \mapsto \oint_{\Gamma } \frac{\partial \varphi_i}{\partial n_i}\psi d\sigma \) the dual of \(\frac{\partial\varphi_i}{\partial n_i}\) in the weak topology of \(H_i\). Since I said we will skip all the functional analysis details, just consider that “the flux transmission condition works the same in the weak topology”.

Particularizing \(\eqref{eq:weakNeuman}\) only for \( \psi = e_{j}\) with \(q_{j}\) along \(\Gamma\) (otherwise the line integral is zero), a discrete analog of the “weak” flux is:

and then the discrete analog of the “weak” flux trasmition condition is:

Putting everything together we end up with the discrete analog of \(\eqref{eq:subdomainsPDEWithTransmition}\):

Conclusion

In that part I briefly described how to apply the FEM and to express discrete analog of the transmission conditions. In the next one, we will show how solving \(\eqref{eq:discreteSubdomainsPDEWithTransmition}\) for the trace first is equivalent to what was done in the previous post.

Trace first

Remember what we did last time after massaging the global linear system ?

We first solved

where \(\Sigma = A_{\Gamma\Gamma} - A^{(1)}_{\Gamma I}(A^{(1)}_{II})^{-1}A^{(1)}_{I\Gamma} - A^{(2)}_{\Gamma I}(A^{(2)}_{II})^{-1}A^{(2)}_{I\Gamma}\) and \(\tilde{b} = b_{\Gamma} - A^{(1)}_{\Gamma I}(A^{(1)}_{II})^{-1}b^{(1)}_{I} - A^{(2)}_{\Gamma I}(A^{(2)}_{II})^{-1}b^{(2)}_{I}\)

and then solved

Let’s see why it’s equivalent to solve \(\eqref{eq:discreteSubdomainsPDEWithTransmition}\) for the trace first.

Previous post revisited

The idea is to craft a “trace only” equation.

  1. From the first equation in \(\eqref{eq:discreteSubdomainsPDEWithTransmition}\) comes

  2. Injecting that expression of \(x^{(i)}_{I}\) in the flux transmission equation (the third equation in \(\eqref{eq:discreteSubdomainsPDEWithTransmition}\)) leads to

  3. Because of the trace transmission equation (the second equation in \(\eqref{eq:discreteSubdomainsPDEWithTransmition}\)) this is equivalent to

That’s it ! Same !

Implementation & Conclusion

See the previous post for the details :-)

Flux first

Neumann problem

This time we need to craft a “flux only” equation.

  1. Assemble two linear systems from \(\eqref{eq:weakNeuman}\), that is, taking into account the flux, as a Neumann boundary condition:

  2. Perform the same block Gaussian elimination we did in the previous post, in order to find an expression of \(x^{(i)}_{\Gamma}\):

    concisely written as

    where \(\Sigma^{(i)} := A^{(i)}_{\Gamma\Gamma} - A^{(i)}_{\Gamma I}(A^{(i)}_{II})^{-1}A^{(i)}_{I\Gamma}\) and \(\tilde{b}^{(i)} := b^{(i)}_{\Gamma} - A^{(i)}_{\Gamma I}(A^{(i)}_{II})^{-1}b^{(i)}_{I}\)

  3. Inject into the trace transmission condition equation:

  4. Apply the flux transmission condition:

    concisely written as

    where \(\tilde{\Sigma} := (\Sigma^{(1)})^{-1} + (\Sigma^{(2)})^{-1}\) and \(\tilde{\tilde{b}} := (\Sigma^{(2)})^{-1} \tilde{b}^{(2)} - (\Sigma^{(1)})^{-1}\tilde{b}^{(1)}\)

That’s it ! Here is the “flux only” equation. All we have to do now is solve for \(\lambda\), inject in \(\eqref{eq:discreteNeumann}\) and solve for each subdomain.

Implementation

Here is my implementation of the “flux first” method with sub-assembling:

  1 function schurComplementMethod2()
  2 
  3    [vertices, triangles, border, domains] = readMesh("poisson2D.mesh");
  4    interface = readInterface("interface2.vids");
  5 
  6    % Assemble local Dirichlet problems 
  7    % => could be done in parallel
  8    [A1, b1, vids1] = subAssemble(vertices, triangles, border, domains, interface, 1);
  9    [A2, b2, vids2] = subAssemble(vertices, triangles, border, domains, interface, 2);
 10 
 11    nbInterior1 = length(b1) - length(interface);
 12    nbInterior2 = length(b2) - length(interface);
 13 
 14    % LU-factorize the interior of the subdomains, we're going to reuse this everywhere 
 15    % => could be done in parallel
 16    [L1, U1, p1, tmp] = lu(A1(1:nbInterior1, 1:nbInterior1), 'vector');
 17    q1(tmp) = 1:length(tmp);
 18    [L2, U2, p2, tmp] = lu(A2(1:nbInterior2, 1:nbInterior2), 'vector');
 19    q2(tmp) = 1:length(tmp);
 20 
 21    % In order to solve for the flux first, we need to compute the corresponding second member
 22    % => could be done in parallel
 23    bt1 = computeBTildI(A1, b1, nbInterior1, L1, U1, p1, q1);
 24    bt2 = computeBTildI(A2, b2, nbInterior2, L2, U2, p2, q2);
 25 
 26    epsilon = 1.e-30;
 27    maxIter = 600
 28 
 29    % Each pcg contributing to the second member could be done in parallel
 30    btt = pcg(@(x) multiplyByLocalSchurComplement(A2, nbInterior2, L2, U2, p2, q2, x), ...
 31             bt2, epsilon, maxIter) ...
 32          - pcg(@(x) multiplyByLocalSchurComplement(A1, nbInterior1, L1, U1, p1, q1, x), ...
 33             bt1, epsilon, maxIter);
 34 
 35    % Solve for the flux => each nested pcg contribution could be done in parallel
 36    flux = pcg(@(x) pcg(@(y) multiplyByLocalSchurComplement(A1, nbInterior1, L1, U1, p1, q1, y), ...
 37             x, epsilon, maxIter) ...
 38          + pcg(@(y) multiplyByLocalSchurComplement(A2, nbInterior2, L2, U2, p2, q2, y), ...
 39             x, epsilon, maxIter), btt, epsilon, maxIter);
 40 
 41    % Add the computed Neumann data to the load vectors
 42    b1(nbInterior1 + 1 : end) += flux;
 43    b2(nbInterior2 + 1 : end) -= flux;
 44    % and solve the corresponding problems using a direct method
 45    % => could be done in parallel
 46    solution = zeros(length(vertices), 1);
 47    solution(vids1) = A1 \ b1;
 48    solution(vids2) = A2 \ b2;
 49 
 50    writeSolution("poisson2D.sol", solution);
 51 
 52 endfunction
 53 
 54 function [A, b, dVertexIds] = subAssemble(vertices, triangles, border, domains, interface, domainIdx)
 55    % Keep the triangles belonging to the subdomain
 56    dTriangles = triangles(find(domains == domainIdx),:);
 57    dVertexIds = unique(dTriangles(:));
 58 
 59    % Put the interface unknowns at the end, using the numbering we computed in the last post
 60    dVertexIds = [setdiff(dVertexIds, interface); interface];
 61 
 62    % Switch domain triangles to local vertices numbering
 63    % Should use a hashtable here but their is none in octave ...
 64    globalToLocal = zeros(length(vertices), 1);
 65    globalToLocal(dVertexIds) = 1:length(dVertexIds);
 66    dTriangles(:) = globalToLocal(dTriangles(:));
 67 
 68    % Assemble the Dirichlet problem
 69    [A, b] = assemble(vertices(dVertexIds, :), dTriangles, border(dVertexIds));
 70 endfunction
 71 
 72 function [A, b] = assemble(vertices, triangles, border)
 73    nvertices = length(vertices);
 74    ntriangles = length(triangles);
 75 
 76    iis = [];
 77    jjs = [];
 78    vs = [];
 79 
 80    b = zeros(nvertices, 1);
 81    for tid=1:ntriangles
 82       q = zeros(3,2);
 83       q(1,:) = vertices(triangles(tid, 1), :) - vertices(triangles(tid, 2), :);
 84       q(2,:) = vertices(triangles(tid, 2), :) - vertices(triangles(tid, 3), :);
 85       q(3,:) = vertices(triangles(tid, 3), :) - vertices(triangles(tid, 1), :);
 86       area = 0.5 * det(q([1,2], :));
 87       for i=1:3
 88          ii = triangles(tid,i);
 89          if !border(ii)
 90             for j=1:3
 91                jj = triangles(tid,j);
 92                if !border(jj)
 93                   hi = q(mod(i, 3) + 1, :);
 94                   hj = q(mod(j, 3) + 1, :);
 95                   
 96                   v = (hi * hj') / (4 * area);
 97 
 98                   iis = [iis ii];
 99                   jjs = [jjs jj];
100                   vs = [vs v];
101                end
102             end
103             b(ii) += -area / 3;
104          end
105       end
106    end
107    A = sparse(iis, jjs, vs, nvertices, nvertices, "sum");
108 endfunction
109 
110 function [bti] = computeBTildI(A, b, nbInterior, L, U, p, q)
111    bti = b(nbInterior + 1 : end) ...
112       - A(nbInterior + 1 : end, 1:nbInterior) * (U \ (L \ b(1:nbInterior)(p)))(q);
113 endfunction
114 
115 function [res] = multiplyByLocalSchurComplement(A, nbInterior, L, U, p, q, x)
116    res = A(nbInterior + 1 : end, nbInterior + 1 : end) * x ...
117       - A(nbInterior + 1 : end, 1:nbInterior) * (U \ (L \ (A(1:nbInterior, nbInterior + 1 : end) * x)(p)))(q);
118 endfunction
119 
120 function [vertices, triangles, border, domains] = readMesh(fileName)
121 
122    fid = fopen (fileName, "r");
123 
124    % Vertices
125    fgoto(fid, "Vertices");
126    
127    [nvertices] = fscanf(fid, "%d", "C");
128    vertices = zeros(nvertices, 2);
129    for vid=1:nvertices
130       [vertices(vid, 1), vertices(vid, 2)] = fscanf(fid, "%f %f %d\n", "C");
131    end
132 
133    % Edges
134    border = zeros(nvertices, 1);
135    fgoto(fid, "Edges");
136    [nedges] = fscanf(fid, "%d", "C");
137    for eid=1:nedges
138       [v1, v2] = fscanf(fid, "%d %d %d\n", "C");
139       border(v1) = 1;
140       border(v2) = 1;
141    end
142 
143    % Elements
144    fgoto(fid, "Triangles");
145    [ntriangles] = fscanf(fid, "%d", "C");
146    triangles = zeros(ntriangles, 3);
147    domains = zeros(ntriangles, 1);
148    for tid=1:ntriangles
149       [triangles(tid, 1), triangles(tid, 2), triangles(tid, 3), domains(tid)] = fscanf(fid, "%d %d %d %d\n", "C");
150    end
151 
152    fclose(fid);
153 endfunction
154 
155 function [interface] = readInterface(fileName)
156    fid = fopen (fileName, "r");
157    interface = [];
158    while (l = fgetl(fid)) != -1
159       [vid] = sscanf(l, "%d\n", "C");
160       interface = [interface; vid];
161    end
162    fclose(fid);
163 end
164 
165 function [] = fgoto(fid, tag)
166    while !strcmp(fgetl(fid),tag)
167    end
168 endfunction
169 
170 function [] = writeSolution(fileName, solution)
171    fid = fopen (fileName, "w");
172       fprintf(fid, "MeshVersionFormatted 1\n\nDimension 2\n\nSolAtVertices\n%d\n1 1\n\n", length(solution));
173       for i=1:length(solution)
174          fprintf(fid, "%e\n", solution(i));
175       end
176       fclose(fid);
177 endfunction

Conclusion

Solving for the flux first is a bit more involved and less efficient since we have to deal with the inverse of the local Schur complement. I believe that in practice this method is never used. Nevertheless introducing the flux and the Neumann problems is a necessary step to understand the Dirichlet-Neuman and Neuman-Neuman algorithms. Someday I will try to post about that, but it will be later.

Conclusion

The next time, I would like to experiment with something more ambitious: I will try to solve the mild-slope equation in the Lyttelton port using more than two domains decompostion (trace first). I will try to use fancy stuff like MPI and GPU linear algebra libraries. Stay tuned !


View Larger Map



blog comments powered by Disqus

Published

18 December 2013

Category

Numerical methods

Tags