Challenges in Fully Generating Multigrid Solvers for the ...

44
Challenges in Fully Generating Multigrid Solvers for the Simulation of non-Newtonian Fluids Sebastian Kuckuk FAU Erlangen-Nürnberg 18.01.2016 HiStencils 2016, Prague, Czech Republic

Transcript of Challenges in Fully Generating Multigrid Solvers for the ...

Page 1: Challenges in Fully Generating Multigrid Solvers for the ...

Challenges in Fully GeneratingMultigrid Solvers for theSimulation of non-Newtonian Fluids

Sebastian Kuckuk

FAU Erlangen-Nürnberg

18.01.2016

HiStencils 2016, Prague, Czech Republic

Page 2: Challenges in Fully Generating Multigrid Solvers for the ...

Outline

Page 3: Challenges in Fully Generating Multigrid Solvers for the ...

Outline

● Scope and Motivation

● Project ExaStencils

● From Poisson to Fluid Dynamics

● 1st Challenge – Code Generation

● 2nd Challenge – Optimization

● 3rd Challenge – Parallelization

● Preliminary Results

● Conclusion and Outlook

Page 4: Challenges in Fully Generating Multigrid Solvers for the ...

Scope and Motivation

Page 5: Challenges in Fully Generating Multigrid Solvers for the ...

PDEs

● Goal: Solve a partial differential equation approximately by solving a

discretized form

5

Optical Flow (LIC Visualization)Particles in Flow

∆𝑢 = 𝑓 in Ω𝑢 = 𝑔 in 𝜕Ω

𝐴 𝑢ℎ = 𝑓ℎ

Page 6: Challenges in Fully Generating Multigrid Solvers for the ...

Multigrid

6

Smoothing of

High Frequency

Errors

Coarsened

Representation of

Low Frequency Errors

Page 7: Challenges in Fully Generating Multigrid Solvers for the ...

Multigrid V-Cycle

7

Smoothing

Restriction Prolongation

& Correction

Coarse Grid Solving

Page 8: Challenges in Fully Generating Multigrid Solvers for the ...

Computational Grids

● Regular grids

● Block-Structured grids -> deferred for now

8

Page 9: Challenges in Fully Generating Multigrid Solvers for the ...

Main Focus: HPC

● Highly optimized and highly scalable geometric multigrid solvers

● ‘Traditional’ supercomputers like JUQUEEN and SuperMUC

● Alternative architectures like Piz Daint and Mont Blanc

9

Page 10: Challenges in Fully Generating Multigrid Solvers for the ...

Project ExaStencils

Page 11: Challenges in Fully Generating Multigrid Solvers for the ...

14

• Sebastian Kuckuk

• Harald Köstler

• Ulrich Rüde

• Christian Schmitt

• Frank Hannig

• Jürgen Teich

• Hannah Rittich

• Matthias Bolten

• Alexander Grebhahn

• Sven Apel

• Stefan Kronawitter

• Armin Größlinger

• Christian Lengauer

Project ExaStencils

Page 12: Challenges in Fully Generating Multigrid Solvers for the ...

Project ExaStencils

● Multi-layered DSL for the specification of

● PDEs

● Discretizations

● (MG) Solver components

● Parallel behavior and application specifics

● Fully automatic code generation fueled by our Scala framework

● Automatic application of low-level optimizations

● Powerful performance characteristics prediction based on

● Local Fourier Analysis (LFA) and

● Software Product Lines (SPL) technology

15

Page 13: Challenges in Fully Generating Multigrid Solvers for the ...

From Poisson to Fluid Dynamics

Page 14: Challenges in Fully Generating Multigrid Solvers for the ...

Overall Goal

● Simulation of non-isothermal/non-

Newtonian fluid flows

● Suspensions of particles or

macromolecules

● E.g. pastes, gels, foams, drilling

fluids, food products, blood, etc.

● Importance in mining, chemical

and food industry as well as

medical applications

17

https://www.youtube.com/watch?v=G1Op_1yG6lQ

Page 15: Challenges in Fully Generating Multigrid Solvers for the ...

Overall Goal cnt’d

● FORTRAN90 Code

● Finite volume

discretization

● Staggered grid

● SIMPLE Algorithm

● TDMA solvers

● OMP parallelization

18

Fully generate a replica source code for “Parallel finite volume method

simulation of three-dimensional fluid flow and convective heat transfer for

viscoplastic non-Newtonian fluids”, D. A. Vasco, N. O. Moraga and G. Haase

Page 16: Challenges in Fully Generating Multigrid Solvers for the ...

Governing Equations (w/o BC)

Poisson NNF

∆𝑢 = 𝑓−𝛻𝑇 𝐻𝛻𝑣 + 𝐷 𝑣𝑇 ⋅ 𝛻 𝑣 + 𝛻𝑝 + 𝐷

0𝜃0

= 0

𝛻𝑇𝑣 = 0−𝛻𝑇 𝛻𝜃 + 𝐺 ⋅ v𝑇 ⋅ 𝛻 𝜃 = 0

with𝐺, 𝐷 dependent on physical properties and temperature-dependent density, and

viscosity 𝐻 Γ(𝑣) dependent on physical

properties and computed w.r.t. various models

20

Page 17: Challenges in Fully Generating Multigrid Solvers for the ...

The SIMPLE Algorithm

● Semi-Implicit Method for Pressure Linked Equations

● Concept:

23

Solve for velocity componentsSolve for velocity componentsSolve for velocity components

Solve for pressure correction

Apply pressure correction

Update properties

Page 18: Challenges in Fully Generating Multigrid Solvers for the ...

The SIMPLE Algorithm

● Temperature can be added as a separate step

24

Solve for velocity componentsSolve for velocity componentsSolve for velocity components

Solve for pressure correction

Apply pressure correction

Solve for temperature

Update properties

Page 19: Challenges in Fully Generating Multigrid Solvers for the ...

Staggered grid

● Finite volume discretization on a staggered grid

25

values associated with the

x-staggered grid, e.g. U

values associated with the

y-staggered grid, e.g. V

values associated with the

cell centers, e.g. p and θ

control volumes associated

with cell-centered values

control volumes associated

with x-staggered values

control volumes associated

with y-staggered values

Page 20: Challenges in Fully Generating Multigrid Solvers for the ...

Differences to Poisson

● From model problem to application

27

Poisson NNF

∆𝑢 = 𝑓−𝛻𝑇 𝐻𝛻𝑣 + 𝐷 𝑣𝑇 ⋅ 𝛻 𝑣 + 𝛻𝑝 + 𝐷

0𝜃0

= 0

𝛻𝑇𝑣 = 0−𝛻𝑇 𝛻𝜃 + 𝐺 ⋅ v𝑇 ⋅ 𝛻 𝜃 = 0

Linear Non-linear

Scalar PDE (one unknown)3 velocity components, pressure and temperature with varying localizations

Simple boundary conditions Mixed boundary conditions

Finite differences Finite volumes

One grid Staggered grids

Uniform spacing Local refinement

Page 21: Challenges in Fully Generating Multigrid Solvers for the ...

1st Challenge – Code Generation

Page 22: Challenges in Fully Generating Multigrid Solvers for the ...

Extensions

Necessary preparations:

● Interface from/to FORTRAN

● Data and functions

Allows replacing single portions of the code, one step at a time

● Data layouts for staggered grids

● Requires, apart from the language extension itself,

adapted field layouts

new communication routines

specialized boundary conditions

(refined coarsening and interpolation stencils)

29

Page 23: Challenges in Fully Generating Multigrid Solvers for the ...

Porting code

● Straight-forward for simple kernels

30

subroutine advance_fields ()

!$omp parallel do &

!$omp private(i,j,k) &

!$omp firstprivate(l1,m1,n1) &

!$omp shared(rho,rho0) &

!$omp schedule(static) default(none)

do k=1,n1

do j=1,m1

do i=1,l1

rho0(i,j,k)=rho(i,j,k)

end do

end do

end do

!$omp end parallel do

Function AdvanceFields@finest ()

: Unit {

loop over rho@current {

rho[next]@current =

rho[active]@current

}

advance rho@current

}

Page 24: Challenges in Fully Generating Multigrid Solvers for the ...

Porting code

● But what about more complicated code? How do we get from this …

31

! if not at the boundary

fl = xcvi(i) * v(i,jp,k) * (fy(jp)*rho(i,jp,k) + fym(jp)*rho(i,j,k))

flm = xcvip(im) * v(im,jp,k) * (fy(jp)*rho(im,jp,k) + fym(jp)*rho(im,j,k))

flownu = zcv(k) * (fl+flm)

gm = xcvi(i) * vis(i,j,k) * vis(i,jp,k) / (ycv(j)*vis(i,jp,k) + ycv(jp)*vis(i,j,k) + 1.e-30)

gmm = xcvip(im) * vis(im,j,k) * vis(im,jp,k) /(ycv(j)*vis(im,jp,k) +

ycv(jp)*vis(im,j,k) + 1.e-30)

diff = 2. * zcv(k) * (gm+gmm)

call diflow(flownu,diff,acof)

adc = acof + max(0.,flownu)

anu(i,j,k) = adc - flownu

Page 25: Challenges in Fully Generating Multigrid Solvers for the ...

Porting code

● … to this?

32

flownu@current = integrateOverXStaggeredNorthFace (

v[active]@current * rho[active]@current )

Val diffnu : Real = integrateOverXStaggeredNorthFace (

evalAtXStaggeredNorthFace ( vis@current, "harmonicMean" ) )

/ vf_stagCVWidth_y@current@[0, 1, 0]

AuStencil@current:[0, 1, 0] = -1.0 * ( calc_diflow ( flownu@current, diffnu )

+ max ( 0.0, flownu@current ) - flownu@current )

Page 26: Challenges in Fully Generating Multigrid Solvers for the ...

Extended boundary conditions

33

1 Function ApplyBC_u@finest ( ) : Unit {

2 loop over u@current only duplicate [ 1, 0] on boundary {

3 u@current = 0.0

4 }

5 loop over u@current only duplicate [-1, 0] on boundary {

6 u@current = 0.0

7 }

8 loop over u@current only duplicate [ 0, 1] on boundary {

9 u@current = wall_velocity

10 }

11 loop over u@current only duplicate [ 0, -1] on boundary {

12 u@current = -1.0 * wall_velocity

13 }

14 }15

16 Field Solution < global, DefNodeLayout, ApplyBC_u >@finest

Page 27: Challenges in Fully Generating Multigrid Solvers for the ...

Virtual fields

● Rework and extension of the way geometric information is used

● Many virtual fields (positions of (staggered) nodes/ faces/ cells,

interpolation and integration parameters, etc.)

● Tradeoff between on-the-fly calculation and explicit storage

34

1 // previous access through functions doesn’t allow offset access

2 rhs@current = sin ( nodePosition_x@current ( ) )3

4 // newly introduced virtual fields allow virtually identical behavior ...

5 rhs@current = sin ( vf_nodePosition_x@current )6

7 // ... and in addition allow offset accesses

8 Val dif : Real = (

9 vf_nodePosition_x@current@[ 1, 0, 0]

10 - vf_nodePosition_x@current@[ 0, 0, 0] )

Page 28: Challenges in Fully Generating Multigrid Solvers for the ...

Specialized functions

● Specialized evaluation and integration functions

35

1 // evaluate with respect to cells of the grid

2 evalAtSouthFace ( rho[active]@current )3

4 // integrate expressions across faces of grid cells

5 integrateOverEastFace (

6 u[active]@current * rho[active]@current )7

8 // integrate expressions across faces of cells of the staggered grid

9 integrateOverXStaggeredEastFace (

10 u[active]@current * rho[active]@current )11

12 // integrate expressions using specialized interpolation schemes

13 integrateOverZStaggeredEastFace (

14 evalAtZStaggeredEastFace ( vis@current, "harmonicMean" ) )

Page 29: Challenges in Fully Generating Multigrid Solvers for the ...

2nd Challenge – Optimization

Page 30: Challenges in Fully Generating Multigrid Solvers for the ...

Optimization of Solver Components

● Very similar to what we usually do (when using SIMPLE)

● 7P stencils with variable coefficients

● ‘Standard’ options for low-level optimization are viable

● address pre-calculation

● (temporal and spatial) blocking

● vectorization

● polyhedral loop transformations

● etc.

37

Page 31: Challenges in Fully Generating Multigrid Solvers for the ...

Performance distribution

● Rough breakdown of time spent

● Preliminary test with 64^3 cells without applied optimizations

38

update LSE solve update propertiesvelocity u velocity v

velocity w pressure correction

temperature

Page 32: Challenges in Fully Generating Multigrid Solvers for the ...

Optimization of LSE Updates

● Has to be redone many times -> costly

● No temporal blocking

● Reuse due to symmetry in some subexpressions possible

● Vectorization challenging (but possible)

39

Page 33: Challenges in Fully Generating Multigrid Solvers for the ...

Other Open Points

● Choice of discretization specifics / grid spacing / numerical

components are crucial

● Representation of the LSEs on coarser levels

● Optimization of solver considerably harder when handling all

components at once

● Multiple unknowns are combined in vectors

● Stencil coefficient become matrices (e.g. 8x8)

● Inverting the center coefficient becomes a matrix inversion

40

Page 34: Challenges in Fully Generating Multigrid Solvers for the ...

3rd Challenge – Parallelization

Page 35: Challenges in Fully Generating Multigrid Solvers for the ...

Parallelization

● Free for shared memory parallelism

● OpenMP code is emitted by our generator without further effort

● Distributed memory parallelism is also manageable

● Communication patterns and routines are available

● Points of communication have to be annotated (to be automated in the

future)

● More difficult: domain partitioning of non-uniform grids

=> approach from the reference work is strictly serial

42

Page 36: Challenges in Fully Generating Multigrid Solvers for the ...

Domain Partitioning

● Easy for regular domains

43

Each domain consists

of one or more blocks

Each block consists

of one or more fragments

Each fragment consists

of several data points / cells

Page 37: Challenges in Fully Generating Multigrid Solvers for the ...

Parallelization

● Communication statements are added automatically when

transforming Layer 3 to Layer 4 where they may be reviewed or

adapted

45

/* communicates all applicable layers */communicate Solution@current/* communicates only ghost layers */communicate ghost of Solution[active]@current/* communicates duplicate and first two ghost layers */communicate dup, ghost[0, 1] of Solution[active]@current/* asynchronous communicate */begin communicate Residual@current//...finish communicating Residual@current

Page 38: Challenges in Fully Generating Multigrid Solvers for the ...

Preliminary results

Page 39: Challenges in Fully Generating Multigrid Solvers for the ...

Preliminary results

● Initial version without non-Newtonian model

● Comparison is difficult due to the different exit criteria, e.g.

● Solving the single components

● Reference: fixed to 1 solve step (TDMA)

● Our implementation

● At least one step (RBGS or GMG)

● SIMPLE

● Reference: a certain percentage of all cells (including boundaries)

doesn’t change beyond a fixed threshold

● Our implementation: convergence for all components is achieved after

updating the LSE but before starting the solve routine

47

𝑟 ≤ 𝛼 1 + 𝛽 𝑏𝑟𝑖+1 − 𝑟𝑖 < 𝛼

Page 40: Challenges in Fully Generating Multigrid Solvers for the ...

Preliminary results

● One socket of our local cluster (Intel Xeon E7-4830)

● 16 OMP threads

● 100 timesteps

48

0,01

0,1

1

10

100

1000

16³ 32³ 64³ 128³Ex

ec

uti

on

tm

e p

er

tim

ers

tep

[s]

Problem Size

Reference RBGS Multigrid

Page 41: Challenges in Fully Generating Multigrid Solvers for the ...

Conclusion and Outlook

Page 42: Challenges in Fully Generating Multigrid Solvers for the ...

Conclusion

● Generating solvers for simulating non-Newtonian fluids

● 95 %

● Optimization

● 30 %

● Parallelization

● 75 %

50

Page 43: Challenges in Fully Generating Multigrid Solvers for the ...

Outlook

● Enhanced performance optimizations

● Comparison with analytical models

● Study of parallel performance characteristics

● DSL extensions for relevant concepts

● More complex numeric components

● (Multi-) GPU support

51

Page 44: Challenges in Fully Generating Multigrid Solvers for the ...

Thank you for yourAttention!

Questions?