Automatic Code Generation for Simulating Non-Newtonian ...
Transcript of Automatic Code Generation for Simulating Non-Newtonian ...
Automatic Code Generation forSimulating Non-NewtonianFluid Flows with ExaStencils
Sebastian Kuckuk, Harald Köstler
FAU Erlangen-Nürnberg
15.04.2016
SiamPP, Paris, France
Outline
Outline
● Scope and Motivation
● Project ExaStencils
● Application – CFD
● Results
● Conclusion and Outlook
Scope and Motivation
Solving PDEs
● Goal: Solve a partial differential equation approximately by solving a
discretized form
5
Optical Flow (LIC Visualization)Particles in Flow
∆𝑢 = 𝑓 in Ω𝑢 = 𝑔 in 𝜕Ω
𝐴 𝑢ℎ = 𝑓ℎ
Restriction – Computational Grids
● Regular
● Can be non-uniform
● Can be patch-based
6
Restriction – Algorithm
● (Geometric) Multigrid
7
Smoothing of
High Frequency
Errors
Coarsened Representation
Of Low Frequency Errors
Multigrid V-Cycle
8
Smoothing
Restriction Prolongation
& Correction
Coarse Grid Solving
Main Focus: HPC
● Highly optimized and highly scalable geometric multigrid solvers
● ‘Traditional’ supercomputers like JUQUEEN and SuperMUC
● Alternative architectures like Piz Daint and Mont Blanc
9
A lot of variance…
● Composition of optimal solvers requires choosing
● MG components: Cycle Type? Which smoother(s)? Which coarse grid
solver? Which inter-Grid operators?
● MG parameters: Relaxation? Number of smoothing steps? Other
component dependent parameters?
● Applied Optimizations: Vectorization? (Software) Prefetching? Tiling?
Temporal Blocking? Loop transformations?
● …
10
A lot of variance…
● Composition of optimal solvers requires choosing
● MG components: …
● MG parameters: …
● Applied Optimizations: …
● But the choice relies heavily on various influences
● Hardware: CPU, GPU or both? Number of nodes, sockets and cores?
Cache characteristics? Network characteristics?
● Software: MPI, OpenMP or both? CUDA or OpenCL? Which versions?
● Problem description: Which PDE? Which boundary conditions?
● Discretization: Finite Differences, Finite Elements or Finite Volumes?
● Domain: Uniform or block-structured? How to partition?
● …
11
What about SW frameworks?
● Frequent questions
● Will it run optimally on my chosen hardware?
● Will it support new architectures?
● Do I have to invest a lot of time to understand to framework?
● How fast/easy can I express my ideas and models?
● What happens when I change the target hardware?
● Poses serious challenges for developers of frameworks and
applications alike
● One possible alternative is given by domain-specific languages
(DSLs) in conjunction with code generation techniques
12
Project ExaStencils
14
• Sebastian Kuckuk
• Harald Köstler
• Ulrich Rüde
• Christian Schmitt
• Frank Hannig
• Jürgen Teich
• Hannah Rittich
• Lisa Claus
• Matthias Bolten
• Alexander Grebhahn
• Sven Apel
• Stefan Kronawitter
• Armin Größlinger
• Christian Lengauer
Project ExaStencils
ExaSlang
● We propose a multilayered, external DSL
15
Workflow
● Transition between different layers should be automatic
● Review/ extension through users is possible
● Highly modular transformations implement functionality
16
Features
● With our transformation-based approach we can
● Add MPI/OMP parallelization on demand
● Add GPU support on demand
● Determine coarse performance estimation models
● Perform automatic vectorization, loop transformations, address pre-
calculation, common subexpression elimination, etc.
● Efficiently generate many variants forming the basis for auto-tuning and
advanced performance prediction models
● Vastly improve user experience
17
Application – CFD
Overall Goal
● Simulation of non-isothermal/non-
Newtonian fluid flows
● Suspensions of particles or
macromolecules
● E.g. pastes, gels, foams, drilling
fluids, food products, blood, etc.
● Importance in mining, chemical
and food industry as well as
medical applications
19
https://www.youtube.com/watch?v=G1Op_1yG6lQ
The SIMPLE Algorithm
● Semi-Implicit Method for Pressure Linked Equations
● Concept:
20
Solve for velocity componentsSolve for velocity componentsSolve for velocity components
Solve for pressure correction
Apply pressure correction
Update properties
The SIMPLE Algorithm
● Temperature can be added as a separate step
21
Solve for velocity componentsSolve for velocity componentsSolve for velocity components
Solve for pressure correction
Apply pressure correction
Solve for temperature
Update properties
Staggered grid
● Finite volume discretization on a staggered grid
22
values associated with the
x-staggered grid, e.g. U
values associated with the
y-staggered grid, e.g. V
values associated with the
cell centers, e.g. p and θ
control volumes associated
with cell-centered values
control volumes associated
with x-staggered values
control volumes associated
with y-staggered values
Porting code
● Straight-forward for simple kernels
23
subroutine advance_fields ()
!$omp parallel do &
!$omp private(i,j,k) &
!$omp firstprivate(l1,m1,n1) &
!$omp shared(rho,rho0) &
!$omp schedule(static) default(none)
do k=1,n1
do j=1,m1
do i=1,l1
rho0(i,j,k)=rho(i,j,k)
end do
end do
end do
!$omp end parallel do
Function AdvanceFields@finest ()
: Unit {
loop over rho@current {
rho[next]@current =
rho[active]@current
}
advance rho@current
}
Porting code
● Rather than specifying how the calculations are done…
24
! if not at the boundary
fl = xcvi(i) * v(i,jp,k) * (fy(jp)*rho(i,jp,k) + fym(jp)*rho(i,j,k))
flm = xcvip(im) * v(im,jp,k) * (fy(jp)*rho(im,jp,k) + fym(jp)*rho(im,j,k))
flownu = zcv(k) * (fl+flm)
gm = xcvi(i) * vis(i,j,k) * vis(i,jp,k) / (ycv(j)*vis(i,jp,k) + ycv(jp)*vis(i,j,k) + 1.e-30)
gmm = xcvip(im) * vis(im,j,k) * vis(im,jp,k) /(ycv(j)*vis(im,jp,k) +
ycv(jp)*vis(im,j,k) + 1.e-30)
diff = 2. * zcv(k) * (gm+gmm)
call diflow(flownu,diff,acof)
adc = acof + max(0.,flownu)
anu(i,j,k) = adc - flownu
Porting code
● … we’d like to specify what we want to calculate
25
Val flownu : Real = integrateOverXStaggeredNorthFace (
v[active]@current * rho[active]@current )
Val diffnu : Real = integrateOverXStaggeredNorthFace (
evalAtXStaggeredNorthFace ( vis@current, "harmonicMean" ) )
/ vf_stagControlVolumeWidth_y@current@[0, 1, 0]
AuStencil@current:[0, 1, 0] = -1.0 * ( calc_diflow ( flownu, diffnu )
- min ( 0.0, flownu ) )
Results
Results
● Natural convection problem
● With and without non-Newtonian model
● Convergence criteria for
● solving the single components: At least one v-cycle and
● SIMPLE: Convergence for all components is achieved after updating the
LSE but before starting the solve routine
● 10,000 timesteps
● One Socket on Emmy (Intel Xeon E5-2660v2) with 20 OpenMP
threads
27
𝑟 ≤ 𝛼 1 + 𝛽 𝑏𝑟𝑖+1 − 𝑟𝑖 < 𝛼
Results
28
Temperature distribution along slice (z at 50% of the box
depth) for the Newtonian and non-Newtonian case
Results
29
0,01
0,1
1
10
100
1000
16 32 64 128 16 32 64 128
Newtonian non-Newtonian
Avg.
execution t
ime p
er
tim
este
p [m
s]
Problem size
Update quantities Compile LSEs Solve Total
Conclusion and Outlook
Conclusion
● ExaStencils approach
● Multi-layered, external DSL
● Transformation-based code generation framework
● Automatic optimizations and tuning
● CFD application
31
Outlook
● Enhanced performance optimizations
● Comparison with analytical models
● Study of parallel performance characteristics
● DSL extensions for relevant concepts
● More complex numeric components
● (Multi-) GPU support
32
Thank you for yourAttention!
Questions?