Virtual Process Engineering - NVIDIAon-demand.gputechconf.com/gtc/2012/presentations/S... ·...
Transcript of Virtual Process Engineering - NVIDIAon-demand.gputechconf.com/gtc/2012/presentations/S... ·...
Virtual Process Engineering‐ Realtime Simulation of Multiphase Systems
Virtual Process Engineering‐ Realtime Simulation of Multiphase Systems
Wei GeInstitute of Process EngineeringChinese Academy of Sciences
GTC2012 May 16, San Jose
BackgroundBackground
Angstrom nm μm mm m km Mmfs ps μs ms s hour year
DEM
Agent
molecule cluster particle particle-cluster reactor plant environment
0θ0θ
MD
TFM
Multi-scale approach to process engineering
MC / DPD
PBM
Flow sheet
Toooo costly
No matured theory
3Li, Ge, Wang, Yang, 2010, Particuology 8:634-639
Multi-scale simulation
micro
macro
meso 10X
10?X
closer tosteady state 2-5X
VScoarser
grids
More parallelthreads
CPU (multi-core)
Macro
GPU(many-core)
Long-rangecorrelation:continuum
ParameterExchanges
LocalInteraction:
particles
Meso
Micro:
Consistency: Physics, Algorithm, Architecture
…
Switch
5
Rpeak SP:Rpeak DP:Linpack:Mflops/Watt:Memory:Storage:Data Comm.:Inst. Comm.:Occupied area:Weight:Max Power:System:
Monitor:Languages:
2.26Petaflops1.13Petaflops496.5Tflops (21th, Top500)963.7 (9th, Green500)17.2TB(RAM), 6.6TB(VRAM)76TB(Nastron)+320TB(HP)Mellanox QDR InfiniBandH3C Gigabit Ethernet150m2 (with internal cooling)12.6T (with internal cooling)600kW+200kW(cooling)CentOS 5.4, PBSGanglia, GPU monitorC,C++,CUDA ,OpenCL
Mole-8.5 (2010)
Photo by Xianfeng He, 20106
L3(A001-360)
9 4 10 360 :A× × =
Display array L2(V01-18)
66 3 18 : 2 2V G C× = +
L1(H1-4)
61: 2 2H G C+
62 : 2 2H G C+
43 : 2H C
Mole-8.5: system architecture
7
Tylersburg36D
PEX8647
GPU1
GPU2
PEX8647GPU3
IB
Tylersburg36D
PEX8647
PEX8647
GPU1
GPU2
GPU3
CPU0 CPU1DDR3 Mem*3
DDR3 Mem*3
DDR3 Mem*3
DDR3 Mem*3
DDR3 Mem*3
DDR3 Mem*3
Node layout of Mole-8.5
Bottleneck:DeMem PCIE IB
6xC2050 (Fermi)QDR IBTyan S7015
HDMem
2xE5520/70Fan
8
Effect of the CPU-GPU ratio on performance
Medium-size applications Large-size applications
Linpack DEM30K DEM80K DEM500K
2C+1G
2C+2G
2C+6G
2C+2G
2C+1G
2C+6G
2C+2G
2C+1G
2C+6G
2C+2G
2C+1G
2C+6G
2C+2G
2C+1G
BestP/P ratio
BestP/P ratio
2C+6G
Perform
ance
Perform
ance
LBM20482
9
Simulation ofgas-solid flowSimulation ofgas-solid flow
Typical Gas-Solid Flowin Chemical Engineering
High concentration~ 5-40% v/v
High density ratio~ 1000
High heterogeneity~0 vs ~close packing
Sim.: Xu et al., 2010Exp.: Liu et al., 2010 11
Solid Phase: Discrete Particle Methods
Micro-scale: fluctuating, conservativeMD, DSMC, LGA, PPM, …
Meso-scale: fluctuating, dissipativeDPD, FPM, DSPH, LBM, …
Macro-scale: smooth, dissipativeSPH, MPS, DEM, MaPPM, …
12
aii i
iai
ai
iaa Wm
rfDf ∑ ρ
=∇ r2|
aii i
i
ai
iaa Wm
rD∑ ρ
=Δ 22| ff
W(r)
a
i
Virtue of particle methods
Ge & Li, Chin. Sci. Bull., 46:1503, 2001; Ge & Li, Powder Tech., 137:99, 2003
Locality&
Additivity
13
Ge et al., Chin. Sci. Bull., 47:1172, 2002;
Tang, 2005 Doctor Thesis
General platform for discrete simulation
………PPM DPDSPH
Com
putation and Com
munication
DEM MaPPM
Particle
Potential Data
Structure
Link Cell+Neighbor List
Space Decomposion
Dynamic Load Balance
Communication Scheme
Organizer
Communicator
Assistant
Algorithm
…………
Boundary
MPI STL LokiAUTOCAD
CAD Drawing Conversion
Boundary Disposal
Particle Generation
Uniform Domain
Uniform Load
Preprocess
MD Particle Method
Data Partition
Configuration
14
Rotating drum: 9.6M solids, 270GPUs, 13.5*1.5m, 9.6M solids, 270GPUs, 13.5*1.5m, realtime (now)
Xu et al., 2011, Particuology 9:446-50
10^7 psg (particle-steps/sec./GPU) in 3D,targeting at 10^8 psg with 3D DLB
15
Rotating drum simulation with rotational friction
* Yang et al, 2003. Powder Tech. 130:138
Speed evaluation
Particle number
PSG
Single card 12450 4.611e+7Six cards on one node
149400 2.306e+7
Accuracy evaluation
Angle of repose
Coordination number
Literature * 31 deg. 4.2This work 30 deg. 3.95
16
Gas Phase: Implicit vs Explicit
Implicit (PISO, SIMPLE, …) gird>particlebetter stability, longer time stepglobal dependence, poor parallelism
domain decomposition?
Explicit (MAC, PIC, LBM, …) grid<particlelocal dependence, excellent parallelismfiner time step and smaller grid sizemore suitable for structured grid
17
1st approach
Direct Numerical SimulationFine Gas Grid + Individual Solids
Outline of the Approach
shared memory multi-core
list & arithmetic operations
ODE integration
particles (Newton)
solid phase
linked many-core
Regular, explicit & local lattice operations
Lattice Boltzmann(fine grid)
continuum (Boltzmann)
gas phase
Hardware architecture
Software algorithm
Numerical method
Physical model
Simulated system
19
Gas-solid coupling
( ) ( ) ( )( ) ( ) ( )( ) ( ), , 1 , , , ,eq si i s i i s i
tf t t t f t f t f tβ ε τ β ε ττΔ+ Δ + Δ = − − − + Ωix e x x x
( )( )
( 0 .5 ),
1 ( 0 .5 )s
ss
t
t
ε τβ ε τ
ε τ
Δ −=
− + Δ −
( ) ( ) ( ) ( ), , , ,s eq eqi i i i s if t f t f fρ ρ− −Ω = − + −x x V U
Noble D R, Torczynski J R. Int. J. Mod. Phys. C, 1998. 9:1189-120
8
1
sf n i i
n i
F C h eβ=
⎛ ⎞⎟⎜ ⎟= Ω⎜ ⎟⎜ ⎟⎜⎝ ⎠∑ ∑
( )8
1
sf n c n i i
n i
T C h x x eβ=
⎛ ⎞⎟⎜ ⎟= − × Ω⎜ ⎟⎜ ⎟⎜⎝ ⎠∑ ∑
Immersed moving boundary condition:
Force of Fluid on particle:
Fluid-induced torque:
20
Initial Position
Drafting
Kissing
Tumbling
Drafting-Kissing-Tumbling
Benchmark for DNS in LBMBenchmark for DNS in LBM
Wang et al., 2010. Particuology 8(4):379-382
21
1M solid particles & 1G fluid particles @ 576 GPUs
Direct Numerical Simulation (DNS) with Lattice Boltzmann Method (LBM)
display resolution1920x480
image resolution5898x1476
computational resolution
61440x15360
Xiong et al., 2012, Chem. Eng. Sci., 67:422-430 22
100K solid particles in 3D
ICT-IPE visualization team against the display wallSept. 18, 2010, photo by Xiaowei Wang
Xiong et al., 2010
23
Scale-independent
region
Intrinsic constitutive laws
Necessity for large scale simulation
24
New constitutive laws for continuum models
Ma et al., 2007,Chem. Eng. Sci.,
61:6878
Xiong et al., 2011,Chem. Eng. Sci.,
67:422-430
Drag force evolution
Slip velocity evolution
25
413.5
375.2
366.7
362.9
355.4
MLUPS(single precision)
31.5
29.6
29.1
27.9
27.1
Speedup
209.71.05633.3128×256×128
190.02.04360.4128×128×128
186.758.167237.564×128×64
180.316.44458.664×64×64
175.265.711784.132×64×32
MLUPS(double precision)
Steps/second (Intel E5520)*
Steps/second (Fermi GPU)
Domain size(W×H×L)
Performance of GS-DNS LBM
Classical D3Q19 BGK model, higher efficiency expected with MRT and LES
* Compared with serial execution on one core of the CPU Wang et al., 2010
26
2nd approach
Discrete Particle SimulationCoarse Gas Grid + Individual Solids
Outline of the Approach
linked many-core
list & arithmetic operations
ODE integration
particles (Newton)
solid phase
shared memory multi-core
sparse matrix operations
PDE solver(Simple)
continuum (N-S)
gas phase
Hardware architecture
Software algorithm
Numerical method
Physical model
Simulated system
28
Flow distribution: traditional approachFlow distribution: traditional approach
““BiBi--LinearLinear””
InterpolationInterpolation
Needs for a sub-grid-scale flow distribution method29
XuXu, Ge & Li, 2007, , Ge & Li, 2007, Chem. Eng. Sci. Chem. Eng. Sci.
62: 230262: 2302
GradientGradient--based flow distributionbased flow distribution
30
800K solids in a lab-scale fluidized bed
particle diameter: 80 micronsbed diameter: 100mm, bed height: 600mm
1 CPU (for gas) + 12 GPUs (for solids)3x10^6 psg, targeting at 10^7 psg with DLB
31Xu et al., 2011
Xu et al.,2011 Chen et al.,
2011
32
Towards exaflops & realtimeTowards exaflops & realtime
Molecular dynamics simulation of Swine flu
100 nm
H1N1
34
300M atom/radicals, 0.77ns/day, 10ns, 1728GPUsXu et al., 2011, Chin. Sci. Bull. 56(20):2114-8
Whole-system simulation using 7168 GPUs & 86106 CPU cores
GPURegular bulkFixed neighbors1.87Pflops SP
Simplified computational model52nm×54nm×0.78 mm
110.1 billion atomsCPUIrregular surfaceFlexible neighbors165Tflops DP
Hou, Xu, Ge et al., 2012, Int. J. HPC, in revision
Lab-preparedSilicon nanowires
D~nm, L~mm
92T DP+
1.13P SP
Applications:material properties & their scale effect, effect of defects & dopants, …
Performance on Tianhe-1A
35
Comp./Phys. now 3s 5s 300 2000 2000 1ns/dayComp./Phys. exp. 1s 2s <50 <200 <1000 >1ns/h
Multi-scale simulation of fluidization
ReactorGlobal
distributionLocal
distributionLocal
evolutionDetails
evolutionParticle
evolutionDiffusion
& reaction
36
CFD(DNS)
MDreaction-diffusion
MDclusters
QM(DFT)
linear algebra linear algebradiscrete element
General-purpose multi-cores
Special many-cores
Long range Short rangeinternal interfacial
Software-hardware co-design
Physics
Model
Numericalmethod
Hardware
37
Challenges and opportunities
Scalability: how to organize one billion cores?Structural similarity: multi-scale for
hard/software, model and physics
Reliability: how long can we run the whole system?Reasonable redundancy: chip-node-system-software-application
Affordability: exascale or expenscale? Energy efficiency: more concurrency, less idle current
special hardware for general software
38
Prospect:realtime simulation, on-line optimizationindustrial production
virtual process engineering
39
new process
?
40
First demonstration under construction
Ge et al., 2011, Chem. Eng. Sci. 66: 4426-5840