Virtual Process Engineering - NVIDIAon-demand.gputechconf.com/gtc/2012/presentations/S... ·...

21
Virtual Process Engineering Realtime Simulation of Multiphase Systems Virtual Process Engineering Realtime Simulation of Multiphase Systems Wei Ge Institute of Process Engineering Chinese Academy of Sciences GTC2012 May 16, San Jose Background Background

Transcript of Virtual Process Engineering - NVIDIAon-demand.gputechconf.com/gtc/2012/presentations/S... ·...

Virtual Process Engineering‐ Realtime Simulation of Multiphase Systems

Virtual Process Engineering‐ Realtime Simulation of Multiphase Systems

Wei GeInstitute of Process EngineeringChinese Academy of Sciences

GTC2012 May 16, San Jose

BackgroundBackground

Angstrom nm μm mm m km Mmfs ps μs ms s hour year

DEM

Agent

molecule cluster particle particle-cluster reactor plant environment

0θ0θ

MD

TFM

Multi-scale approach to process engineering

MC / DPD

PBM

Flow sheet

Toooo costly

No matured theory

3Li, Ge, Wang, Yang, 2010, Particuology 8:634-639

Multi-scale simulation

micro

macro

meso 10X

10?X

closer tosteady state 2-5X

VScoarser

grids

More parallelthreads

CPU (multi-core)

Macro

GPU(many-core)

Long-rangecorrelation:continuum

ParameterExchanges

LocalInteraction:

particles

Meso

Micro:

Consistency: Physics, Algorithm, Architecture

Switch

5

Rpeak SP:Rpeak DP:Linpack:Mflops/Watt:Memory:Storage:Data Comm.:Inst. Comm.:Occupied area:Weight:Max Power:System:

Monitor:Languages:

2.26Petaflops1.13Petaflops496.5Tflops (21th, Top500)963.7 (9th, Green500)17.2TB(RAM), 6.6TB(VRAM)76TB(Nastron)+320TB(HP)Mellanox QDR InfiniBandH3C Gigabit Ethernet150m2 (with internal cooling)12.6T (with internal cooling)600kW+200kW(cooling)CentOS 5.4, PBSGanglia, GPU monitorC,C++,CUDA ,OpenCL

Mole-8.5 (2010)

Photo by Xianfeng He, 20106

L3(A001-360)

9 4 10 360 :A× × =

Display array L2(V01-18)

66 3 18 : 2 2V G C× = +

L1(H1-4)

61: 2 2H G C+

62 : 2 2H G C+

43 : 2H C

Mole-8.5: system architecture

7

Tylersburg36D

PEX8647

GPU1

GPU2

PEX8647GPU3

IB

Tylersburg36D

PEX8647

PEX8647

GPU1

GPU2

GPU3

CPU0 CPU1DDR3 Mem*3

DDR3 Mem*3

DDR3 Mem*3

DDR3 Mem*3

DDR3 Mem*3

DDR3 Mem*3

Node layout of Mole-8.5

Bottleneck:DeMem PCIE IB

6xC2050 (Fermi)QDR IBTyan S7015

HDMem

2xE5520/70Fan

8

Effect of the CPU-GPU ratio on performance

Medium-size applications Large-size applications

Linpack DEM30K DEM80K DEM500K

2C+1G

2C+2G

2C+6G

2C+2G

2C+1G

2C+6G

2C+2G

2C+1G

2C+6G

2C+2G

2C+1G

2C+6G

2C+2G

2C+1G

BestP/P ratio

BestP/P ratio

2C+6G

Perform

ance

Perform

ance

LBM20482

9

Simulation ofgas-solid flowSimulation ofgas-solid flow

Typical Gas-Solid Flowin Chemical Engineering

High concentration~ 5-40% v/v

High density ratio~ 1000

High heterogeneity~0 vs ~close packing

Sim.: Xu et al., 2010Exp.: Liu et al., 2010 11

Solid Phase: Discrete Particle Methods

Micro-scale: fluctuating, conservativeMD, DSMC, LGA, PPM, …

Meso-scale: fluctuating, dissipativeDPD, FPM, DSPH, LBM, …

Macro-scale: smooth, dissipativeSPH, MPS, DEM, MaPPM, …

12

aii i

iai

ai

iaa Wm

rfDf ∑ ρ

=∇ r2|

aii i

i

ai

iaa Wm

rD∑ ρ

=Δ 22| ff

W(r)

a

i

Virtue of particle methods

Ge & Li, Chin. Sci. Bull., 46:1503, 2001; Ge & Li, Powder Tech., 137:99, 2003

Locality&

Additivity

13

Ge et al., Chin. Sci. Bull., 47:1172, 2002;

Tang, 2005 Doctor Thesis

General platform for discrete simulation

………PPM DPDSPH

Com

putation and Com

munication

DEM MaPPM

Particle

Potential Data

Structure

Link Cell+Neighbor List

Space Decomposion

Dynamic Load Balance

Communication Scheme

Organizer

Communicator

Assistant

Algorithm

…………

Boundary

MPI STL LokiAUTOCAD

CAD Drawing Conversion

Boundary Disposal

Particle Generation

Uniform Domain

Uniform Load

Preprocess

MD Particle Method

Data Partition

Configuration

14

Rotating drum: 9.6M solids, 270GPUs, 13.5*1.5m, 9.6M solids, 270GPUs, 13.5*1.5m, realtime (now)

Xu et al., 2011, Particuology 9:446-50

10^7 psg (particle-steps/sec./GPU) in 3D,targeting at 10^8 psg with 3D DLB

15

Rotating drum simulation with rotational friction

* Yang et al, 2003. Powder Tech. 130:138

Speed evaluation

Particle number

PSG

Single card 12450 4.611e+7Six cards on one node

149400 2.306e+7

Accuracy evaluation

Angle of repose

Coordination number

Literature * 31 deg. 4.2This work 30 deg. 3.95

16

Gas Phase: Implicit vs Explicit

Implicit (PISO, SIMPLE, …) gird>particlebetter stability, longer time stepglobal dependence, poor parallelism

domain decomposition?

Explicit (MAC, PIC, LBM, …) grid<particlelocal dependence, excellent parallelismfiner time step and smaller grid sizemore suitable for structured grid

17

1st approach

Direct Numerical SimulationFine Gas Grid + Individual Solids

Outline of the Approach

shared memory multi-core

list & arithmetic operations

ODE integration

particles (Newton)

solid phase

linked many-core

Regular, explicit & local lattice operations

Lattice Boltzmann(fine grid)

continuum (Boltzmann)

gas phase

Hardware architecture

Software algorithm

Numerical method

Physical model

Simulated system

19

Gas-solid coupling

( ) ( ) ( )( ) ( ) ( )( ) ( ), , 1 , , , ,eq si i s i i s i

tf t t t f t f t f tβ ε τ β ε ττΔ+ Δ + Δ = − − − + Ωix e x x x

( )( )

( 0 .5 ),

1 ( 0 .5 )s

ss

t

t

ε τβ ε τ

ε τ

Δ −=

− + Δ −

( ) ( ) ( ) ( ), , , ,s eq eqi i i i s if t f t f fρ ρ− −Ω = − + −x x V U

Noble D R, Torczynski J R. Int. J. Mod. Phys. C, 1998. 9:1189-120

8

1

sf n i i

n i

F C h eβ=

⎛ ⎞⎟⎜ ⎟= Ω⎜ ⎟⎜ ⎟⎜⎝ ⎠∑ ∑

( )8

1

sf n c n i i

n i

T C h x x eβ=

⎛ ⎞⎟⎜ ⎟= − × Ω⎜ ⎟⎜ ⎟⎜⎝ ⎠∑ ∑

Immersed moving boundary condition:

Force of Fluid on particle:

Fluid-induced torque:

20

Initial Position

Drafting

Kissing

Tumbling

Drafting-Kissing-Tumbling

Benchmark for DNS in LBMBenchmark for DNS in LBM

Wang et al., 2010. Particuology 8(4):379-382

21

1M solid particles & 1G fluid particles @ 576 GPUs

Direct Numerical Simulation (DNS) with Lattice Boltzmann Method (LBM)

display resolution1920x480

image resolution5898x1476

computational resolution

61440x15360

Xiong et al., 2012, Chem. Eng. Sci., 67:422-430 22

100K solid particles in 3D

ICT-IPE visualization team against the display wallSept. 18, 2010, photo by Xiaowei Wang

Xiong et al., 2010

23

Scale-independent

region

Intrinsic constitutive laws

Necessity for large scale simulation

24

New constitutive laws for continuum models

Ma et al., 2007,Chem. Eng. Sci.,

61:6878

Xiong et al., 2011,Chem. Eng. Sci.,

67:422-430

Drag force evolution

Slip velocity evolution

25

413.5

375.2

366.7

362.9

355.4

MLUPS(single precision)

31.5

29.6

29.1

27.9

27.1

Speedup

209.71.05633.3128×256×128

190.02.04360.4128×128×128

186.758.167237.564×128×64

180.316.44458.664×64×64

175.265.711784.132×64×32

MLUPS(double precision)

Steps/second (Intel E5520)*

Steps/second (Fermi GPU)

Domain size(W×H×L)

Performance of GS-DNS LBM

Classical D3Q19 BGK model, higher efficiency expected with MRT and LES

* Compared with serial execution on one core of the CPU Wang et al., 2010

26

2nd approach

Discrete Particle SimulationCoarse Gas Grid + Individual Solids

Outline of the Approach

linked many-core

list & arithmetic operations

ODE integration

particles (Newton)

solid phase

shared memory multi-core

sparse matrix operations

PDE solver(Simple)

continuum (N-S)

gas phase

Hardware architecture

Software algorithm

Numerical method

Physical model

Simulated system

28

Flow distribution: traditional approachFlow distribution: traditional approach

““BiBi--LinearLinear””

InterpolationInterpolation

Needs for a sub-grid-scale flow distribution method29

XuXu, Ge & Li, 2007, , Ge & Li, 2007, Chem. Eng. Sci. Chem. Eng. Sci.

62: 230262: 2302

GradientGradient--based flow distributionbased flow distribution

30

800K solids in a lab-scale fluidized bed

particle diameter: 80 micronsbed diameter: 100mm, bed height: 600mm

1 CPU (for gas) + 12 GPUs (for solids)3x10^6 psg, targeting at 10^7 psg with DLB

31Xu et al., 2011

Xu et al.,2011 Chen et al.,

2011

32

Towards exaflops & realtimeTowards exaflops & realtime

Molecular dynamics simulation of Swine flu

100 nm

H1N1

34

300M atom/radicals, 0.77ns/day, 10ns, 1728GPUsXu et al., 2011, Chin. Sci. Bull. 56(20):2114-8

Whole-system simulation using 7168 GPUs & 86106 CPU cores

GPURegular bulkFixed neighbors1.87Pflops SP

Simplified computational model52nm×54nm×0.78 mm

110.1 billion atomsCPUIrregular surfaceFlexible neighbors165Tflops DP

Hou, Xu, Ge et al., 2012, Int. J. HPC, in revision

Lab-preparedSilicon nanowires

D~nm, L~mm

92T DP+

1.13P SP

Applications:material properties & their scale effect, effect of defects & dopants, …

Performance on Tianhe-1A

35

Comp./Phys. now 3s 5s 300 2000 2000 1ns/dayComp./Phys. exp. 1s 2s <50 <200 <1000 >1ns/h

Multi-scale simulation of fluidization

ReactorGlobal

distributionLocal

distributionLocal

evolutionDetails

evolutionParticle

evolutionDiffusion

& reaction

36

CFD(DNS)

MDreaction-diffusion

MDclusters

QM(DFT)

linear algebra linear algebradiscrete element

General-purpose multi-cores

Special many-cores

Long range Short rangeinternal interfacial

Software-hardware co-design

Physics

Model

Numericalmethod

Hardware

37

Challenges and opportunities

Scalability: how to organize one billion cores?Structural similarity: multi-scale for

hard/software, model and physics

Reliability: how long can we run the whole system?Reasonable redundancy: chip-node-system-software-application

Affordability: exascale or expenscale? Energy efficiency: more concurrency, less idle current

special hardware for general software

38

Prospect:realtime simulation, on-line optimizationindustrial production

virtual process engineering

39

new process

?

40

First demonstration under construction

Ge et al., 2011, Chem. Eng. Sci. 66: 4426-5840

Ge et al., 2011,Chem. Eng. Sci.66: 4426-58

This work is supported by:NSFC MOF MOST CASSinoPec PetroChina BaoSteel YeTechBHPb Alstom nVidia GE Unilever Shell Total… … …

emms.mpcs.cn,www.multiscalesci.org

GroupEx2010, Shanxi, Photo by Xianfeng He

Thanks!