Virtual Process Engineering - NVIDIA · 2013. 8. 23. · TFM Multi-scale approach to process...

21
Virtual Process Engineering Realtime Simulation of Multiphase Systems Virtual Process Engineering Realtime Simulation of Multiphase Systems Wei Ge Institute of Process Engineering Chinese Academy of Sciences GTC2012 May 16, San Jose Background Background

Transcript of Virtual Process Engineering - NVIDIA · 2013. 8. 23. · TFM Multi-scale approach to process...

  • Virtual Process Engineering‐ Realtime Simulation of Multiphase Systems

    Virtual Process Engineering‐ Realtime Simulation of Multiphase Systems

    Wei GeInstitute of Process EngineeringChinese Academy of Sciences

    GTC2012 May 16, San Jose

    BackgroundBackground

  • Angstrom nm μm mm m km Mmfs ps μs ms s hour year

    DEM

    Agent

    molecule cluster particle particle-cluster reactor plant environment

    0θ0θ

    MD

    TFM

    Multi-scale approach to process engineering

    MC / DPD

    PBM

    Flow sheet

    Toooo costly

    No matured theory

    3Li, Ge, Wang, Yang, 2010, Particuology 8:634-639

    Multi-scale simulation

    micro

    macro

    meso 10X

    10?X

    closer tosteady state 2-5X

    VScoarser

    grids

    More parallelthreads

  • CPU (multi-core)

    Macro

    GPU(many-core)

    Long-rangecorrelation:continuum

    ParameterExchanges

    LocalInteraction:

    particles

    Meso

    Micro:

    Consistency: Physics, Algorithm, Architecture

    Switch

    5

    Rpeak SP:Rpeak DP:Linpack:Mflops/Watt:Memory:Storage:Data Comm.:Inst. Comm.:Occupied area:Weight:Max Power:System:Monitor:Languages:

    2.26Petaflops1.13Petaflops496.5Tflops (21th, Top500)963.7 (9th, Green500)17.2TB(RAM), 6.6TB(VRAM)76TB(Nastron)+320TB(HP)Mellanox QDR InfiniBandH3C Gigabit Ethernet150m2 (with internal cooling)12.6T (with internal cooling)600kW+200kW(cooling)CentOS 5.4, PBSGanglia, GPU monitorC,C++,CUDA ,OpenCL

    Mole-8.5 (2010)

    Photo by Xianfeng He, 20106

  • L3(A001-360)

    9 4 10 360 :A× × =

    Display array L2(V01-18)

    66 3 18 : 2 2V G C× = +

    L1(H1-4)

    61: 2 2H G C+

    62 : 2 2H G C+

    43 : 2H C

    Mole-8.5: system architecture

    7

    Tylersburg36D

    PEX8647

    GPU1

    GPU2

    PEX8647GPU3

    IB

    Tylersburg36D

    PEX8647

    PEX8647

    GPU1

    GPU2

    GPU3

    CPU0 CPU1DDR3 Mem*3

    DDR3 Mem*3

    DDR3 Mem*3

    DDR3 Mem*3

    DDR3 Mem*3

    DDR3 Mem*3

    Node layout of Mole-8.5

    Bottleneck:DeMem PCIE IB

    6xC2050 (Fermi)QDR IBTyan S7015

    HDMem

    2xE5520/70Fan

    8

  • Effect of the CPU-GPU ratio on performance

    Medium-size applications Large-size applications

    Linpack DEM30K DEM80K DEM500K

    2C+1G

    2C+2G

    2C+6G

    2C+2G

    2C+1G

    2C+6G

    2C+2G

    2C+1G

    2C+6G

    2C+2G

    2C+1G

    2C+6G

    2C+2G

    2C+1G

    BestP/P ratio

    BestP/P ratio

    2C+6G

    Perform

    ance

    Perform

    ance

    LBM20482

    9

    Simulation ofgas-solid flowSimulation ofgas-solid flow

  • Typical Gas-Solid Flowin Chemical Engineering

    High concentration~ 5-40% v/v

    High density ratio~ 1000

    High heterogeneity~0 vs ~close packing

    Sim.: Xu et al., 2010Exp.: Liu et al., 2010 11

    Solid Phase: Discrete Particle Methods

    Micro-scale: fluctuating, conservativeMD, DSMC, LGA, PPM, …

    Meso-scale: fluctuating, dissipativeDPD, FPM, DSPH, LBM, …

    Macro-scale: smooth, dissipativeSPH, MPS, DEM, MaPPM, …

    12

  • aii i

    iai

    ai

    iaa W

    mrfDf ∑ ρ=∇ r2|

    aii i

    i

    ai

    iaa W

    mr

    D∑ ρ=Δ 22|ff

    W(r)

    a

    i

    Virtue of particle methods

    Ge & Li, Chin. Sci. Bull., 46:1503, 2001; Ge & Li, Powder Tech., 137:99, 2003

    Locality&

    Additivity

    13

    Ge et al., Chin. Sci. Bull., 47:1172, 2002;

    Tang, 2005 Doctor Thesis

    General platform for discrete simulation

    ………PPM DPDSPH

    Com

    putation and Com

    munication

    DEM MaPPM

    Particle

    Potential Data

    Structure

    Link Cell+Neighbor List

    Space Decomposion

    Dynamic Load Balance

    Communication Scheme

    Organizer

    Communicator

    Assistant

    Algorithm

    …………

    Boundary

    MPI STL LokiAUTOCAD

    CAD Drawing Conversion

    Boundary Disposal

    Particle Generation

    Uniform Domain

    Uniform Load

    Preprocess

    MD Particle Method

    Data Partition

    Configuration

    14

  • Rotating drum: 9.6M solids, 270GPUs, 13.5*1.5m, 9.6M solids, 270GPUs, 13.5*1.5m, realtime (now)

    Xu et al., 2011, Particuology 9:446-50

    10^7 psg (particle-steps/sec./GPU) in 3D,targeting at 10^8 psg with 3D DLB

    15

    Rotating drum simulation with rotational friction

    * Yang et al, 2003. Powder Tech. 130:138

    Speed evaluation

    Particle number

    PSG

    Single card 12450 4.611e+7Six cards on one node

    149400 2.306e+7

    Accuracy evaluation

    Angle of repose

    Coordination number

    Literature * 31 deg. 4.2This work 30 deg. 3.95

    16

  • Gas Phase: Implicit vs Explicit

    Implicit (PISO, SIMPLE, …) gird>particlebetter stability, longer time stepglobal dependence, poor parallelism

    domain decomposition?

    Explicit (MAC, PIC, LBM, …) grid

  • Outline of the Approach

    shared memory multi-core

    list & arithmetic operations

    ODE integration

    particles (Newton)

    solid phase

    linked many-core

    Regular, explicit & local lattice operations

    Lattice Boltzmann(fine grid)

    continuum (Boltzmann)

    gas phase

    Hardware architecture

    Software algorithm

    Numerical method

    Physical model

    Simulated system

    19

    Gas-solid coupling

    ( ) ( ) ( )( ) ( ) ( )( ) ( ), , 1 , , , ,eq si i s i i s itf t t t f t f t f tβ ε τ β ε τ

    τΔ+ Δ + Δ = − − − + Ωix e x x x

    ( )( )

    ( 0 .5 ),

    1 ( 0 .5 )s

    ss

    t

    t

    ε τβ ε τ

    ε τ

    Δ −=

    − + Δ −

    ( ) ( ) ( ) ( ), , , ,s eq eqi i i i s if t f t f fρ ρ− −Ω = − + −x x V U

    Noble D R, Torczynski J R. Int. J. Mod. Phys. C, 1998. 9:1189-120

    8

    1

    sf n i i

    n i

    F C h eβ=

    ⎛ ⎞⎟⎜ ⎟= Ω⎜ ⎟⎜ ⎟⎜⎝ ⎠∑ ∑

    ( )8

    1

    sf n c n i i

    n i

    T C h x x eβ=

    ⎛ ⎞⎟⎜ ⎟= − × Ω⎜ ⎟⎜ ⎟⎜⎝ ⎠∑ ∑

    Immersed moving boundary condition:

    Force of Fluid on particle:

    Fluid-induced torque:

    20

  • Initial Position

    Drafting

    Kissing

    Tumbling

    Drafting-Kissing-Tumbling

    Benchmark for DNS in LBMBenchmark for DNS in LBM

    Wang et al., 2010. Particuology 8(4):379-382

    21

    1M solid particles & 1G fluid particles @ 576 GPUs

    Direct Numerical Simulation (DNS) with Lattice Boltzmann Method (LBM)

    display resolution1920x480

    image resolution5898x1476

    computational resolution

    61440x15360

    Xiong et al., 2012, Chem. Eng. Sci., 67:422-430 22

  • 100K solid particles in 3D

    ICT-IPE visualization team against the display wallSept. 18, 2010, photo by Xiaowei Wang

    Xiong et al., 2010

    23

    Scale-independent

    region

    Intrinsic constitutive laws

    Necessity for large scale simulation

    24

  • New constitutive laws for continuum models

    Ma et al., 2007,Chem. Eng. Sci.,

    61:6878

    Xiong et al., 2011,Chem. Eng. Sci.,

    67:422-430

    Drag force evolution

    Slip velocity evolution

    25

    413.5

    375.2

    366.7

    362.9

    355.4

    MLUPS(single precision)

    31.5

    29.6

    29.1

    27.9

    27.1

    Speedup

    209.71.05633.3128×256×128

    190.02.04360.4128×128×128

    186.758.167237.564×128×64

    180.316.44458.664×64×64

    175.265.711784.132×64×32

    MLUPS(double precision)

    Steps/second (Intel E5520)*

    Steps/second (Fermi GPU)

    Domain size(W×H×L)

    Performance of GS-DNS LBM

    Classical D3Q19 BGK model, higher efficiency expected with MRT and LES

    * Compared with serial execution on one core of the CPU Wang et al., 2010

    26

  • 2nd approach

    Discrete Particle SimulationCoarse Gas Grid + Individual Solids

    Outline of the Approach

    linked many-core

    list & arithmetic operations

    ODE integration

    particles (Newton)

    solid phase

    shared memory multi-core

    sparse matrix operations

    PDE solver(Simple)

    continuum (N-S)

    gas phase

    Hardware architecture

    Software algorithm

    Numerical method

    Physical model

    Simulated system

    28

  • Flow distribution: traditional approachFlow distribution: traditional approach

    ““BiBi--LinearLinear””

    InterpolationInterpolation

    Needs for a sub-grid-scale flow distribution method29

    XuXu, Ge & Li, 2007, , Ge & Li, 2007, Chem. Eng. Sci. Chem. Eng. Sci.

    62: 230262: 2302

    GradientGradient--based flow distributionbased flow distribution

    30

  • 800K solids in a lab-scale fluidized bed

    particle diameter: 80 micronsbed diameter: 100mm, bed height: 600mm

    1 CPU (for gas) + 12 GPUs (for solids)3x10^6 psg, targeting at 10^7 psg with DLB

    31Xu et al., 2011

    Xu et al.,2011 Chen et al.,

    2011

    32

  • Towards exaflops & realtimeTowards exaflops & realtime

    Molecular dynamics simulation of Swine flu

    100 nm

    H1N1

    34

    300M atom/radicals, 0.77ns/day, 10ns, 1728GPUsXu et al., 2011, Chin. Sci. Bull. 56(20):2114-8

  • Whole-system simulation using 7168 GPUs & 86106 CPU cores

    GPURegular bulkFixed neighbors1.87Pflops SP

    Simplified computational model52nm×54nm×0.78 mm

    110.1 billion atomsCPUIrregular surfaceFlexible neighbors165Tflops DP

    Hou, Xu, Ge et al., 2012, Int. J. HPC, in revision

    Lab-preparedSilicon nanowires

    D~nm, L~mm

    92T DP+

    1.13P SP

    Applications:material properties & their scale effect, effect of defects & dopants, …

    Performance on Tianhe-1A

    35

    Comp./Phys. now 3s 5s 300 2000 2000 1ns/dayComp./Phys. exp. 1s 2s

  • CFD(DNS)

    MDreaction-diffusion

    MDclusters

    QM(DFT)

    linear algebra linear algebradiscrete element

    General-purpose multi-cores

    Special many-cores

    Long range Short rangeinternal interfacial

    Software-hardware co-design

    Physics

    Model

    Numericalmethod

    Hardware

    37

    Challenges and opportunities

    Scalability: how to organize one billion cores?Structural similarity: multi-scale for

    hard/software, model and physics

    Reliability: how long can we run the whole system?Reasonable redundancy: chip-node-system-software-application

    Affordability: exascale or expenscale? Energy efficiency: more concurrency, less idle current

    special hardware for general software

    38

  • Prospect:realtime simulation, on-line optimizationindustrial production

    virtual process engineering

    39

    new process

    ?

    40

    First demonstration under construction

    Ge et al., 2011, Chem. Eng. Sci. 66: 4426-5840

  • Ge et al., 2011,Chem. Eng. Sci.66: 4426-58

    This work is supported by:NSFC MOF MOST CASSinoPec PetroChina BaoSteel YeTechBHPb Alstom nVidia GE Unilever Shell Total… … …

    emms.mpcs.cn,www.multiscalesci.org

    GroupEx2010, Shanxi, Photo by Xianfeng He

    Thanks!