WaLBerla::Core: An Overview

22
WaLBerla::Core: An Overview C. Feichtinger Chair for System Simulation, University of Erlangen-Nuremberg, Erlangen, Germany RRZE Seminar 22.6.10 C. Feichtinger Core 0.2

Transcript of WaLBerla::Core: An Overview

Page 1: WaLBerla::Core: An Overview

WaLBerla::Core:

An Overview

C. Feichtinger

Chair for System Simulation, University of Erlangen-Nuremberg, Erlangen, Germany

RRZE Seminar

22.6.10

C. Feichtinger Core 0.2

Page 2: WaLBerla::Core: An Overview

Core 0.2: Implementation details

Outline

Introduction to LBM

Introduction to WaLBerla

Software DesignsSweepsFunctionality ManagementPatches and BlocksParallelization

GPU Performance Study

Implementation Details

C. Feichtinger Core 0.2

Page 3: WaLBerla::Core: An Overview

Lattice Boltzmann Method

Brief Introduction

Mesoscopic method for CFD simulations

Equivalent to a finite difference Navier-Stokes scheme

Two major steps: Stream step and collision step

fα(xi + eα,iδt, t + δt)− fα(xi , t) = −δt

τ

hfα(xi , t)− f

(eq)α (ρ(xi , t), ui (xi , t))

iρui =

18Xα=0

eα,i · fα ρ =18Xα=0

C. Feichtinger Core 0.2

Page 4: WaLBerla::Core: An Overview

The Software Framework WaLBerla

Widely Applicable Lattice Boltzmann Solver from Erlangen

Massively Parallel LB Framework

Designed tosupport a wide range of functionalities required by CFD applicationsminimize the integration effort of new functionality

C. Feichtinger Core 0.2

Page 5: WaLBerla::Core: An Overview

Software Design of WaLBerla::Core

New Design Objectives

Library

Organization of functionality

Heterogeneous computing

Dynamic load balancing

Grid refinement

New data structures

Optimized dynamic simulations

C. Feichtinger Core 0.2

Page 6: WaLBerla::Core: An Overview

Software Design of WaLBerla::Core

Sweeps: A Kernel Management Concept

Sweep Chain I

Sweep I

Sweep II

Sweep III

Sweep Chain II

Sweep I

Sweep II

Sweep

Preprocessing

Post-processing

CommunicationTiming

Visualization

Timing

Block Sweep

Global Sweep

Sweep Concept

: Iteration : Execution Order : Dependency

(Time loop)

C. Feichtinger Core 0.2

Page 7: WaLBerla::Core: An Overview

Software Design of WaLBerla::Core

Functionality Management

UID Name Granularity Example

fs Functionality Selector Simulation Gravity on/offhs Hardware Selector Process CPU and/or GPUbs Block Selector Block LBM

Examples

useFunction(LBMSweep_CPU, fsNoFeat, hsCPU, bsPureLBM);

useFunction(LBMSweep_GPU, fsNoFeat, hsGPU, bsPureLBM);

useFunction(LBMSweep_Grav, fsGravity, hsCPU, bsPureLBM);

useFunction(LBMSweep_FreeSurf_Grav, fsGravity, hsCPU, bsFreeSurface);

C. Feichtinger Core 0.2

Page 8: WaLBerla::Core: An Overview

Software Design of WaLBerla::Core

Patch Data Structure

C. Feichtinger Core 0.2

Page 9: WaLBerla::Core: An Overview

Software Design of WaLBerla::Core

Patch Data Structures

C. Feichtinger Core 0.2

Page 10: WaLBerla::Core: An Overview

Software Design of WaLBerla::Core

MPI Parallelization

C. Feichtinger Core 0.2

Page 11: WaLBerla::Core: An Overview

Software Design of WaLBerla::Core

Data: B = All Blocks allocated on the process

for block ∈ B do1

// Go over all neighboring Blocks

for nBlock ∈ N do2

if nBlock.isAllocated then // nBlock lies on current process3

for data ∈ D do4

sendData = extract(block.data, Direction To nBlock, fs, hs, bs);5

insert(nBlock.data, sendData, Direction To nBlock, fs ,hs, bs);6

end7

end8

//

else // nBlock lies on a different process9

for data ∈ D do10

sendData = extract(block.data, Direction To nBlock, fs, hs, bs);11

sendData.addHeader();12

insert(endBuffer[nBlock.rank], sendData, fs , hs, bs);13

end14

15

end16

end17

Algorithm 1: Data Extraction

C. Feichtinger Core 0.2

Page 12: WaLBerla::Core: An Overview

Software Design of WaLBerla::Core

Multi-GPU Implementation

C. Feichtinger Core 0.2

Page 13: WaLBerla::Core: An Overview

Software Design of WaLBerla::Core

Heterogeneous Multi-GPU Implementation

C. Feichtinger Core 0.2

Page 14: WaLBerla::Core: An Overview

LBM Performance Study

Multi-GPU Performance

C. Feichtinger Core 0.2

Page 15: WaLBerla::Core: An Overview

LBM Performance Study

Single-GPU Performance

C. Feichtinger Core 0.2

Page 16: WaLBerla::Core: An Overview

LBM Performance Study

Multi-GPU Performance

C. Feichtinger Core 0.2

Page 17: WaLBerla::Core: An Overview

LBM Performance Study

Multi-GPU Performance

C. Feichtinger Core 0.2

Page 18: WaLBerla::Core: An Overview

LBM Performance Study

Heterogeneous Multi-GPU Performance

Blocks GPU: 1 GPU: 22, CPU: 1

Nodes 1 30 1 30 60 90Processes 2 x GPU 60 x GPU 2 x GPU + 60 x GPU + 60 GPU + 60 GPU +

6 x CPU 180 x CPU 420 x CPU 660 x CPU

MFLUPS 476 14480 459 13267 15684 17846

C. Feichtinger Core 0.2

Page 19: WaLBerla::Core: An Overview

Logging

Modifications to the Logger class (core/src/Logging.h)

Log levels: no log, log info, log progress, log progress detail

Logging macros:Always on: LOG ERROR, LOG WARNING, LOG ASSERT, LOG RESULTInput file activated: LOG INFO, LOG PROGRESSMakefile activated: LOG PROGRESS DETAIL

Logging sectionsLOG INFO SEC(){...LOG INFO();}

Logger only creates a files if logging output has to be written

ROOT PROCCESS sections

C. Feichtinger Core 0.2

Page 20: WaLBerla::Core: An Overview

Timing

PerfLogger and WallTimeLogger

Can be wrapped around any function

Start/End measuring with begin() and end()

PerfLogger provides also trigger()

WallTimeLogger provides min/avg/max times in parallel simulations

Examples

PerLogger logLoop("Timeloop PerfLogger");

logLoop.begin();

logLoop.trigger();

logLoop.end();

PerLogger logPDF("PDFSweep PerfLogger");

wrapFunction(pdfSweep,logPDF);

C. Feichtinger Core 0.2

Page 21: WaLBerla::Core: An Overview

Timing

PerfLogger and WallTimeLogger

[RESULT ]------(0.341 sec) -----------------------------------------------------------------Final TimeLoop PerfLogger : 0 MFLUPS, 8.54671 MLUPSTime: 0.137654-----------------------------------------------------------------

[RESULT ]------(0.342 sec) -----------------------------------------------------------------Walltime of Communication :Min: 0.000949621 sec, Max: 0.000949621 sec, Avg: 0.000949621 sec-----------------------------------------------------------------

[RESULT ]------(0.342 sec) -----------------------------------------------------------------Final PDF Logger : 0 MFLUPS, 8.65434 MLUPSTime: 0.135942-----------------------------------------------------------------

C. Feichtinger Core 0.2

Page 22: WaLBerla::Core: An Overview

Memory Management

Class for Grid based Data: bd::Field<Type T, Uint CellSize>.

C. Feichtinger Core 0.2