WaLBerla::Core: An Overview

Post on 19-Mar-2022

4 views 0 download

Transcript of WaLBerla::Core: An Overview

WaLBerla::Core:

An Overview

C. Feichtinger

Chair for System Simulation, University of Erlangen-Nuremberg, Erlangen, Germany

RRZE Seminar

22.6.10

C. Feichtinger Core 0.2

Core 0.2: Implementation details

Outline

Introduction to LBM

Introduction to WaLBerla

Software DesignsSweepsFunctionality ManagementPatches and BlocksParallelization

GPU Performance Study

Implementation Details

C. Feichtinger Core 0.2

Lattice Boltzmann Method

Brief Introduction

Mesoscopic method for CFD simulations

Equivalent to a finite difference Navier-Stokes scheme

Two major steps: Stream step and collision step

fα(xi + eα,iδt, t + δt)− fα(xi , t) = −δt

τ

hfα(xi , t)− f

(eq)α (ρ(xi , t), ui (xi , t))

iρui =

18Xα=0

eα,i · fα ρ =18Xα=0

C. Feichtinger Core 0.2

The Software Framework WaLBerla

Widely Applicable Lattice Boltzmann Solver from Erlangen

Massively Parallel LB Framework

Designed tosupport a wide range of functionalities required by CFD applicationsminimize the integration effort of new functionality

C. Feichtinger Core 0.2

Software Design of WaLBerla::Core

New Design Objectives

Library

Organization of functionality

Heterogeneous computing

Dynamic load balancing

Grid refinement

New data structures

Optimized dynamic simulations

C. Feichtinger Core 0.2

Software Design of WaLBerla::Core

Sweeps: A Kernel Management Concept

Sweep Chain I

Sweep I

Sweep II

Sweep III

Sweep Chain II

Sweep I

Sweep II

Sweep

Preprocessing

Post-processing

CommunicationTiming

Visualization

Timing

Block Sweep

Global Sweep

Sweep Concept

: Iteration : Execution Order : Dependency

(Time loop)

C. Feichtinger Core 0.2

Software Design of WaLBerla::Core

Functionality Management

UID Name Granularity Example

fs Functionality Selector Simulation Gravity on/offhs Hardware Selector Process CPU and/or GPUbs Block Selector Block LBM

Examples

useFunction(LBMSweep_CPU, fsNoFeat, hsCPU, bsPureLBM);

useFunction(LBMSweep_GPU, fsNoFeat, hsGPU, bsPureLBM);

useFunction(LBMSweep_Grav, fsGravity, hsCPU, bsPureLBM);

useFunction(LBMSweep_FreeSurf_Grav, fsGravity, hsCPU, bsFreeSurface);

C. Feichtinger Core 0.2

Software Design of WaLBerla::Core

Patch Data Structure

C. Feichtinger Core 0.2

Software Design of WaLBerla::Core

Patch Data Structures

C. Feichtinger Core 0.2

Software Design of WaLBerla::Core

MPI Parallelization

C. Feichtinger Core 0.2

Software Design of WaLBerla::Core

Data: B = All Blocks allocated on the process

for block ∈ B do1

// Go over all neighboring Blocks

for nBlock ∈ N do2

if nBlock.isAllocated then // nBlock lies on current process3

for data ∈ D do4

sendData = extract(block.data, Direction To nBlock, fs, hs, bs);5

insert(nBlock.data, sendData, Direction To nBlock, fs ,hs, bs);6

end7

end8

//

else // nBlock lies on a different process9

for data ∈ D do10

sendData = extract(block.data, Direction To nBlock, fs, hs, bs);11

sendData.addHeader();12

insert(endBuffer[nBlock.rank], sendData, fs , hs, bs);13

end14

15

end16

end17

Algorithm 1: Data Extraction

C. Feichtinger Core 0.2

Software Design of WaLBerla::Core

Multi-GPU Implementation

C. Feichtinger Core 0.2

Software Design of WaLBerla::Core

Heterogeneous Multi-GPU Implementation

C. Feichtinger Core 0.2

LBM Performance Study

Multi-GPU Performance

C. Feichtinger Core 0.2

LBM Performance Study

Single-GPU Performance

C. Feichtinger Core 0.2

LBM Performance Study

Multi-GPU Performance

C. Feichtinger Core 0.2

LBM Performance Study

Multi-GPU Performance

C. Feichtinger Core 0.2

LBM Performance Study

Heterogeneous Multi-GPU Performance

Blocks GPU: 1 GPU: 22, CPU: 1

Nodes 1 30 1 30 60 90Processes 2 x GPU 60 x GPU 2 x GPU + 60 x GPU + 60 GPU + 60 GPU +

6 x CPU 180 x CPU 420 x CPU 660 x CPU

MFLUPS 476 14480 459 13267 15684 17846

C. Feichtinger Core 0.2

Logging

Modifications to the Logger class (core/src/Logging.h)

Log levels: no log, log info, log progress, log progress detail

Logging macros:Always on: LOG ERROR, LOG WARNING, LOG ASSERT, LOG RESULTInput file activated: LOG INFO, LOG PROGRESSMakefile activated: LOG PROGRESS DETAIL

Logging sectionsLOG INFO SEC(){...LOG INFO();}

Logger only creates a files if logging output has to be written

ROOT PROCCESS sections

C. Feichtinger Core 0.2

Timing

PerfLogger and WallTimeLogger

Can be wrapped around any function

Start/End measuring with begin() and end()

PerfLogger provides also trigger()

WallTimeLogger provides min/avg/max times in parallel simulations

Examples

PerLogger logLoop("Timeloop PerfLogger");

logLoop.begin();

logLoop.trigger();

logLoop.end();

PerLogger logPDF("PDFSweep PerfLogger");

wrapFunction(pdfSweep,logPDF);

C. Feichtinger Core 0.2

Timing

PerfLogger and WallTimeLogger

[RESULT ]------(0.341 sec) -----------------------------------------------------------------Final TimeLoop PerfLogger : 0 MFLUPS, 8.54671 MLUPSTime: 0.137654-----------------------------------------------------------------

[RESULT ]------(0.342 sec) -----------------------------------------------------------------Walltime of Communication :Min: 0.000949621 sec, Max: 0.000949621 sec, Avg: 0.000949621 sec-----------------------------------------------------------------

[RESULT ]------(0.342 sec) -----------------------------------------------------------------Final PDF Logger : 0 MFLUPS, 8.65434 MLUPSTime: 0.135942-----------------------------------------------------------------

C. Feichtinger Core 0.2

Memory Management

Class for Grid based Data: bd::Field<Type T, Uint CellSize>.

C. Feichtinger Core 0.2