openMP loop parallelization

22
Loop parallelization Albert DeFusco December 13, 2012

Transcript of openMP loop parallelization

Page 1: openMP loop parallelization

Loop parallelization

Albert DeFusco

December 13, 2012

Page 2: openMP loop parallelization

Compute π by Integration

I By definition

π = 4arctan(1) (1)

I Many formulas to compute π have centered aroundapproximating arctan(1) by series

I arctan(1) is computed by integration

π =

∫ 1

0

41 + x2 dx (2)

Page 3: openMP loop parallelization

Compute π by Integration

1

1.5

2

2.5

3

3.5

4

-0.5 0 0.5 1 1.5

6 points : π ≈ 3.1124

Page 4: openMP loop parallelization

Compute π by Integration

1

1.5

2

2.5

3

3.5

4

-0.5 0 0.5 1 1.5

41 points : π ≈ 3.1380; 109 points : π ≈ 3.14159265

Page 5: openMP loop parallelization

Compute π in C++ and Fortran 901 #include < iostream>2 using namespace std ;34 double arc ( double x ) ;5 const long i n t num steps=1000000000 ;6 double step ;7 i n t main ( )8 {

9 double pi , sum=0 . 0 ;10 step=1 . 0 / ( double ) num steps ;1112 for ( i n t i =0 ; i <=num steps ; i ++)13 {

14 double x =( i +0 . 5 ) ∗ step ;15 sum+= arc ( x ) ;16 }

17 p i = sum∗ step ;1819 cout . p r e c i s i o n ( 10 ) ;20 cout << "pi is probably "21 << f i x e d << p i << endl ;2223 return 0 ;24 }

25 double arc ( double x )26 {

27 double y = 4 . 0 / ( 1+x ∗ x ) ;28 return y ;29 }

1 program i n t e g r a t e P i23 i m p l i c i t none4 integer ( kind=8 ) : : num steps , i5 rea l ( kind=8 ) : : sum, step , p i , x67 num steps=10000000008 sum=0 . d09 step=1 . d0 / num steps

1011 do i =1 , num steps12 x =( i +0 . 5d0 ) ∗ step13 sum=sum+arc ( x )14 enddo1516 p i =sum∗ step1718 write ( 6 , &19 ’("pi is probably ",f12.10)’ ) &20 p i2122 contains23 function arc ( x )24 i m p l i c i t none25 rea l ( kind=8 ) : : arc , x26 arc = 4 . d0 / ( 1 . d0+x ∗ x )27 end function arc28 end program i n t e g r a t e P i

Page 6: openMP loop parallelization

Compute π

>cp -r /home/sam/training/openmp/basic .

>cd pi

>module load gcc/4.5

#Compile in C++

>CXX pi.cpp -o pi

#Compile in Fortran 90

>FC pi.f90 -o pi

>qsub pi-serial.job

>cat pi.out

pi is probably 3.1415926556

real 0m25.604s

user 0m25.594s

sys 0m0.000s

Page 7: openMP loop parallelization

Variable Scopes

I Scope is the lexical context which determines the lifetime of avariable

I In C++ variables declared in the following regions are “locally”scoped

I Regions in curly bracesI LoopsI Subroutines

I Fortran variables are valid within an entire subroutine andcontained subroutines and functions

I Global variables exist in both languagesI Variables declared as public in classes or outside of main in C++I Common blocks or modules in Fortran

Page 8: openMP loop parallelization

Variable Scopes

I lexical scopeI The physical text where a variable is valid

I dynamical scopeI The runtime scope of a variable

I Scopes will be extremely important to OpenMPI Several clauses exist to control the scoping behavior

Page 9: openMP loop parallelization

Variable Scopes1 #include < iostream>2 using namespace std ;34 double arc ( double x ) ;5 const long i n t num steps=1000000000 ;6 double step ;7 i n t main ( )8 {

9 double pi , sum=0 . 0 ;10 step=1 . 0 / ( double ) num steps ;1112 for ( i n t i =0 ; i <=num steps ; i ++)13 {

14 double x =( i +0 . 5 ) ∗ step ;15 sum+= arc ( x ) ;16 }

17 p i = sum∗ step ;1819 cout . p r e c i s i o n ( 10 ) ;20 cout << "pi is probably "21 << f i x e d << p i << endl ;2223 return 0 ;24 }

25 double arc ( double x )26 {

27 double y = 4 . 0 / ( 1+x ∗ x ) ;28 return y ;29 }

I num steps and step areglobal

I pi, step and sum are valid forall of main

I i and x are valid only withinthe for loop

Page 10: openMP loop parallelization

Variable Scopes1 program i n t e g r a t e P i23 i m p l i c i t none4 integer ( kind=8 ) : : num steps , i5 rea l ( kind=8 ) : : sum, step , p i , x67 num steps=10000000008 sum=0 . d09 step=1 . d0 / num steps

1011 do i =1 , num steps12 x =( i +0 . 5d0 ) ∗ step13 sum=sum+arc ( x )14 enddo1516 p i =sum∗ step1718 write ( 6 , &19 ’("pi is probably ",f12.10)’ ) &20 p i2122 contains23 function arc ( x )24 i m p l i c i t none25 rea l ( kind=8 ) : : arc , x26 arc = 4 . d0 / ( 1 . d0+x ∗ x )27 end function arc28 end program i n t e g r a t e P i

I All variables declared inprogram are available to arc

I Variables in function arcare private to arc

Page 11: openMP loop parallelization

Loop parallelization

I The omp parallel for regionI Reads the loop bounds and divides the workI i becomes private to each thread in Fortran and C++I What other scopes do we expect?

1 #pragma omp p a r a l l e l for2 for ( i n t i =0 ; i <=num steps ; i ++)3 {

4 double x =( i +0 . 5 ) ∗ step ;5 sum+= arc ( x ) ;6 }

The locally scoped variable x becomesprivate to each thread by default

1 ! $omp p a r a l l e l do2 do i =1 , num steps3 x =( i +0 . 5d0 ) ∗ step4 sum=sum+arc ( x )5 enddo6 ! $omp end p a r a l l e l do

All variables except i are shared by default

Page 12: openMP loop parallelization

parallel clauses

I Scoping clauses restricted to the lexical extentI private (list)

I Independent variables are created for each thread

I shared (list)I Can be used for reading and writing by multiple threads

I reduction (operator:variable)I A private copy is made for each thread and the reduction operator is

applied at the end of the parallel regionI Operators: +, *, -, &, |, ˆ, &&, ||

I default (private|shared|none)I By default all variables are sharedI Locally scoped variables in C++ are private

Page 13: openMP loop parallelization

Compute π in C++ and Fortran 90 in parallel

1 #pragma omp p a r a l l e l for reduc t ion ( + :sum)2 for ( i n t i =0 ; i <=num steps ; i ++)3 {

4 double x =( i +0 . 5 ) ∗ step ;5 sum+= arc ( x ) ;6 }

1 ! $omp p a r a l l e l do reduc t ion ( + :sum) &2 ! $omp p r i v a t e ( x )3 do i =1 , num steps4 x =( i +0 . 5d0 ) ∗ step5 sum=sum+arc ( x )6 enddo7 ! $omp end p a r a l l e l do

Page 14: openMP loop parallelization

parallel for region

>#edit and compile

>qsub pi-parallel.job

>cat pi.out

pi is probably 3.1415926556

real 0m6.458s

user 0m25.601s

sys 0m0.005s

Page 15: openMP loop parallelization

Loop parallelization

I Loops that can be parallelized will have the followingI All assignments are made to arrays

I Each element is assigned by at most one iteration

I No iteration reads elements assigned by another iteration

I Blocks must have one entrance and exit

Page 16: openMP loop parallelization

Loop parallelization

I Loops with data dependence cannot be parallelizedI a[i] = x*a[i-1]I Data dependence exists through the dynamical extent of a parallel

region

I Loops with an undefined number of iteration cannot beparallelized

I while(!converged)

I Race conditionsI sum=sum+iI Many can be cured with the reduction clause

I Use control statements and divergent iterations carefullyI goto must be within the blockI break, continue must be within the blockI stop and exit are allowed

Page 17: openMP loop parallelization

Loop nesting

I In nested loops, both loops cannot be parallelI The inner loop would not be computed in parallel since all of the

threads are already used in the outer loop

1 for ( i n t y=0 ; y<25 ; ++y )2 {

3 for ( i n t x=0 ; x<80 ; ++x )4 {

5 a [ y ] += b [ x ] ;6 }

7 }

I In this example only the outer loop can be parallelized due todata dependence of a[y]

Page 18: openMP loop parallelization

Exercises

Page 19: openMP loop parallelization

Loop parallelization

I Copy /home/sam/training/openmp/basic/I Parallelize the loops in laplace2d, matrixvector

Page 20: openMP loop parallelization
Page 21: openMP loop parallelization

Jacobi Iterations1

I Iteratively converges to correct value (e.g. Temperature), bycomputing new values at each point from the average ofneighboring points.

Ak+1(i, j) =Ak (i − 1, j) + Ak (i + 1, j) + Ak (i, j − 1) + Ak (i, j + 1)

4

1Adapted from a recent PSC workshop

Page 22: openMP loop parallelization

Loop parallelization

I Parallelize the loops in matrixvector

>cd matrixvector

>#edit mxm.c

>module purge

>module load gcc/4.5

>CXX mxm.c -fopenmp -o matrixvector

>qsub matrixvector.job

>#inspect omp.out