Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf ·...

39
Recent Advances in Dense Matrix Computations for Two-Sided Reduction Algorithms Azzam Haidar 1 Hatem Ltaief 2 Piotr Luszczek 1 Jack Dongarra 1 1 Innovative Computing Laboratory 2 KAUST Supercomputing Laboratory London, UK, Jun 28-30, 2012 Haidar, Ltaief, Luszczek, Dongarra () λ and X PMAA 2012 1 / 39

Transcript of Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf ·...

Page 1: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Recent Advances in Dense MatrixComputations for Two-Sided Reduction

Algorithms

Azzam Haidar1 Hatem Ltaief2 Piotr Łuszczek1

Jack Dongarra1

1Innovative Computing Laboratory

2KAUST Supercomputing Laboratory

London, UK, Jun 28-30, 2012

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 1 / 39

Page 2: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Outline

1 MotivationTime Breakdowns

2 Block and Tile AlgorithmsBlock AlgorithmsTile Algorithms

3 Two-Stage ApproachStage I: Band ReductionStage II: Bulge Chasing

4 Data Translation Layer5 Tuning the Tile Size6 Dynamic Scheduling: QUARK

7 Performance Results8 Future Work

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 2 / 39

Page 3: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

MORSE

Matrices Over Runtime Systems at Exascale

=⇒ Mission statement: "Design dense and sparse linear algebramethods that achieve the fastest possible time to an accuratesolution on large-scale Hybrid systems".

=⇒ Separation of concerns:Algorithmic challenges to exploit the hardware capabilitiesat most.Runtime challenges due to the ever growing hardwarecomplexity.

=⇒ MORSE Algorithms: MORSE-Dense (PLASMA, MAGMA, tilealgorithms), MORSE-Sparse (PaStiX, MaPHyS), MORSE-Stencil,MORSE-FMM

=⇒ MORSE Runtimes: QUARK, StarPU, DAGuEHaidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 3 / 39

Page 4: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

MORSE

Matrices Over Runtime Systems at Exascale

=⇒ Mission statement: "Design dense and sparse linear algebramethods that achieve the fastest possible time to an accuratesolution on large-scale Hybrid systems".

=⇒ Separation of concerns:Algorithmic challenges to exploit the hardware capabilitiesat most.Runtime challenges due to the ever growing hardwarecomplexity.

=⇒ MORSE Algorithms: MORSE-Dense (PLASMA, MAGMA, tilealgorithms), MORSE-Sparse (PaStiX, MaPHyS), MORSE-Stencil,MORSE-FMM

=⇒ MORSE Runtimes: QUARK, StarPU, DAGuEHaidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 4 / 39

Page 5: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

MORSE

Matrices Over Runtime Systems at Exascale

=⇒ Mission statement: "Design dense and sparse linear algebramethods that achieve the fastest possible time to an accuratesolution on large-scale Hybrid systems".

=⇒ Separation of concerns:Algorithmic challenges to exploit the hardware capabilitiesat most.Runtime challenges due to the ever growing hardwarecomplexity.

=⇒ MORSE Algorithms: MORSE-Dense (PLASMA, MAGMA, tilealgorithms), MORSE-Sparse (PaStiX, MaPHyS), MORSE-Stencil,MORSE-FMM

=⇒ MORSE Runtimes: QUARK, StarPU, DAGuEHaidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 5 / 39

Page 6: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

MORSE

Matrices Over Runtime Systems at Exascale

=⇒ Mission statement: "Design dense and sparse linear algebramethods that achieve the fastest possible time to an accuratesolution on large-scale Hybrid systems".

=⇒ Separation of concerns:Algorithmic challenges to exploit the hardware capabilitiesat most.Runtime challenges due to the ever growing hardwarecomplexity.

=⇒ MORSE Algorithms: MORSE-Dense (PLASMA, MAGMA, tilealgorithms), MORSE-Sparse (PaStiX, MaPHyS), MORSE-Stencil,MORSE-FMM

=⇒ MORSE Runtimes: QUARK, StarPU, DAGuEHaidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 6 / 39

Page 7: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

MORSE

Matrices Over Runtime Systems at Exascale

=⇒ Mission statement: "Design dense and sparse linear algebramethods that achieve the fastest possible time to an accuratesolution on large-scale Hybrid systems".

=⇒ Separation of concerns:Algorithmic challenges to exploit the hardware capabilitiesat most.Runtime challenges due to the ever growing hardwarecomplexity.

=⇒ MORSE Algorithms: MORSE-Dense (PLASMA, MAGMA, tilealgorithms), MORSE-Sparse (PaStiX, MaPHyS), MORSE-Stencil,MORSE-FMM

=⇒ MORSE Runtimes: QUARK, StarPU, DAGuEHaidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 7 / 39

Page 8: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Motivation

Outline

1 MotivationTime Breakdowns

2 Block and Tile AlgorithmsBlock AlgorithmsTile Algorithms

3 Two-Stage ApproachStage I: Band ReductionStage II: Bulge Chasing

4 Data Translation Layer5 Tuning the Tile Size6 Dynamic Scheduling: QUARK

7 Performance Results8 Future Work

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 8 / 39

Page 9: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Motivation Time Breakdowns

Time Breakdown for TRD and Just Eig-Values (QR)

0

20

40

60

80

100

120

200 400 600 800 1000 1200 1400 1600 1800 2000

Fra

ctio

n o

f to

tal tim

e [

%]

Matrix size

ReductionImplicit QL/QR Iteration

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 9 / 39

Page 10: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Motivation Time Breakdowns

Time Breakdown for TRD and Just Eig-Vals (D&C)

0

20

40

60

80

100

120

200 400 600 800 1000 1200 1400 1600 1800 2000

Fra

ctio

n o

f to

tal tim

e [

%]

Matrix size

ReductionDivide and Conquer Iteration

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 10 / 39

Page 11: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Motivation Time Breakdowns

Time Breakdown: TRD, Eig-Values, and Eig-Vectors

0

20

40

60

80

100

200 400 600 800 1000 1200 1400 1600 1800 2000

Fra

ctio

n o

f to

tal tim

e [

%]

Matrix size

ReductionDivide and Conquer Iteration

Divide and Conquer Backtransformation

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 11 / 39

Page 12: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Block and Tile Algorithms

Outline

1 MotivationTime Breakdowns

2 Block and Tile AlgorithmsBlock AlgorithmsTile Algorithms

3 Two-Stage ApproachStage I: Band ReductionStage II: Bulge Chasing

4 Data Translation Layer5 Tuning the Tile Size6 Dynamic Scheduling: QUARK

7 Performance Results8 Future Work

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 12 / 39

Page 13: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Block and Tile Algorithms Block Algorithms

Block Algorithms

Panel-Update SequenceTransformations are blocked/accumulated within the Panel(Level 2 BLAS)Transformations applied at once on the trailing submatrix(Level 3 BLAS)Parallelism hidden inside the BLASFork-join Model

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 13 / 39

Page 14: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Block and Tile Algorithms Block Algorithms

Block Algorithms

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 14 / 39

Page 15: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Block and Tile Algorithms Block Algorithms

Block Algorithms

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

5

10

Gflo

p/s

Matrix Sizes0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

0

5

x 107

TLB

Mis

ses

Figure: Performance evaluation and TLB miss analysis of the one-stageLAPACK TRD algorithm with optimized Intel MKL BLAS, on a dual-socketquad-core Intel Xeon (8 cores total).

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 15 / 39

Page 16: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Block and Tile Algorithms Block Algorithms

Block Algorithms

Panel computation involves the entire trailing submatrixPerformance are impeded by memory-bound nature of thepanelReductions achieved through one-stage approach2-sided reductions (TRD, BRD, HRD) more challenging than1-sided factorizations (QR, LU, Cholesky)

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 16 / 39

Page 17: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Block and Tile Algorithms Tile Algorithms

Tile Algorithms

Parallelism is brought to the foreTile data layout translationMay require the redesign of linear algebra algorithmsRemove unnecessary synchronization points betweenPanel-Update sequencesDAG execution where nodes represent tasks and edgesdefine dependencies between themDynamic runtime system environment

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 17 / 39

Page 18: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Block and Tile Algorithms Tile Algorithms

Tile Algorithms

Figure: Translation from LAPACK Layout (column-major) to Tile DataLayout

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 18 / 39

Page 19: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Two-Stage Approach

Outline

1 MotivationTime Breakdowns

2 Block and Tile AlgorithmsBlock AlgorithmsTile Algorithms

3 Two-Stage ApproachStage I: Band ReductionStage II: Bulge Chasing

4 Data Translation Layer5 Tuning the Tile Size6 Dynamic Scheduling: QUARK

7 Performance Results8 Future Work

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 19 / 39

Page 20: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Two-Stage Approach Stage I: Band Reduction

Stage I: Band Reduction

Tile algorithm running on top of tile data layoutRely on high performant compute-intensive kernelsComposed by successive calls to Level 3 BLAS operationsDerived from QR factorization kernelsHandle cautiously the symmetric structure of the matrix

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 20 / 39

Page 21: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Two-Stage Approach Stage II: Bulge Chasing

Stage II: Bulge Chasing

Further reduce the band tridiagonal matrix to the finaltridiagonal formAlgorithm proceeds by column-wise annihilationEach column annihilation (or sweep) creates a bulge, whichneeds to be chased down to the bottom right corner of thematrixIf N is the matrix size, (N-2) sweeps are required to achievethe tridiagonal structure.Rely on Level 2 BLAS kernelsHighly memory-bound operations: the whole matrix needsto be traversed to annihilate a single column.

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 21 / 39

Page 22: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Two-Stage Approach Stage II: Bulge Chasing

Stage II: Bulge Chasing

dtsqr2(0) dtslq3(0) dtsqr3(0) dtslq3(0) dtsqr3(0) dtslq3(0) dtsqr3(0)

dtslq3(0) dtsqr2(1) dtslq3(1) dtsqr3(1) dtslq3(1) dtsqr3(1) dtslq3(1)

dtsqr3(1) dtslq3(1) dtsqr2(2) dtslq3(2) dtsqr3(2) dtslq3(2) dtsqr3(2)

dtslq3(2) dlarfx(2) dtsqr2(3) dtslq3(3) dtsqr3(3) dtslq3(3) dtsqr3(3)

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 22 / 39

Page 23: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Two-Stage Approach Stage II: Bulge Chasing

Stage II: Bulge Chasing ZigZag

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 23 / 39

Page 24: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Two-Stage Approach Stage II: Bulge Chasing

Runtime Translation from Column-major to Tile: DTL

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 24 / 39

Page 25: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Two-Stage Approach Stage II: Bulge Chasing

Stage II: Bulge Chasing

Column-major algorithm running on top of column-majordata layoutData layout mismatch between both stagesNeed an abstraction layer to reconcile both stage layouts.

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 25 / 39

Page 26: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Data Translation Layer

Outline

1 MotivationTime Breakdowns

2 Block and Tile AlgorithmsBlock AlgorithmsTile Algorithms

3 Two-Stage ApproachStage I: Band ReductionStage II: Bulge Chasing

4 Data Translation Layer5 Tuning the Tile Size6 Dynamic Scheduling: QUARK

7 Performance Results8 Future Work

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 26 / 39

Page 27: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Data Translation Layer

Data Translation Layer

Pro:Allows writing algorithm using column-major layout

Especially important for bulge chasing which has “shift bycolumn” data access

Tracks and relays dependences automatically to theruntime scheduler.

Con:Still requires the user to minimize data access area for eachkernel call.

Minimizing dependences minimizes scheduling overhead andenables more parallelism.

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 27 / 39

Page 28: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Tuning the Tile Size

Outline

1 MotivationTime Breakdowns

2 Block and Tile AlgorithmsBlock AlgorithmsTile Algorithms

3 Two-Stage ApproachStage I: Band ReductionStage II: Bulge Chasing

4 Data Translation Layer5 Tuning the Tile Size6 Dynamic Scheduling: QUARK

7 Performance Results8 Future Work

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 28 / 39

Page 29: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Tuning the Tile Size

More Detailed Look at Tile Size (NB) Influence

0

10

20

30

40

50

60

0 20 40 60 80 100 120 140 160 180

Tota

l ru

nn

ing

tim

e [

sec

on

ds]

Tile size (NB)

1st Stage2nd Stage

Both Stages Combined

N=5040

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 29 / 39

Page 30: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Dynamic Scheduling: QUARK

Outline

1 MotivationTime Breakdowns

2 Block and Tile AlgorithmsBlock AlgorithmsTile Algorithms

3 Two-Stage ApproachStage I: Band ReductionStage II: Bulge Chasing

4 Data Translation Layer5 Tuning the Tile Size6 Dynamic Scheduling: QUARK

7 Performance Results8 Future Work

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 30 / 39

Page 31: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Dynamic Scheduling: QUARK

Dynamic Scheduling: QUARK

Conceptually similar to out-of-order processor scheduling

because it has:Dynamic runtime DAG schedulerOut-of-order execution flow of tasksTask scheduling as soon as dependencies are satisfiedOverlapping of operations from both stages

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 31 / 39

Page 32: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Dynamic Scheduling: QUARK

Dynamic Scheduling: QUARK

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 32 / 39

Page 33: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Performance Results

Outline

1 MotivationTime Breakdowns

2 Block and Tile AlgorithmsBlock AlgorithmsTile Algorithms

3 Two-Stage ApproachStage I: Band ReductionStage II: Bulge Chasing

4 Data Translation Layer5 Tuning the Tile Size6 Dynamic Scheduling: QUARK

7 Performance Results8 Future Work

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 33 / 39

Page 34: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Performance Results

TRD Performance Results

2k 3k 4k 6k 8k 10k 12k 14k 16k 18k 20k 22k 24k0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

Matrix size

Gflo

p/s

PLASMA DSYTRDPLASMA DSYTRD w/o groupingMKL SBR DSYRDBSBR toolkit DSYRDDMKL DSYTRDMKL Scalapack PDSYTRDLPK reference DSYTRD

50X

12X

A. Haidar, H. Ltaief and J. Dongarra (SC’11)Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 34 / 39

Page 35: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Performance Results

Speedup: TRD + Eigenvalue

2k3k4k 6k 8k 10k 12k 14k 16k 18k 20k 22k 24k0

1

2

3

4

5

6

7

8

9

10DSYEV eigenvalue only 48 threads

Matrix size

Tim

e(M

KL

)/T

ime

(PL

AS

MA

)

MKL_DSYEV/PLASMA_DSYEV

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 35 / 39

Page 36: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Performance Results

Speedup: TRD + Eigenvalue + Eigenvector

2k 3k 4k 6k 8k 10k 12k 14k 16k 18k 20k 22k 24k0.6

0.8

1

1.2

1.4

1.6

1.8

2DSYEV full−eigenvectors 48 threads

Matrix size

Tim

e(M

KL

)/T

ime(P

LA

SM

A)

MKL_DSYEV/PLASMA_DSYEV

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 36 / 39

Page 37: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Future Work

Outline

1 MotivationTime Breakdowns

2 Block and Tile AlgorithmsBlock AlgorithmsTile Algorithms

3 Two-Stage ApproachStage I: Band ReductionStage II: Bulge Chasing

4 Data Translation Layer5 Tuning the Tile Size6 Dynamic Scheduling: QUARK

7 Performance Results8 Future Work

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 37 / 39

Page 38: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Future Work

Future Work

Evaluating the performance in the case where only aneigenvalue subset is needed.Extension to the Hessenberg reduction (matrix sign function)Algorithm favorable for running on heterogeneoushardwares (e.g., StarPU w/ multiple GPUs)Implementation on Distributed Environment (DAGuE)

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 38 / 39

Page 39: Recent Advances in Dense Matrix Computations for Two-Sided ...luszczek/pubs/pmaa12twosided.pdf · Matrices Over Runtime Systems at Exascale =) Mission statement: "Design dense and

Thank you!

Thank you!

شكرا !

Haidar, Ltaief, Łuszczek, Dongarra () λ and X PMAA 2012 39 / 39