Opportunities for ELPA to Accelerate the Solution of the ...

40
Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners:

Transcript of Opportunities for ELPA to Accelerate the Solution of the ...

Page 1: Opportunities for ELPA to Accelerate the Solution of the ...

Opportunities for ELPA to Accelerate theSolution of the Bethe-Salpeter Eigenvalue

ProblemPeter Benner, Andreas Marek, Carolin Penke

August 16, 2018

ELSI Workshop 2018

Partners:

Page 2: Opportunities for ELPA to Accelerate the Solution of the ...

The Problem

The Bethe-Salpeter Eigenvalue Problem

Find eigenvalues, right and left eigenvectors for

HBSx = λx with

HBS =

[A B−B −A

]=

[A B−BH −AT

],

A = AH , B = BT ∈ Cn×n are dense.

Comes up in quantum chemical simulations.

n is proportional to n2e where ne is the number of electrons in the

system.

→ n can become very large (> 50, 000)

→ Parallel and scalable algorithms running on supercomputers necessary.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 2/25

Page 3: Opportunities for ELPA to Accelerate the Solution of the ...

The Problem

The Bethe-Salpeter Eigenvalue Problem

Find eigenvalues, right and left eigenvectors for

HBSx = λx with

HBS =

[A B−B −A

]=

[A B−BH −AT

],

A = AH , B = BT ∈ Cn×n are dense.

Comes up in quantum chemical simulations.

n is proportional to n2e where ne is the number of electrons in the

system.

→ n can become very large (> 50, 000)

→ Parallel and scalable algorithms running on supercomputers necessary.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 2/25

Page 4: Opportunities for ELPA to Accelerate the Solution of the ...

The Problem

The Bethe-Salpeter Eigenvalue Problem

Find eigenvalues, right and left eigenvectors for

HBSx = λx with

HBS =

[A B−B −A

]=

[A B−BH −AT

],

A = AH , B = BT ∈ Cn×n are dense.

Comes up in quantum chemical simulations.

n is proportional to n2e where ne is the number of electrons in the

system.

→ n can become very large (> 50, 000)

→ Parallel and scalable algorithms running on supercomputers necessary.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 2/25

Page 5: Opportunities for ELPA to Accelerate the Solution of the ...

Applications in Quantum Chemistry

Eigenvalues λi , right eigenvectors xi and left eigenvectors yi are used tocompute

Spectral Density

φ(ω) =1

2n

2n∑j=1

δ(ω − λj),

Optical Absorption Spectrum

ε+(ω) =n∑

j=1

(dHr xj)(yHj dl)

yHj xjδ(ω − λj).

→ We are interested in all (or most) eigenpairs.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 3/25

Page 6: Opportunities for ELPA to Accelerate the Solution of the ...

Applications in Quantum Chemistry

Eigenvalues λi , right eigenvectors xi and left eigenvectors yi are used tocompute

Spectral Density

φ(ω) =1

2n

2n∑j=1

δ(ω − λj),

Optical Absorption Spectrum

ε+(ω) =n∑

j=1

(dHr xj)(yHj dl)

yHj xjδ(ω − λj).

→ We are interested in all (or most) eigenpairs.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 3/25

Page 7: Opportunities for ELPA to Accelerate the Solution of the ...

Properties of (definite) Bethe-Salpeter EVP

Due to the special structure eigenvalues appear in quadruples(λ,−λ, λ,−λ).

[Benner, Faßbender, Yang ’14/’18]

R

iR

HBS is called definite if

[In 00 −In

]HBS =

[A BB A

]> 0, holds in many

physical settings. Here eigenvalues come in real pairs and it holds

Theorem [Shao, da Jornada, Yang, Deslippe, Louie ’15]

There exist X1,X2 ∈ Cn×n, and λ1, . . . , λn ∈ R+, Λ+ = diag{λ1, . . . , λn}, s.t.

HBSX = X

[Λ+

−Λ+

], Y HHBS =

[Λ+

−Λ+

]Y H , Y HX = I2n

where X =

[X1 X2

X2 X1

], Y =

[X1 −X2

−X2 X1

].

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 4/25

Page 8: Opportunities for ELPA to Accelerate the Solution of the ...

Properties of (definite) Bethe-Salpeter EVP

Due to the special structure eigenvalues appear in quadruples(λ,−λ, λ,−λ).

[Benner, Faßbender, Yang ’14/’18]

R

iR

HBS is called definite if

[In 00 −In

]HBS =

[A BB A

]> 0, holds in many

physical settings. Here eigenvalues come in real pairs and it holds

Theorem [Shao, da Jornada, Yang, Deslippe, Louie ’15]

There exist X1,X2 ∈ Cn×n, and λ1, . . . , λn ∈ R+, Λ+ = diag{λ1, . . . , λn}, s.t.

HBSX = X

[Λ+

−Λ+

], Y HHBS =

[Λ+

−Λ+

]Y H , Y HX = I2n

where X =

[X1 X2

X2 X1

], Y =

[X1 −X2

−X2 X1

].

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 4/25

Page 9: Opportunities for ELPA to Accelerate the Solution of the ...

A Structure-Preserving Method

[Shao, da Jornada, Yang, Deslippe, Louie ’15]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A direct method is implemented in BSEPACK1. The structure-preservingacquisition of all eigenpairs in high precision relies on a connection toHamiltonian matrices.

Theorem

Let Q = 1√2

[In −iInIn iIn

], then

QH

[A B−B −A

]Q = i

[Im(A + B) −Re(A− B)Re(A + B) Im(A− B)

]=: iH,

where H is real Hamiltonian, i.e. JH = (JH)T with J =

[0 I−I 0

].

1https://sites.google.com/a/lbl.gov/bsepack/

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 5/25

Page 10: Opportunities for ELPA to Accelerate the Solution of the ...

The resulting algorithm

Algorithm 1 Algorithm for the complex Bethe-Salpeter eigenvalue problem

Require: A = AH ,B = BT ∈ Cn×n, s.t.

[In 00 −In

]HBS =

[A BB A

]> 0

Ensure: X1,X2 ∈ Cn×n and Λ+ = diag{λ1, . . . , λn} satisfying H

[X1

X2

]=

[X1

X2

]Λ+.

1: Construct M =

[Re(A + B) Im(A− B)−Im(A + B) Re(A− B)

]2: Compute the Cholesky factorization M = LLT

3: Construct W = LT[

0 In−In 0

]L

4: Compute the spectral decomposition −iW =[Z+ Z−

]diag{Λ+,−Λ+}

[Z+ Z−

]H.

5: Set

[X1

X2

]=

[In 00 −In

]QLZ+Λ

−1/2+

Main workload: Solve a strictly imaginary Hermitian eigenvalueproblem.

→ Imanginary part is skew-symmetric.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 6/25

Page 11: Opportunities for ELPA to Accelerate the Solution of the ...

The resulting algorithm

Algorithm 1 Algorithm for the complex Bethe-Salpeter eigenvalue problem

Require: A = AH ,B = BT ∈ Cn×n, s.t.

[In 00 −In

]HBS =

[A BB A

]> 0

Ensure: X1,X2 ∈ Cn×n and Λ+ = diag{λ1, . . . , λn} satisfying H

[X1

X2

]=

[X1

X2

]Λ+.

1: Construct M =

[Re(A + B) Im(A− B)−Im(A + B) Re(A− B)

]2: Compute the Cholesky factorization M = LLT

3: Construct W = LT[

0 In−In 0

]L

4: Compute the spectral decomposition −iW =[Z+ Z−

]diag{Λ+,−Λ+}

[Z+ Z−

]H.

5: Set

[X1

X2

]=

[In 00 −In

]QLZ+Λ

−1/2+

Main workload: Solve a strictly imaginary Hermitian eigenvalueproblem.

→ Imanginary part is skew-symmetric.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 6/25

Page 12: Opportunities for ELPA to Accelerate the Solution of the ...

The resulting algorithm

Algorithm 1 Algorithm for the complex Bethe-Salpeter eigenvalue problem

Require: A = AH ,B = BT ∈ Cn×n, s.t.

[In 00 −In

]HBS =

[A BB A

]> 0

Ensure: X1,X2 ∈ Cn×n and Λ+ = diag{λ1, . . . , λn} satisfying H

[X1

X2

]=

[X1

X2

]Λ+.

1: Construct M =

[Re(A + B) Im(A− B)−Im(A + B) Re(A− B)

]2: Compute the Cholesky factorization M = LLT

3: Construct W = LT[

0 In−In 0

]L

4: Compute the spectral decomposition −iW =[Z+ Z−

]diag{Λ+,−Λ+}

[Z+ Z−

]H.

5: Set

[X1

X2

]=

[In 00 −In

]QLZ+Λ

−1/2+

Main workload: Solve a strictly imaginary Hermitian eigenvalueproblem.

→ Imanginary part is skew-symmetric.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 6/25

Page 13: Opportunities for ELPA to Accelerate the Solution of the ...

Implementation

W is skew-symmetric

⇒ can be reduced to tridiagonal form (e.g via Householder transformations):

UWUT = T =

0 α1

−α1 0. . .

. . .. . . α2n−2

−α2n−2 0

Can be transformed to symmetric form using D = diag{1, i , i2, . . . , i2n}

−iDH

0 α1

−α1 0. . .

. . .. . . α2n−2

−α2n−2 0

D =

0 α1

α1 0. . .

. . .. . . α2n−2

α2n−2 0

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 7/25

Page 14: Opportunities for ELPA to Accelerate the Solution of the ...

Implementation

W is skew-symmetric

⇒ can be reduced to tridiagonal form (e.g via Householder transformations):

UWUT = T =

0 α1

−α1 0. . .

. . .. . . α2n−2

−α2n−2 0

Can be transformed to symmetric form using D = diag{1, i , i2, . . . , i2n}

−iDH

0 α1

−α1 0. . .

. . .. . . α2n−2

−α2n−2 0

D =

0 α1

α1 0. . .

. . .. . . α2n−2

α2n−2 0

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 7/25

Page 15: Opportunities for ELPA to Accelerate the Solution of the ...

Implementation

Reduction to tridiagonal form by editing ScaLAPACK’s routine fortridiagonal reduction of symmetric matrices:

PDSYTRD → PDSSTRD

Solve symmetric tridiagonal EVP via

1. Bisection method (PDSTEBZ)2. Inverse Iteration (PDSTEIN).

Back transformation of eigenvectors:[X1

X2

]=

[In 00 −In

]QLUDV+Λ

−1/2+ ,

where all matrices but D = diag{1, i , i2 . . . , i2n} are real.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 8/25

Page 16: Opportunities for ELPA to Accelerate the Solution of the ...

Implementation

Reduction to tridiagonal form by editing ScaLAPACK’s routine fortridiagonal reduction of symmetric matrices:

PDSYTRD → PDSSTRD

Solve symmetric tridiagonal EVP via

1. Bisection method (PDSTEBZ)2. Inverse Iteration (PDSTEIN).

Back transformation of eigenvectors:[X1

X2

]=

[In 00 −In

]QLUDV+Λ

−1/2+ ,

where all matrices but D = diag{1, i , i2 . . . , i2n} are real.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 8/25

Page 17: Opportunities for ELPA to Accelerate the Solution of the ...

Implementation

Reduction to tridiagonal form by editing ScaLAPACK’s routine fortridiagonal reduction of symmetric matrices:

PDSYTRD → PDSSTRD

Solve symmetric tridiagonal EVP via

1. Bisection method (PDSTEBZ)2. Inverse Iteration (PDSTEIN).

Back transformation of eigenvectors:[X1

X2

]=

[In 00 −In

]QLUDV+Λ

−1/2+ ,

where all matrices but D = diag{1, i , i2 . . . , i2n} are real.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 8/25

Page 18: Opportunities for ELPA to Accelerate the Solution of the ...

The ELPA Library

BSEPACK is just proof-of-concept, not performance optimized.

→ Room for improvement by using better libraries!

The ELPA Project2

Eigenvalue SoLvers for Petaflop-ApplicationsThe publicly available ELPA library provides highly efficient and highlyscalable direct eigensolvers for symmetric matrices. Though especiallydesigned for use for PetaFlop/s applications solving large problem sizes onmassively parallel supercomputers, ELPA eigensolvers have proven to bealso very efficient for smaller matrices.

Mainly developed at Max Planck Computing and Data Facility(MPCDF) in Garching.

Original application area: electronic structure calculations.

2https://elpa.mpcdf.mpg.de/

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 9/25

Page 19: Opportunities for ELPA to Accelerate the Solution of the ...

Tridiagonalisation in ELPA

Figure: ELPA employs a two-step tridiagonalisation for symmetric or Hermitianmatrices. [Marek et al. ’14]

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 10/25

Page 20: Opportunities for ELPA to Accelerate the Solution of the ...

ELPA vs. ScaLAPACK

BLAS lvl. 3 routines for redution to banded form.

→ More data locality, less communication, highly efficient GEMM routines!

Carefully crafted communication patterns in reduction to tridiagonalform and eigenvector back transformation.

Solution of Tridiagonal Systems: Divide-and-Conquer instead ofBisection Method and Inverse Iteration.

OpenMP and GPU support.

Better Performance and Scalability!

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 11/25

Page 21: Opportunities for ELPA to Accelerate the Solution of the ...

Using ELPA to enhance BSEPACK

Two promising points of attack for ELPA:

1. Diagonalize complex Hermitian matrix −iW = ZΛZH using ELPA

+ Main portion of the workload could benefit from performance andscalability of ELPA.

− Uses complex arithmetic, while BSEPACK mainly work on real data.

2. Extend the ELPA algorithm for skew-symmetric matrices, just asScaLAPACK was extended in BSEPACK.

+ Easy from mathematical point of view− Major revision of ELPA software stack necessary.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 12/25

Page 22: Opportunities for ELPA to Accelerate the Solution of the ...

Using ELPA to enhance BSEPACK

Two promising points of attack for ELPA:

1. Diagonalize complex Hermitian matrix −iW = ZΛZH using ELPA

+ Main portion of the workload could benefit from performance andscalability of ELPA.

− Uses complex arithmetic, while BSEPACK mainly work on real data.

2. Extend the ELPA algorithm for skew-symmetric matrices, just asScaLAPACK was extended in BSEPACK.

+ Easy from mathematical point of view− Major revision of ELPA software stack necessary.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 12/25

Page 23: Opportunities for ELPA to Accelerate the Solution of the ...

Numerical Test Setup

System DRACO

Located at Max Planck Computingand Data Facility

HPC Extension to HPC SystemHYDRA

880 nodes, Intel ’Haswell’ XeonE5-2698, 32 cores @ 2.3 GHz

128 GB main memory per node

Interconnect: fast InfiniBand FDR14network

http://www.mpcdf.mpg.de/services/computing/draco/about-the-system

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 13/25

Page 24: Opportunities for ELPA to Accelerate the Solution of the ...

A ∈ Cn×n and B ∈ Cn×n where initialized with random complexvalues.

Diagonal of A positive and scaled up, s.t.

[A BB A

]is positive definite

(Gershgorin Circle Theorem).

To work on a matrix on a distributed memory machine, it is dividedinto many subblocks of a certain block sizes nb × nb.

Numerical Tests:

1. Test impact of block size on performance for n = 25, 000.

2. Test strong scalability for medium sized matrices n = 25, 000.

3. Compare runtimes for larger matrices up to n = 75, 000.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 14/25

Page 25: Opportunities for ELPA to Accelerate the Solution of the ...

A ∈ Cn×n and B ∈ Cn×n where initialized with random complexvalues.

Diagonal of A positive and scaled up, s.t.

[A BB A

]is positive definite

(Gershgorin Circle Theorem).

To work on a matrix on a distributed memory machine, it is dividedinto many subblocks of a certain block sizes nb × nb.

Numerical Tests:

1. Test impact of block size on performance for n = 25, 000.

2. Test strong scalability for medium sized matrices n = 25, 000.

3. Compare runtimes for larger matrices up to n = 75, 000.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 14/25

Page 26: Opportunities for ELPA to Accelerate the Solution of the ...

Block Size in BSEPACK and BSEPACK+ELPA

0 50 100 150 200 250 3000

1,000

2,000

3,000

4,000

5,000

Block Size nb

Ru

nti

mes

ins

BSEPACK

BSEPACK+ELPA

Figure: Runtimes for different block sizes at n = 25, 000.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 15/25

Page 27: Opportunities for ELPA to Accelerate the Solution of the ...

Scalability of BSEPACK and BSEPACK+ELPA

0 5 10 15 20 25 30 350

2,000

4,000

6,000

Number of Nodes

Ru

nti

me

ins

BSEPACK, nb = 64

BSEPACK nb = 256

BSEPACK + ELPA, nb = 64

Figure: Strong Scalability for medium sized matrix (n = 25, 000) in BSEPACK andBSEPACK enhanced with ELPA. 32 threads were used per node.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 16/25

Page 28: Opportunities for ELPA to Accelerate the Solution of the ...

Scalability of BSEPACK and BSEPACK+ELPA

21 22 23 24 25

102

103

104

Number of Nodes

Ru

nti

me

ins

BSEPACK

Tridiagonalization (ScaLAPACK)

BSEPACK + ELPA

Hermitian Solve (ELPA)

Perfect Scaling

Figure: Strong Scalability for medium sized matrix (n = 25, 000) in BSEPACK andBSEPACK enhanced with ELPA. 32 threads were used per node.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 17/25

Page 29: Opportunities for ELPA to Accelerate the Solution of the ...

Runtimes for large matrices

1 2 3 4 5 6 7 8

·104

0

2,000

4,000

6,000

Size of submatrices A,B ∈ Cn×n

Ru

nti

me

ins

BSEPACK, nb = 64

BSEPACK, nb = 256

BSEPACK + ELPA,nb = 64

Figure: Runtimes of BSEPACK and BSEPACK+ELPA for large matrices of size2n × 2n.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 18/25

Page 30: Opportunities for ELPA to Accelerate the Solution of the ...

Runtimes for large matrices

10,000 15,000 25,000 40,000 65,000

102

103

104

Size of submatrices A,B ∈ Cn×n

Ru

nti

me

ins

BSEPACK nb = 64

BSEPACK nb = 256

BSEPACK + ELPA

Exponent 2.79

Exponent 2.40

Exponent 2.64

Figure: Runtimes of BSEPACK and BSEPACK+ELPA for large matrices of size2n × 2n.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 19/25

Page 31: Opportunities for ELPA to Accelerate the Solution of the ...

Using ELPA to enhance BSEPACK

Two promising points of attack for ELPA:

1. Diagonalize complex Hermitian matrix −iW = ZΛZH using ELPA

+ Main portion of the workload could benefit from performance andscalability of ELPA.

− Uses complex arithmetic, while BSEPACK mainly work on real data.

2. Extend the ELPA algorithm for skew-symmetric matrices, just asScaLAPACK was extended in BSEPACK.

+ Easy from mathematical point of view− Major revision of ELPA software stack necessary.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 20/25

Page 32: Opportunities for ELPA to Accelerate the Solution of the ...

ELPA for Skew-Symmetric MatricesStays the same:

Computation of Householder vectors.

Application to off-diagonal blocks.

Tridiagonal solve.

Most of the back transformation of eigenvectors.

All communication and synchronizations.

Change wherever symmetry is implicitly assumed:

Updates on blocks including the diagonal:

A− (I − τvvT )A(I − τvvT ) = A− v (τvTA− 0.5τ2vTAvvT )︸ ︷︷ ︸uT1

− (Aτv − 0.5τ2vvTAv)︸ ︷︷ ︸u2

vT

A = AT ⇒ u2 = u1, A = −AT ⇒ u2 = −u1

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 21/25

Page 33: Opportunities for ELPA to Accelerate the Solution of the ...

ELPA for Skew-Symmetric MatricesStays the same:

Computation of Householder vectors.

Application to off-diagonal blocks.

Tridiagonal solve.

Most of the back transformation of eigenvectors.

All communication and synchronizations.

Change wherever symmetry is implicitly assumed:

Updates on blocks including the diagonal:

A− (I − τvvT )A(I − τvvT ) = A− v (τvTA− 0.5τ2vTAvvT )︸ ︷︷ ︸uT1

− (Aτv − 0.5τ2vvTAv)︸ ︷︷ ︸u2

vT

A = AT ⇒ u2 = u1, A = −AT ⇒ u2 = −u1

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 21/25

Page 34: Opportunities for ELPA to Accelerate the Solution of the ...

ELPA for Skew-Symmetric Matrices

Use skew-symmetric BLAS routines provided by BSEPACK.

DSYR2 → DSSR2

DSYMV → DSSMV

Multiplication with D = diag{1, i , i2, . . . , i2n} in back transformation.

Preliminary results from a smaller compute server.

Bruno

2 Intel ’Haswell’ Xeon E5-2640v3, 8 cores each, @2.6GHz

32 GB main memory per CPU

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 22/25

Page 35: Opportunities for ELPA to Accelerate the Solution of the ...

ELPA for Skew-Symmetric Matrices

Use skew-symmetric BLAS routines provided by BSEPACK.

DSYR2 → DSSR2

DSYMV → DSSMV

Multiplication with D = diag{1, i , i2, . . . , i2n} in back transformation.

Preliminary results from a smaller compute server.

Bruno

2 Intel ’Haswell’ Xeon E5-2640v3, 8 cores each, @2.6GHz

32 GB main memory per CPU

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 22/25

Page 36: Opportunities for ELPA to Accelerate the Solution of the ...

Runtimes for large matrices

0 0.2 0.4 0.6 0.8 1 1.2 1.4

·104

0

500

1,000

1,500

2,000

2,500

Size of submatrices A,B ∈ Cn×n

Ru

nti

me

ins

BSEPACK, nb = 64

BSEPACK + complex ELPA, nb = 64

BSEPACK + skew-symmetric ELPA, nb = 64

Figure: Preliminary results matrices of size 2n × 2n running on 16 cores.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 23/25

Page 37: Opportunities for ELPA to Accelerate the Solution of the ...

Easy further improvements:

Rank-2 Update becomes easier for skew-symmetric matrices:

A− (I − τvvT )A(I − τvvT ) = A− v(τvTA− 0.5τ2 vTAv︸ ︷︷ ︸=0

vT )

− (Aτv − 0.5τ2v vTAv︸ ︷︷ ︸=0

)vT

Eigenvector resulting from tridiagonal solve have to be multiplied bycomplex diagonal D before back transformation.

→ Instead of applying Householder transformation directly, we applythem to identity matrix and then multiply.

Easier to implement but more FLOPs.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 24/25

Page 38: Opportunities for ELPA to Accelerate the Solution of the ...

Future Work

ELPA for skew-symmetric matrices on production level (includingOpenMP and GPU support)

HPC implementations of algorithms for Hamiltonian matrices.

Structure preserving Divide-and-Conquer scheme based on the matrixdisk function / doubling algorithm [Bai, Demmel, Gu ’97] andpermuted Lagrangian Graph Bases [Mehrmann, Poloni ’12].

Thank you for your attention!

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 25/25

Page 39: Opportunities for ELPA to Accelerate the Solution of the ...

The ELPA library can be found athttps://elpa.mpcdf.mpg.de/

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 25/25

Page 40: Opportunities for ELPA to Accelerate the Solution of the ...

M. Shao, F. H. da Jornada, C. Yang, J. Deslippe, and S. G. Louie.Structure preserving parallel algorithms for solving the Bethe-Salpeter eigenvalue problem.Linear Algebra and its Applications, 488(Supplement C):148 – 167, 2016.

P. Benner, H. Faßbender, and C. Yang.Some remarks on the complex J-symmetric eigenproblem.Preprint MPIMD/15-12, Max Planck Institute Magdeburg, July 2015.Available from http://www.mpi-magdeburg.mpg.de/preprints/.

T. Auckenthaler, V. Blum, H.-J. Bungartz, T. Huckle, R. Johanni, L. Krmer, B. Lang, H. Lederer, and P.R. Willems.Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations.Parallel Computing, 37(12):783 – 794, 2011.6th International Workshop on Parallel Matrix Algorithms and Applications (PMAA’10).

Z. Bai, J. Demmel, and M. Gu.An Inverse Free Parallel Spectral Divide and Conquer Algorithm for Nonsymmetric Eigenproblems.76(3):279–308, 1997.

Peter Benner, Heike Fabender, and Chao Yang.Some remarks on the complex J-symmetric eigenproblem.Linear Algebra and its Applications, 544:407 – 442, 2018.

V. Mehrmann and F. Poloni.Doubling algorithms with permuted Lagrangian graph bases.SIAM J. Matrix Anal. Appl., 33(3):780–805, 2012.

[email protected] C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 25/25