jcwomack.files.wordpress.com › 2012 › 07 › jcw_cms2015_po… · Intception: Automatic...

1
Intception: Automatic generation of code for the evaluation of molecular integrals J.C. Womack and F.R. Manby Centre for Computational Chemistry, School of Chemistry, University of Bristol, Bristol, BS8 1TS Molecular integrals The need to evaluate integrals over electronic coordinates is a common feature of methods which approximately solve the Schr¨odinger equation for molecules: E = hΨ| ˆ H |Ψi hΨ|Ψi In these electronic structure methods, the molecular wavefunction, |Ψi, is typically expressed in a basis of one-electron functions, leading to integrals over one- and two-electron coordinates, e.g. hi | ˆ f |j i = Z dr ψ * i (r ) ˆ f ψ j (r ) (ia|jb )= Z dr 1 dr 2 ψ * i (r 1 )ψ * j (r 2 )r -1 12 ψ a (r 1 )ψ b (r 2 ) Evaluation of molecular integrals is computationally intensive—in some methods it is the most expensive step in a calculation. The computational implementation of electronic structure methods therefore requires the development of efficient molecular integral evaluation code. Evaluating molecular integrals For molecular calculations, contracted Gaussian-type orbitals (GTOs) are often employed, constructed from fixed linear-combinations of primitive Gaussian functions, e.g. |a) φ a (r ; A, a)= K X m d ma g (r ; ζ m , A, a) |a m ) g (r ; ζ m , A, a)=(x - A x ) a x (y - A y ) a y (z - A z ) a z exp(-ζ m |r - A| 2 ) with centre A, angular momentum vector a, total angular momentum l a = a x + a y + a z , and per-primitive exponent ζ m . Integrals over contracted GTOs are obtained by contraction (and spherical transformation) of integrals over primitive Cartesian Gaussians, e.g. for 2-index overlap: (a|b )=(a m |b n )= K a X m K b X n d ma d nb Z dr g (r ; ζ m , A, a)g (r ; ζ n , B, b) To obtain primitive integrals with higher angular momentum, l , it is generally only necessary to explicitly evaluate the l = 0 case, e.g. (0 A |0 B )=(π/ζ ) 3/2 exp(-ξ |A - B| 2 ) Integrals over primitive Cartesian Gaussians with higher l can then be obtained using (vertical) recurrence relations (VRRs) [1], e.g. (a + 1 i |b)= PA i (a|b)+ a i 2ζ (a - 1 i |b)+ b i 2ζ (a|b - 1 i ) The horizontal recurrence relation (HRR) can be used to shift angular momentum between centres and can be applied to contracted integrals [2]. (a(b + 1 i )|c) = ((a + 1 i )b|c)+ AB i (ab|c) VRRs, HRRs and contractions may be combined to create multiple integral evaluation schemes: (a|0) (a|b) (0|0) Primitive (l b =0) Primitive Primitive (l a =l b =0) (a|b) VRR Contract VRR Contracted (a|0) (a|0) (0|0) Primitive (l b =0) Contracted (l b =0) Primitive (l a =l b =0) (a|b) VRR HRR Contract Contracted Electron repulsion integrals (ERIs) ERIs are the most computationally expensive integral required in many electronic structure methods (e.g. HF, DFT, MP2, CC)—efficient software implementation is vital. (ab|r -1 12 |c)= Z dr 1 dr 2 g a (r 1 )g b (r 1 )r -1 12 g c (r 2 ) To enable use of Cartesian RRs, an integral transform, r -1 12 =2π -1/2 R 0 dug (r 2 ; u 2 , r 1 , 0), is used and an auxiliary index, m, introduced: ((a + 1 i )b|c) (m) = VRR n (ab|c) (m) , (ab|c) (m+1) ,... o When m = 0, the true ERIs are obtained, i.e. (ab|c) (0) (ab|r -1 12 |c) [1]. Additionally, the l = 0 case is complicated by the need to evaluate the Boys function [3]: (0 A 0 B |0 C ) (m) = f (ζ a b c , A, B, C)F m (T ) F m (T )= Z 1 0 dtt 2m exp(-Tt 2 ) Intception Intception is designed to automatically generate molecular integral evaluation code, addressing the following issues: Difficult and time consuming development process I Efficient algorithms may be specific to integral types. I Very efficient code may not be easy to read or debug. I Discourages development of methods requiring new integral classes. A shifting software/hardware environment I Over the lifetime of scientific software, the operating environment of the software may change significantly. I Efficient algorithms are specific to software/hardware environment. I Fully utilising new software/hardware (e.g. GPGPU) requires modification or rewriting of existing code. Cray 2 supercomputer (1980s) x86 workstation (1990-2000s) Nvidia GPGPU (2000s-) References and Acknowledgements [1] Obara, S. & Saika, A. J. Chem. Phys. 84, 3963–3974 (1986). [2] Head-Gordon, M. & Pople, J. A. J. Chem. Phys. 89, 5777–5786 (1988). [3] Helgaker, T., Jørgensen, P. & Olsen, J. Molecular Electronic-Structure Theory (Wiley, 2000), pp.365–368. [4] The Python programming language, version 3.x. https://www.python.org/. [5] ISO/IEC. Programming languages - C (ISO/IEC 9899:1999(E)) (1999). [6] MOLPRO, H.-J. Werner, P. J. Knowles, G. Knizia, F. R. Manby, M. Sch¨ utz, and others , see http://www.molpro.net. [7] Optimisation of code in collaboration with MEng student Tom Rumsey. [8] Dunning Jr., T. H. J. Chem. Phys. 90, 1007–1023 (1989). [9] Wilson, A. K., Woon, D. E., Peterson, K. A. & Dunning Jr., T. H. J. Chem. Phys. 110, 7667–7676 (1999). Image credits: Cray 2 image by NASA [Public domain], via Wikimedia Commons; x86 workstation image by Vernon Chan [CC-BY-2.0], via Wikimedia Commons; Nvidia GPU image by Flickr user GBPublic PR [CC BY-NC-SA 2.0], via Flickr. All images modified to add text descriptions. Intception Input script Intception Source code Integral classes defined in DSL: Integral indexes Base expression Recurrence relations Process DSL input Construct optimized algorithm Output integral evaluation code Can be compiled into a library Can be interfaced with existing packages Python C Key features I Input is written using a domain-specific language (DSL), built using Python [4]. I Users define integrals using RRs and base (l = 0) expression. I Source code generated by customising an “algorithm template”. I Output is in C, allowing wide compatibility with other software packages (C99 standard) [5]. Domain-specific language (DSL) The DSL encapsulates the abstract mathematical problem of molecular integral evaluation using Python classes to represent relevant objects. dsl_binop op_pow left right dsl_unop op_exp arg dsl_binop op_mul left right dsl_binop op_mul left right int 1.5 dsl_binop op_mul left right dsl_scalar pi dsl_scalar o_o_xp dsl_unop op_neg arg dsl_scalar xaxb_o_xp dsl_scalar RAB2 dsl_cartesian_gaussian ga dsl_cartesian_gaussian gb * dsl_cartesian_gaussian ga dsl_cartesian_gaussian gb dsl_binop dsl_op op_mul The DSL modifies the default behaviour of Python operators, allowing expressions comprised of DSL objects to be parsed and manipulated directly within the Python language. Mathematical expressions are represented as trees of binary and unary operations on DSL objects, which may easily be analysed and manipulated in order to generate source code. Integral evaluation algorithm template VRR sequence Contraction sequence HRR sequence Copy to output array Base function Contracted Primitive (00|0) (m) (a0|c) (a0|c) (ab|c) (ab|c) (00|0) (m) (ab|c) (ab|c) (00|0) (m) (a0|c) (ab|c) (ab|c) (00|0) (m) (ab|c) (ab|c) (ab|c) Source code is generated by customising a general template for molecular integral evaluation. The template is customised based on a user-provided description of the integral type (e.g. RRs, expression for l = 0 case) and specification of which algorithm components to use (e.g. VRR, HRR, contraction), written in the DSL. Results Generated molecular integral evaluation code is numerically accurate when compared to existing integral code [6]. It is possible to rapidly generate and test routines using different routes through the algorithm template. At present, generated code is less efficient than the existing code in Molpro, though we anticipate significant improvements with further optimisation [7]. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0 100 200 300 400 500 600 700 800 900 1000 Time taken / s Repetitions H 2 O Molpro built-in Generated (VRR-only) Generated (VRR and HRR) 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 0 100 200 300 400 500 600 700 800 900 1000 Time taken / s Repetitions Kr Molpro built-in Generated (VRR-only) Generated (VRR and HRR) Plots of serial code execution time to evaluate all contracted, spherically transformed 3-index Coulomb integrals (ab |r -1 12 |c ), using a cc-pVDZ basis set [8, 9] for all atoms. Each repetition is a single loop iteration over a call to the integral evaluation routine. Type Algorithm H 2 O Kr Av. mag. Max diff. RMSD Av. mag. Max diff. RMSD (a|b ) VRR-only 10 -1 3.33 × 10 -16 4.68 × 10 -17 10 -1 2.64 × 10 -16 2.73 × 10 -17 (a|b ) VRR and HRR 10 -1 3.55 × 10 -16 6.62 × 10 -17 10 -1 2.22 × 10 -16 1.95 × 10 -17 (a|b |c ) VRR-only 10 -2 4.44 × 10 -16 1.05 × 10 -17 10 -2 1.78 × 10 -15 3.88 × 10 -17 (a|b |c ) VRR and HRR 10 -2 4.44 × 10 -16 1.41 × 10 -17 10 -2 3.55 × 10 -15 4.20 × 10 -17 (a|r -1 12 |b ) VRR-only 1 4.26 × 10 -14 5.68 × 10 -15 1 1.42 × 10 -14 1.35 × 10 -15 (ab |r -1 12 |c ) VRR-only 10 -1 2.35 × 10 -14 1.01 × 10 -15 10 -2 3.55 × 10 -15 1.05 × 10 -16 (ab |r -1 12 |c ) VRR and HRR 10 -1 2.35 × 10 -14 1.03 × 10 -15 10 -2 3.55 × 10 -15 9.76 × 10 -17 Numerical comparison of integrals evaluated using Molpro’s [6] built-in routines and generated routines for several types of contracted, spherically transformed integrals, using a cc-pVDZ basis set [8, 9]. Maximum difference is the largest per-integral difference for an integral type, while the root mean square deviation is calculated for the entire block of integrals of each type. The average (mean) magnitude for all integrals computed of each type is reported to provide context. March 2015 Research funded by EPSRC

Transcript of jcwomack.files.wordpress.com › 2012 › 07 › jcw_cms2015_po… · Intception: Automatic...

Page 1: jcwomack.files.wordpress.com › 2012 › 07 › jcw_cms2015_po… · Intception: Automatic generation of code for the evaluation of molecular integrals J.C. Womack and F.R. Manby

Intception: Automatic generation of codefor the evaluation of molecular integrals

J.C. Womack and F.R. Manby

Centre for Computational Chemistry, School of Chemistry, University of Bristol, Bristol, BS8 1TS

Molecular integrals

The need to evaluate integrals over electronic coordinates is a common feature of methods whichapproximately solve the Schrodinger equation for molecules:

E =〈Ψ|H |Ψ〉〈Ψ|Ψ〉

In these electronic structure methods, the molecular wavefunction, |Ψ〉, is typically expressed in a basis ofone-electron functions, leading to integrals over one- and two-electron coordinates, e.g.

〈i |f |j〉 =

∫drψ∗i (r)f ψj(r) (ia|jb) =

∫dr1dr2ψ

∗i (r1)ψ∗j (r2)r−1

12 ψa(r1)ψb(r2)

Evaluation of molecular integrals is computationally intensive—in some methods it is the most expensivestep in a calculation. The computational implementation of electronic structure methods therefore requiresthe development of efficient molecular integral evaluation code.

Evaluating molecular integrals

For molecular calculations, contracted Gaussian-type orbitals (GTOs) are often employed, constructed fromfixed linear-combinations of primitive Gaussian functions, e.g.

|a) ≡ φa(r; A, a) =K∑m

dmag(r; ζm,A, a)

|am) ≡ g(r; ζm,A, a) = (x − Ax)ax(y − Ay)ay (z − Az)az exp(−ζm|r − A|2)

with centre A, angular momentum vector a, total angular momentum la = ax + ay + az , and per-primitiveexponent ζm. Integrals over contracted GTOs are obtained by contraction (and spherical transformation) ofintegrals over primitive Cartesian Gaussians, e.g. for 2-index overlap:

(a|b) = (am|bn) =

Ka∑m

Kb∑n

dmadnb

∫dr g(r; ζm,A, a)g(r; ζn,B,b)

To obtain primitive integrals with higher angular momentum, l , it is generally only necessary to explicitlyevaluate the l = 0 case, e.g.

(0A|0B) = (π/ζ)3/2 exp(−ξ|A− B|2)

Integrals over primitive Cartesian Gaussians with higher l can then be obtained using (vertical) recurrencerelations (VRRs) [1], e.g.

(a + 1i |b) = PAi(a|b) +ai2ζ

(a− 1i |b) +bi2ζ

(a|b− 1i)

The horizontal recurrence relation (HRR) can be used to shift angular momentum between centres and canbe applied to contracted integrals [2].

(a(b + 1i)|c) = ((a + 1i)b|c) + ABi(ab|c)

VRRs, HRRs and contractions may be combined to create multiple integral evaluation schemes:

(a|0) (a|b)(0|0)

Primitive(lb=0)

PrimitivePrimitive(la=lb=0)

(a|b)VRR ContractVRR

Contracted

(a|0) (a|0)(0|0)

Primitive(lb=0)

Contracted(lb=0)

Primitive(la=lb=0)

(a|b)VRR HRRContract

Contracted

Electron repulsion integrals (ERIs)

ERIs are the most computationally expensive integral required in many electronic structure methods (e.g.HF, DFT, MP2, CC)—efficient software implementation is vital.

(ab|r−112 |c) =

∫dr1dr2 ga(r1)gb(r1)r−1

12 gc(r2)

To enable use of Cartesian RRs, an integral transform, r−112 = 2π−1/2

∫∞0 du g(r2; u2, r1, 0), is used and an

auxiliary index, m, introduced:

((a + 1i)b|c)(m) = VRR{

(ab|c)(m), (ab|c)(m+1), . . .}

When m = 0, the true ERIs are obtained, i.e. (ab|c)(0) ≡ (ab|r−112 |c) [1]. Additionally, the l = 0 case is

complicated by the need to evaluate the Boys function [3]:

(0A0B |0C )(m) = f (ζa, ζb, ζc ,A,B,C)Fm(T ) Fm(T ) =

∫ 1

0dt t2m exp(−Tt2)

Intception

Intception is designed to automatically generate molecular integral evaluation code,addressing the following issues:

Difficult and time consuming development processI Efficient algorithms may be specific to integral types.

I Very efficient code may not be easy to read or debug.

I Discourages development of methods requiring new integral classes.

A shifting software/hardware environmentI Over the lifetime of scientific software, the operating environment of the software

may change significantly.

I Efficient algorithms are specific to software/hardware environment.

I Fully utilising new software/hardware (e.g. GPGPU) requires modification orrewriting of existing code.

Cray 2 supercomputer (1980s)

x86 workstation (1990-2000s)

Nvidia GPGPU (2000s-)

References and Acknowledgements

[1] Obara, S. & Saika, A. J. Chem. Phys. 84, 3963–3974 (1986).[2] Head-Gordon, M. & Pople, J. A. J. Chem. Phys. 89, 5777–5786 (1988).[3] Helgaker, T., Jørgensen, P. & Olsen, J. Molecular Electronic-Structure Theory (Wiley, 2000), pp.365–368.[4] The Python programming language, version 3.x. https://www.python.org/.[5] ISO/IEC. Programming languages - C (ISO/IEC 9899:1999(E)) (1999).[6] MOLPRO, H.-J. Werner, P. J. Knowles, G. Knizia, F. R. Manby, M. Schutz, and others , see http://www.molpro.net.[7] Optimisation of code in collaboration with MEng student Tom Rumsey.[8] Dunning Jr., T. H. J. Chem. Phys. 90, 1007–1023 (1989).[9] Wilson, A. K., Woon, D. E., Peterson, K. A. & Dunning Jr., T. H. J. Chem. Phys. 110, 7667–7676 (1999).

Image credits: Cray 2 image by NASA [Public domain], via Wikimedia Commons; x86 workstation image by Vernon Chan [CC-BY-2.0], via Wikimedia

Commons; Nvidia GPU image by Flickr user GBPublic PR [CC BY-NC-SA 2.0], via Flickr. All images modified to add text descriptions.

Intception

Input script Intception Source code

Integral classes definedin DSL:● Integral indexes● Base expression● Recurrence relations

● Process DSL input ● Construct optimized

algorithm● Output integral

evaluation code

● Can be compiled into a library

● Can be interfacedwith existing packages

Python C

Key featuresI Input is written using a domain-specific language (DSL), built using Python [4].

I Users define integrals using RRs and base (l = 0) expression.

I Source code generated by customising an “algorithm template”.

I Output is in C, allowing wide compatibility with other software packages (C99 standard) [5].

Domain-specific language (DSL)

The DSL encapsulates the abstract mathematical problem of molecular integral evaluation using Python classesto represent relevant objects.

dsl_binopop_pow

left right

dsl_unopop_exp

arg

dsl_binopop_mul

left right

dsl_binopop_mul

left right

int 1.5dsl_binopop_mul

left right

dsl_scalarpi

dsl_scalaro_o_xp

dsl_unopop_neg

argdsl_scalarxaxb_o_xp

dsl_scalarRAB2

dsl_cartesian_gaussian ga dsl_cartesian_gaussian gb*

dsl_cartesian_gaussian ga

dsl_cartesian_gaussian gb

dsl_binop dsl_op

op_mul

The DSL modifies the default behaviour of Python operators, allowing expressions comprised of DSL objects tobe parsed and manipulated directly within the Python language. Mathematical expressions are represented astrees of binary and unary operations on DSL objects, which may easily be analysed and manipulated in order togenerate source code.

Integral evaluation algorithm template

VRR sequence

Contraction sequence

HRR sequence

Copy to output array

Base functionContracted

Primitive(00|0)(m)

(a0|c)

(a0|c)

(ab|c)

(ab|c)

(00|0)(m)

(ab|c)

(ab|c)

(00|0)(m)

(a0|c)

(ab|c)

(ab|c)

(00|0)(m)

(ab|c)

(ab|c)

(ab|c)

Source code is generated by customising a general template for molecular integral evaluation. The template iscustomised based on a user-provided description of the integral type (e.g. RRs, expression for l = 0 case) andspecification of which algorithm components to use (e.g. VRR, HRR, contraction), written in the DSL.

Results

Generated molecular integral evaluation code is numerically accurate when compared toexisting integral code [6]. It is possible to rapidly generate and test routines using differentroutes through the algorithm template. At present, generated code is less efficient than theexisting code in Molpro, though we anticipate significant improvements with furtheroptimisation [7].

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 100 200 300 400 500 600 700 800 900 1000

Tim

eta

ken

/s

Repetitions

H2OMolpro built-inGenerated (VRR-only)Generated (VRR and HRR)

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

18.0

0 100 200 300 400 500 600 700 800 900 1000

Tim

eta

ken

/s

Repetitions

KrMolpro built-inGenerated (VRR-only)Generated (VRR and HRR)

Plots of serial code execution time to evaluate all contracted, spherically transformed 3-index Coulombintegrals (ab|r−1

12 |c), using a cc-pVDZ basis set [8, 9] for all atoms. Each repetition is a single loop iterationover a call to the integral evaluation routine.

Type AlgorithmH2O Kr

Av. mag. Max diff. RMSD Av. mag. Max diff. RMSD

(a|b) VRR-only 10−1 3.33× 10−16 4.68× 10−17 10−1 2.64× 10−16 2.73× 10−17

(a|b) VRR and HRR 10−1 3.55× 10−16 6.62× 10−17 10−1 2.22× 10−16 1.95× 10−17

(a|b|c) VRR-only 10−2 4.44× 10−16 1.05× 10−17 10−2 1.78× 10−15 3.88× 10−17

(a|b|c) VRR and HRR 10−2 4.44× 10−16 1.41× 10−17 10−2 3.55× 10−15 4.20× 10−17

(a|r−112 |b) VRR-only 1 4.26× 10−14 5.68× 10−15 1 1.42× 10−14 1.35× 10−15

(ab|r−112 |c) VRR-only 10−1 2.35× 10−14 1.01× 10−15 10−2 3.55× 10−15 1.05× 10−16

(ab|r−112 |c) VRR and HRR 10−1 2.35× 10−14 1.03× 10−15 10−2 3.55× 10−15 9.76× 10−17

Numerical comparison of integrals evaluated using Molpro’s [6] built-in routines and generated routines forseveral types of contracted, spherically transformed integrals, using a cc-pVDZ basis set [8, 9]. Maximumdifference is the largest per-integral difference for an integral type, while the root mean square deviation iscalculated for the entire block of integrals of each type. The average (mean) magnitude for all integralscomputed of each type is reported to provide context.

March 2015 Research funded by EPSRC