Post on 06-Feb-2018
Intception: Automatic generation of codefor the evaluation of molecular integrals
James C. Womack
University of Bristol
11th February 2015
Southampton, 11th February 2015 1/28
Molecular integrals
E =〈Ψ|H|Ψ〉〈Ψ|Ψ〉
E (2) =1
4
∑ijab
|〈ij |ab〉 − 〈ij |ba〉|2
εi + εj − εa − εb
Fαβ = 〈α|h|β〉+∑δγ
Pδγ
[(αβ|γδ)− 1
2(αδ|γβ)
]
〈ij |ab〉 ≡ (ia|jb) =
∫dr1dr2 ψ
∗i (1)ψ∗
j (2)r−112 ψa(1)ψb(2)
|i〉 ≡ ψi (r) =∑α
Ciαφα(r)
|α〉 ≡ φα(r; A, a) =K∑m
dmαg(r; ζmα,A, a)
Southampton, 11th February 2015 2/28
Molecular integrals: GTOs
|a) ≡ g(r; ζa,A, a) = (x − Ax)ax (y − Ay )ay (z − Az)az exp(−ζa|r − A|2)
=∏
i=x ,y ,z
(ri − Ai )ai exp(−ζa(ri − Ai )
2)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
φ(x
)
x
STO ζ = 1.0STO-1GSTO-2GSTO-3G
A = (Ax ,Ay ,Az)
a = (ax , ay , az)
la = ax + ay + az
s: l = 0p: l = 1d: l = 2
(0, 0, 0)(1, 0, 0), (0, 1, 0), (0, 0, 1)(2, 0, 0), (1, 1, 0), . . .
Plot adapted from Szabo, A. & Ostlund, N. S. Modern Quantum Chemistry (Dover, 1996), pp.153–159.
Southampton, 11th February 2015 3/28
Evaluating molecular integrals
(a|b) =
∫dr g(r; ζa,A, a)g(r; ζb,B,b)
(a|b) (a|b) (i|a)(0|0)
Primitive Contracted(AO)
MOPrimitive(l=0)
Recurrence relations: Obara-Saika scheme
(0A|0B) = (π/ζ)3/2 exp(−ξ|A− B|2)
∂
∂Ai|a) = 2ζa|a + 1i )− ai |a− 1i )
(a + 1i |b) = PAi (a|b) +ai2ζ
(a− 1i |b) +bi2ζ
(a|b− 1i )
Obara, S. & Saika, A. J. Chem. Phys. 84, 3963 (1986).
Southampton, 11th February 2015 4/28
Evaluating molecular integrals: “ERIs”
(ab|r−112 |c) =
∫dr1dr2 ga(r1)gb(r1)r−1
12 gc(r2)
Auxiliary indexes
((a + 1i )b|c)(m) = VRR{
(ab|c)(m), (ab|c)(m+1), . . .}
(ab|c)(0) ≡ (ab|r−112 |c)
The Boys function
(0A0B |0C )(m) = f (ζa, ζb, ζc ,A,B,C)Fm(T )
Fm(T ) =
∫ 1
0dt t2m exp(−Tt2)
Obara, S. & Saika, A. J. Chem. Phys. 84, 3963 (1986).
Helgaker, T., Jørgensen, P. & Olsen, J. Molecular Electronic-Structure Theory (Wiley, 2000), pp.365–368.
Southampton, 11th February 2015 5/28
Evaluating molecular integrals
(a|0) (a|0)(0|0)
Primitive(lb=0)
Contracted(lb=0)
Primitive(la=lb=0)
(a|b)VRR HRRContract
Contracted
Contracted functions (AOs)
φα(r; A, a) =K∑m
dmαg(r; ζmα,A, a)
Horizontal recurrence relations (HRRs)
(a(b + 1i )|c) = ((a + 1i )b|c) + ABi (ab|c)
... can be applied to contracted integrals.
Head-Gordon, M. & Pople, J. A. J. Chem. Phys. 89, 5777–5786 (1988).
Southampton, 11th February 2015 6/28
Womack, J. C. & Manby, F. R. J. Chem. Phys. 140, 044118 (2014).
Southampton, 11th February 2015 7/28
DF3-MP2-F12/3*A(FIX) theory
vij ,kl =∑m
〈ijm|r−112 f23|mlk〉
a(1)kl ,mn =
∑i
〈kli |[f12, t1]f23|inm〉
a(2)kl ,mn =
∑i
〈kli |[f12, t2]f23|inm〉
label definition
J-F (a|r−112 |b|f23|c)
(ab|r−112 |c|f23|d)
(a|r−112 |bc|f23|d)
(a|r−112 |b|f23|cd)
F-F (a|f12|b|f23|c)(ab|f12|c|f23|d)(a|f12|bc|f23|d)(a|f12|b|f23|cd)
F-FX (a|f12|b|{∇22f23}|c)
(ab|f12|c|{∇22f23}|d)
(a|f12|bc|{∇22f23}|d)
(a|f12|b|{∇22f23}|cd)
F-FD (a|f12|b|f2λ3|c)
(ab|f12|c|f2λ3|d)
(a|f12|bc|f2λ3|d)
(a|f12|b|f2λ3|cd)
FT1-F (ab|[t1, f12]|c|f23|d)FT2-F (a|[t2, f12]|bc|f23|d)
Womack, J. C. & Manby, F. R. J. Chem. Phys. 140, 044118 (2014).
Southampton, 11th February 2015 8/28
Intception
Purpose
Automatic generation of source code for the evaluation ofmolecular integrals in quantum chemistry
Motivation
Integral evaluation code is time consuming to write.
The software and hardware landscape is constantly shifting.
Cray 2 supercomputer x86 workstation Nvidia Tesla GPGPU
x86 workstation image by Vernon Chan [CC-BY-2.0], via Wikimedia Commons
Southampton, 11th February 2015 9/28
Multi-core CPU
Memory per core is large
Several sophisticated cores
Multiple different threads
Large on-die caches
OpenMP, MPI
GPGPU
Memory per core is small
Many stream processors
Many similar threads
Data transfer bottlenecks
CUDA, OpenCL
New hardware and software revisions may also introduce changesto behaviour that require code modification!
Intel Core i7 image by Eric Gaba, Wikimedia Commons user Sting [CC BY-SA 3.0], via Wikimedia Commons
NVIDIA K80 GPU image by Flickr user GBPublic [CC BY-NC-SA 2.0], via Flickr
Southampton, 11th February 2015 10/28
Automatic code generation in quantum chemistry
Not a new idea!
There are many projects using some form of code generation for:
Derivation and simplication of equations.
Implementation of electronic structure methods.
Optimized method implementation.
A few examples
Tensor Contraction Engine (NWChem)Hirata, S. J. Phys. Chem. A 107, 9887–9897 (2003).
(Local) Integrated Tensor Framework (Molpro)Shamasundar, K. R., Knizia, G. & Werner, H.-J. J. Chem. Phys. 135, 054101 (2011).
Kats, D. & Manby, F. R. J. Chem. Phys. 138, 144101 (2013).
Libint Valeev group, http://www.valeevgroup.chem.vt.edu/software.html
Libcint Sun, arXiv:1412.0649 [physics] (2014); https://github.com/sunqm/libcint
Southampton, 11th February 2015 11/28
Inception of Intception
Input Code generator Source code
Parseinput Translate
Writing a general input parser is difficult!
Avoid this by building on top of the Python language.
In Python, (nearly) “everything is an object”.
A domain-specific language (DSL) can be created usingPython objects with overloaded operators methods.
Southampton, 11th February 2015 12/28
Implementation of Intception
Input script Intception Source code
Integral classes definedin DSL:● Integral indexes● Base expression● Recurrence relations
● Process DSL input ● Construct optimized
algorithm● Output integral
evaluation code
● Can be compiled into a library
● Can be interfacedwith existing packages
Python C
Intception is written entirely in Python 3.
The input is a Python script, written using DSL objects.
C is a lingua franca, allowing wide compatibility with othersoftware packages (we use the C99 standard).
Southampton, 11th February 2015 13/28
Implementation of Intception: DSL
dsl_cartesian_gaussian ga dsl_cartesian_gaussian gb*
dsl_cartesian_gaussian ga
dsl_cartesian_gaussian gb
dsl_binop dsl_op
op_mul
Southampton, 11th February 2015 14/28
Implementation of Intception: DSL
dsl_binopop_pow
left right
dsl_unopop_exp
arg
dsl_binopop_mul
left right
dsl_binopop_mul
left right
int 1.5dsl_binopop_mul
left right
dsl_scalarpi
dsl_scalaro_o_xp
dsl_unopop_neg
argdsl_scalarxaxb_o_xp
dsl_scalarRAB2
Southampton, 11th February 2015 15/28
Implementation of Intception: Modularity
DSL generator
dsl_integral
dsl_rr
dsl_cartesian_gaussian
“overlap_2idx”
ga
dsl_cartesian_gaussian gb
vrra dsl_rr vrrb
dsl_scalar o_o_2xpdsl_positionPA
etc...
dsl_scalar o_o_2xpdsl_positionPB
etc...
base dsl_scalar xaxb_o_xpdsl_scalar RAB2
etc...
In DSL expression: In DSL expression:
In DSL expression:
dsl_scalar xadsl_positionA
dsl_scalar xbdsl_positionB
angmom
angmom
User input
src_dsl_integral
work arraycountersarray indexing
etc...
Source code specific data:
dsl_integral “overlap_2idx”
furtherprocessing
output
integral evaluationsource code
Southampton, 11th February 2015 16/28
Implementation of Intception: Input
Definition of RRs
Creation of dsl_integral object
Expression for (0|0)
Creation of generator object
Source code output
Southampton, 11th February 2015 17/28
Implementation of Intception: Output
VRR sequence
Contraction sequence
HRR sequence
Copy to output array
Base functionContracted
Primitive(00|0)(m)
(a0|c)
(a0|c)
(ab|c)
(ab|c)
(00|0)(m)
(ab|c)
(ab|c)
(00|0)(m)
(a0|c)
(ab|c)
(ab|c)
(00|0)(m)
(ab|c)
(ab|c)
(ab|c)
Southampton, 11th February 2015 18/28
Current status
Currently implemented
Serial code.
Integrals over primitive Cartesian Gaussians.
VRR-only and VRR+HRR algorithms.
Boys function / auxiliary indexes.
Work in progress
Contraction and spherical transformation.
Optimized serial code [1].
Parallel implementation [1].
1. Collaboration with Tom Rumsey (MEng project).
Southampton, 11th February 2015 19/28
Testing and preliminary results
Testing
Testing of generated code for a range of 2- and 3-index integralclasses against Molpro’s built-in primitive integral routines:
Numerical accuracy of generated code verified against Molpro.
Serial execution times typically within 2× Molpro.
Preliminary data example: (a|r−112 |b)
Comparison of generated and Molpro primitive Coulomb integralevaluation subroutines:
All combinations of exponents and angular momentum valuesfrom uncontracted cc-pVTZ (C) basis set.
Representative range of function centres (±2.65 A).
Max difference: 1.02× 10−12 Max RMSD: 4.74× 10−13
Werner, H.-J. et al. MOLPRO, development version (http://www.molpro.net)
Southampton, 11th February 2015 20/28
Preliminary data example: (a|r−112 |b)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
0 20000 40000 60000 80000 100000
Tim
e ta
ken
/ s
Repetitions
MolproGenerated
boys.chm.bris.ac.ukUncontracted C cc-pVTZ basis setIntception master branch 20150209Molpro master branch 20150209No external BLAS, serial execution
Southampton, 11th February 2015 21/28
Is Intception a compiler?
Source code Compiler Object code
● Abstract problem implicit● Algorithm explicit● Target specified by
language choice and compiler options
● Intermediaterepresentation
● Analysis● Optimization● Translation
● Optimized for target● Algorithm defined in
source code
Input script Intception Source code
● Abstract problem explicit● Algorithm parts specified● Target specified in script
● Analysis● Optimization● Assemble algorithm
parts using template● Translation
● Optimized for target● Constructed using
algorithm template
Python C
High level Lower level
Southampton, 11th February 2015 22/28
Is Intception a compiler?
Source code Compiler Object code
High level Lower level
Abstract problem
Intception
via input script
Southampton, 11th February 2015 23/28
Challenges of automatic code generation
Formal definition of implementation details
Hand-coding: one-off, specific implementation.Code generation: formal, general implementation.
Example: dynamic memory allocation
For a given integral class (abcd · · · ) how much memory isrequired? Consider:
Number of indexes.
Type of indexes (primitive, contracted).
Evaluation algorithm (VRR, HRR).
... generate routine for array sizing based on runtime information.
Southampton, 11th February 2015 24/28
Challenges of automatic code generation
A block of (p|p) integrals (la = lb = 1):
m=0
m=1
2×VRR VRR+HRR 2×VRR withauxiliary index
a
b
a
b
a
b
(a + 1i |b) = VRR {(a|b), (a− 1i |b), (a|b− 1i )}(a|b + 1i ) = HRR {(a|b), (a + 1i |b)}
Southampton, 11th February 2015 25/28
The “competition”
Similar projects
Libint[1] : DSL, C++ output, Obara-Saika type RRs,emphasises pre-generated optimized libraries.
Libcint[2] : C output, Dupuis-Rys-King (Gaussianquadrature) method.
Why Intception?
Simple DSL as user input.
Focus on generality, new integral classes.
Designed for modularity and extensibility.
... software ecosystem diversity is also important!
1. Valeev group, http://www.valeevgroup.chem.vt.edu/software.html
2. Sun, arXiv:1412.0649 [physics] (2014); https://github.com/sunqm/libcint
Southampton, 11th February 2015 26/28
The future
Near future
Contraction and spherical transformation.
Optimized parallel code output (multi-core CPU, GPGPU) [1].
Robust testing and benchmarking of generated code inquantum chemical calculations.
Public release in the near future (licensing TBC).
Long term possibilities
Multi-component integral support (e.g. multipole moment).
Other basis function types (e.g. plane waves).
Automatic generation of wrapper code.
Simple API for generated library.
1. Collaboration with Tom Rumsey (MEng project).
Southampton, 11th February 2015 27/28
Acknowledgements
I would like to thank the following people and organizations:
Prof Fred Manby
Tom Rumsey
Centre for Computational Chemistry
EPSRC
University of Bristol
Southampton, 11th February 2015 28/28
Appendix: (a|b) data
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
0 20000 40000 60000 80000 100000
Tim
eta
ken
/s
Repetitions
MolproGenerated
Max difference: 7.11× 10−15 Max RMSD: 4.29× 10−15