Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChem

Click here to load reader

  • date post

    18-Jan-2015
  • Category

    Science

  • view

    164
  • download

    2

Embed Size (px)

description

Recent advances in reformulating electronic structure algorithms for stream processors such as graphical processing units have made DFT calculations on systems comprising up to O(10 to the 3) atoms feasible. Simulations on such systems that previously required half a week on traditional processors can now be completed in only half an hour. Listen to Professor Heather Kulik, Massachusetts Institute of Technology, as she discusses how she leverages these GPU-accelerated quantum chemistry methods in the code TeraChem to investigate large-scale quantum mechanical features in applications ranging from protein structure to mechanochemical depolymerization. In each case, large-scale and rapid evaluation of electronic structure properties is critical for unearthing previously poorly understood properties and mechanistic features of these systems. Professor Kulik also discusses outstanding challenges in the use of Gaussian localized-basis-set codes on GPUs pertaining to limitations in basis set size and how she circumvents such challenges to computational efficiency with systematic, physics-based error corrections to basis set incompleteness

Transcript of Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChem

  • 1. Heather J. Kulik Assistant Professor, ChemE, MITApril 22, 2014" " " "

2. energy! health! catalysis! Grand challenge: how do we harness and control energy to make useful products?! Computation allows us to understand known processes, predict and design new pathways." " 3. 2 2m 2 +V( r) # $ % & ' (( r) = E( r) many-body Schrodinger equation! Vext" N," E,!"the real! spatial density from single particle orbitals! external potential! The DFT reformulation:! many-body wavefunction! 4. a zoo of XC functionals! Kinetic energy! Coulomb repulsion! Pieces of our DF:! Exchange- correlation (XC) functional! 5. Number of atoms! Classical N log(N)" Empirical N2" Semi- empirical N3" DFT N3" Correlated N5-N7" Exact N!" Accuracy! 1 10 100 1,000 10,000" chemical accuracy! relative rates! may nd TS! Also, sampling!! before! Terachem! DFT- O(N1.8)! 6. then (mid-2000s):! Beowulf clusters! now:! GPU clusters! DFT on a handful of atoms (three to ~100)! DFT or better on three thousand atoms!! TeraChem: see http://petachem.com" 7. 1! 100! 10000! 1000000! 110! 168! 350! 900! time (s)! #atoms! CPU! GPU! Novel architecture & GPU-optimized algorithms:! I.S. Umtsev and T. J. Martinez J. Chem. Theory Comput. 5, 1004 (2009)." 183x" 62x" 33x" 13x" 8. (SS|SS), (SS|SP), (SS|PP), , (DD|DD) Reordering 2e integrals by type:" Coulomb" repulsion" | ( )= r1( ) r1( ) 1 r2 r1 r2( ) r2( )dr1 dr2 I.S. Umtsev and T.J. Martinez J. Chem. Theory Comput., 4, 222 (2008)." 9. Only need high accuracy DP for largest integrals" DP SP | ( ) | ( )1/2 | ( )1/2 Reordering 2e integrals by size:" I.S. Umtsev and T. J. Martinez J. Chem. Theory Comput. 5, 1004 (2009)." 10. System size & complexity! Getting the necessary physics! Unsystematic errors! Energetics ! ! ! ! ! ! (self-interaction)! Charge transfer! O OO O O O O O O 1 2 3 4 Bond rearrangement! Non-adiabatic processes! relativistic effects, dispersion, and so on! Heterogeneity! Conditions! 11. Studying proteins with quantum mechanics! " Mechanochemical depolymerization" " Enzyme catalysis with a methyltransferase" 12. Why we are interested:! Force elds (MM) usually for proteins." BUT limitations remain:" 1) Charge transfer! 2) Bond rearrangement! 3) Polarization! All are key for catalysis in enzymes!! " Open questions: Can QM! do well in cases for which force elds are optimized: prototypical structures? ! QM! MM! QM/MM?! 13. Protein!only! Less!than!30%! similarity! No!ligands!or! modied!residues! 1!En:ty/ chain! 5>35!aa! q! 2! 70k" 15k" 6.7k" 4.7k" 413" 58" Our protein test set selection method! H.J. Kulik, N. Luehr, I.S. Umtsev, and T.J. Martinez JPCB 116, 12501 (2012)." 14. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ GAV L I MFWP S T CYNQ DEKRH 0 20 40 60 80 100 AAFreq. Human Genome_ Total PDB_ non-polar polar charged Yes! Good correspondence to total PDB in primary structure:! Underestimates: His, Cys." Overestimates: Gly/Ala, Trp." " " His! Gly! Ala! Trp!Cys! 15. helix sheet none non-polar polar charged Secondary structure! Sort of We sample some helical and beta sheet secondary structure motifs:! Our small peptides have much higher abundance of loop or disordered regions than is common in globular (i. e. large, natural) proteins. " 16. 1AQG! 1CEK! 1EMZ! 1J4M! 1LB0! 1LB7! 1LBJ! 1LCX! 1LVQ! 1LVR! 1LVZ! 1MZI! 1O53! 1ODP! 1PJD! 1QLO! 1RIJ! 1T2Y! 1UAO! 1V46! 1Y03! 1Y49! 1YJP! 1YT6! 2AP7! 2CEH! 2CSA! 2E4E! 2EVQ! 2FBU! 2FXY! 2FXZ! 2I9M! 2JOF! 2JTA! 2JXF! 2K58! 2K59! 2KJM! 2KNP! 2KUX! 2KVX! 2NX6! 2NX7! 2OL9! 2ONW! 2OQ9! 2PJV! 2PV6! 2RLJ! 2RMW! 2RPS! 3E4H! 3FTK! 3FTR! 3FVA! 3NJW! 3NVG! 58 proteins from Protein Data Bank " 5-35 residues in length+2q-2Normally treated with force elds, now can characterize whole proteins with DFT. 17. RHF, B3LYP, PBEh, BLYP functionals" STO-3G, 3-21g and 6-31g localized basis sets" gas phase, PCM, and MM water-solvated results" Optimized structures with 1) Amber ff03 force eld in AMBER or 2) DFT/RHF in TeraChem" 18. RHF! BLYP! B3LYP! WPBEH!" 0% " " 20% " "40% " 60% " " 80% 100% """ convergence problems"converged" Thistalk" convergence problems"converged" 19. Expt." C.M. Isborn, N. Luehr, I.S. Umtsev, and T.J. Martinez JCTC 8 5092 (2012). " 20. MM QM 0.0 0.4 0.8 1.2 1.6 MM QM Expt 0% 5% 10% 15% 20% 6 8 10 75% 100%Clashes Favor. Rama. Poor RotamerRMSD 21. MM QM Expt. 0% 5% 10% 15% 20% 25% 50% 75% 100% Favor. Rama. Poor Rotamer 22. MM QM 0.0 0.4 0.8 1.2 MM QM Expt. 0% 5% 10% 15% MM QM Expt. 0 2 4 6 8 10 MM QM Expt. 0% 25% 50% 75% 100%Clashes Favor. Rama. >0.4 ! Also from unexpected connectivity." 23. M MM QM Expt. 0% 5% 10% 15% xpt. MM QM Expt. 0% 25% 50% 75% 100% Favor. Rama. Poor Rotamer Ramachandran! 24. Method! C-RMSD! Clash/1000! Poor Rot! Good Rama! AMBER! 0.61! 3! 9%! 80%! RHF/STO-3G" 0.70" 40" 15%" 75%" RHF/3-21G" 0.68" 14" 11%" 86%" RHF/6-31G" 0.72" 8" 9%" 86%" Experiment! --! 9! 19%! 80%! Reasoning: RMSD is C positioning only not a signicant basis- dependence, others more sensitive to treatment of O, N, S, etc. " Beyond minimal basis set needed to describe sidechains and secondary structure with RHF:" 25. Method! C-RMSD! Clash/1000! Poor Rot! Good Rama! AMBER! 0.61! 3! 9%! 80%! PBEh/MINI" 0.71" 75" 24%" 69%" PBEh/STO-3G" 0.77" 72" 19%" 71%" PBEh/3-21G" 0.69" 21" 15%" 81%" PBEh/6-31G" 0.63" 9" 13%" 85%" Experiment! --! 9! 19%! 80%! More signicant basis set dependence with PBEh than with RHF:" 26. Method! C-RMSD! Clash/1000! Poor Rot! Good Rama! AMBER! 0.61! 3! 9%! 80%! RHF/MINI" 0.69" 45" 18%" 80%" RHF/MINI-D" 0.67" 44" 18%" 78%" Experiment! --! 9! 19%! 80%! Inclusion of Grimmes D3 empirical dispersion does not change outcome:" Reasoning: peptides are too small, not enough ternary structure for dispersion to matter." 27. Best described by MM: prototypical structuresPDB ID: 1ODPBest described by QM: less ordered structuresPDB ID: 3FTRPDB ID: 1RIJPDB ID: 2RPSPDB ID: 2I9MQM! MM! QM/MM?! 28. Disorder = 1 2 Nres unassignedss Nres + 1 4 NSSint NSS + 1 4 Nres atypical Nres Unhealthy residues" Interruptions in secondary structure" Residues with no/ disordered secondary structure type " Many possible denitions. One which covers key descriptors of disorder:" 29. Chen et al Acta. Crystall. D. (2010)." clashing! rotamers! Ramachandran! >0.4 ! RelativeHealth = HealthMM HealthQM HealthExpt Molprobity scores compared for each protein: negative value means MM is better, positive means QM is better." 30. H.J. Kulik, N. Luehr, I.S. Umtsev, and T.J. Martinez JPCB 116, 12501 (2012)." 31. ! 32. ZF = neutralized N and C termini." MMH2O = solvated in MM water." Selected set of 20 worst offender proteins from original 58. Some of the clashing problem is xed with neutralized termini but not with solvation." 33. Sidechain positioning is greatly improved but protons are still transferring" ZF = neutralized N and C termini." MMH2O = solvated in MM water." 34. Studying proteins with quantum mechanics" " Mechanochemical depolymerization! " Enzyme catalysis with a methyltransferase" 35. OPA: o-phthalaldehyde PPA: poly-o-phthalaldehyde hydrolysis of end caps! capping! Uncapped: Tc=-50 C" " Capped: Tc>100 C" Previously: remove endcap with chemical reaction/light: depolymerization." " Will mechanical bond scission in middle cause depolymerization? " 36. PPA90 PPA26 Polymers above MWmin undergo mechanical bond scission. " " 26 kDa < PPA MWmin < 90 kDa! Experimental conditions:! Dissolved in THF" Low-entanglement ~ 1mg/mL" NaOH to prevent acidic degradation" under Argon @ -15 C" " Pulsed ultrasound" -0.5s on/1.0 s off, 8.7 W/cm2" " Gel ltration to identify product MWs." " 37. A B -" +" 38. M.T. Ong, J. Leiding, H. Tao, A. M. Virshup, and T. J. Martinez JACS (2009)." here Nattach is the number of APs (two in the following) and ni is unit vector directed from the ith AP to its corresponding PP: ni ) ri fix - ri |ri fix - ri| (2) The positions of the APs and PPs are denoted as ri and ri fix spectively. The total force is then given as the vector sum of the initio internal forces and the external force: Ftotal ) Fab initio + Fext (3) Here, we choose idealized xed pulling points which are nsistent with forces that would act on the CB molecule embedded ernal forces and cis-pulling. Superpositions of the reactant, transition te, and product geometries under a range of external forces are shown ow (color scheme matches the one used in plotting the MEPs). MOLECULE xed pulling point! (PP)! attachment point! (AP)! Fi! Fext = Fi ri PP ri AP ri PP ri AP i AP 39. C.E. Diesendruck, G.I. Peterson, H. J. Kulik, J. A. Kaitz, B. D. Mar, P. A. May, S. R. White, T. J. Martinez, A.J. Boydston, and J. S. Moore Nature Chemistry (in press 2014)." dimer! 41 atoms! trimer! 57 atoms! tetramer! 73 atoms! UB3LYP/6-31g calculations" " Tetramer has 73 atoms." " 2ps with 0.25 fs timestep @ 300K" " Wigner initial conditions" " Calculations on tetramer take 1-3 days:" "8000 steps," "10s-6000s/timestep" ! 40. B3LYP& 41. Frame&7& Frame&12& Frame&44& Frame&45& HOMO&LUMO&Mechanism& O OO O O OO OO O + OO OO O + Occ:!2.00! Occ:!0.00! Occ:!0.00! Occ:!0.00! Occ:!2.00! Occ:!2.00! Occ:!1.00! Occ:!1.00! 42. " " 43. Studying proteins with quantum mechanics" " Mechanochemical depolymerization" " Enzyme catalysis with a methyltransferase! 44. Cyclophilin A! Non-local and dynamic" ?! Local and static" Chymotrypsin! J.S. Fraser,et al., Nature (2009).!J. Fastrez and A. R. Fersht, Biochemistry (1973).! 45. SAM! catechol! Mg2+" Y68! E6! W38" W143" K144" Human soluble form, 221 residues, ~3400 atoms." 1) Remote residues inuence catalysis." 2) Methyl transfer is ubiquitous" 3) Enzyme in humans (all tissues)" 4) V108M polymorph key indicator of mental function" 5) Target for antipsychotics and Parkinsons" 46. 1! 7! 10! Model # Ats.Time (s)Reactants637Key res., Rct.631193Key res., Rct.995554Whole protein34196233J. Zhang and J. P. Klinman, JACS (2011)." HJK, J. Zhang, J. P. Klinman and T. J. Martinez (in preparation 2014)." QM/MM models! 10 11 12 13 14 15 16 Ea (kcal/mol) 0 10 20 30 40 50 60 # QM Residues 0 8 16 24 32 Cov.Cuts -4 -3 -2 -1 0 1 Charge Ea QM charge Covalent cuts 47. +K144 +Y68 +E6+W38, W143 48. ! Chromophore excitation energies in PYP" C.M. Isborn, N. Luehr, I.S. Umtsev, and T. J. Martinez JCTC 8 5092 (2012). " 49. GPUs help us to apply DFT to larger and more varied systems. " " TeraChem has been designed from the ground up to exploit scaling over GPU cores." " Our rst results suggest practical DFT can fail in unusual ways. Big systems are hard to study!" " However theres a wide open frontier of work that can be done once DFT on a thousand atoms is routine.! 50. Lots of fun stuff ahead: http://hjklol.mit.edu" Acknowledgements:! " " " " " " Funding: Burroughs Wellcome Fund " " My group at MIT! Tim Ioannidis" John La (UROP)" Dr. Niladri Patra" Natasha Seelam" Lisi Xie" Todd J. Martinez! Martinez Group at Stanford! Prof. Christine Isborn (UC Merced)" Fang Liu" Brendan Mar" Dr. Lee-Ping Wang" Judith Klinman! Klinman Group at Berkeley! Jianyu Zhang" 51. TEST DRIVE K40 GPU - WORLDS FASTEST GPU Upload and run your own codes by remotely accessing a cluster The GPU Test Drive is awesome! We were able to benchmark, gain valuable insight and significant performance improvement. A very big thank you for the opportunity. Richard Heyns, CEO of brytlyt, UK www.nvidia.com/GPUTestDrive 52. UPCOMING GTC EXPRESS WEBINARS April 23: CUDA 6 Features Overview May 1: CUDA 6: Unified Memory May 7: CUDA 6: Drop-in Performance Optimized Libraries May 13: An Overview of AMBER 14 - Creating the World's Fastest Molecular Dynamics Software Package May 14: CUDA 6: Performance Overview June 3: The Next Steps for [email protected] www.gputechconf.com/gtcexpress 53. NVIDIA GLOBAL IMPACT AWARD $150,000 annual award Categories include: disease research, automotive safety, weather prediction Submission deadline: Dec. 12, 2014 Winner announced at GTC 2015 Recognizing groundbreaking work with GPUs in tackling key social and humanitarian problems impact.nvidia.com