Alpha-helical Topology and Tertiary Structure Prediction ...
Transcript of Alpha-helical Topology and Tertiary Structure Prediction ...
Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins
Scott R. McAllisterChristodoulos A. Floudas
Princeton University
• Department of Chemical Engineering• Program of Applied and Computational Mathematics• Department of Operations Research and Financial Engineering• Center for Quantitative Biology
OutlineProtein structure prediction overviewPredicting α-helical contacts
Probability developmentModelResults
Predicting α-helical contacts in α/β proteinsDistance boundingModelResults
Structure prediction of α-helical proteinsFrameworkResults
Protein Structure PredictionProblem
Given an amino acid sequence, identify the three-dimensional protein structure
ApproachesHomology modelingFold recognition/threadingFragment assemblyFirst Principles - Optimization
Statistical potentialsPhysics-based potentials
…TLQAETDQLEDEKSALQ…
?
?
?
Floudas, et al. Chemical Engineering Science. 2006, 61:966-988.Floudas. AIChE Journal. 2005, 51:1872-1884.
Floudas, Biotechnology & Bioengineering, 2007.
ASTRO-FOLD
Derivation of Restraints-Dihedral angle restrictions-Cα distance constraints
(Reduced Search Space)
Helix Prediction-Detailed atomistic modeling-Simulations of local interactions
(Free Energy Calculations)
Overall 3D Structure Prediction-Structural data from previous stages-Prediction via novel solution approach
(Global Optimization and Molecular Dynamics)
Loop Structure Prediction-Dihedral angle sampling-Discard conformers by clustering
(Novel Clustering Methodology)
β-sheet Prediction-Novel hydrophobic modeling-Predict list of optimal topologies
(Combinatorial Optimization, MILP)
Klepeis, JL and Floudas, CA. Biophys J. (2003)
OutlineProtein structure prediction overviewPredicting α-helical contacts
Probability developmentModelResults
Predicting α-helical contacts in α/β proteinsDistance boundingModelResults
Structure prediction of α-helical proteinsFrameworkResults
OverviewProblem
Topology prediction of globular α-helical proteinsApproachThesis: Topology is based on certain Inter-helical Hydrophobic to Hydrophobic Contacts
Create a dataset of helical proteinsDevelop inter-helical contact probabilitiesApply two novel mixed-integer optimization models (MILP)
Level 1 - PRIMARY contactsLevel 2 - WHEEL contacts
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Dataset Selection
Protein Sources229 PDBSelect251 database62 CATH2 database20 Zhang et al.3
7 Huang et al.4
RestrictionsNo β-sheets, at least 2 α-helicesNo highly similar sequences
Dataset318 proteins in the database set
1Hobohm, U. and C.Sander. Prot Sci 3 (1994) 522 2Orengo, C.A. et al. Structure 5 (1997) 1093.3Zhang, C. et al. PNAS 99 (2002) 3581.4Huang, E.S. et al. J Mol Biol 290 (1999) 267.
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Probability Development
Contact TypesPRIMARY contact
Minimum distance hydrophobic contact between 4.0 Å and 10.0 Å
WHEEL contact Only WHEEL position hydrophobic contacts between 4.0 Å
and 12.0 Å
Classified as parallel or antiparallel contacts
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Model OverviewFormulation: Maximize inter-helical
residue-residue contact probabilitiesBinary variable indicates antiparallel
helical contactBinary variable indicates residue
contactGoal: Produce a rank-ordered list of
the most likely helical contactsContacts used to restrict conformational
space explored during protein tertiary structure prediction
An,myh
nmjiw ,
,
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Pairwise Model Objective
Level 1 ObjectiveMaximize probability of pairwise residue-
residue contacts
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Pairwise Model Constraints
Level 1 ConstraintsAt most one contact per position
Helix-helix interaction direction
Linking interaction variables
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Pairwise Model Constraints
Level 1 ConstraintsRestrict number of contacts between a given
helix pair (MAX_CONTACT)
Vary the number of helix-helix interactions (SUBTRACT)
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Pairwise Model ConstraintsLevel 1 Constraints
Allow for and Limit helical kinks
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Pairwise Model Constraints
Level 1 ConstraintsConsistent numbering
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
k
i
j
l
Pairwise Model ConstraintsFeasible topologies
m n p
1 1
Pairwise Model Objective
Level 2 ObjectiveMaximize the sum of predicted wheel
probabilities
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Pairwise Model Constraints
Level 2 ConstraintsRequire at most one wheel contact for a
specified primary contact
Level 2 AimDistinguish between equally likely Level 1
predictionsIncrease the total number of contact
predictions
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Results – 2-3 helix bundles
PDB:1mbh in PyMol PDB:1nre in PyMol
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Results – 1nre Contact Predictions
PRIMARY Contact
PRIMARY Distance
WHEEL Contact
WHEEL Distance
Helix-Helix Interaction
25L-49L 6.0 28L-45L 9.1 1-2 A
28L-83V 12.7 - - 1-3 P
45L-85L 9.3 49L-81L 8.1 2-3 A
51I-77L 9.3 - - 2-3 A
subtract 0, max_contact 2
Results – 1hta Contact Predictions
PRIMARY Contact
PRIMARY Distance
Helix-Helix Interaction
5I-28L 9.1 1-2 A
46L-62L 8.4 2-3 A
subtract 0, max_contact 1
Results – Contact Prediction Summary
McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.
Summary
Thesis: Topology of alpha helical globular proteins is based on inter-helical hydrophobic to hydrophobic contactsValidated on alpha helical globular proteins
OutlineProtein structure prediction overviewPredicting α-helical contacts
Probability developmentModelResults
Predicting α-helical contacts in α/β proteinsDistance boundingModelResults
Structure prediction of α-helical proteinsFrameworkResults
OverviewProblem
α-helical topology prediction of globular α/β proteinsApproach
Predict/determine the secondary structure and β-sheet topologyEstablish bounds on inter-residue distancesApply novel optimization model (MILP) to maximize hydropobocity of interhelical interactions
McAllister and Floudas. 2008, In preparation.
Helix Prediction-Detailed atomistic modeling-Simulations of local interactions
(Free Energy Calculations)
β-sheet Prediction-Novel hydrophobic modeling-Predict list of optimal topologies
(Combinatorial Optimization)
Establishing Distance BoundsApproach
Use secondary structure location and β-sheet topologyDevelop local and non-local bounds (PDBSelect25)Local bounds based on residue separation and secondary structure
Establishing Distance BoundsNon-local
Extended β-contacts
“Cross” β-contacts
Tightening Distance BoundsUse of triangle inequality relationships
Model is iteratively applied to determine tightest distance bounds
Objective FunctionMaximize number of hydrophobic interactions between α-helices and hydrophobicity
α values are weight factors
Number Hydrophobicity
PRIFT scale*
*Cornette et al. J Mol Biol. 1987, 195:659-85.
ConstraintsResidue contact constraints
Residue i forms at most one contact with residue in helix n
Residue i forms at most two contact
Additional constraints limiting the size of allowed helix kinks
Similar to constraints for α-helical topology prediction of α-helical proteins
ConstraintsResidue contact constraints
Disallow (i,i+2), (i,i+5), and (i,i+6) residue pairs from both having contacts with helix n
These residues exist on opposite faces of a helix
i
i+2
i+5
i+6
ConstraintsHelix contact constraints
Maximum of 2 helix-helix contacts for a helix
Only 1 helix-helix interaction direction
Ensure feasible topologiesSimilar to constraints for α-helical topology prediction of α-helical proteins
ConstraintsRelating residue contacts to helix contacts
Ensure consistent numbering
k
i
j
l
ConstraintsRelating distances to residue contacts
If residue pair (i,j) forms an inter-helical contact, then dij falls within contact distance
If residue pair (i,j) does not form an inter-helical contact, dij falls beyond contact upper bound
ConstraintsDistance constraints
Satisfies initial bounds
Satisfies triangle inequality constraints
ConstraintsDistance constraints
Restrict distances based on right angle interaction assumption
If residue pair (i,k) is an interhelicalcontact, line segment (i,k) is perpendicular to line segment (i,j)Relationship is used to bound the distance djk
Results1dcjA
1o2fB
Results – 1bm8
Results - SummaryBest average contact distance for 11 of 12 proteins in the test set was less than 11.0 Angstroms
Results – CASP7 T350Prediction with the optimal topology is shown
OutlineProtein structure prediction overviewPredicting α-helical contacts
Probability developmentModelResults
Predicting α-helical contacts in α/β proteinsDistance boundingModelResults
Structure prediction of α-helical proteinsFrameworkResults
ASTRO-FOLD for α-helical Bundles
Overall 3D Structure Prediction-Structural data from previous stages-Prediction via novel solution approach
(Global Optimization and Molecular Dynamics)
Derivation of Restraints-Dihedral angle restrictions-Cα distance constraints
(Reduced Search Space)
Helix Prediction-Detailed atomistic modeling-Simulations of local interactions
(Free Energy Calculations)
Loop Structure Prediction-Dihedral angle sampling-Discard conformers by clustering
(Novel Clustering Methodology)
Interhelical Contacts-Maximize common residue pairs-Rank-order list of topologies
(MILP Optimization Model)
McAllister, Floudas. Proceedings, BIOMAT 2005.
Derivation of RestraintsDihedral angle restraints
For residues with α-helix or β-sheet classificationFor loop residues using the best identified conformer from loop modeling efforts
Distance restraintsHelical hydrogen bond network (i,i+4)α-helical topology predictionsβ-sheet topology predictions
Klepeis, JL and Floudas, CA. Journal of Global Optimization. (2003)
Constrained optimizationProblem definition
Atomistic level force field (ECEPP/3)
Distance constraints
Tertiary Structure PredictionHybrid global optimization approachαBB deterministic global optimizationConformational Space Annealing (CSA)
Modifications/EnhancementsImproved initial point selection using a torsion angle dynamics based annealing procedure from CYANAInclusion of a rotamer optimization stage for quick energetic improvementsStreamlined parallel implementation
αBB Global OptimizationBased on a branch-and-bound frameworkUpper bound on the global solution is obtained by
solving the full nonconvex problem to local optimalityLower bound is determined by solving a valid convex
underestimation of the original problemConvergence is obtained by successive subdivision of
the region at each level in the brand & bound treeGuaranteed ε-convergence for C2 NLPs
Adjiman, CS, et al. Computers and Chemical Engineering. (1998a,b)Floudas, CA and co-workers, 1995-2007
Conformational Space AnnealingInduce variations
MutationsCrossovers
Subject to local energy minimizationAnneal through the gradual reduction of space
Lee, JH, et al. Journal of Computational Chemistry. (1997)Scheraga and co-workers, 1997-2007.
Rotamer Side Chain OptimizationSide chain packing is crucial to the stability and specificity of the native stateRotamer optimization is a quick way to alleviate steric clashesBetter starting point for constrained nonlinear minimization
Torsion Angle DynamicsWhy? Difficult to identify low energy feasible pointsFast evaluation of steric based force fieldUnconstrained formulation with penalty functions
Implemented by solving equations of motion as preprocessing for each constrained minimization
Guntert, P, et al. Journal of Molecular Biology. (1997)Klepeis, JL, et al. Journal of Computational Chemistry. (1999)Klepeis, JL and Floudas, CA. Computers and Chemical Engineering. (2000)
Hybrid Global Optimization AlgorithmAll secondary nodes begin performing αBB iterationsOnce the CSA bank is full, CSA takes control of a subset of secondary nodes
αBB Control CSA ControlPrimaryprocessor
Secondaryprocessors
Idle Work
αBB control•Maintains list of lower bounding subregions•Tracks overall upper and lower bounds•Defines branching directions•Sends and receives work to and from αBB work nodes
CSA control•Maintains CSA bank•Maintains queue of αBB minima for bank increases•Handles bank updates•Sends and receives work to and from CSA work nodes
Idle work•Performs shear movements and perturbations on CSA structures•Only executed during idle time of primary processor
αBB work•Torsion angle dynamics•Rotamer optimization•Minimization of lower bounding function•Minimization of upper bounding function
CSA work•Rotamer optimization•Minimization of CSA trial conformation
McAllister and Floudas. 2007, Submitted for publication.
Results – Tertiary Structure Prediction
PDB: 1nre
Lowest energy predicted structure of 1nre (color) versus native 1nre (gray)
Lowest RMSD predicted structure of 1nre (color) versus native 1nre (gray)
Energy -1395.48RMSD 6.63
Energy -1340.45RMSD 3.52
Results – Tertiary Structure Prediction
PDB: 1hta
Lowest energy predicted structure of 1hta (color) versus native 1hta (gray)
Lowest RMSD predicted structure of 1hta (color) versus native 1hta (gray)
Energy -941.02RMSD 6.70
Energy -915.57RMSD 2.58
Results – Blind Tertiary Structure Prediction(Collaboration with Michael Hecht)
S836
Lowest energy predicted structure of s836 (color) versus native s836 (gray)
Lowest RMSD predicted structure of s836 (color) versus native s836 (gray)
Energy -1740.11RMSD 2.84
Energy –1697.88RMSD 2.39
ConclusionsTwo novel mixed-integer linear programming
models were developed for α-helical topology prediction in α-helical proteins
PRIMARY and WHEEL contactsFor all 26 test α-helical proteins, best average
contact distance predictions fell well below 11.0 ÅA novel mixed-integer linear programming model
was aslo developed for α-helical topology prediction in α/β proteinsFor 11 of 12 test α/β proteins, best average contact
distance predictions fell below 11.0 ÅTopology predictions were useful for restraining the
tertiary structures during global optimization and obtaining a near-native predictions in a blind study
AcknowledgementsFunding sources
National Institutes of Health (R01 GM52032)US EPA (GAD R 832721-010)*
*Disclaimer: This work has not been reviewed by and does not represent the opinions of the funding agency.
Questions