Alpha-helical Topology and Tertiary Structure Prediction ...

Post on 23-Feb-2022

5 views 1 download

Transcript of Alpha-helical Topology and Tertiary Structure Prediction ...

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins

Scott R. McAllisterChristodoulos A. Floudas

Princeton University

• Department of Chemical Engineering• Program of Applied and Computational Mathematics• Department of Operations Research and Financial Engineering• Center for Quantitative Biology

OutlineProtein structure prediction overviewPredicting α-helical contacts

Probability developmentModelResults

Predicting α-helical contacts in α/β proteinsDistance boundingModelResults

Structure prediction of α-helical proteinsFrameworkResults

Protein Structure PredictionProblem

Given an amino acid sequence, identify the three-dimensional protein structure

ApproachesHomology modelingFold recognition/threadingFragment assemblyFirst Principles - Optimization

Statistical potentialsPhysics-based potentials

…TLQAETDQLEDEKSALQ…

?

?

?

Floudas, et al. Chemical Engineering Science. 2006, 61:966-988.Floudas. AIChE Journal. 2005, 51:1872-1884.

Floudas, Biotechnology & Bioengineering, 2007.

ASTRO-FOLD

Derivation of Restraints-Dihedral angle restrictions-Cα distance constraints

(Reduced Search Space)

Helix Prediction-Detailed atomistic modeling-Simulations of local interactions

(Free Energy Calculations)

Overall 3D Structure Prediction-Structural data from previous stages-Prediction via novel solution approach

(Global Optimization and Molecular Dynamics)

Loop Structure Prediction-Dihedral angle sampling-Discard conformers by clustering

(Novel Clustering Methodology)

β-sheet Prediction-Novel hydrophobic modeling-Predict list of optimal topologies

(Combinatorial Optimization, MILP)

Klepeis, JL and Floudas, CA. Biophys J. (2003)

OutlineProtein structure prediction overviewPredicting α-helical contacts

Probability developmentModelResults

Predicting α-helical contacts in α/β proteinsDistance boundingModelResults

Structure prediction of α-helical proteinsFrameworkResults

OverviewProblem

Topology prediction of globular α-helical proteinsApproachThesis: Topology is based on certain Inter-helical Hydrophobic to Hydrophobic Contacts

Create a dataset of helical proteinsDevelop inter-helical contact probabilitiesApply two novel mixed-integer optimization models (MILP)

Level 1 - PRIMARY contactsLevel 2 - WHEEL contacts

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Dataset Selection

Protein Sources229 PDBSelect251 database62 CATH2 database20 Zhang et al.3

7 Huang et al.4

RestrictionsNo β-sheets, at least 2 α-helicesNo highly similar sequences

Dataset318 proteins in the database set

1Hobohm, U. and C.Sander. Prot Sci 3 (1994) 522 2Orengo, C.A. et al. Structure 5 (1997) 1093.3Zhang, C. et al. PNAS 99 (2002) 3581.4Huang, E.S. et al. J Mol Biol 290 (1999) 267.

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Probability Development

Contact TypesPRIMARY contact

Minimum distance hydrophobic contact between 4.0 Å and 10.0 Å

WHEEL contact Only WHEEL position hydrophobic contacts between 4.0 Å

and 12.0 Å

Classified as parallel or antiparallel contacts

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Model OverviewFormulation: Maximize inter-helical

residue-residue contact probabilitiesBinary variable indicates antiparallel

helical contactBinary variable indicates residue

contactGoal: Produce a rank-ordered list of

the most likely helical contactsContacts used to restrict conformational

space explored during protein tertiary structure prediction

An,myh

nmjiw ,

,

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Objective

Level 1 ObjectiveMaximize probability of pairwise residue-

residue contacts

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Constraints

Level 1 ConstraintsAt most one contact per position

Helix-helix interaction direction

Linking interaction variables

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Constraints

Level 1 ConstraintsRestrict number of contacts between a given

helix pair (MAX_CONTACT)

Vary the number of helix-helix interactions (SUBTRACT)

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model ConstraintsLevel 1 Constraints

Allow for and Limit helical kinks

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Constraints

Level 1 ConstraintsConsistent numbering

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

k

i

j

l

Pairwise Model ConstraintsFeasible topologies

m n p

1 1

Pairwise Model Objective

Level 2 ObjectiveMaximize the sum of predicted wheel

probabilities

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Pairwise Model Constraints

Level 2 ConstraintsRequire at most one wheel contact for a

specified primary contact

Level 2 AimDistinguish between equally likely Level 1

predictionsIncrease the total number of contact

predictions

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Results – 2-3 helix bundles

PDB:1mbh in PyMol PDB:1nre in PyMol

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Results – 1nre Contact Predictions

PRIMARY Contact

PRIMARY Distance

WHEEL Contact

WHEEL Distance

Helix-Helix Interaction

25L-49L 6.0 28L-45L 9.1 1-2 A

28L-83V 12.7 - - 1-3 P

45L-85L 9.3 49L-81L 8.1 2-3 A

51I-77L 9.3 - - 2-3 A

subtract 0, max_contact 2

Results – 1hta Contact Predictions

PRIMARY Contact

PRIMARY Distance

Helix-Helix Interaction

5I-28L 9.1 1-2 A

46L-62L 8.4 2-3 A

subtract 0, max_contact 1

Results – Contact Prediction Summary

McAllister, Mickus, Klepeis, Floudas. Proteins. 2006, 65:930-952.

Summary

Thesis: Topology of alpha helical globular proteins is based on inter-helical hydrophobic to hydrophobic contactsValidated on alpha helical globular proteins

OutlineProtein structure prediction overviewPredicting α-helical contacts

Probability developmentModelResults

Predicting α-helical contacts in α/β proteinsDistance boundingModelResults

Structure prediction of α-helical proteinsFrameworkResults

OverviewProblem

α-helical topology prediction of globular α/β proteinsApproach

Predict/determine the secondary structure and β-sheet topologyEstablish bounds on inter-residue distancesApply novel optimization model (MILP) to maximize hydropobocity of interhelical interactions

McAllister and Floudas. 2008, In preparation.

Helix Prediction-Detailed atomistic modeling-Simulations of local interactions

(Free Energy Calculations)

β-sheet Prediction-Novel hydrophobic modeling-Predict list of optimal topologies

(Combinatorial Optimization)

Establishing Distance BoundsApproach

Use secondary structure location and β-sheet topologyDevelop local and non-local bounds (PDBSelect25)Local bounds based on residue separation and secondary structure

Establishing Distance BoundsNon-local

Extended β-contacts

“Cross” β-contacts

Tightening Distance BoundsUse of triangle inequality relationships

Model is iteratively applied to determine tightest distance bounds

Objective FunctionMaximize number of hydrophobic interactions between α-helices and hydrophobicity

α values are weight factors

Number Hydrophobicity

PRIFT scale*

*Cornette et al. J Mol Biol. 1987, 195:659-85.

ConstraintsResidue contact constraints

Residue i forms at most one contact with residue in helix n

Residue i forms at most two contact

Additional constraints limiting the size of allowed helix kinks

Similar to constraints for α-helical topology prediction of α-helical proteins

ConstraintsResidue contact constraints

Disallow (i,i+2), (i,i+5), and (i,i+6) residue pairs from both having contacts with helix n

These residues exist on opposite faces of a helix

i

i+2

i+5

i+6

ConstraintsHelix contact constraints

Maximum of 2 helix-helix contacts for a helix

Only 1 helix-helix interaction direction

Ensure feasible topologiesSimilar to constraints for α-helical topology prediction of α-helical proteins

ConstraintsRelating residue contacts to helix contacts

Ensure consistent numbering

k

i

j

l

ConstraintsRelating distances to residue contacts

If residue pair (i,j) forms an inter-helical contact, then dij falls within contact distance

If residue pair (i,j) does not form an inter-helical contact, dij falls beyond contact upper bound

ConstraintsDistance constraints

Satisfies initial bounds

Satisfies triangle inequality constraints

ConstraintsDistance constraints

Restrict distances based on right angle interaction assumption

If residue pair (i,k) is an interhelicalcontact, line segment (i,k) is perpendicular to line segment (i,j)Relationship is used to bound the distance djk

Results1dcjA

1o2fB

Results – 1bm8

Results - SummaryBest average contact distance for 11 of 12 proteins in the test set was less than 11.0 Angstroms

Results – CASP7 T350Prediction with the optimal topology is shown

OutlineProtein structure prediction overviewPredicting α-helical contacts

Probability developmentModelResults

Predicting α-helical contacts in α/β proteinsDistance boundingModelResults

Structure prediction of α-helical proteinsFrameworkResults

ASTRO-FOLD for α-helical Bundles

Overall 3D Structure Prediction-Structural data from previous stages-Prediction via novel solution approach

(Global Optimization and Molecular Dynamics)

Derivation of Restraints-Dihedral angle restrictions-Cα distance constraints

(Reduced Search Space)

Helix Prediction-Detailed atomistic modeling-Simulations of local interactions

(Free Energy Calculations)

Loop Structure Prediction-Dihedral angle sampling-Discard conformers by clustering

(Novel Clustering Methodology)

Interhelical Contacts-Maximize common residue pairs-Rank-order list of topologies

(MILP Optimization Model)

McAllister, Floudas. Proceedings, BIOMAT 2005.

Derivation of RestraintsDihedral angle restraints

For residues with α-helix or β-sheet classificationFor loop residues using the best identified conformer from loop modeling efforts

Distance restraintsHelical hydrogen bond network (i,i+4)α-helical topology predictionsβ-sheet topology predictions

Klepeis, JL and Floudas, CA. Journal of Global Optimization. (2003)

Constrained optimizationProblem definition

Atomistic level force field (ECEPP/3)

Distance constraints

Tertiary Structure PredictionHybrid global optimization approachαBB deterministic global optimizationConformational Space Annealing (CSA)

Modifications/EnhancementsImproved initial point selection using a torsion angle dynamics based annealing procedure from CYANAInclusion of a rotamer optimization stage for quick energetic improvementsStreamlined parallel implementation

αBB Global OptimizationBased on a branch-and-bound frameworkUpper bound on the global solution is obtained by

solving the full nonconvex problem to local optimalityLower bound is determined by solving a valid convex

underestimation of the original problemConvergence is obtained by successive subdivision of

the region at each level in the brand & bound treeGuaranteed ε-convergence for C2 NLPs

Adjiman, CS, et al. Computers and Chemical Engineering. (1998a,b)Floudas, CA and co-workers, 1995-2007

Conformational Space AnnealingInduce variations

MutationsCrossovers

Subject to local energy minimizationAnneal through the gradual reduction of space

Lee, JH, et al. Journal of Computational Chemistry. (1997)Scheraga and co-workers, 1997-2007.

Rotamer Side Chain OptimizationSide chain packing is crucial to the stability and specificity of the native stateRotamer optimization is a quick way to alleviate steric clashesBetter starting point for constrained nonlinear minimization

Torsion Angle DynamicsWhy? Difficult to identify low energy feasible pointsFast evaluation of steric based force fieldUnconstrained formulation with penalty functions

Implemented by solving equations of motion as preprocessing for each constrained minimization

Guntert, P, et al. Journal of Molecular Biology. (1997)Klepeis, JL, et al. Journal of Computational Chemistry. (1999)Klepeis, JL and Floudas, CA. Computers and Chemical Engineering. (2000)

Hybrid Global Optimization AlgorithmAll secondary nodes begin performing αBB iterationsOnce the CSA bank is full, CSA takes control of a subset of secondary nodes

αBB Control CSA ControlPrimaryprocessor

Secondaryprocessors

Idle Work

αBB control•Maintains list of lower bounding subregions•Tracks overall upper and lower bounds•Defines branching directions•Sends and receives work to and from αBB work nodes

CSA control•Maintains CSA bank•Maintains queue of αBB minima for bank increases•Handles bank updates•Sends and receives work to and from CSA work nodes

Idle work•Performs shear movements and perturbations on CSA structures•Only executed during idle time of primary processor

αBB work•Torsion angle dynamics•Rotamer optimization•Minimization of lower bounding function•Minimization of upper bounding function

CSA work•Rotamer optimization•Minimization of CSA trial conformation

McAllister and Floudas. 2007, Submitted for publication.

Results – Tertiary Structure Prediction

PDB: 1nre

Lowest energy predicted structure of 1nre (color) versus native 1nre (gray)

Lowest RMSD predicted structure of 1nre (color) versus native 1nre (gray)

Energy -1395.48RMSD 6.63

Energy -1340.45RMSD 3.52

Results – Tertiary Structure Prediction

PDB: 1hta

Lowest energy predicted structure of 1hta (color) versus native 1hta (gray)

Lowest RMSD predicted structure of 1hta (color) versus native 1hta (gray)

Energy -941.02RMSD 6.70

Energy -915.57RMSD 2.58

Results – Blind Tertiary Structure Prediction(Collaboration with Michael Hecht)

S836

Lowest energy predicted structure of s836 (color) versus native s836 (gray)

Lowest RMSD predicted structure of s836 (color) versus native s836 (gray)

Energy -1740.11RMSD 2.84

Energy –1697.88RMSD 2.39

ConclusionsTwo novel mixed-integer linear programming

models were developed for α-helical topology prediction in α-helical proteins

PRIMARY and WHEEL contactsFor all 26 test α-helical proteins, best average

contact distance predictions fell well below 11.0 ÅA novel mixed-integer linear programming model

was aslo developed for α-helical topology prediction in α/β proteinsFor 11 of 12 test α/β proteins, best average contact

distance predictions fell below 11.0 ÅTopology predictions were useful for restraining the

tertiary structures during global optimization and obtaining a near-native predictions in a blind study

AcknowledgementsFunding sources

National Institutes of Health (R01 GM52032)US EPA (GAD R 832721-010)*

*Disclaimer: This work has not been reviewed by and does not represent the opinions of the funding agency.

Questions