Protein Tertiary Structure Prediction
date post
18-Jan-2016Category
Documents
view
51download
0
Embed Size (px)
description
Transcript of Protein Tertiary Structure Prediction
Protein Tertiary Structure PredictionStructural Bioinformatics
Primary: amino acid linear sequence.Secondary: -helices, -sheets and loops.
Tertiary: the 3D shape of the fully folded polypeptide chainThe Different levels of Protein Structure
How can we view the protein structure ?Download the coordinates of the structure from the PDB http://www.rcsb.org/pdb/
Launch a 3D viewer program For example we will use the program Pymol The program can be downloaded freely from the Pymol homepage http://pymol.sourceforge.net/
Upload the coordinates to the viewer
Pymol exampleLaunch PymolOpen file 1aqb (PDB coordinate file)Display sequenceHide everythingShow main chain / hide main chainShow cartoon Color by ssColor redColor green, resi 1:40
Help http://pymol.sourceforge.net/newman/user/toc.html
Predicting 3D Structure
Comparative modeling (homology)
Based on structural homology
Fold recognition (threading)Outstanding difficult problemBased on sequence homology
Comparative ModelingSimilar sequences suggests similar structure
Sequence and Structure alignments of two Retinol Binding Protein
Structure AlignmentsThe outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures. The RMSD of two aligned structures indicates their divergence from one another. Low values of RMSD mean similar structuresThere are many different algorithms for structural Alignment.
Dali (Distance mAtrix aLIgnment)DALI offers pairwise alignments of protein structures. The algorithm uses the three-dimensional coordinates of each protein to calculate distance matrices comparing residues.
See Holm L and Sander C (1993) J. Mol. Biol. 233:123-138.SALIGN http://salilab.org/DBALI/?page=tools
Fold classification based on structure-structurealignment of proteins (FSSP)Page 293FSSP is based on a comprehensive comparison ofPDB proteins (greater than 30 amino acids in length) using DALI. Representative sets exclude sequence homologs sharing > 25% amino acid identity.
http://www.ebi.ac.uk/dali/fssp
Comparative ModelingComparative structure predictionproduces an all atom model of asequence, based on its alignment to oneor more related protein structures in thedatabase
Similar sequence suggests similar structure
- Comparative ModelingAccuracy of the comparative model is related to the sequence identity on which it is based >50% sequence identity = high accuracy 30%-50% sequence identity= 90% modeled
Homology Threshold for Different Alignment Lengths
Alignment length (L) Homology Threshold (t)A sequence alignment between two proteins is considered to imply structural homology if the sequence identity is equal to or above the homology threshold t in a sequence region of a given length L. The threshold values t(L) are derived from PDB
Chart1
79.6
71.9
65.9
61.2
57.2
53.9
51.2
48.7
46.6
44.7
43
39.4
36.6
34.2
32.3
30.6
29.1
27.8
26.7
24.8
Sheet1
1079.6
1271.9
1465.9
1661.2
1857.2
2053.9
2251.2
2448.7
2646.6
2844.7
3043
3539.4
4036.6
4534.2
5032.3
5530.6
6029.1
6527.8
7026.7
8024.8
Sheet1
Sheet2
Sheet3
Comparative ModelingSimilarity particularly high in coreAlpha helices and beta sheets preservedEven near-identical sequences vary in loops
Comparative Modeling MethodsMODELLER (Sali Rockefeller/UCSF)SCWRL (Dunbrack- UCSF )SWISS-MODEL http://swissmodel.expasy.org//SWISS-MODEL.html
Comparative ModelingModeling of a sequence based on known structuresConsist of four major steps :Finding a known structure(s) related to the sequence to be modeled (template), using sequence comparison methods such as PSI-BLAST2. Aligning sequence with the templates3. Building a model4. Assessing the model
Fold Recognition
Protein FoldsA combination of secondary structural unitsForms basic level of classificationEach protein family belongs to a foldDifferent sequences can share similar folds
HemoglobinTIMProtein Folds: sequential and spatial arrangement of secondary structures
Protein FoldsA combination of secondary structural unitsForms basic level of classificationEach protein family belongs to a foldDifferent sequences can share similar folds
Similar folds usually mean similar functionHomeodomain
Transcriptionfactors
Protein FoldsA combination of secondary structural unitsForms basic level of classificationEach protein family belongs to a foldDifferent sequences can share similar folds
The same fold can have multiple functionsRossmannTIM barrel12 functions31 functions
Fold classification:
Class:All alphaAll betaAlpha/betaAlpha+betaFoldSuperfamilyFamilySCOP Structure Classification Of Proteins
Retinol Binding Protein
Fold RecognitionMethods of protein fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity.
Search for folds that are compatible with a particular sequence.
"the turn the protein folding problem on it's head rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence
Basic steps in Fold Recognition :Compare sequence against a Library of all known Protein Folds (finite number)Query sequence MTYGFRIPLNCERWGHKLSTVILKRP...Goal: find to what folding template the sequence fits bestThere are different ways to evaluate sequence-structure fit
MAHFPGFGQSLLFGYPVYVFGD... ... 1) ... 56) ... n) ... -10 ... -123 ... 20.5 There are different ways to evaluate sequence-structure fit
Programs for fold recognitionTOPITS (Rost 1995)GenTHREADER (Jones 1999)SAMT02 (UCSC HMM)3D-PSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/
Ab Initio ModelingCompute molecular structure from laws of physics and chemistry alone Theoretically Ideal solution Practically nearly impossibleWHY ?Exceptionally complex calculationsBiophysics understanding incomplete
Ab Initio MethodsRosetta (Bakers lab, Seattle)
Undertaker (Karplus, UCSC)
CASP - Critical Assessment of Structure PredictionCompetition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally.Current state -ab-initio - the worst, but greatly improved in the last years. Modeling - performs very well when homologous sequences with known structures exist.Fold recognition - performs well.
Whats NextPredicting function from structure
Structural Genomics : a large scale structure determination project designed to cover all representative protein structures
Zarembinski, et al., Proc.Nat.Acad.Sci.USA, 99:15189 (1998)ATP binding domain of protein MJ0577
~300unique foldsin PDB
Currently ~800 unique folds
~1000- 3000unique folds in structure spaceEstimated
Structure Genomics expectations ~ 5 proteins to characterize thesequence space
corresponding to 1 fold~10000-15000new structuresexpected
As a result of the Structure Genomic initiative many structures of proteins with unknown function will be solved
Approaches for predicting function from structure ConSurf - Mapping the evolution conservation on the protein structure http://consurf.tau.ac.il/
Approaches for predicting function from structure PHPlus Identifying positive electrostatic patches on the protein structure http://pfp.technion.ac.il/
Approaches for predicting function from structure SHARP2 Identifying positive electrostatic patches on the protein structure http://www.bioinformatics.sussex.ac.uk/SHARP2
***