Protein Tertiary Structure Prediction

Click here to load reader

  • date post

    18-Jan-2016
  • Category

    Documents

  • view

    51
  • download

    0

Embed Size (px)

description

Protein Tertiary Structure Prediction. Structural Bioinformatics. The Different levels of Protein Structure. Primary: amino acid linear sequence. Secondary:  -helices, β -sheets and loops. Tertiary : the 3D shape of the fully folded polypeptide chain. - PowerPoint PPT Presentation

Transcript of Protein Tertiary Structure Prediction

  • Protein Tertiary Structure PredictionStructural Bioinformatics

  • Primary: amino acid linear sequence.Secondary: -helices, -sheets and loops.

    Tertiary: the 3D shape of the fully folded polypeptide chainThe Different levels of Protein Structure

  • How can we view the protein structure ?Download the coordinates of the structure from the PDB http://www.rcsb.org/pdb/

    Launch a 3D viewer program For example we will use the program Pymol The program can be downloaded freely from the Pymol homepage http://pymol.sourceforge.net/

    Upload the coordinates to the viewer

  • Pymol exampleLaunch PymolOpen file 1aqb (PDB coordinate file)Display sequenceHide everythingShow main chain / hide main chainShow cartoon Color by ssColor redColor green, resi 1:40

    Help http://pymol.sourceforge.net/newman/user/toc.html

  • Predicting 3D Structure

    Comparative modeling (homology)

    Based on structural homology

    Fold recognition (threading)Outstanding difficult problemBased on sequence homology

  • Comparative ModelingSimilar sequences suggests similar structure

  • Sequence and Structure alignments of two Retinol Binding Protein

  • Structure AlignmentsThe outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures. The RMSD of two aligned structures indicates their divergence from one another. Low values of RMSD mean similar structuresThere are many different algorithms for structural Alignment.

  • Dali (Distance mAtrix aLIgnment)DALI offers pairwise alignments of protein structures. The algorithm uses the three-dimensional coordinates of each protein to calculate distance matrices comparing residues.

    See Holm L and Sander C (1993) J. Mol. Biol. 233:123-138.SALIGN http://salilab.org/DBALI/?page=tools

  • Fold classification based on structure-structurealignment of proteins (FSSP)Page 293FSSP is based on a comprehensive comparison ofPDB proteins (greater than 30 amino acids in length) using DALI. Representative sets exclude sequence homologs sharing > 25% amino acid identity.

    http://www.ebi.ac.uk/dali/fssp

  • Comparative ModelingComparative structure predictionproduces an all atom model of asequence, based on its alignment to oneor more related protein structures in thedatabase

    Similar sequence suggests similar structure

  • Comparative ModelingAccuracy of the comparative model is related to the sequence identity on which it is based >50% sequence identity = high accuracy 30%-50% sequence identity= 90% modeled
  • Homology Threshold for Different Alignment Lengths

    Alignment length (L) Homology Threshold (t)A sequence alignment between two proteins is considered to imply structural homology if the sequence identity is equal to or above the homology threshold t in a sequence region of a given length L. The threshold values t(L) are derived from PDB

    Chart1

    79.6

    71.9

    65.9

    61.2

    57.2

    53.9

    51.2

    48.7

    46.6

    44.7

    43

    39.4

    36.6

    34.2

    32.3

    30.6

    29.1

    27.8

    26.7

    24.8

    Sheet1

    1079.6

    1271.9

    1465.9

    1661.2

    1857.2

    2053.9

    2251.2

    2448.7

    2646.6

    2844.7

    3043

    3539.4

    4036.6

    4534.2

    5032.3

    5530.6

    6029.1

    6527.8

    7026.7

    8024.8

    Sheet1

    Sheet2

    Sheet3

  • Comparative ModelingSimilarity particularly high in coreAlpha helices and beta sheets preservedEven near-identical sequences vary in loops

  • Comparative Modeling MethodsMODELLER (Sali Rockefeller/UCSF)SCWRL (Dunbrack- UCSF )SWISS-MODEL http://swissmodel.expasy.org//SWISS-MODEL.html

  • Comparative ModelingModeling of a sequence based on known structuresConsist of four major steps :Finding a known structure(s) related to the sequence to be modeled (template), using sequence comparison methods such as PSI-BLAST2. Aligning sequence with the templates3. Building a model4. Assessing the model

  • Fold Recognition

  • Protein FoldsA combination of secondary structural unitsForms basic level of classificationEach protein family belongs to a foldDifferent sequences can share similar folds

  • HemoglobinTIMProtein Folds: sequential and spatial arrangement of secondary structures

  • Protein FoldsA combination of secondary structural unitsForms basic level of classificationEach protein family belongs to a foldDifferent sequences can share similar folds

  • Similar folds usually mean similar functionHomeodomain

    Transcriptionfactors

  • Protein FoldsA combination of secondary structural unitsForms basic level of classificationEach protein family belongs to a foldDifferent sequences can share similar folds

  • The same fold can have multiple functionsRossmannTIM barrel12 functions31 functions

  • Fold classification:

    Class:All alphaAll betaAlpha/betaAlpha+betaFoldSuperfamilyFamilySCOP Structure Classification Of Proteins

  • Retinol Binding Protein

  • Fold RecognitionMethods of protein fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity.

    Search for folds that are compatible with a particular sequence.

    "the turn the protein folding problem on it's head rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence

  • Basic steps in Fold Recognition :Compare sequence against a Library of all known Protein Folds (finite number)Query sequence MTYGFRIPLNCERWGHKLSTVILKRP...Goal: find to what folding template the sequence fits bestThere are different ways to evaluate sequence-structure fit

  • MAHFPGFGQSLLFGYPVYVFGD... ... 1) ... 56) ... n) ... -10 ... -123 ... 20.5 There are different ways to evaluate sequence-structure fit

  • Programs for fold recognitionTOPITS (Rost 1995)GenTHREADER (Jones 1999)SAMT02 (UCSC HMM)3D-PSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/

  • Ab Initio ModelingCompute molecular structure from laws of physics and chemistry alone Theoretically Ideal solution Practically nearly impossibleWHY ?Exceptionally complex calculationsBiophysics understanding incomplete

  • Ab Initio MethodsRosetta (Bakers lab, Seattle)

    Undertaker (Karplus, UCSC)

  • CASP - Critical Assessment of Structure PredictionCompetition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally.Current state -ab-initio - the worst, but greatly improved in the last years. Modeling - performs very well when homologous sequences with known structures exist.Fold recognition - performs well.

  • Whats NextPredicting function from structure

  • Structural Genomics : a large scale structure determination project designed to cover all representative protein structures

    Zarembinski, et al., Proc.Nat.Acad.Sci.USA, 99:15189 (1998)ATP binding domain of protein MJ0577

  • ~300unique foldsin PDB

    Currently ~800 unique folds

  • ~1000- 3000unique folds in structure spaceEstimated

  • Structure Genomics expectations ~ 5 proteins to characterize thesequence space

    corresponding to 1 fold~10000-15000new structuresexpected

  • As a result of the Structure Genomic initiative many structures of proteins with unknown function will be solved

  • Approaches for predicting function from structure ConSurf - Mapping the evolution conservation on the protein structure http://consurf.tau.ac.il/

  • Approaches for predicting function from structure PHPlus Identifying positive electrostatic patches on the protein structure http://pfp.technion.ac.il/

  • Approaches for predicting function from structure SHARP2 Identifying positive electrostatic patches on the protein structure http://www.bioinformatics.sussex.ac.uk/SHARP2

    ***