Protein Tertiary Structure Prediction

Protein Tertiary Structure Prediction

Structural Bioinformatics

Primary: amino acid linear sequence.

Secondary: -helices, β-sheets and loops.

Tertiary: the 3D shape of the fully folded polypeptide chain

The Different levels of Protein Structure

How can we view the protein structure ?

• Download the coordinates of the structure from the PDB http://www.rcsb.org/pdb/

• Launch a 3D viewer program For example we will use the program Pymol The program can be downloaded freely from the Pymol homepage http://pymol.sourceforge.net/

• Upload the coordinates to the viewer

http://www.rcsb.org/pdb/

http://pymol.sourceforge.net/

Pymol example• Launch Pymol• Open file “1aqb” (PDB coordinate file)• Display sequence• Hide everything• Show main chain / hide main chain• Show cartoon • Color by ss• Color red• Color green, resi 1:40

Help http://pymol.sourceforge.net/newman/user/toc.html

https://webmail.technion.ac.il/horde/util/go.php?url=http%3A%2F%2Fpymol.sourceforge.net%2Fnewman%2Fuser%2Ftoc.html&Horde=436ecc9e2bce8cd6e7f2bb82ecc90c8a

Predicting 3D Structure

– Comparative modeling (homology)

Based on structural homology

– Fold recognition (threading)

Outstanding difficult problem

Based on sequence homology

Comparative ModelingSimilar sequences suggests similar structure

Sequence and Structure alignments of two Retinol Binding Protein

Structure Alignments

The outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures. The RMSD of two aligned structures indicates their divergence from one another.

Low values of RMSD mean similar structures

There are many different algorithms for structural Alignment.

Dali (Distance mAtrix aLIgnment)

DALI offers pairwise alignments of protein structures. The algorithm uses the three-dimensional coordinates of each protein to calculate distance matrices comparing residues.

See Holm L and Sander C (1993) J. Mol. Biol. 233:123-138.

SALIGN http://salilab.org/DBALI/?page=tools

Fold classification based on structure-structurealignment of proteins (FSSP)

Page 293

FSSP is based on a comprehensive comparison ofPDB proteins (greater than 30 amino acids in length) using DALI. Representative sets exclude sequence homologs sharing > 25% amino acid identity.

http://www.ebi.ac.uk/dali/fssp

Comparative Modeling

Comparative structure predictionproduces an all atom model of asequence, based on its alignment to oneor more related protein structures in thedatabase

Similar sequence suggests similar structure

Comparative Modeling• Accuracy of the comparative model is

related to the sequence identity on which it is based

>50% sequence identity = high accuracy

30%-50% sequence identity= 90% modeled

<30% sequence identity =low accuracy (many errors)

Homology Threshold for Different Alignment Lengths

0

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100

Alignment length (L)

Homology Threshold (t)

A sequence alignment between two proteins is considered to imply structural homology if the sequence identity is equal to or above the homology threshold t in a sequence region of a given length L.

The threshold values t(L) are derived from PDB

Comparative Modeling

• Similarity particularly high in core– Alpha helices and beta sheets preserved– Even near-identical sequences vary in loops

Comparative Modeling Methods

MODELLER (Sali –Rockefeller/UCSF)

SCWRL (Dunbrack- UCSF )

SWISS-MODEL http://swissmodel.expasy.org//SWISS-MODEL.html

Comparative ModelingModeling of a sequence based on known structuresConsist of four major steps :1. Finding a known structure(s) related to the sequence

to be modeled (template), using sequence comparison methods such as PSI-BLAST

2. Aligning sequence with the templates

3. Building a model

4. Assessing the model

Fold Recognition

Protein Folds

• A combination of secondary structural units– Forms basic level of classification

• Each protein family belongs to a fold

• Different sequences can share similar folds

Hemoglobin TIM

Protein Folds: sequential and spatial arrangement of secondary structures

Protein Folds




Similar folds usually mean similar function

Homeodomain Transcriptionfactors

Protein Folds




The same fold can have multiple functions

Rossmann

TIM barrel

12 functions

31 functions

Fold classification:

•Class:All alphaAll betaAlpha/betaAlpha+beta

•Fold•Superfamily•Family

SCOP Structure Classification Of Proteins

Retinol Binding Protein

Fold Recognition

• Methods of protein fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity.

• Search for folds that are compatible with a particular sequence.

• "the turn the protein folding problem on it's head” rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence

Basic steps in Fold Recognition :

Compare sequence against a Library of all known Protein Folds (finite number)

Query sequenceQuery sequence

MTYGFRIPLNCERWGHKLSTVILKRP...

Goal: find to what folding template the sequence fits best

There are different ways to evaluate sequence-structure fit

MAHFPGFGQSLLFGYPVYVFGD...

Potential fold

...

1) ... 56) ... n)

...

-10 ... -123 ... 20.5

There are different ways to evaluate sequence-structure fit

Programs for fold recognition

• TOPITS (Rost 1995)

• GenTHREADER (Jones 1999)

• SAMT02 (UCSC HMM)

• 3D-PSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/

Ab Initio Modeling

• Compute molecular structure from laws of physics and chemistry alone Theoretically Ideal solution

Practically nearly impossible

WHY ?– Exceptionally complex calculations– Biophysics understanding incomplete

Ab Initio Methods

• Rosetta (Bakers lab, Seattle)

• Undertaker (Karplus, UCSC)

CASP - Critical Assessment of Structure Prediction

• Competition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally.

• Current state -– ab-initio - the worst, but greatly improved in the last

years. – Modeling - performs very well when homologous

sequences with known structures exist.– Fold recognition - performs well.

What’s Next

Predicting function from structure

Structural Genomics : a large scale structure determination project designed to cover all

representative protein structures

Zarembinski, et al., Proc.Nat.Acad.Sci.USA, 99:15189 (1998)

ATP binding domain of protein MJ0577

~300unique folds

in PDBCurrently

~800 unique folds

~1000- 3000unique folds

in “structure space”

Estimated

Structure Genomics expectations

~ 5 proteins to characterize thesequence space

corresponding to 1 fold

~10000-15000new structures

expected

As a result of the Structure Genomic initiative many structures of proteins with unknown function will be solved

Wanted !Automated methods to predict function from the protein structures resulting from the structural genomic project.

Approaches for predicting function from structure

ConSurf - Mapping the evolution conservation on the protein structure http://consurf.tau.ac.il/


PHPlus – Identifying positive electrostatic patches on the protein structure http://pfp.technion.ac.il/


SHARP2 – Identifying positive electrostatic patches on the protein structure http://www.bioinformatics.sussex.ac.uk/SHARP2

Protein Tertiary Structure Prediction

Documents

Transcript of Protein Tertiary Structure Prediction