Secondary Structure Prediction of proteins

Post on 16-Jan-2017

1.301 views 3 download

Transcript of Secondary Structure Prediction of proteins

Secondary Structure Prediction Of Protein

Protein Sequence +

Structure VIJAY

INRODUCTION

Primary structure (Amino acid sequence)

Secondary structure (α-helix, β-sheet)

Tertiary structure (Three-dimensional structure formed by assembly of secondary

structures)

Quaternary structure (Structure formed by more than one polypeptide chains)

Secondary Structure

Defined as the local conformation of protein backbone

Primary Structure —folding— Secondary Structure

a helix and b sheet

Secondary Structure

Regular Secondary

Structure

(a-helices, b-sheets)

Irregular

Secondary

Structure

(Tight turns,

Random coils,

bulges)

a helix

•common confirmation.

•spiral structure

•Tightly packed coiled polypeptide

backbone, with extending side chains

•Spontaneous

•stabilized by H-bonding between amide

hydrogens and carbonyl oxygens of peptide

bonds.

•R-groups lie on the exterior of the helix

and perpendicular to its axis.

•complete turn of helix —3.6 aminoacyl

residues with distance 0.54 nm

e.g. the keratins- entirely α-helical

Myoglobin- 80% helical

•Glycine and Proline , bulky amino acids,

charged amino acids favor disruption of the

helix.

b sheet

•β-sheets are composed of 2 or more different regions of

stretches of at least 5-10 amino acids.

•The folding and alignment of stretches of the polypeptide

backbone aside one another to form β-sheets is stabilized by

H-bonding between amide hydrogens and carbonyl oxygens

•the peptide backbone of the β sheet is highly extended.

•R groups of adjacent residues point in opposite directions.

• β-sheets are either parallel or antiparallel

b-sheet

(parallel, anti-parallel)

What is secondary structure prediction?

Given a protein sequence (primary structure)

1st step in prediction of protein structure.

Technique concerned with determination of secondary structure of

given polypeptide by locating the Coils Alpha Helix Beta Strands in

plypeptide

GHWIATRGQLIREAYEDYRHFSSECPFIP

Predict its secondary structure content

(C=Coils H=Alpha Helix E=Beta Strands)

CEEEEECHHHHHHHHHHHCCCHHCCCCCC

Why secondary structure prediction?

o secondary structure —tertiary structure prediction

o Protein function prediction

o Protein classification

o Predicting structural change

o detection and alignment of remote homology between proteins

o on detecting transmembrane regions, solvent-accessible residues,

and other important features of molecules

o Detection of hydrophobic region and hydrophilic region

Prediction methods

o Statistical method

o Chou-Fasman method, GOR I-IV

o Nearest neighbors

o NNSSP, SSPAL

o Neural network

o PHD, Psi-Pred, J-Pred

o Support vector machine (SVM)

o HMM

Chou-Fasman algorithm

Chou and fasman in 1978

It is based on assigning a set of prediction value to amino

acid residue in polypeptide and applying an algorithm to the

conformational parameter and positional frequency.

conformational parameter for each amino acid is calculated

by considering the relative frequency of each 20 amino

acid in proteins

By this C=Coils H=Alpha Helix E=Beta Strands are

determined

Also called preference parameter

• A table of prediction value or preference parameter for each of 20 amino acid in alpha helix ,beta plate and turn already calculated and standardised.

• To obtain the prediction value the frequency of amino acids( i) in structure is divided by of all residences in protein (s)

• i/s • The resulting structural parameter of

p(alpha),p(beta),p(turn)vary —0.5 to 1.5 for 20 amino acid

Window is scanned to find a short sequence of amino acid that has high probability to form one type of structure

When 4 out of 6 amino acid have high probability >1.03 the – alpha helix

3 out of 5 amino acid with probability >1.03-beta

RULES

ALGORITHM

o Note preference parameter for 20 aa in peptide

o Scan the window and identify the region where 4 out of

6 contiguous residue have p(alpha helix) >1.00

o Continue scanning in both the direction until the 4

contiguous residue that have an average p(alpha

helix)<1.00,end of helix

o If segment is longer than 5aa and p(alpha helix)>p(beta

sheet )-segment –completely alpha helix

o scan different segment and identify - alpha helix

Identify the region where 3 out of 5 aa have the

value of p( beta sheet) >1.00 ,region is predicted

as beta sheet

Continue scanning both the direction until 4

residue that have p( beta sheet) <1.00

End of beta sheet

average p( beta sheet) >105 and p( beta sheet)

>p(alpha helix) than consider complete segment

as b pleated sheet

If any region is over lapping than consider it as

alpha helix if average p(alpha helix)>p(beta sheet )

Or beta sheet if p(alpha helix)<p(beta sheet )

To identify turn

P(t)=f(j)f(j+1)f(j+2)f(j+3)

J=residual number

result

Accuracy: ~50% ~60%

helix alanine,glutamine,leucine,methionine

Helix breaking proline and glycine

Beta sheet isoleucine,valine,tyrosine

Beta breaking proline,aspargine,glutamine

Turn contains proline(30%),serine(14%),lysine, aspargine(10%)

Glycine(19%),aspartic acid (`18%),serine(13%),tyrosine(11%)

http://www.accelrys.com/product/gcg-wisconsin-package/program-list.html

Out put of Chou-Fasman

GOR METHOD

• GOR(Garnier,Osguthorpe,Robson)1978

• Chou fasman method is based on assumption that each amino

acid individually influence the 2ry structure of sequence

• GOR is based on, amino acid flanking the central amino acid

will influence the 2ry structure

• Consider a peptide central amino acid

side amino acid

• It assume that amino acid up to 8 residue on sides will

influence the 2ry structure of central residue

• 4th version

• 64% accurate

ALGORITHUM

•It uses the sliding window of 17 amino acid

•The side amino acid sequence and alignment is determined to

predict secondary structure of central sequence

•Good for helix than sheet because beta sheet has more inter

sequence hydrogen bonding

•36.5% accurate for beta sheet

•input any amino acid sequence

•Output tells about secondary structure

NEAREST NEIGHBOUR METHOD

o Based on ,short homologues sequences of amino acids have the same secondary structure

o It predicts secondary structure of central homologues segment by neighbour homologues sequences

o By using structural database find some secondary structure of sequence which may be homologues to our target sequence

o Naturally evolved proteins with 35% identical amino acid sequence will have same secondary structure

o Find some sequence which may match with target sequence

o Scoring matrix,MSA

“Singleton” score matrix

Helix Sheet Loop Buried Inter Exposed Buried Inter Exposed Buried Inter Exposed ALA -0.578 -0.119 -0.160 0.010 0.583 0.921 0.023 0.218 0.368 ARG 0.997 -0.507 -0.488 1.267 -0.345 -0.580 0.930 -0.005 -0.032 ASN 0.819 0.090 -0.007 0.844 0.221 0.046 0.030 -0.322 -0.487 ASP 1.050 0.172 -0.426 1.145 0.322 0.061 0.308 -0.224 -0.541 CYS -0.360 0.333 1.831 -0.671 0.003 1.216 -0.690 -0.225 1.216 GLN 1.047 -0.294 -0.939 1.452 0.139 -0.555 1.326 0.486 -0.244 GLU 0.670 -0.313 -0.721 0.999 0.031 -0.494 0.845 0.248 -0.144 GLY 0.414 0.932 0.969 0.177 0.565 0.989 -0.562 -0.299 -0.601 HIS 0.479 -0.223 0.136 0.306 -0.343 -0.014 0.019 -0.285 0.051 ILE -0.551 0.087 1.248 -0.875 -0.182 0.500 -0.166 0.384 1.336 LEU -0.744 -0.218 0.940 -0.411 0.179 0.900 -0.205 0.169 1.217 LYS 1.863 -0.045 -0.865 2.109 -0.017 -0.901 1.925 0.474 -0.498 MET -0.641 -0.183 0.779 -0.269 0.197 0.658 -0.228 0.113 0.714 PHE -0.491 0.057 1.364 -0.649 -0.200 0.776 -0.375 -0.001 1.251 PRO 1.090 0.705 0.236 1.249 0.695 0.145 -0.412 -0.491 -0.641 SER 0.350 0.260 -0.020 0.303 0.058 -0.075 -0.173 -0.210 -0.228 THR 0.291 0.215 0.304 0.156 -0.382 -0.584 -0.012 -0.103 -0.125 TRP -0.379 -0.363 1.178 -0.270 -0.477 0.682 -0.220 -0.099 1.267 TYR -0.111 -0.292 0.942 -0.267 -0.691 0.292 -0.015 -0.176 0.946 VAL -0.374 0.236 1.144 -0.912 -0.334 0.089 -0.030 0.309 0.998

Neural Network Method

•Prediction is done by utilizing the

information of different

DATABASE

•Linear sequence 3D structure of

Polypeptide

Neural network

Input signals are summed and turned into zero or one

3.

J1

J2

J3

J4

Feed-forward multilayer network

Input layer Hidden layer Output layer

neurons

Enter sequences

Compare Prediction to Reality

Adju

st W

eights

Neural network training

Simple Neural Network

With Hidden Layer

out i fij

2

J fjk

1

Jk

kin

j

Simple neural network with hidden layer

A

C

D

E

F

G

H

I

K

L

M

N

P

Q

R

S

T

V

W

Y

.

H

E

L

D (L)

R (E)

Q (E)

G (E)

F (E)

V (E)

P (E)

A (H)

A (H)

Y (H)

V (E)

K (E)

K (E)

Neural network for secondary structure

Summary

Introduction

What is secondary structure prediction

Why

Chou-Fasman method

GOR I-IV

Nearest neighbors

Neural network

Suggested reading:

Chapter 15 in “Current Topics in Computational Molecular Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT Press. 2002.”

Bioinformatics by Cynthia and per jambeck

Bioinformatics by S.C.RASTOGI

Bioinformatics By Andreas

Optional reading: Review by Burkhard Rost:

http://cubic.bioc.columbia.edu/papers/2003_rev_dekker/paper.html

Reference