Secondary Structure Prediction of proteins

of 38/38
Secondary Structure Prediction Of Protein Protein Sequence + Structure VIJAY
  • date post

    16-Jan-2017
  • Category

    Science

  • view

    1.297
  • download

    3

Embed Size (px)

Transcript of Secondary Structure Prediction of proteins

  • Secondary Structure Prediction Of Protein

    Protein Sequence +

    Structure VIJAY

  • INRODUCTION

    Primary structure (Amino acid sequence)

    Secondary structure -helix, -sheet

    Tertiary structure Three-dimensional structure formed by assembly of secondary

    structures

    Quaternary structure Structure formed by more than one polypeptide chains

  • Secondary Structure

    Defined as the local conformation of protein backbone

    Primary Structure folding Secondary Structure

    a helix and b sheet

    Secondary Structure

    Regular Secondary

    Structure

    (a-helices, b-sheets)

    Irregular

    Secondary

    Structure

    (Tight turns,

    Random coils,

    bulges)

  • a helix

    common confirmation.

    spiral structure

    Tightly packed coiled polypeptide

    backbone, with extending side chains

    Spontaneous

    stabilized by H-bonding between amide

    hydrogens and carbonyl oxygens of peptide

    bonds.

    R-groups lie on the exterior of the helix

    and perpendicular to its axis.

    complete turn of helix 3.6 aminoacyl

    residues with distance 0.54 nm

    e.g. the keratins- entirely -helical

    Myoglobin- 80% helical

  • Glycine and Proline , bulky amino acids,

    charged amino acids favor disruption of the

    helix.

  • b sheet

    -sheets are composed of 2 or more different regions of

    stretches of at least 5-10 amino acids.

    The folding and alignment of stretches of the polypeptide

    backbone aside one another to form -sheets is stabilized by

    H-bonding between amide hydrogens and carbonyl oxygens

    the peptide backbone of the sheet is highly extended.

    R groups of adjacent residues point in opposite directions.

    -sheets are either parallel or antiparallel

  • b-sheet

    (parallel, anti-parallel)

  • What is secondary structure prediction?

    Given a protein sequence (primary structure)

    1st step in prediction of protein structure.

    Technique concerned with determination of secondary structure of

    given polypeptide by locating the Coils Alpha Helix Beta Strands in

    plypeptide

    GHWIATRGQLIREAYEDYRHFSSECPFIP

    Predict its secondary structure content

    (C=Coils H=Alpha Helix E=Beta Strands)

    CEEEEECHHHHHHHHHHHCCCHHCCCCCC

  • Why secondary structure prediction?

    o secondary structure tertiary structure prediction

    o Protein function prediction

    o Protein classification

    o Predicting structural change

    o detection and alignment of remote homology between proteins

    o on detecting transmembrane regions, solvent-accessible residues,

    and other important features of molecules

    o Detection of hydrophobic region and hydrophilic region

  • Prediction methods

    o Statistical method

    o Chou-Fasman method, GOR I-IV

    o Nearest neighbors

    o NNSSP, SSPAL

    o Neural network

    o PHD, Psi-Pred, J-Pred

    o Support vector machine (SVM)

    o HMM

  • Chou-Fasman algorithm

    Chou and fasman in 1978

    It is based on assigning a set of prediction value to amino

    acid residue in polypeptide and applying an algorithm to the

    conformational parameter and positional frequency.

    conformational parameter for each amino acid is calculated

    by considering the relative frequency of each 20 amino

    acid in proteins

    By this C=Coils H=Alpha Helix E=Beta Strands are

    determined

    Also called preference parameter

  • A table of prediction value or preference parameter for each of 20 amino acid in alpha helix ,beta plate and turn already calculated and standardised.

    To obtain the prediction value the frequency of amino acids( i) in structure is divided by of all residences in protein (s)

    i/s The resulting structural parameter of

    p(alpha),p(beta),p(turn)vary 0.5 to 1.5 for 20 amino acid

  • Window is scanned to find a short sequence of amino acid that has high probability to form one type of structure

    When 4 out of 6 amino acid have high probability >1.03 the alpha helix

    3 out of 5 amino acid with probability >1.03-beta

    RULES

  • ALGORITHM

    o Note preference parameter for 20 aa in peptide

    o Scan the window and identify the region where 4 out of 6 contiguous residue have p(alpha helix) >1.00

    o Continue scanning in both the direction until the 4 contiguous residue that have an average p(alpha

    helix)p(beta sheet )-segment completely alpha helix

    o scan different segment and identify - alpha helix

  • Identify the region where 3 out of 5 aa have the

    value of p( beta sheet) >1.00 ,region is predicted

    as beta sheet

    Continue scanning both the direction until 4

    residue that have p( beta sheet) 105 and p( beta sheet)

    >p(alpha helix) than consider complete segment

    as b pleated sheet

  • If any region is over lapping than consider it as

    alpha helix if average p(alpha helix)>p(beta sheet )

    Or beta sheet if p(alpha helix)

  • result

    Accuracy: ~50% ~60%

    helix alanine,glutamine,leucine,methionine

    Helix breaking proline and glycine

    Beta sheet isoleucine,valine,tyrosine

    Beta breaking proline,aspargine,glutamine

    Turn contains proline(30%),serine(14%),lysine, aspargine(10%)

    Glycine(19%),aspartic acid (`18%),serine(13%),tyrosine(11%)

    http://www.accelrys.com/product/gcg-wisconsin-package/program-list.html

  • Out put of Chou-Fasman

  • GOR METHOD

    GOR(Garnier,Osguthorpe,Robson)1978

    Chou fasman method is based on assumption that each amino

    acid individually influence the 2ry structure of sequence

    GOR is based on, amino acid flanking the central amino acid

    will influence the 2ry structure

    Consider a peptide central amino acid

    side amino acid

    It assume that amino acid up to 8 residue on sides will

    influence the 2ry structure of central residue

    4th version

    64% accurate

  • ALGORITHUM

    It uses the sliding window of 17 amino acid The side amino acid sequence and alignment is determined to

    predict secondary structure of central sequence

    Good for helix than sheet because beta sheet has more inter

    sequence hydrogen bonding

    36.5% accurate for beta sheet

    input any amino acid sequence

    Output tells about secondary structure

  • NEAREST NEIGHBOUR METHOD

    o Based on ,short homologues sequences of amino acids have the same secondary structure

    o It predicts secondary structure of central homologues segment by neighbour homologues sequences

    o By using structural database find some secondary structure of sequence which may be homologues to our target sequence

    o Naturally evolved proteins with 35% identical amino acid sequence will have same secondary structure

    o Find some sequence which may match with target sequence

    o Scoring matrix,MSA

  • Singleton score matrix

    Helix Sheet Loop Buried Inter Exposed Buried Inter Exposed Buried Inter Exposed ALA -0.578 -0.119 -0.160 0.010 0.583 0.921 0.023 0.218 0.368 ARG 0.997 -0.507 -0.488 1.267 -0.345 -0.580 0.930 -0.005 -0.032 ASN 0.819 0.090 -0.007 0.844 0.221 0.046 0.030 -0.322 -0.487 ASP 1.050 0.172 -0.426 1.145 0.322 0.061 0.308 -0.224 -0.541 CYS -0.360 0.333 1.831 -0.671 0.003 1.216 -0.690 -0.225 1.216 GLN 1.047 -0.294 -0.939 1.452 0.139 -0.555 1.326 0.486 -0.244 GLU 0.670 -0.313 -0.721 0.999 0.031 -0.494 0.845 0.248 -0.144 GLY 0.414 0.932 0.969 0.177 0.565 0.989 -0.562 -0.299 -0.601 HIS 0.479 -0.223 0.136 0.306 -0.343 -0.014 0.019 -0.285 0.051 ILE -0.551 0.087 1.248 -0.875 -0.182 0.500 -0.166 0.384 1.336 LEU -0.744 -0.218 0.940 -0.411 0.179 0.900 -0.205 0.169 1.217 LYS 1.863 -0.045 -0.865 2.109 -0.017 -0.901 1.925 0.474 -0.498 MET -0.641 -0.183 0.779 -0.269 0.197 0.658 -0.228 0.113 0.714 PHE -0.491 0.057 1.364 -0.649 -0.200 0.776 -0.375 -0.001 1.251 PRO 1.090 0.705 0.236 1.249 0.695 0.145 -0.412 -0.491 -0.641 SER 0.350 0.260 -0.020 0.303 0.058 -0.075 -0.173 -0.210 -0.228 THR 0.291 0.215 0.304 0.156 -0.382 -0.584 -0.012 -0.103 -0.125 TRP -0.379 -0.363 1.178 -0.270 -0.477 0.682 -0.220 -0.099 1.267 TYR -0.111 -0.292 0.942 -0.267 -0.691 0.292 -0.015 -0.176 0.946 VAL -0.374 0.236 1.144 -0.912 -0.334 0.089 -0.030 0.309 0.998

  • Neural Network Method

    Prediction is done by utilizing the

    information of different

    DATABASE

    Linear sequence 3D structure of

    Polypeptide

  • Neural network

    Input signals are summed and turned into zero or one

    3.

    J1

    J2

    J3

    J4

    Feed-forward multilayer network

    Input layer Hidden layer Output layer

    neurons

  • Enter sequences

    Compare Prediction to Reality

    Adju

    st W

    eights

    Neural network training

  • Simple Neural Network

    With Hidden Layer

    out i f ij2

    J f jk1

    Jk

    kin

    j

    Simple neural network with hidden layer

  • A

    C

    D

    E

    F

    G

    H

    I

    K

    L

    M

    N

    P

    Q

    R

    S

    T

    V

    W

    Y

    .

    H

    E

    L

    D (L)

    R (E)

    Q (E)

    G (E)

    F (E)

    V (E)

    P (E)

    A (H)

    A (H)

    Y (H)

    V (E)

    K (E)

    K (E)

    Neural network for secondary structure

  • Summary

    Introduction

    What is secondary structure prediction

    Why

    Chou-Fasman method

    GOR I-IV

    Nearest neighbors

    Neural network

  • Suggested reading:

    Chapter 15 in Current Topics in Computational Molecular Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT Press. 2002.

    Bioinformatics by Cynthia and per jambeck

    Bioinformatics by S.C.RASTOGI

    Bioinformatics By Andreas

    Optional reading: Review by Burkhard Rost:

    http://cubic.bioc.columbia.edu/papers/2003_rev_dekker/paper.html

    Reference

    http://cubic.bioc.columbia.edu/papers/2003_rev_dekker/paper.htmlhttp://cubic.bioc.columbia.edu/papers/2003_rev_dekker/paper.html