CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein...

30
CS612 - Algorithms in Bioinformatics Protein Structure February 5, 2019

Transcript of CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein...

Page 1: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

CS612 - Algorithms in Bioinformatics

Protein Structure

February 5, 2019

Page 2: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Introduction to Protein Structure

A protein is a linear chain of organic molecular building blockscalled amino acids.

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 3: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Introduction to Protein Structure

Amine (NH+3 ), carboxyl (COO−),

C-α and the hydrogen attached toit are called backbone.

All amino acids have the samebackbone.

R (residue) can be anything...

It is called the side chain, and isdifferent for different amino acids.

In nature there are 20 amino acids.Amine(NH+

3 )

6

R

?

H

?

C-α

6

Carboxyl(COO−)

6

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 4: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

D vs. L Amino Acids

C COO−N+H3

R

H

Mirror

CCOO− +NH3

R

H

The dashed lines are pointing away from you, and the boldlines are pointing towards you.

Amino acids in nature are L (the left side of the image).

D Amino acids (right side) can be synthesized, but they nearlynever exist in nature.

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 5: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Amino Acid Names

Name 3-letter 1-letter Side chain Name 3-letter 1-letter Side chain

Glycine GLY G Glutamine GLN Q

Alanine ALA A Asparagine ASN N

Phenylalanine PHE F Lysine LYS K

Leucine LEU L Arginine ARG R

Isoleucine ILE I Serine SER S

Valine VAL V Threonine THR T

Proline PRO P Tyrosine TYR Y

Methionine MET M Glutamic acid GLU E

Tryptophan TRP W Aspartic acid ASP D

Histidine HIS H Cysteine CYS C

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 6: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Hydrophobic Amino Acids

Name 3-letter 1-letter Side chain Name 3-letter 1-letter Side chain

Glycine GLY G Glutamine GLN Q

Alanine ALA A Asparagine ASN N

Phenylalanine PHE F Lysine LYS K

Leucine LEU L Arginine ARG R

Isoleucine ILE I Serine SER S

Valine VAL V Threonine THR T

Proline PRO P Tyrosine TYR Y

Methionine MET M Glutamic acid GLU E

Tryptophan TRP W Aspartic acid ASP D

Histidine HIS H Cysteine CYS C

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 7: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Polar Amino Acids

Name 3-letter 1-letter Side chain Name 3-letter 1-letter Side chain

Glycine GLY G Glutamine GLN Q

Alanine ALA A Asparagine ASN N

Phenylalanine PHE F Lysine LYS K

Leucine LEU L Arginine ARG R

Isoleucine ILE I Serine SER S

Valine VAL V Threonine THR T

Proline PRO P Tyrosine TYR Y

Methionine MET M Glutamic acid GLU E

Tryptophan TRP W Aspartic acid ASP D

Histidine HIS H Cysteine CYS C

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 8: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Charged Amino Acids

Name 3-letter 1-letter Side chain Name 3-letter 1-letter Side chain

Glycine GLY G Glutamine GLN Q

Alanine ALA A Asparagine ASN N

Phenylalanine PHE F Lysine LYS K

Leucine LEU L Arginine ARG R

Isoleucine ILE I Serine SER S

Valine VAL V Threonine THR T

Proline PRO P Tyrosine TYR Y

Methionine MET M Glutamic acid GLU E

Tryptophan TRP W Aspartic acid ASP D

Histidine HIS H Cysteine CYS C

Positive charge

Negative charge

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 9: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Aromatic Amino Acids

Name 3-letter 1-letter Side chain Name 3-letter 1-letter Side chain

Glycine GLY G Glutamine GLN Q

Alanine ALA A Asparagine ASN N

Phenylalanine PHE F Lysine LYS K

Leucine LEU L Arginine ARG R

Isoleucine ILE I Serine SER S

Valine VAL V Threonine THR T

Proline PRO P Tyrosine TYR Y

Methionine MET M Glutamic acid GLU E

Tryptophan TRP W Aspartic acid ASP D

Histidine HIS H Cysteine CYS C

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 10: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Formation of Proteins

During the translation of a gene into a protein, the protein isformed by the sequential joining of amino acids end-to-end toform a long chain-like molecule (polymer).

A polymer of amino acids is often referred to as a polypeptide.

The genome is capable of coding for 20 different amino acidswhose chemical properties. depend on the composition oftheir side chains (”R”).

Thus, to a first approximation, a protein is a sequence ofthese amino acids.

This sequence is called the primary structure of the protein.

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 11: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Formation of Proteins

?

Peptide bond

Amino acid 1 Amino acid 2

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 12: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Polymerization of Amino Acids

N-terminus C-terminus

Peptide bond

�����

Peptide – amino acids polymerized in chain

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 13: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Interactions Between Atoms

���*

Covalent bond

Planar angle������

Dihedral angle

?

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 14: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Dihedral Angles

Π1

Π2

θ

A

B C

D

A

B C

D

A D

+ clockwise– counterclockwise

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 15: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Dihedral Angles

(a) 0 degrees (b) 180 degrees

(c) 90 degrees (d) -90 degrees

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 16: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Backbone Dihedral Angles

N C

N ′R

phi(φ) psi(ψ)

omega(ω)

φ is defined by Ci−1,N,C-α,C – rotation about the N - C-α axis.

It is undefined for the first amino acid.

ψ is defined by Ni ,C-α,C,Ni+1 – rotation about the C-α - C axis.

It is undefined for the last amino acid.

ω is always 180 degrees.

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 17: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Protein Secondary Structures

In order to work properly, aprotein must fold to form aspecific three-dimensionalshape called nativeconformation/structure.

Secondary structure refers tofolding in a small part of theprotein that forms acharacteristic shape.

The most commonsecondary structure elementsare α-helices and β-sheets.

Left: α helix. Right: β sheet

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 18: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Secondary Structure Elements

Repeating values of φand ψ along thepeptide chain resultin regular structures.

These structures arestabilized byinteractions alongatoms on the chain.

Left: α helix. Right: β sheet

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 19: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

α Helices

α helices are characterized by repeating values of φ around -57 andψ around -47.

Ramachandran plot: A scatterplot showing the φ and ψ values ofamino acids. Areas in green are ”allowed” (energetically favorable).

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 20: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

β Strands

β strands are characterized by repeating values of φ around -110 –-140 and ψ around 110 – 135.

These relatively extended strands interact to form β sheets.

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 21: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Parallel and Anti-Parallel β Sheets

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 22: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Coils and Loops

The sections of the peptidechain that link the α-helicesand β-sheets are referred toas turns and loops

Other secondarysubstructure classificationsexist, but are rarely seen inpractice

Sub-units that do not fitinto any other classificationare known as random coils

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 23: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Protein 3-D Structure

The 3-dimensional fold of aprotein is called a tertiarystructure.

Many proteins consist ofmore than one polypeptidefolded together.

The spatial relationshipbetween these separatepolypeptide chains is calledthe quaternary structure.

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 24: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Protein Folds – Mainly α

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 25: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Protein Folds – Mainly β

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 26: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Protein Folds – α/β

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 27: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Protein Structure Determination

How does a protein know its fold?

It is believed that all the information is encoded in its primarystructure (amino acid sequence).

Yet, no algorithm exists as of today to successfully predict thisstructure – the protein folding problem.

There are experimental methods to determine proteinstructure.

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 28: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

X-ray Crystallography

Crystallize the protein.

Pass an X-ray to create adiffraction pattern.

Reconstruct atomic modelfrom electron density map.

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 29: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

NMR (Nuclear Magnetic Resonance) Spectroscopy

Magnetic field is applied to asolution containing the protein.

Atomic nuclei are aligned by thefield.

When unaligned, the nuclei giveoff a typical signal.

Inter-atomic distances can beinferred.

The 3-D structure can bemodeled.

Done in solutions – atoms arefree to move.

Usually produces an ensembleof structures.

Nurit Haspel CS612 - Algorithms in Bioinformatics

Page 30: CS612 - Algorithms in Bioinformaticsnurith/cs612/proteinStructure.pdf · Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called

Cryo-Electron Microscopy (Cryo-EM

Samples are frozen tocryo-temperatures (of liquidnitrogen)

EM is used to obtain animage.

Multiple 2D images are usedto construct a 3D image.

Especially useful for verylarge macromolecules (entireviruses, ribosomes...)

Started at 1nm resolution,nowadays approaching 2-3A.

http://blogs.sciencemag.org/

Nurit Haspel CS612 - Algorithms in Bioinformatics