Topics in bioinformatics CS697

of 32 /32
Topics in bioinformatics CS697 Spring 2011 Class 12 – Mar-22-2011 Molecular distance measurements Molecular transformations

Embed Size (px)


Topics in bioinformatics CS697. Spring 2011 Class 12 – Mar-22-2011 Molecular distance measurements Molecular transformations. Rotation in 3D - matrices. Rotations in 3D – Euler angles. * Rotate the XYZ-system about the Z-axis by α. The X-axis now lies on the line of nodes. - PowerPoint PPT Presentation

Transcript of Topics in bioinformatics CS697

Page 1: Topics in bioinformatics CS697

Topics in bioinformaticsCS697

Spring 2011Class 12 – Mar-22-2011

Molecular distance measurementsMolecular transformations

Page 2: Topics in bioinformatics CS697

Rotation in 3D - matrices

Page 3: Topics in bioinformatics CS697

Rotations in 3D – Euler angles

* Rotate the XYZ-system about the Z-axis by α. The X-axis now lies on the line of nodes.

* Rotate the XYZ-system again about the now rotated X-axis by β. The Z-axis is now in its final orientation, and the x-axis remains on the line of nodes.

* Rotate the XYZ-system a third time about the new Z-axis by γ.

Page 4: Topics in bioinformatics CS697


Page 5: Topics in bioinformatics CS697


Extension of complex numbers.

h=a+bi+cj+dk, a,b,c,d real numbers.

i,j,k : imaginary components s.t.:


ij=k, jk=i, ki=j

ij = -ji, jk = -kj, ki = -ik

Page 6: Topics in bioinformatics CS697

Unit quaternions

• Unit quaternions :

• Group under quaternion multiplication.

• can be mapped to:

Page 7: Topics in bioinformatics CS697

Quaternions as axis-angle rotation

Page 8: Topics in bioinformatics CS697

Some resources

Page 9: Topics in bioinformatics CS697

Molecular distance measurements

Page 10: Topics in bioinformatics CS697

Measuring protein structure similarity

• Visual comparison• Dihedral angle comparison• Distance matrix• RMSD (root mean square distance)

Given two “shapes” or structures A and B, we are interested in defining a distance, or similarity measure between A and B.

Is the resulting distance (similarity measure) D a metric?

D(A,B) ≤ D(A,C) + D(C,B)

Page 11: Topics in bioinformatics CS697

Comparing dihedral angles

Torsion angles () are:- local by nature- invariant upon rotation and translation of the molecule- compact (O(n) angles for a protein of n residues)

Add 1 degreeTo all


Page 12: Topics in bioinformatics CS697

Internal distance matrix







5.91 2 3 4

1 0 3.8 6.0 8.1

2 3.8 0 3.8 5.9

3 6.0 3.8 0 3.8

4 8.1 5.9 3.8 0

Page 13: Topics in bioinformatics CS697

Internal distance matrix (2)

• Advantages- invariant with respect to rotation and translation- can be used to compare proteins

• Disadvantages- the distance matrix is O(n2) for a protein with n

residues- comparing distance matrix is a hard problem- insensitive to chirality

Page 14: Topics in bioinformatics CS697

Quality Assessment through RMSD

▆ RMSD: Root Mean Squared Deviation

• one of the simplest measures to quantify how different two protein conformations really are

• advantage: simple to compute by representing conformations as 3N vectors (N atoms)

• limitations: 1) limited to conformations of the same protein chain

2) atom-atom correspondence needed on different-length chains

3) not very descriptive if changes are localized

▆ RMSD: Average atomic distance

• given two conformations of a chain of N atoms

• represent the conformations as two 3N vectors x and y

• RMSD(x,y) is the euclidean distance between x and y, averaged over the N atoms

▆ lRMSD: least RMSD

• same conformation rigidly transformed in space (translated or rotated) should give an RMSD of 0

• before computing RMSD(x,y) one needs to remove changes due to rigid-body transformations

Page 15: Topics in bioinformatics CS697

Protein Structure Superposition

A rigid-body transformation T is a combination of a translation t and a rotation R: T(x) = Rx+t

The quantity to be minimized is:

Where a and b are the two point sets.



∥Rai−bi +t∥2

Page 16: Topics in bioinformatics CS697

The translation part

∂ ε∂t

=2∑i= 1


Rai−bi +t =0E is minimum with respectto t when:

Then: t=−R ∑i=1


a i ∑i=1



If both data sets A and B have been centered on 0, then t = 0 !

Step 1: Translate point sets A and B such that their centroids coincide at the origin of the framework

Page 17: Topics in bioinformatics CS697

The rotation part

Let A and B be the centroids of A' and B', and A and B the matrices containing the coordinates of the points of A and B centered on 0:





μ B=1N



b i

A= [ a1− μ A a2− μ A . . . aN− μ A ]B= [b1−μ B b2− μ B .. . b N− μ B ]

Build covariance matrix: C=ABT

x = 3x3


Page 18: Topics in bioinformatics CS697

The rotation part

U and V are orthogonal matrices, and D is a diagonal matrix containing the singular values.U, V and D are 3x3 matrices


Compute SVD (Singular Value Decomposition) of C:

Define S by:

S= { I if det C 0diag {1,1,−1 } otherwise }



Page 19: Topics in bioinformatics CS697

2. Build covariance matrix:

The algorithm



S= { I if det C 0diag {1,1,−1 } otherwise }

1. Center the two point sets A and B

3. Compute SVD (Singular ValueDecomposition) of C:

5. Compute rotation matrix

4. Define S:


O(N) in time!

6. Compute lRMSD:

lRMSD= ∑i=1


a' i2∑



b' i2−2∑



d i s i


Page 20: Topics in bioinformatics CS697

Some reading material

Page 21: Topics in bioinformatics CS697

lRMSD has Shortcomings

▆ lRMSD cannot capture localized changes: if a small perturbation occurs in a part of the structure, e.g. rotation of a hinge connecting two domains, lRMSD will report a large value

▆Main reason: lRMSD does not know how to attribute changes to specific atoms of the chain

▆ lRMSD distributes change equally (through the averaging) to all atoms in a protein chain

▆Measuring conformational similarity is an active research area

Page 22: Topics in bioinformatics CS697

Other Quality Assessment: Shape Similarity

▆ Sometimes assessment of cavities on the surface of a protein is more important than description of the rest of the structure, especially when the goal is prediction of a binding site rather than of the entire structure (which can be thought of as a scaffold)

▆ Methods that assess surface area, solvent accessible surface area, that compute volumes, and detect cavities on proteins are very important in the context of binding and docking

Model each atom as a vdw sphere, the union of which gives the molecular surface

Not all molecular surface is accessible to solvent. Rolling a solvent ball over the vdw spheres traces out the solvent accessible surface area (SASA) . SASA is important to quantitatively determine interactions of the protein

Page 23: Topics in bioinformatics CS697

Solvent-accessible Solvent Area (SASA)▆ Computational geometry methods that use Delaunay triangulations and alpha shapes assess SASA and

other geometric descriptors of molecular surfaces, volumes, and cavities

▆ We will come back to this topic in the context of molecular docking – further reading about shape computing at

SASA for a 1.4 Å ball SASA for a 1.5 Å ball. Increasing the radius reduces the SASA due to more cavities that a bulkier ball cannot penetrate

Page 24: Topics in bioinformatics CS697

Ultrafast Shape Recognition (USR)

Drug design – Screening a number of potential compounds.

Find a set of molecules which closely resemble a lead molecule from a HUGE database.

Shape similarity may indicate similar binding properties and similar activity.

Page 25: Topics in bioinformatics CS697


Efficient global comparison of molecular shapes.

The molecules are represented as feature vectors, representing the relative positions of the atoms.

Does not require alignment of the molecules.

Suitable for large database search.

Page 26: Topics in bioinformatics CS697

Feature vector representation

The shape of a molecule is uniquely determined by the relative positions of the atoms.

… Which are determined by the inter-atomic distances.

The set of distances can be constrained due to forces that hold the atoms together.

Page 27: Topics in bioinformatics CS697

Strategic feature points

The molecule is described as 4 sets of atomic distance distributions from feature points:Center of mass - ctd

Point closest to ctd - cst

Point farthest from ctd – fct

Point farthest from fct – ftf

The moments of the distributions are calculated and stored as a feature vector.

Estimate of the size, compactness and symmetry of the molecule.

Page 28: Topics in bioinformatics CS697

Feature vector for a molecule

Page 29: Topics in bioinformatics CS697

Similarity score

Page 30: Topics in bioinformatics CS697

Similarity score

Page 31: Topics in bioinformatics CS697

Advantages and disadvantages

Extremely fast due to calculation of only 4N distances and distributions.

Very sensitive to small changes in the molecule shape.

Does not directly account for chemical interactions and atom types.

Page 32: Topics in bioinformatics CS697

Quality Assessment through LGA

▆ Local-Global-Alignment (LGA) introduced by Adam Zemla in 2003 is being used as a more accurate similarity assessment than lRMSD in CASP

▆ LGA generates many different local superpositions to find regions where two conformations are similar: combines longest continuous segment (LCS) and global distance test (GDT) to find local and global similarities

▆ LCS superimposes the longest segments that fit under a selected RMSD cutoff

▆ GDT complements evaluations made with LCS searching for the largest (not necessary continuous) set of ‘equivalent’ residues that deviate by no more than a specified distance cutoff

▆ Further reading: Zemla A., "LGA - a Method for Finding 3D Similarities in Protein Structures", Nucleic Acids Research, 2003, Vol. 31, No. 13, pp. 3370-3374