CONSIDERATION OF GLYCOSIDIC TORSION ANGLE …

of 171/171
CONSIDERATION OF GLYCOSIDIC TORSION ANGLE PREFERENCES AND CH/π INTERACTIONS IN PROTEIN-CARBOHYDRATE DOCKING by Anita Karen Nivedha (Under the direction of Robert J. Woods) ABSTRACT Carbohydrates play a pivotal role in various life processes including energy metabolism, storage, immune recognition, transportation, signaling and biosynthesis. In these roles, they often interact with other integral components of the living system such as proteins and lipids. An understanding of how these molecules interact can further our knowledge of crucial biological processes, and begins with the knowledge of the three-dimensional structures of these complexes. However, owing to challenges involved in crystallizing oligosaccharide structures, theoretical modeling methods such as molecular docking are often used to predict how oligosaccharides interact with protein receptors. But, docking programs have generalized scoring functions which often produce unnatural oligosaccharide conformations during docking. In this thesis, we present two approaches to improve protein-carbohydrate docking by accounting for specific intra- and intermolecular interaction energies relating to carbohydrates, which are not currently dealt with by existing docking methodologies. In the first approach, we developed a set of Carbohydrate Intrinsic (CHI) energy functions in order to account for intramolecular energies of carbohydrate ligands primarily determined by the conformations of glycosidic torsion angles connecting individual saccharides. This work resulted in the development
  • date post

    05-Feb-2022
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of CONSIDERATION OF GLYCOSIDIC TORSION ANGLE …

CH/π INTERACTIONS IN PROTEIN-CARBOHYDRATE DOCKING
by
ABSTRACT
Carbohydrates play a pivotal role in various life processes including energy metabolism,
storage, immune recognition, transportation, signaling and biosynthesis. In these roles,
they often interact with other integral components of the living system such as proteins
and lipids. An understanding of how these molecules interact can further our knowledge
of crucial biological processes, and begins with the knowledge of the three-dimensional
structures of these complexes. However, owing to challenges involved in crystallizing
oligosaccharide structures, theoretical modeling methods such as molecular docking are
often used to predict how oligosaccharides interact with protein receptors. But, docking
programs have generalized scoring functions which often produce unnatural
oligosaccharide conformations during docking. In this thesis, we present two approaches
to improve protein-carbohydrate docking by accounting for specific intra- and
intermolecular interaction energies relating to carbohydrates, which are not currently
dealt with by existing docking methodologies. In the first approach, we developed a set of
Carbohydrate Intrinsic (CHI) energy functions in order to account for intramolecular
energies of carbohydrate ligands primarily determined by the conformations of glycosidic
torsion angles connecting individual saccharides. This work resulted in the development
of Vina-Carb (incorporation of the CHI energy functions within the scoring function of
AutoDock Vina), which significantly improved the conformations of oligosaccharide
binding mode predictions. In the second approach, we developed a scoring function by
fitting a mathematical model to data from literature describing the energy contributed by
CH/π interactions. This energy function was used to score the crucial interactions
between CH groups lining the carbohydrate ring and the π electron densities in aromatic
amino acids of interacting proteins. Employing the CH/π interaction energy function to
rescore docked protein-carbohydrate complexes improved the rankings of accurate pose
predictions made by both AutoDock Vina and Vina-Carb. The scoring functions
developed and used in this work are transferable and can therefore be used with other
docking programs and also in the refinement of experimental carbohydrate structures.
INDEX WORDS: Autodock, AutoDock Vina, Molecular Docking, Protein-Carbohydrate
Docking, Docking Scoring Functions, Internal Energies, Carbohydrate, Carbohydrate
Intrinsic Energy Functions, CHI Energy Functions, Vina-Carb, Antibody, Antigen,
Lectin, Enzyme, Carbohydrate Binding Module, CH/π Interactions
CONSIDERATION OF GLYCOSIDIC TORSION ANGLE PREFERENCES AND
CH/π INTERACTIONS IN PROTEIN-CARBOHYDRATE DOCKING
by
B. Tech., Vellore Institute of Technology University, India, 2008
A Thesis Submitted to the Graduate Faculty of The University of Georgia in Partial
Fulfillment of the Requirements for the Degree
DOCTOR OF PHILOSOPHY
CH/π INTERACTIONS IN PROTEIN-CARBOHYDRATE DOCKING
by
Committee: James H. Prestegard
The University of Georgia
iv
DEDICATION
I would like to dedicate this work to my beloved parents, Jenetta and Joshwa.
v
ACKNOWLEDGEMENTS
Firstly, I would like to acknowledge and extend my gratitude to my major
Professor, Dr. Robert J. Woods for his support, encouragement, guidance and for giving
me the wonderful opportunity to be a part of the Woods’ Group Family. I would like to
thank my PhD Advisory Committee, Dr. James H. Prestegard, Dr. Liming Cai and Dr.
Donald L. Evans for their valuable advice, insight and suggestions over the years as my
dissertation took shape. I would like to thank colleagues who were directly involved in
my research, Dr. B. Lachele Foley, Dr. Matthew B. Tessier, Dr. Spandana Makeneni and
David F. Thieker. It has been a great learning experience and a pleasure collaborating and
working with each one of you.
I would like to acknowledge the support of my peers in the Woods’ group: Dr.
Arunima Singh, Amika Sood, Dr. Jodi Hadden, Mark Baine, Dr. Xiaocong Wang, Dr.
Keigo Ito, Dr. Oliver Grant, Huimin Hu, Dr. Valerie Murphy, Dr. Mari DeMarco, Mia Ji,
Dr. Elisa Fadda, Dr. Joanne Martin and Dr. Hannah Smith. Matt, thank you for helping
me when I was a newbie in the group, and amongst other things, for teaching me to do
docking, which constitutes a major portion of my dissertation today. Arunima, Amika,
Spandana and Jodi, thank you for being with me through the ups and downs in Graduate
School. Keigo, thank you for helping me with all my QM questions and for your tips on
scientific writing. Mark, thank you for being a huge support during my time in the group
and for all of your efforts in keeping everything around the lab in order.
I am thankful to God for being my Provider and for all of His blessings at every stage
of my life as a graduate student. I would like to acknowledge the unconditional love,
vi
support and encouragement given by Mama and Papa. Thank you for being my greatest
cheerleaders. I would like to extend my heartfelt thanks to Amy, Ashley, Niranjana,
Madison, Jagadish, Cookieday, Adwoa, Ken, Femi, Anna, Ebenezer, Adeline, Savior
Karnik and Manikins, for being there for me, for believing in me, cheering me on and
supporting me throughout Graduate School. I could not have done it without your solid
support.
I would certainly not be where I am if not for all of the wonderful people who have
sown into my life and my career. For them, I am forever grateful.
vii
LIST OF TABLES
Table 4.1 PDB IDs and ligand sequences employed in the study, including the shape
RMSD (SRMSD) values for the ligands generated by GLYCAM, relative to the
crystallographic ligands. ....................................................................................... 25
Table 5.1 Comparison between ADV and VC at the four settings of CHI-coefficient and
CHI-cutoff. ............................................................................................................ 68
Table 5.2 PRMSDmin(5) produced by ADV and VC1|2 for the 12 test systems with ligands
containing 1,6-linkages. ........................................................................................ 69
Table 5.3 Comparison between ADV and VC1|2 for the apo proteins Test Set. ............... 76
Table 6.1 Average rank of accurate PRMSDmin pose predictions by ADV and VC1|2
before and after rescoring as a function of the CH/π interaction energy
coefficients. The systems are divided into different groups based on the number of
detected CH/π interactions. ................................................................................... 95
LIST OF FIGURES
Figure 2.1. An illustration of the conversion from the chain and ring form of glucose. .... 6
Figure 2.2 A representation of two chair conformations of Glucose, namely, 4 C1 and
1 C4.
................................................................................................................................. 7
Figure 2.3 A 1-3 glycosidic linkage formation between a glucopyranose (Glcp) unit and a
galactopyranose (Galp) unit. The D in the name refers to the molecule being
dextrorotatory, which refers to it rotating plane polarized light to the right. .......... 8
Figure 2.4 Carbohydrate epimers: galactose and glucose are C4 epimers, while glucose
and mannose are C2 epimers. .................................................................................. 8
Figure 3.1 a.) Rigid Docking b.) Flexible Ligand Docking .............................................. 14
Figure 3.2 The workflow within the AutoDock Vina algorithm. ..................................... 18
Figure 4.1 (a) Illustration of an antibody with its variable fragment (Fv) aligned to the
grid box. The yellow dot represents the CoM of the CDRs (0,0,0), and the green
dot represents the center of the grid box (0,0,11). (b) Aligned orientation of an
antibody antigen-binding fragment (Fab), with respect to the internal reference
axes. The region in red + pink represents the VH domain (CDRs (red) and
framework regions (pink) of the heavy chain) of the antibody, while the region in
blue represents the VL domain (CDRs (dark blue) and framework regions (cyan)
of the light chain). The X-axis for the alignment was defined by a vector passing
through the CoM of the variable light chain (VL domain, which contains the light
chain CDRs and framework sequences), and the CoM of the variable heavy chain
(VH domain). The Z-axis was defined as a vector normal to the X-axis, and
passing through the CoM of the entire variable region, or variable fragment (Fv).
ix
The antibody was then translated so that the CoM of the CDRs was placed at the
origin. The Y-axis was defined as a vector perpendicular to the XZ-plane, and
passing through the origin. The docking grid box was aligned to the internal co-
ordinate axes with its center offset from the origin by 11Å along the Z-axis, so as
to optimally encompass the CDR loops, while also permitting adequate volume
for the movement of the ligand during docking. Such a definition enabled the
docking grid box to be consistently aligned with respect to the CDRs. ............... 28
Figure 4.2 PRMSD and SRMSD calculation. Shown in (a) and (b) are the PRMSD and
SRMSD, respectively, of a representative docked pose with respect to its crystal
ligand. (a) The PRMSD is the RMSD between the ring atoms of a representative
docked structure (white) and the corresponding crystal structure (black). (b) The
SRMSD is the RMSD value obtained after the docked structure (white) is
superimposed on the crystal structure (black). ..................................................... 30
Figure 4.3 The φ and ψ angle distributions from 100 docked structures, for selected
linkages, as indicated by the dashed rectangle. Data are presented, in order, for
AD3 (black bars), AD4.2 (white bars) and ADV (grey bars). The bin containing
the experimentally-determined values is highlighted with a light blue outline. The
bin containing the structure with the lowest docked energy is indicated as follows:
AD3, yellow; AD4.2, orange; ADV, green. ......................................................... 34
Figure 4.4 Representation of the 8 model disaccharides pertinent to the development of
CHI energy functions. The models depicting 1,2-linkages can be used to model
1,4-linkages due to symmetry about the O5 atom. ................................................ 35
x
Figure 4.5 Individual (dashed lines) and average (solid line) rotational energy curves for
models (see Figure 4.4) whose linkages have similar local geometries. .............. 36
Figure 4.6 Comparison of the CHI energy functions (solid line) to the glycosidic torsion
angle distributions of carbohydrates from experimental co-crystal structures
(histograms). ......................................................................................................... 38
Figure 4.7 Scatter plots demonstrating improvement in the linear correlation between
SRMSD and docked energies after rescoring, for each of the three docking
programs. Points before rescoring are shown in dark grey and points after
rescoring are shown in light grey. Shown in the insets are SRMSD vs. docked
energy plots of only the overall lowest PRMSD structure for each of the six
antibody systems before (dark grey) and after (light grey) rescoring. The black
rectangles in all insets enclose plot areas with SRMSD ≤ 1 Å and energies ≤ 0
kcal/mol................................................................................................................. 39
Figure 4.8 Graphs showing the distribution of conformations produced by AD3 ( ),
AD4.2 ( ) and ADV ( ) plotted onto the corresponding CHI energy curves for
each of the representative linkage combinations; the curves are offset from each
other by 6 kcal/mol. .............................................................................................. 41
Figure 4.9 a) SRMSDs of the lowest energy poses for all six systems from AD3, AD4.2
and ADV, before (dark grey) and after (light grey) rescoring. (b) PRMSDs of the
lowest energy poses for all six systems from all three docking programs, before
(dark grey) and after (light grey) rescoring. .......................................................... 43
Figure 4.10 (a) AD3 lowest energy pose for 1S3K before rescoring (white) compared to
the crystal ligand (black); PRMSD = 5.7 Å. (b) Lowest energy pose after
0
2
4
6
8
10
12
14
16
18
20
22
Δ E
Δ E
Δ E
xi
inclusion of the CHI energy (white) compared to the crystal ligand (black);
PRMSD = 0.6 Å. ................................................................................................... 44
Figure 4.11 Docking the trisaccharide to the Salmonella antibody (in 1MFD and 1MFA).
(a) Lowest energy pose from ADV for 1MFD before rescoring (white) compared
to the crystal ligand (black); PRMSD = 5.5Å. (b) Lowest energy pose from ADV
for 1MFD after rescoring (white) compared to the crystal ligand (black); PRMSD
= 1.0Å. (c) and (d) show the 1MFD antibody in transparent surface representation
along with the oxygen atom belonging to the water molecule from the
crystallographic co-complex, WAT 601; in (c) the crystal ligand from 1MFD is
shown in CPK representation, and in (d) the lowest energy pose from ADV for
1MFD before rescoring (in CPK representation) showing the Gal residue
replacing Abe within the binding pocket is shown. (e) The Gal residue from the
ligand in 1MFD (in van der Waals representation) after being superimposed onto
the Abe residue from the ligand in 1MFA is shown within the 1MFA binding site.
A cross-section of the 1MFA antibody is represented as a transparent surface with
potential steric clashes visible between the Gal residue and the antibody. (f) Same
as (e) but with the 1MFA antibody represented as an opaque surface thus more
clearly depicting potential steric clashes between the O-3 and O-6 groups of the
Gal residue and the interior of the binding pocket. ............................................... 48
Figure 4.12 Docking to the antibody in 1M7I using AD4.2. (a) Lowest energy pose
before rescoring (white) compared to the crystal ligand (black); PRMSD = 3.9Å.
(b) Lowest energy pose after rescoring (white) compared to the crystal ligand
(black); PRMSD = 10.7Å. .................................................................................... 49
xii
Figure 5.1 a.) The effect of applying CHI-coefficient values of 1 (solid line), 2 (dashed
line) and 5 (dotted line) to the original VCΦ|β curve. b.) The effect of applying a
CHI-cutoff value of 2 to the original CHIΦ|β curve (VC1|2). ................................. 58
Figure 5.2 Assessment of docking to 14 antibody systems with ADV and various CHI-
energy coefficients of VC a.) SRMSDavg amongst the 5 top-ranked poses b.)
PRMSDmin(5). ......................................................................................................... 65
Figure 5.3 Comparison of the VC1|2 (dotted line) and VC2|1 (solid line) CHIΦ|β curve to
the distribution of glycosidic linkages in carbohydrate crystal structures in the
PDB. The bottom X-axis and left Y-axis correspond to the histogram which
depicts the distribution of PDB structures, while the top X-axis and right Y-axis
correspond to the CHI-energy curves. .................................................................. 67
Figure 5.4 Distribution of ω angles produced by ADV (blue) and VC1|2(green) for 12 test
systems containing one or more 1,6-linkages overlaid against the reference crystal
structure ω angles (red dots) and the corresponding CHI energy curve. .............. 70
Figure 5.5 a.) The PRMSDmin(5) pose from ADV compared to the reference ligand (blue).
b.) The PRMSDmin(5) pose from VC1|2 compared to the reference ligand (blue). c.)
The Φ torsion angles of α-sugars from the docked poses of the 3C6S ligand from
both ADV (yellow triangles) and VC1|2 (green squares) plotted on to the CHI
curve. The torsion angles corresponding to the reference are plotted as blue
circles. ................................................................................................................... 72
Figure 5.6 The crystal structure of a CBM from endoglucanase Cel5A (PDB ID: 4AFD)
is depicted in complex with a tetrasaccharide ligand. All amino acids further than
xiii
5 Å away from the ligand are colored grey. Those residues within 5 Å are colored
orange if they are cyclic and red if acyclic. .......................................................... 74
Figure 5.7 a.) Models representing the PRMSDmin(5) produced by docking 1MFC with
ADV (yellow) and VC at CHI1|2 (green). The primary difference between docked
models is a rhamnose ring that is flipped approximately 180 degrees, highlighted
by the orange arrows. b.) Ligands from two crystal structures, 1MFB (blue) and
1MFC (cyan), also differ by the orientation of the RAM 524 ring....................... 75
Figure 5.8 A depiction of the ranks of acceptable poses (Rankacc), i.e., the lowest-ranked
pose with PRMSD ≤ 2Å, produced by ADV and VC1|2 from docking
oligosaccharide ligands onto apo protein structures. ............................................ 77
Figure 5.9 a) The ligands from five crystal structures (PDB ID: 2E0P, 2EO7, 2EEX,
2EJ1, and 2EQD) of the Cel44A enzyme are superimposed on the protein from
PDB ID: 2EQD. Amino acids reported to be involved in substrate binding (N45,
R47, W64, W71, W327, W331, E359, and W392) are colored orange or red,
depending on whether the residue is aromatic or not. 146
The catalytic residue
(Q186) is colored yellow. All other amino acids are grey. The active site has been
separated into a (-) and (+) site. The circled values represent the position of each
residue relative to the glycosidic linkage that is cleaved during catalysis. The
ligands exclusive to the (-) side of the active site are depicted by varying shades
of purple. The octasaccharide that extends across both the (-) and (+) site (2EQD)
is colored blue. Each carbohydrate ring is colored according to whether the CHI
energy penalty is applied to the surrounding Φ/Ψ values. Rings are either green or
red depending on whether VC is or is not applied, respectively. b) A
xiv
representation of the PRMSDmin(5) and PRMSDmin(20) poses from ADV and VC1|2.
c) The glycosidic linkages of the octasaccharide that extends across the active site
(2EQD) are labeled according to the penalty received by the CHI energy curve.
Penalties greater than 2 kcal/mol are highlighted in red. VC is not applied to the (-
1) residue since it is neither a 4 C1 nor
1 C4 chair, so the ring is colored red and the
penalties are unlisted. ............................................................................................ 80
Figure 6.1 The replacement of the aromatic group in (A) by the group aliphatic group in
(B) in the study by Water et al. 154
in an interaction with a tetraacetylglucose
molecule led to a decrease in the interaction energy of the system. ..................... 85
Figure 6.2 The carbohydrate antigen from Salmonella stacking against two aromatic
amino acids, namely, a Tryptophan and a Tyrosine in the binding pocket of an
antibody Fab fragment. (PDB ID: 1MFE) 33
......................................................... 87
and Phenylalanine. ................................................................................................ 88
Figure 6.4 The mathematical model (Lennard-Jones potential) used in this study to
describe the interaction between a CH-group and an aromatic moiety. ............... 90
Figure 6.5 Detection of CH/π interactions a.) An average position of the co-ordinates of
the atoms C2, O5 and O1 is determined. In order to find the vector C1H1, the
negative of the vector between points C1 and the average of atom positions C2, O5
and O1 (computed in (a.)) is determined. b.) The distance between the centroid of
the aromatic ring and the plane of the carbohydrate ring delineated by atoms O5,
C2, C3 and C5 is determined, dcenters (≤ 7Å). c.) The carbon atoms in the
carbohydrate ring are projected onto the aromatic ring plane and the distances
xv
between each of these projections and the centroid of the aromatic ring is
determined, dcp (≤ 2.5Å). Shown in green are the CH bond vectors pointing
towards the aromatic ring (scored), and shown in red are the CH bond vectors
pointing away from the aromatic ring (not scored). ............................................. 93
Figure 6.6 The effect of applying the CH/π interaction to the top-ranked pose produced
by VC1|2 before and after rescoring. Shown in green is the crystal ligand, in white
is the top-ranked pose before rescoring (PRMSD = 5.6Å) and in blue is the top-
ranked pose after rescoring (PRMSD = 0.9Å). ..................................................... 97
Figure 6.7 Model Systems used by Ringer et al. to quantify CH/π interactions using
quantum mechanical calculations ......................................................................... 99
Figure 6.8 a.) The individual interaction energy curves for the models (as described in
Figure 6.7) used by Ringer et al. 155
, alongside the average of the individual
curves. b.) The average curve (a) shown alongside the mathematical model used
in the current study.............................................................................................. 100
3. Computational Methods/Molecular Docking ............................................................ 12
4. The Importance of Ligand Conformational Energies in Carbohydrate Docking:
Sorting the Wheat from the Chaff ..................................................................................... 19
Abstract ......................................................................................................................... 20
Introduction ................................................................................................................... 20
Abstract ......................................................................................................................... 53
Introduction ................................................................................................................... 54
6. The Consideration of CH/π Interactions in Carbohydrate-Protein Docking ............. 84
Introduction ................................................................................................................... 84
This dissertation can be sub-divided into the following sections:
1. The comparison of docking programs for carbohydrate docking and the
development of Carbohydrate Intrinsic (CHI) Energy Functions, which describe
the rotational preferences of oligosaccharides about the glycosidic linkage.
2. The development and evaluation of Vina-Carb, formed by incorporating the CHI
energy functions within the scoring function of AutoDock Vina, and comparison
to the original program, AutoDock Vina.
3. The development of a CH/π interaction energy term to score CH/π interactions in
protein-carbohydrate complexes and the application of the function to docked
protein-carbohydrate complexes.
The above topics, along with a literature review of background information and the
computational methods applied in each case are presented in the following manner:
CHAPTER 2: CARBOHYDRATES: BIOLOGICAL SIGNIFICANCE AND
STRUCTURE
Chapter 2 is a discussion on the structure and biological significance of carbohydrate and
protein-carbohydrate interactions.
CHAPTER 3: MOLECULAR DOCKING
Chapter 3 discusses the theory behind the molecular docking computational method to
predict intermolecular interactions. It further discusses the challenges associated with
carbohydrate ligands, and specifically describes the AutoDock Vina docking algorithm.
2
Additionally in this chapter, an introduction to the research described in the following
chapters is presented.
CHAPTER 4: IMPORTANCE OF LIGAND CONFORMATIONAL ENERGIES IN
CARBOHYDRATE DOCKING: SORTING THE WHEAT FROM THE CHAFF
Chapter 4 is an original research study, in which the performances of various versions of
the popular docking program, AutoDock is compared using a set of antibody-
carbohydrate complexes. A set of Carbohydrate Intrinsic (CHI) energy functions are
developed, which are used to describe the conformational preferences of glycosidic
linkages constituting oligosaccharides. The CHI energy functions are then employed to
rescore the docked poses. The results from this study was published as a journal article.
A. K. Nivedha, S. Makeneni, B. L. Foley, M. B. Tessier , R. J. Woods, J. Comput. Chem.
2014, 35, 526–539.
CARBOHYDRATE DOCKING
Chapter 5 describes original research in which the CHI energy functions were
incorporated within AutoDock Vina’s scoring function, leading to the development of
Vina-Carb. The performances of Vina-Carb and AutoDock Vina were evaluated using a
set of protein-carbohydrate complexes consisting of antibodies, lectins, carbohydrate
binding modules and enzymes. This work has been accepted for publication.
A. K. Nivedha, D. F. Thieker, R. J. Woods, J. Chem. Theory. Comput. 2015
3
CARBOHYDRATE-PROTEIN DOCKING
Chapter 6 describes original research in which, utilizing available literature, a
mathematical model to score CH/π interactions in protein-carbohydrate complexes has
been developed and employed in rescoring docking results from AutoDock Vina and
Vina-Carb, for a test set consisting of lectin-carbohydrate complexes.
CHAPTER 7: CONCLUSIONS AND FUTURE DIRECTIONS
Chapter 7 summarizes the main conclusions from the preceding chapters and discusses
future directions.
Carbohydrates play a central role in energy metabolism, biological recognition
and as structural components in living organisms. 1-3
4-6
signaling. 7,8
They may exist both as freestanding entities or covalently linked to
macromolecules such as proteins (glycoproteins) and lipids (glycolipids), frequently
found attached to the outer cell surfaces, where they are conveniently positioned to
modulate interactions between various components of the living system by mediating
cell-cell and cell-molecule interactions. 9 When oligosaccharides are organized in the
form of glycoconjugates, the mere size of the attached oligosaccharides influences the
interactions of the glycoconjugates with other molecules. For example, N-glycosylation
and O-glycosylation are common post-translational modifications which occur in
proteins. 10
, which protect the protein from degradation and in intracellular
trafficking and secretion. 2 Aberrant glycosylation is often a hallmark of diseases such as
rheumatoid arthritis 15-19
and cancer. 20-23
Many carbohydrate-based host-pathogen interactions are currently known. 24
Surface polysaccharides are the most common structures found on the outer surfaces of
bacterial cells. 25,26
lipopolysaccharides, lipooligosaccharides or capsular polysaccharides. 27
The conjugation
of a polysaccharide to a carrier protein has resulted in the production of commercially
available vaccines such as those against Haemophilus influenzae 28
and Streptococcus
pneumoniae 29
Many bacterial and viral pathogens bind to host tissue via interactions
5
with carbohydrates on the surfaces of the host cell. Antibodies contain glycans as part of
their structure and some antibodies are reactive against sugars found on cell surfaces of
bacteria such as Shigella and Salmonella. 30-35
Of the four major classes of macromolecules found in living organisms, namely,
nucleic acids, proteins, carbohydrates and lipids, carbohydrates are the most structurally
diverse. 36
They are primarily defined as polyhydroxyaldehydes or polyhydroxyketones,
and in their simplest form exist as monosaccharides, which combine with each other via
glycosidic linkages forming oligosaccharides. Monosaccharides can exist in both the
open chain and ring forms. When the chain-form of the monosaccharide has a carbonyl
group (C==O) on one end which forms an aldehyde, it is called an aldose, whereas if this
carbonyl group is in the middle forming a ketone, it is referred to as a ketose. The ring
form of a monosaccharide, which is the preferred form in aqueous solutions and in
oligosaccharides, is formed when the oxygen on C5, i.e., O5 links with the carbon
comprising the carbonyl group (C1), transferring its hydrogen to the carbonyl oxygen
forming a hydroxyl group. This forms a chiral anomeric center at C1. The oxygen at C1
(O1) can be either axial or equatorial with respect to the carbohydrate ring. This
electronegative O1 atom prefers to adopt the axial orientation due to steric and
stereoelectronic effects, instead of the less hindered equatorial orientation which would
be expected to be the preferred orientation based on steric effects alone. This is known as
the anomeric, or more accurately, the endo-anomeric effect.
6
Figure 2.1. An illustration of the conversion from the chain and ring form of glucose.
Monosaccharides forming a five-membered ring are called furanoses and those
which form a six-membered ring are called pyranoses. Similar to cyclohexanes, 6-
membered monosaccharides exist most often in one of two isomeric chair conformations,
which are specified as 1 C4 and
4 C1, where the letter C stands for ‘chair’ and the numbers
indicate the carbon atoms above and below the reference plane of the chair conformation
formed by the atoms C2, C3, C5 and O5. (Figure 2.2)
chain form of glucose
7
Figure 2.2 A representation of two chair conformations of Glucose, namely, 4 C1 and
1 C4.
The individual units constituting proteins and nucleic acids are generally
connected in a linear fashion by a single type of linkage, namely, the amide linkage
between amino acids in proteins and the 3’ to 5’ phosphodiester bonds in nucleic acids. 37
Oligosaccharides however, can be linear or branched and each monosaccharide unit can
be linked to another via a glycosidic linkage which can be if different types depending on
the stereochemistry of the C1 atom on the non-reducing sugar and that of the linking atom
on the reducing sugar. A disaccharide is formed when two monosaccharides combine via
a condensation reaction, resulting in the release of a water molecule and the formation of
a glycosidic bond. The formation of a glycosidic linkage results in the formation of a
reducing sugar on one end and a non-reducing sugar on the other.
4C1 1C4
8
Figure 2.3 A 1-3 glycosidic linkage formation between a glucopyranose (Glcp) unit and a
galactopyranose (Galp) unit. The D in the name refers to the molecule being
dextrorotatory, which refers to it rotating plane polarized light to the right.
Different kinds of sugars exist in nature and the main difference between most
saccharides is in the orientation of the hydroxyl groups with respect to the plane of the
carbohydrate ring, resulting in significant differences in the physical and chemical
properties of the sugars. Glucose and mannose are C2-epimers while glucose and
galactose are C4-epimers. (Figure 2.4) These hexoses have the molecular formula
C6H12O6. The stereoisomers for these aldohexoses were identified by the German chemist
Emil Fischer in the early 19 th
century. 38
Figure 2.4 Carbohydrate epimers: galactose and glucose are C4 epimers, while glucose
and mannose are C2 epimers.
H2O
The three-dimensional structures of carbohydrates are greatly influenced by the
conformations of the glycosidic linkages connecting individual monosaccharide units.
The lone pair of electrons on the O5 atom of the sugar ring has a significant effect on the
conformational stability and orientation of the glycosidic linkage. 39,40
The anomeric
effect is observed in saccharides, due to which the electronegative substituent at the C1
position tends to adopt the axial orientation rather than the equatorial orientation in
contrast with expectations based solely on sterics. 41-46
From previous work analyzing the preferences of glycosidic bonds, it is clear that
carbohydrates most prefer a single rotamer at both the Φ and Ψ linkages. The preferred
range of glycosidic angle values is broader for the Ψ angle compared to the Φ linkage. It
is also known that some proteins distort the carbohydrate ring shapes, and consequently
the glycosidic linkages upon binding. A survey of the PDB for protein-carbohydrate
crystal complexes in which the oligosaccharide is bound to enzymes in addition to other
proteins such as lectins an antibodies, revealed that the distortion of glycosidic linkage by
binding partners of carbohydrates is a rare occurrence. 47,48
Carbohydrate-Protein Complexes
Proteins that bind to carbohydrates have a great diversity of binding site
topologies and functions, and include enzymes, lectins, antibodies and periplasmic
receptors. 49
Complex formation is driven primarily by hydrogen bonding, van der Waals
contacts, and hydrophobic interactions. 50
Whereas the former contributes to specificity,
51 by virtue of the directionality of the hydroxyl groups, the latter two contribute to
affinity through non-specific interactions. 52
Being highly polar molecules, sugars are
highly solvated in an aqueous solution. The hydroxyl groups in a sugar molecule are
10
involved in cooperative hydrogen bonds, bidentate hydrogen bonds and hydrogen
bonding networks. 53
Each hydroxyl group in a saccharide can engage in two kinds of
hydrogen bonds, as a donor of one hydrogen bond and an acceptor of two through the sp3
lone pairs. When the sugar hydroxyl group is a donor, the hydrogen bonds formed are
shorter or stronger than those formed when the sugar hydroxyl group is an acceptor. 54
In
cooperative hydrogen bonds, the hydroxyl group in the sugar acts as both a donor and
acceptor of hydrogen bonds. A bidentate hydrogen bond is formed when two adjacent
hydroxyl groups in a 4 C1 sugar interact with a different atom of the same planar polar
side-chain residue. The presence of both cooperative and bidentate hydrogen bonds leads
to the creation of networks of hydrogen bonds between the sugars and interacting amino
acids. And when these planar polar residues hydrogen bond with nearby polar residues, it
results in the formation of a more elaborate hydrogen bond network. Hydrogen bonds
formed as a result are strong enough to stabilize the complex but are also weak enough to
accommodate ligand dynamics. Amino acids with polar planar side-chain groups, capable
of forming all three kinds of hydrogen bonds, such as Glu, Gln, Asp, Asn, Arg and His,
are abundant in the binding sites of sugars. 51
Van der Waals interactions make a significant contribution to protein-
carbohydrate complex-formation, in addition to contributions from other interactions
such as the stacking of the hydrophobic patches of carbohydrate rings against aromatic
amino acids lining the binding site. An analysis of protein-carbohydrate complexes in the
PDB has revealed that carbohydrate binding sites have a higher propensity for aromatic
amino acids namely, tryptophan, tyrosine, phenylalanine and histidine compared to the
rest of the protein. 55-57
The presence of aromatic amino acids in the sugar binding site
11
also contributes to specificity by allowing or disallowing particular sugar epimers
through the combination of steric hindrance and a favorable or unfavorable polar
environment. 58
A wealth of information can be gained from an understanding of the structure and
dynamics of protein-carbohydrate interactions, however, carbohydrates are extremely
flexible molecules 59
, making protein-carbohydrate complexes particularly challenging to
crystallize. As a result, computational methods such as molecular docking and molecular
dynamics simulations can be employed to gain insight into the physical and biochemical
properties carbohydrate molecules, both freely in solution and in complex with proteins.
The knowledge thus gained has various applications including gene therapy and the
design of carbohydrate-based biotherapeutic agents.
12
A detailed understanding of the three-dimensional structure and subsequently the
function of carbohydrates is vital in increasing our understanding of crucial biological
processes. However, obtaining experimental 3D structures of carbohydrates is a
challenge, 60
and as a result, theoretical modeling methods can be employed to aid in
understanding the relationship between the structure and function of oligosaccharides.
Molecular docking and molecular dynamics simulations are key computational
approaches used in the study of carbohydrate molecules. In this chapter we will focus on
molecular docking methodologies, specifically in relation to oligosaccharide ligands.
Molecular docking predicts the binding orientation and affinity of a small molecule
(ligand), with respect to a larger molecule (macromolecule). The area around the
predicted ligand binding site on the macromolecule is specified using a gridbox. The two
main steps in docking are searching and scoring. The search algorithm searches the
available conformational space for favorable binding modes of the ligand with respect to
the macromolecule, while the docking scoring function evaluates each pose generated by
the algorithm. During docking, a compromise between speed and effectiveness in
sampling the conformational space available has to be made. The program typically
produces several models at the end of a docking run, which are then ranked based on
calculated binding affinities.
There are different approaches to docking, such as rigid docking and flexible docking.
Figure 3.1 When all torsion angles are frozen during docking, it is termed as rigid
13
docking. During flexible docking, some if not all of these parameters are allowed to vary.
If upon complex-formation significant conformational change occurs in either the protein
or ligand or in both molecules, rigid docking is inadequate to model such a binding event.
In such cases, flexible docking should be the method of choice, which allows for induced
fit during complex formation. The level of computational complexity allowed during a
docking run can be set by the user, by adjusting the level of flexibility of the ligand and
macromolecule. Proteins can be docked rigidly, because, a comparison of experimental
protein-ligand complexes to their unbound counterparts has revealed that in most cases,
only a few side-chains in the active site of the protein change conformation.
1.
2.
n.
14
The application of a scoring function helps to assess protein-ligand
complementarity more than calculating binding affinity, as even non-binder ligands can
be docked and given a binding affinity score using molecular docking. However, docking
has proved to be an indispensable computational tool which helps in obtaining a 3D
starting structure for a bound protein-ligand complex, which could not be obtained
experimentally. It also helps to assess the binding of multiple small molecules against a
single protein target and compare binding affinities. Protein-ligand complementarity is a
prerequisite for binding to occur, but cannot be used as the sole criterion for evaluation.
Docking scoring functions evaluate how well the predicted binding pose of a
ligand complements the protein binding site, and can be empirical or knowledge-based
scoring functions. Empirical scoring functions operate on the assumption that binding
1.
2.
n.
15
affinities can be evaluated by the summation of independent interaction energy terms,
which in most cases is a weighted sum of electrostatics, hydrogen bonding, hydrophobic
interaction and repulsion terms. The coefficients for the individual terms of the scoring
function are derived by fitting to experimentally determined Ki values of protein-ligand
complexes with solved crystal structures. In general, these scoring functions suffer from a
significant dependence on ligand size, i.e., greater the size of the docked ligand, greater
or better the calculated binding affinity. Knowledge-based scoring functions are derived
by performing a statistical analysis of experimentally-determined protein-ligand
complexes based on the assumption that if certain contacts occur at a statistically
significant rate, it must be favorable and vice versa.
Several parameters affect the performance of the docking scoring function,
including the physical and chemical properties of input molecules, the preparation of the
input and the individual terms of the docking scoring function. Docking scoring functions
are usually developed for the purpose of high-throughput virtual screening of relatively
small, rigid, drug-like molecules. In this thesis, we will study the performance of such
docking methodologies with respect to carbohydrate ligands, which are larger, more
flexible molecules ranging from a disaccharide to a dodecasaccharide connected by 1,x-
linkages (x = 2, 3, 4 or 6). Applying these generalized docking scoring functions to
carbohydrate docking usually leads to an unfavorable deviation of the carbohydrate
ligands from their natural conformations. It may be useful to customize docking scoring
functions to specifically dock carbohydrate ligands.
The glycosidic torsion angles connecting individual monosaccharide units have a
major influence on the overall conformation of an oligosaccharide ligand. Although these
16
linkages are generally flexible, this flexibility spans a limited range of preferred torsion
angles, which has been identified from a survey of carbohydrate crystal structures in the
PDB. 48
All protein-carbohydrate complexes found in the PDB were included in this
survey which consisted of carbohydrates both covalently and non-covalently interacting
with proteins such as lectins, antibodies, enzymes, carbohydrate binding modules, etc. In
the past, efforts have been made to model the conformational preferences of
carbohydrates into molecular docking; the approaches used include a re-calibration of an
existing docking scoring function to model carbohydrate properties, the inclusion of
additional interaction energy terms in the scoring function which are crucial to protein-
carbohydrate binding and the inclusion of a carbohydrate conformational energy score to
an existing docking scoring function.
In this thesis, the performances of a few docking programs are evaluated and
compared using a set of antibody-carbohydrate complexes with solved X-ray crystal
structures from the PDB. A standardized docking protocol for docking oligosaccharide
ligands onto antibodies has also been described. A set of energy functions which
calculate the conformational energies of carbohydrates has been derived using quantum
mechanical methods. These carbohydrate internal energy functions, known as
Carbohydrate Intrinsic (CHI) energy functions score a disaccharide molecule based on
the orientations of the glycosidic torsion angles. The CHI energies were then added to
docked energies, showing a significant improvement in the ranking of accurate binding
poses. Finally, the CHI energy functions were coded to constitute the docking program’s
(AutoDock Vina) scoring function leading to the development of Vina-Carb. The
performance of Vina-Carb was evaluated against a set of 72 protein-carbohydrate
17
complexes with solved crystallographic structures from the PDB, and compared to the
performance of the original docking program without the CHI energy functions,
AutoDock Vina.
For each AutoDock Vina docking job, multiple runs are started from random
conformations. The number of individual runs are determined by the exhaustiveness
parameter, which can be set by the user. Each run consists of a set of sequential steps,
which are determined heuristically based on the number of flexible bonds in the system
under study. Each step consists of 3 stages, namely a random perturbation of the system,
followed by a local optimization using the Broyden-Fletcher-Goldfarb-Shanno algorithm
and a selection step in which the step is either accepted or not. Each local optimization
involved numerous evaluations of the docking scoring function, and is decided based on
convergence and other criteria. Each run can produce multiple promising results, which
are stored, and finally merged, clustered and sorted to produce the final result of docked
poses. (Figure 3.2)
Run R1
Run R2
Run RN
Step S1
Step S2
Step SN
Each Step, Si
19
_____________________________
A. K. Nivedha, S. Makeneni, B. L. Foley, M. B. Tessier , R. J. Woods, J. Comput. Chem.
2014, 35, 526–539. Reprinted here with the permission of publisher.
20
Abstract
Docking algorithms that aim to be applicable to a broad range of ligands suffer reduced
accuracy because they are unable to incorporate ligand-specific conformational energies.
Here, we develop internal energy functions, Carbohydrate Intrinsic (CHI), to account for
the rotational preferences of the glycosidic torsion angles in carbohydrates. The relative
energies predicted by the CHI energy functions mirror the conformational distributions of
glycosidic linkages determined from a survey of oligosaccharide-protein complexes in
the Protein Data Bank. Addition of CHI energies to the standard docking scores in
Autodock 3, 4.2, and Vina consistently improves pose ranking of oligosaccharides
docked to a set of anti-carbohydrate antibodies. The CHI energy functions are also
independent of docking algorithm, and with minor modifications, may be incorporated
into both theoretical modeling methods, and experimental NMR or X-ray structure
refinement programs.
metabolism, gene expression, cell-cell communication, growth, development, and
immune response 9 . In vivo, complex carbohydrates (glycans) are found on cell surfaces
as glyconjugates (glycoproteins/glycolipids) or polysaccharides, mediating biological
function by their direct interaction with proteins, such as receptors (lectins), enzymes,
and antibodies. Cancer is marked by aberrant glycosylation which can serve as a disease-
related marker, or as a target for therapeutic intervention 22,61-63
. Conversely, endogenous
cell-surface glycans are frequently exploited by infectious agents, as in the
21
A physical understanding of
to block such interactions, 67-70
such as antibodies which target specific glycans. 71,72
A
better understanding of the immune system’s response to carbohydrate-based vaccines, 73-
76 facilitates the prediction and rationalization
71 of hazardous or misleading cross-
reactivities between antibodies against disease-related carbohydrates, and endogenous
glycans. 77,78
experimental methods such as X-ray crystallography and NMR spectroscopy include,
production and purification of the protein, isolation or synthesis of the glycan, and co-
crystallization of the complex. 60
Therefore, there is a long-standing interest in applying
theoretical modeling methods (automated docking) to aid in the characterization of the
3D structure of carbohydrate-protein complexes. 71,79-84
However, these methods also
have limitations. Automated docking faces the triple challenge of accurately predicting 1)
the ligand orientation in the binding site (pose); 2) the ligand conformation in the binding
site (shape); and 3) the relative affinity of the optimal pose (interaction energy). Ligand
internal energies are only approximately modeled within docking algorithms by mainly
considering energies associated with internal steric repulsion. Such an approximation
inherently degrades the accuracy of docking predictions as various ligand classes have
specific conformational properties. The glycosidic torsion angles between individual
monosaccharides forming glycans are crucial in defining their 3D structure and
dynamics. The accurate prediction of oligosaccharide conformations requires the
22
anomeric, and gauche effects. 85
Their omission frequently leads to the incorrect
prediction of docked oligosaccharide conformations. 86-88
Docking programs treat interaction energy terms as empirically-adjustable components,
which may be tuned for a particular ligand class, such as carbohydrates. 89
Inclusion of
carbohydrate conformational energies in the docking energy function would likely
require reoptimization of the empirical weighting resulting in a non-transferable
carbohydrate-specific implementation of the algorithm. Alternatively, we wished to
develop a carbohydrate-specific conformational energy function which predicts
oligosaccharide energies independent of docking algorithm, and could potentially also be
employed to evaluate the conformational energies of experimentally-determined
oligosaccharide structures. We focused on modeling conformational properties intrinsic
to glycosidic linkages between pyranoses, with the criterion that the method should also
be generalizable to other carbohydrate ring forms, such as furanoses, as well as to other
linkages, such as 1-6, 2-3, 2-6, etc. Tetrahydropyran, and related analogs, have long been
employed as representative carbohydrates in quantum mechanical calculations for this
purpose. 90-97
The assumption being that any additional effects on the conformational
properties, for example from hydrogen bonding, overlay the intrinsic properties of the
linkages between pyran rings. Quantum mechanical calculations were employed on a set
of glycosidically-linked tetrahydropyrans representing all two-bond linkages between
pyranoses. The rotational energy profiles for these linkages were used to derive the
desired carbohydrate intrinsic (CHI) energy functions. Given a 3D oligosaccharide
23
structure, the CHI energy functions may be employed to estimate the energy arising from
any distortion of the glycosidic linkages, relative to their lowest energy conformations.
Because of the important roles of anti-carbohydrate antibodies in therapeutic and
diagnostic applications, and the challenges associated with experimentally defining their
3D structures, they have been the subject of numerous automated docking studies. 98-104
We chose six crystallographically-determined antibody-carbohydrate complexes to
evaluate the ability of CHI energy functions to improve predicted rankings of the docked
poses. These systems were selected based on the diversity of the antibody binding site
topologies (canyon, valley, crater), 105
and size variations of the carbohydrate ligands (tri-
to penta saccharides including linear and branched sequences).
Methods
Docking was performed using AutoDock 3.0.5 (AD3), 106
4.2 (AD4.2) 107
and Vina 1.1.2
(ADV). 108
Details of the reference systems, including PDB IDs, ligand sequences and
biological origin are presented in Table 4.1. In each case, the protein chain containing the
ligand with the lowest average B-factor was selected for docking. The carbohydrate
ligands in systems 1UZ8, 1S3K and 1M7I were built using the Carbohydrate Builder on
GLYCAM-Web (www.glycam.org). 109
sugar residues abequose and 2-deoxy-rhamnose. Oligosaccharides containing these
deoxy residues were assembled using the tLEaP 110
module from the AMBER package
employing GLYCAM06i force field parameters and PREP residue structure files,
available for download at www.glycam.org (S4.11). The antibody structures were
All protein and ligand files were prepared for
docking using AutoDock Tools 1.5.4 (ADT). 107
The choice of partial charge was based
on the method used to calibrate the scoring functions of the individual docking programs;
Kollman charges 112
were added to the protein for docking with AD3, while Gasteiger
charges 113
were used to prepare proteins for docking with AD4.2 and ADV, and in each
case Gasteiger charges were assigned to the ligands. AutoDock distributes any non-zero
residual net charge across the macromolecule. Hydrogen atoms were added to the protein
using ADT, whereas GLYCAM hydrogens were retained in the ligands. A standard grid
box (dimensions: 26.25 x 26.25 x 37.50Å) was employed for all runs, centered relative to
the complementarity determining regions (CDRs) of the antibody (Figure 1a). Before
docking, the ligand was translated to the center of mass (CoM) of the CDRs but
maintained in the default GLYCAM orientation and conformation. VMD 109
was used for
Table 4.1 PDB IDs and ligand sequences employed in the study, including the shape
RMSD (SRMSD) values for the ligands generated by GLYCAM, relative to the
crystallographic ligands.
PDB ID:
Chain ID
= Mannose (Man) = Galactose (Gal) = Fucose (Fuc) = 2-Deoxy Rhamnose
= Abequose (Abe) = N-Acetyl Glucosamine(GlcNAc) = Rhamnose (Rha) = Aglycon (OME/OH)
27
c SRMSD defined in Section Shape, and pose, RMSD values.
d 1MFA and
1MFD, consisted of the trisaccharide antigen from Salmonella serotype B. In 1MFD, the
trisaccharide is bound to a Fab antibody fragment, while in 1MFA the trisaccharide is
bound to a single-chain Fv fragment of the antibody. Although the antigen-binding site in
both the Fab and scFv fragments are essentially the same, and bound to the same
trisaccharide antigen, in the Fv-complex a water molecule has become inserted into an
internal hydrogen bond within the trisaccharide, leading to a perturbation of the
trisaccharide conformation.
In all ligands, the hydroxyl groups and glycosidic torsion angles were defined as
being flexible, while the C5-C6 bonds were restrained at the orientation present in the
reference crystal structures. The protein was maintained rigid. In AD3 and AD4.2, 100
runs of the Lamarckian Genetic Algorithm were employed, with 800,000 energy
evaluations per run, and a population size of 200. The translation step size was 2Å, while
the quaternion and dihedral step sizes were each 50°. The ADV source code was
modified to increase the total number of output structures from 20 to 100 (Supplementary
Material, S4.1). The maximum energy difference between the best and worst binding
modes was set at 10 kcal/mol while the exhaustiveness value was 8. The complete set of
docking parameters used is given in S4.2, S4.3 and S4.4.
Antibody and docking grid box alignment
Consistent grid box placement on the CDRs was achieved by positioning the box
relative to three points defined by specific CoM’s within the CDRs. The CDRs were
28
and Chothia 117
numbering schemes. To ensure consistent orientation of the antibody surface relative to
the box grid points, the protein coordinates were transformed with respect to a set of
internal coordinate axes, as shown in Figure 4.1. This protocol removes any issues arising
from the fact that the grid is cubic and not spherical, which can otherwise result in varied
regions of each antibody being included within the grid.
Figure 4.1 (a) Illustration of an antibody with its variable fragment (Fv) aligned to the
grid box. The yellow dot represents the CoM of the CDRs (0,0,0), and the green dot
represents the center of the grid box (0,0,11). (b) Aligned orientation of an antibody
antigen-binding fragment (Fab), with respect to the internal reference axes. The region in
red + pink represents the VH domain (CDRs (red) and framework regions (pink) of the
heavy chain) of the antibody, while the region in blue represents the VL domain (CDRs
(dark blue) and framework regions (cyan) of the light chain). The X-axis for the
alignment was defined by a vector passing through the CoM of the variable light chain
29
(VL domain, which contains the light chain CDRs and framework sequences), and the
CoM of the variable heavy chain (VH domain). The Z-axis was defined as a vector
normal to the X-axis, and passing through the CoM of the entire variable region, or
variable fragment (Fv). The antibody was then translated so that the CoM of the CDRs
was placed at the origin. The Y-axis was defined as a vector perpendicular to the XZ-
plane, and passing through the origin. The docking grid box was aligned to the internal
co-ordinate axes with its center offset from the origin by 11Å along the Z-axis, so as to
optimally encompass the CDR loops, while also permitting adequate volume for the
movement of the ligand during docking. Such a definition enabled the docking grid box
to be consistently aligned with respect to the CDRs.
Quantum mechanical calculations
Quantum mechanical calculations were performed using Gaussian09. 118
Structures were optimized at the HF/6-31G++(2d, 2p) level of theory, and single-point
energies calculated at the B3LYP/6-31G++(2d, 2p) level, consistent with the approach
used in the GLYCAM force field development. 94
Rotational energy profiles were
computed at 15° increments, allowing complete relaxation of other coordinates.
Shape, and pose, RMSD values
Pose RMSD (PRMSD) values were obtained by calculating the RMSD between
the ring atoms of the crystal ligand maintained in its native co-crystallised position, and
the corresponding ring atoms in the docked ligand maintained in its docked position
(Figure 4.2a). A pose with a PRMSD ≤ 2Å was considered to have been successfully
docked. Shape RMSD (SRMSD) values were obtained by first superimposing the crystal
and docked ligands followed by calculating the RMSD between their respective ring
30
atoms (Figure 4.2b). The SRMSD is a quantification of the dissimilarity in the 3D
conformations of the docked and crystal ligands, irrespective of their relative positions on
the protein surface.
Figure 4.2 PRMSD and SRMSD calculation. Shown in (a) and (b) are the PRMSD and
SRMSD, respectively, of a representative docked pose with respect to its crystal ligand.
(a) The PRMSD is the RMSD between the ring atoms of a representative docked
structure (white) and the corresponding crystal structure (black). (b) The SRMSD is the
RMSD value obtained after the docked structure (white) is superimposed on the crystal
structure (black).
Assessment of current docking methodologies
The six ligands extracted from their co-crystal structures could successfully be
docked back rigidly into the same structure of the protein (results not shown); this is an
outcome observed previously in studies of carbohydrate-protein docking. 103,119
Although
necessary, this docking experiment is not a sufficient prerequisite for any docking
method, since both molecules in a co-crystallized complex are already in the correct
conformation for binding, and do not require induced fit to occur during docking.
Pose RMSD = 5.5Å Shape RMSD = 1.1Å a
b
31
Independently-generated oligosaccharide 3D structures were employed as ligands
to test the performance of the docking methodologies in predicting bound conformations
of unknown carbohydrate-protein complexes. These starting structures were generated
using GLYCAM, known to produce low-energy conformations of carbohydrates; the
structures generated were found to be essentially equivalent to the same ligands found in
the co-crystal structures, as indicated by their SRMSDs (Table 1), and by a comparison of
their glycosidic torsion angles (S4.5). The average SRMSD between the crystallographic
ligands and theoretical structures was 0.53Å. The preliminary SRMSD analysis also
showed that the ligand in each antibody complex adopted a low energy conformation,
similar to that expected for the free ligand.
A second requirement for a general docking protocol is to permit the ligands a
reasonable level of freedom by allowing their glycosidic torsion angles and hydroxyl
groups complete flexibility. This approach enables comparisons to be made between
structures of the experimental and theoretical ligands, facilitating an assessment of the
impact of induced fit in the ligand on the outcome from docking analysis.
After docking, the φ (O5’-C1’-Ox-Cx) and ψ (C1’-Ox-Cx-Cx-1) glycosidic torsion angles
of the docked poses (Figure 4.4-I) were measured, and compared to the torsion angles of
corresponding linkages in the experimental co-crystal structure, and in the initial
GLYCAM theoretical structure. The analysis indicated that the distribution of the torsion
angle values amongst the docked poses frequently deviated considerably from both the
crystal and GLYCAM reference values (S4.5). Five examples of this analysis are
highlighted in Figure 4.3. Presented in Figure 4.3a is an instance in which all three
docking programs identified the lowest energy pose correctly (that is, with the glycosidic
32
angles falling within 30° of the corresponding torsion angles in the crystal structure).
Presented in Figure 4.3b, c, and d are cases in which only one of the docking programs
identified the correct pose, and finally an example is shown in which all three programs
failed to produce the correct torsion angles (Figure 3e). All of the methods were able to
generate some number of conformations that were within 30° of the crystallographic φ
and ψ values, however, these were often not the poses that had the best docking energy.
Thus, in a routine application of docking, they would not be identified as the most likely
(highest-ranked) pose. Overall, a very broad range of torsion angles (and therefore 3D
shapes) were generated by each algorithm, indicating a potential opportunity to employ a
conformational energy function as an additional filter to identify unlikely conformations
in the docking data.
d
e
1S3K
1M7I
34
Figure 4.3 The φ and ψ angle distributions from 100 docked structures, for selected
linkages, as indicated by the dashed rectangle. Data are presented, in order, for AD3
(black bars), AD4.2 (white bars) and ADV (grey bars). The bin containing the
experimentally-determined values is highlighted with a light blue outline. The bin
containing the structure with the lowest docked energy is indicated as follows: AD3,
yellow; AD4.2, orange; ADV, green.
Development and validation of the CHI energy functions
Quantum mechanical conformational energies for a variety of model
disaccharides were obtained by employing tetrahydropyran (THP) as the minimal model
of a carbohydrate ring. Two THP molecules were used to model each glycosidic linkage
(1-2, 1-3 and 1-4) between pyranoses in the 4 C1 and
1 C4 configurations. Given that there
are two anomeric configurations (α and β), and two hydroxyl configurations (axial (ax)
and equatorial (eq)), associated with each linkage, the development of each CHI energy
function required the analysis of the glycosidic rotational energies of at least four
structures per linkage. For example, the different models used in modeling the 1-3
linkage are presented in Figure 4.4.
35
Figure 4.4 Representation of the 8 model disaccharides pertinent to the development of
CHI energy functions. The models depicting 1,2-linkages can be used to model 1,4-
linkages due to symmetry about the O5 atom.
Individual rotational energy profiles were determined for both the φ (O5’-C1’-Ox-
Cx) and ψ (C1’-Ox-Cx-Cx-1) glycosidic torsion angles of the various disaccharide models
(Figure 4.5). A similar approach has been employed by A. D. French to examine the
properties of various disaccharides and disaccharide analogs. 96,98,112,120
Models with
similar local symmetries gave rise to similar torsional energy profiles and were grouped
together. Average energy curves were then obtained for each group. Based on similar
energy profiles, two average energy curves for the Φ-linkage were computed: one, for all
models with an α-linkage (Figure 4.5a), and the other for all models with a β-linkage
(Figure 4.5b). Similarly, two average curves for the Ψ-linkage were computed, based on
division of the linkages into the following two groups: 1-2ax, 1-4ax, 1-3eq (Figure 4.5c);
and 1-2eq, 1-4eq, 1-3ax (Figure 4.5d).
I II
V VI
φ ψ
III IV
VII VIII
(eq)
(eq)
(eq)
(ax)
(ax)
(ax)
36
Figure 4.5 Individual (dashed lines) and average (solid line) rotational energy curves for
models (see Figure 4.4) whose linkages have similar local geometries.
The CHI energy functions (S4.6) were generated by fitting Gaussian expansions
(Eqn 4.1) to the average energy values for each of the curves in Figure 5 using the default
fitting routine in Gnuplot ver. 4.0 113
:
− (−)
(Eqn 4.1)
where, N is the number of individual Gaussian functions used for each CHI energy
equation, x refers to the glycosidic torsion angle (φ or ψ), and ai, bi, and ci refer to the
37
magnitude, width, and mid-point of the distribution respectively. All curves (S4.7) were
adjusted to a minimum value of 0 kcal/mol, and may therefore be considered
conformational energy penalty functions. In order to apply the energy curves shown in
Figure 4.5 to linkages containing L-sugars, it is simply necessary to employ the mirror
images of the relevant energy curve.
The experimental distribution of glycosidic angles in carbohydrate-protein crystal
structures in the PDB provides an independent metric for comparison with the predicted
CHI energies. Glycosidic torsion angle data for over 13,000 glycosidic linkages were
extracted using the GlyTorsion web-tool 121
(S4.8), binned, and plotted against the
corresponding CHI energy curves (Figure 4.6). The comparison leads to the important
conclusion that the majority of proteins that recognize oligosaccharides select low energy
(solution-like) conformations of the glycosidic linkage. This has considerable importance
for carbohydrate docking, as it supports the view that biasing selection toward low energy
linkage conformations should enhance the likelihood of correct pose prediction.
38
Figure 4.6 Comparison of the CHI energy functions (solid line) to the glycosidic torsion
angle distributions of carbohydrates from experimental co-crystal structures (histograms).
Refinement of the docking results using the CHI energy functions
An assessment of the performance of each of the docking algorithms can be made
by plotting the difference between the conformations of the ligands, relative to that in the
co-complex (SRMSDs), against the predicted interaction energies. Ideally, poses with
correct ligand shapes should have lower interaction energies than seen for incorrect
shapes. Plots of interaction energy versus SRMSD were generated for AD3, AD4.2 and
ADV (Figure 4.7), and the coefficient of determination (R 2 ) computed by linear
regression. In each case, only weak linear relationships between ligand shape and
0 60 120 180 240 300 360
0
2
4
6
8
10
12
10 12 14 16
0 1 2 3 4 5 6 7 8 9
0 2 4 6 8
10 12 14 16
0
1
2
3
4
5
6
10 12 14 16
0
1
2
3
4
5
6
10 12 14 16
39
interaction energy were observed (R 2 ≤ 0.19), and in the case of ADV a slight negative
slope was observed. Following rescoring of the docked poses by addition of the CHI
energy from each glycosidic angle to the docked energy of the structure, a clear
enhancement of the R 2 values was observed, across all three programs (0.60 ≤ R
2 ≤ 0.68).
It should be reiterated here that none of the three docking algorithms include internal
rotational energies (torsion terms), and at best account for ligand internal energies in a
general steric sense. In the case of glycosidic linkages, this internal energy was found to
be less than approximately 0.2 kcal/mol. Thus, while some double counting of internal
energy is introduced by adding the CHI energy directly to the total docking energy, it
does not result in a significant error.
Figure 4.7 Scatter plots demonstrating improvement in the linear correlation between
SRMSD and docked energies after rescoring, for each of the three docking programs.
Points before rescoring are shown in dark grey and points after rescoring are shown in
light grey. Shown in the insets are SRMSD vs. docked energy plots of only the overall
lowest PRMSD structure for each of the six antibody systems before (dark grey) and after
R² = 0.09
R² = 0.60
40
(light grey) rescoring. The black rectangles in all insets enclose plot areas with SRMSD ≤
1 Å and energies ≤ 0 kcal/mol.
Prior to inclusion of the CHI energies, all poses from AD3 and ADV and a
majority of those from AD4.2 were predicted to have favorable (negative) interaction
energies; a result of the nearly horizontal slope of the SRMSD-versus-interaction-energy
curves. Addition of the CHI energies led to positive slopes and frequently unfavorable
interaction energies (positive) for high-energy ligand conformations. Therefore, an
intuitive interaction energy cut-off of 0 kcal/mol could be defined as a convenient filter
for eliminating the most unlikely structures.
For all six antibody complexes, the poses that are most similar to the co-crystal
(lowest PRMSD poses) also have CHI-adjusted interaction energies ≤ 0 kcal/mol, with
the single exception being the AD4.2 results for 1M7I (Figure 4.7b). All 100 docked
poses of that pentasaccharide received positive rescored interaction energies, reflecting
the sub-optimal quality of the conformations produced by AD4.2 for this system. In this
case, the pose closest to the co-complex displayed a PRMSD = 3.4Å, and a CHI-
corrected interaction energy of 14.7 kcal/mol; rescoring can’t correct for the absence of a
correct pose. Thus, the addition of the CHI energy to the docked energy scores provides
a cutoff (0 kcal/mol), below which all poses may be considered possible binders.
Presented in Figure 4.8, are the φ and ψ torsion angles for the docked poses from
all 6 antibody-carbohydrate systems, overlaid onto the corresponding CHI energy curves.
They provide a clear indication that the docking algorithms sample a disproportionately
large number of high-energy ligand conformations, particularly evident for AD4.2 and
ADV. Several low energy regions, particularly for the ψ angles, are also not well-
41
represented. In quantitative terms, for AD3 >45% of the poses contain ligands with at
least one bond in a high energy conformation (CHI energies > 2 kcal/mol); the numbers
for AD4.2 and ADV being 73 and 77 %, respectively.
Figure 4.8 Graphs showing the distribution of conformations produced by AD3 ( ),
AD4.2 ( ) and ADV ( ) plotted onto the corresponding CHI energy curves for each of
the representative linkage combinations; the curves are offset from each other by 6
kcal/mol.
Pose ranking after including the CHI energy:
In 9 of the 18 cases (6 antibodies x 3 docking algorithms), the top-ranked pose
remained the same before and after inclusion of the CHI energies (Figure 4.9), with an
0
2
4
6
8
10
12
14
16
18
20
22
Δ E
Δ E
Δ E
Δ E
0
2
4
6
8
10
12
14
16
18
20
22
Δ E
Δ E
Δ E
42
average SRMSD of 0.3Å. That the ranking of these poses did not change is unsurprising,
given that inclusion of the CHI energy function does not greatly alter the interaction
energy if the ligand is already in a low-energy conformation. However, in 7 of the 9
remaining cases, the SRMSD of the top-ranked pose improved by an average of 0.8Å,
after rescoring and reranking.
Prior to rescoring, from the 100 docking runs, poses with PRMSDs ≤ 1Å were
obtained in 17 out of the 18 cases, however, they were not necessarily lowest energy
poses, highlighting the challenge in recognizing a correctly docked pose amongst all
poses produced by a docking run. The impact of the CHI energy on the ability of
docking to both produce a correctly docked pose and rank it as the lowest energy
structure is indicated in terms of PRMSDs in Figure 4.9b. In several instances in which
the lowest energy pose produced by the docking program was incorrect (PRMSD > 2Å),
reranking after including the CHI energy led to lowest energy structures having both
PRMSD and SRMSD < 1Å.
43
Figure 4.9 a) SRMSDs of the lowest energy poses for all six systems from AD3, AD4.2
and ADV, before (dark grey) and after (light grey) rescoring. (b) PRMSDs of the lowest
energy poses for all six systems from all three docking programs, before (dark grey) and
after (light grey) rescoring.
The impact of rescoring on the conformations (SRMSDs) and orientations
(PRMSDs) of the top-ranked poses are presented for several examples in the following
section. Docking of the tetrasaccharide ligand onto the 1S3K antibody, using AD3
0.7
[Å ]
a b 1MFA 1MFD 1S3K1UZ8 1M7D 1M7I 1MFA 1MFD 1S3K1UZ8 1M7D 1M7I
44
(Figure 4.10), and docking of the trisaccharide ligand onto the 1M7D antibody, using
AD4.2 yielded top-ranked poses with PRMSDs > 5Å. Both these structures obtained high
CHI energy scores of 7.0 kcal/mol and 11.6 kcal/mol, respectively. The lowest energy
poses after reranking had PRMSDs of 0.6Å (1S3K/AD3), and 0.5Å (1M7D/AD4.2), with
lower CHI energies of 1.0 kcal/mol and 0.9 kcal/mol respectively.
Figure 4.10 (a) AD3 lowest energy pose for 1S3K before rescoring (white) compared to
the crystal ligand (black); PRMSD = 5.7 Å. (b) Lowest energy pose after inclusion of the
CHI energy (white) compared to the crystal ligand (black); PRMSD = 0.6 Å.
Prior to rescoring, lowest energy structures obtained for 1MFD from all three
programs had PRMSDs > 5Å, with CHI energies > 4 kcal/mol for the poses from AD4.2
and ADV, and 1.3 kcal/mol for the pose from AD3 (Figure 4.11a, S4.9). After rescoring,
the lowest energy pose from AD3 remained unchanged, whereas, the corresponding pose
from AD4.2 was replaced by a pose with a lower CHI energy score, however, the newly
top-ranked pose still had a high PRMSD. Even though rescoring did not result in
a PRMSD = 5.7Å b PRMSD = 0.6Å
45
correctly docked lowest energy poses in either of these cases, it improved the overall
ranking of the lowest PRMSD structures (PRMSDs < 1Å) from 18 to 9 in AD3, and 13 to
2 in AD4.2 (S4.10). It should also be noted that the second lowest energy pose in AD3
(PRMSD = 1Å) remained unchanged in ranking after rescoring. In contrast, the relatively
high CHI energy score of the lowest energy pose from ADV contributed to this pose
being replaced by a correctly docked structure, with a lower CHI energy score, after
rescoring (Figure 4.11b, S4.9, S4.10).
The ligand in 1MFD is a branched trisaccharide comprised of mannose (Man),
galactose (Gal), and abequose (Abe). Abe is an analog of Gal (3,6-dideoxyGal), and the
anchoring residue for the trisaccharide in the crystal structure 32
(Figure 4.11c). An
examination of the docking results indicated that all three docking programs consistently
generated better scores for poses in which the Gal residue replaces Abe in the binding site
(Figure 4.11d), with little increase in the SRMSD for the incorrect pose. That is, the
trisaccharide can fit equally well into the binding site in the two possible orientations
effectively flipped by 180°. The theoretical preference for Gal in the binding site appears
to be a consequence of its ability to make additional hydrogen bonds with the protein
relative to the more hydrophobic Abe. This observation suggests that the balance
between contributions from hydrogen bonding versus hydrophobic interactions is
imperfect in these docking algorithms. In addition, the 1MFD crystal structure reveals
the presence of a water molecule within the binding pocket, mediating hydrogen bond
interactions between the antibody and the ligand’s Abe residue. Given that explicit
waters are not generally included in docking studies, the algorithms may be
compensating for their absence by placing the more polar Gal inside the binding pocket.
46
This conclusion is supported by the observation that one of the hydroxyl groups of the
Gal residue (O-4) occupies a position in close proximity to this water molecule (PDB
residue name: WAT 601) originally found in the crystal complex (Figure 4.11d).
The flipping of the carbohydrate ligand that was observed in 1MFD, was not
observed in the case of its scFv counterpart (1MFA); instead, all three lowest energy
poses (AD3; AD4.2; ADV) for 1MFA had orientations similar to that of the crystal
ligand (PRMSDs < 2 Å). Since the ligands being docked to both antibodies are identical,
we can infer that the two binding sites are not identical (Table 1). To facilitate a better
understanding of the difference between the two binding pockets, their volumes were
calculated using Fpocket 122
; the volume of the 1MFA binding pocket was calculated to
be 423.01Å 3 , while that of 1MFD was 582.51Å
3 . The 1MFD binding pocket, being
150Å 3 larger, is able to accommodate the flipped orientation of the Gal residue, whereas,
the smaller 1MFA binding pocket is not as accommodating of this ligand orientation, due
to possible steric clashes. This potential steric clash was confirmed by superimposing the
coordinates of the Gal residue onto those of Abe in 1MFA (Figure 4.11e, f).
47
ba
O6
48
Figure 4.11 Docking the trisaccharide to the Salmonella antibody (in 1MFD and 1MFA).
(a) Lowest energy pose from ADV for 1MFD before rescoring (white) compared to the
crystal ligand (black); PRMSD = 5.5Å. (b) Lowest energy pose from ADV for 1MFD
after rescoring (white) compared to the crystal ligand (black); PRMSD = 1.0Å. (c) and
(d) show the 1MFD antibody in transparent surface representation along with the oxygen
atom belonging to the water molecule from the crystallographic co-complex, WAT 601;
in (c) the crystal ligand from 1MFD is shown in CPK representation, and in (d) the
lowest energy pose from ADV for 1MFD before rescoring (in CPK representation)
showing the Gal residue replacing Abe within the binding pocket is shown. (e) The Gal
residue from the ligand in 1MFD (in van der Waals representation) after being
superimposed onto the Abe residue from the ligand in 1MFA is shown within the 1MFA
binding site. A cross-section of the 1MFA antibody is represented as a transparent surface
with potential steric clashes visible between the Gal residue and the antibody. (f) Same as
(e) but with the 1MFA antibody represented as an opaque surface thus more clearly
depicting potential steric clashes between the O-3 and O-6 groups of the Gal residue and
the interior of the binding pocket.
The known challenge associated with docking large, flexible molecules using
AD4.2 108,123
was encountered with the linear pentasaccharide ligand in 1M7I. None of
the 100 poses were correctly docked (all PRMSDs > 2Å); the lowest energy pose had a
PRMSD of 3.9Å and a CHI energy of 18.5 kcal/mol (Figure 12a). After rescoring, the
lowest energy pose had a considerably improved CHI energy score of 4.3 kcal/mol,
however, it still had a high PRMSD (Figure 12b). It has been suggested that the
maximum number of rotatable bonds be limited to 10 when employing AD4.2. 123
The
49
ligand in 1M7I has nearly double that number at 19, making this quite a challenging
system to dock using AD4.2. In AD3, although only 4 of the 100 output poses were
correctly docked, they occupied the top 4 ranks, before and after rescoring. In ADV, 7 of
the 100 output poses were correctly docked, of which 5 were amongst the 8 top-ranked
poses, before and after rescoring. Although both AD3 and ADV seem to have had
difficulty in finding the correct pose for the pentasaccharide, whenever such a pose was
found, both programs scored them favorably. As these poses also had low SRMSD
values, they were identified as lowest energy poses after rescoring.
Figure 4.12 Docking to the antibody in 1M7I using AD4.2. (a) Lowest energy pose
before rescoring (white) compared to the crystal ligand (black); PRMSD = 3.9Å. (b)
Lowest energy pose after rescoring (white) compared to the crystal ligand (black);
PRMSD = 10.7Å.
50
Conclusions
A solution to a major challenge encountered in flexible carbohydrate docking has
been presented in this study by the development of intrinsic energy terms for
carbohydrates, which quantify the relative energy of their glycosidic torsion angles. In 7
of the 18 cases (6 systems x AD3/AD4.2/ADV), the lowest energy poses generated by the
docking programs had PRMSDs > 2Å, however, after rescoring using the CHI energy
functions, the PRMSDs in 4 of the 7 cases improved, with correctly docked poses
(PRMSDs ≤ 2Å) replacing incorrect poses, and increasing the total count of correctly
docked lowest energy poses to 15 out of 18. Rescoring also led to lowest energy poses
that had SRMSDs ≤ 1Å in 16 out of 18 cases, and SRMSDs ≤ 1.5Å in the two remaining
cases. Among the three docking programs employed in this study, ADV was most
successful in producing and appropriately ranking the correct ligand pose, with a success
rate of 83% before rescoring, and 100% after rescoring. Inclusion of the CHI energy term
in rescoring docked poses enabled the filtering of poses based on their conformations,
increasing the chances of finding the correct pose amongst all output poses generated.
In most docking applications, locating the correctly docked pose amongst the
numerous output poses largely depends on the ranking of these poses based on their
energy scores. The CHI energy functions may in principle be used in the assessment of
carbohydrate structures obtained from any theoretical or experimental method. By
favoring energetically reasonable ligand conformations, the CHI energies significantly
improve the pose ranking for structures obtained from docking algorithms, making the
rescored energy a better predictor of the quality of the docked pose. This improvement
was observed across all three programs indicating that the CHI energy functions may be
51
employed independently of the scoring functions. The CHI energy functions could also
be incorporated directly within docking programs as a component of the scoring function,
although that might require a reoptimization of the scoring functions. Application to
crystallographic data leads to the conclusion that proteins primarily recognize low-energy
conformations of carbohydrates. This final observation has considerable relevance to the
design of carbohydrate-based inhibitors and vaccines.
Individual Author Contributions
Anita K. Nivedha: Authored portions of the paper and prepared figures for the paper;
designed docking protocols and the antibody alignment algorithm; performed the
dockings; developed the CHI energy functions and applied the functions to docking
results; provided tools for analysis, analyzed and interpreted the data.
Spandana Makeneni: Authored portions of the paper; co-designed docking protocols and
the antibody alignment algorithm; performed binding site volume calculations and
provided tools for the analysis of data.
B. Lachele Foley: Contributed to the design of the antibody alignment algorithm and the
development of the CHI energy functions.
Matthew B. Tessier: Contributed to the design of preliminary docking protocols,
provided PREP files for the non-standard sugar residues, and scripts for the collection of
quantum mechanical data.
Robert J. Woods: Authored the paper; conceived and designed the experiment, and
contributed to the analysis and interpretation of data.
52
CARBOHYDRATE DOCKING
_____________________________
A. K. Nivedha, D. F. Thieker, R. J. Woods. Accepted by J. Chem. Theory Comput.
Reprinted here with permission of publisher.
53
Abstract
Docking programs are primarily designed to dock rigid, drug-like fragments onto
macromolecules, and frequently encounter issues predicting more flexible carbohydrate
molecules. The primary source of flexibility within a carbohydrate is the glycosidic
linkage. Previous efforts have developed Carbohydrate Intrinsic (CHI) energy functions
that reflect glycosidic torsion angle preferences. The following work represents the