CONSIDERATION OF GLYCOSIDIC TORSION ANGLE …

CONSIDERATION OF GLYCOSIDIC TORSION ANGLE PREFERENCES AND

CH/π INTERACTIONS IN PROTEIN-CARBOHYDRATE DOCKING

by

Anita Karen Nivedha

(Under the direction of Robert J. Woods)

ABSTRACT

Carbohydrates play a pivotal role in various life processes including energy metabolism,

storage, immune recognition, transportation, signaling and biosynthesis. In these roles,

they often interact with other integral components of the living system such as proteins

and lipids. An understanding of how these molecules interact can further our knowledge

of crucial biological processes, and begins with the knowledge of the three-dimensional

structures of these complexes. However, owing to challenges involved in crystallizing

oligosaccharide structures, theoretical modeling methods such as molecular docking are

often used to predict how oligosaccharides interact with protein receptors. But, docking

programs have generalized scoring functions which often produce unnatural

oligosaccharide conformations during docking. In this thesis, we present two approaches

to improve protein-carbohydrate docking by accounting for specific intra- and

intermolecular interaction energies relating to carbohydrates, which are not currently

dealt with by existing docking methodologies. In the first approach, we developed a set of

Carbohydrate Intrinsic (CHI) energy functions in order to account for intramolecular

energies of carbohydrate ligands primarily determined by the conformations of glycosidic

torsion angles connecting individual saccharides. This work resulted in the development

of Vina-Carb (incorporation of the CHI energy functions within the scoring function of

AutoDock Vina), which significantly improved the conformations of oligosaccharide

binding mode predictions. In the second approach, we developed a scoring function by

fitting a mathematical model to data from literature describing the energy contributed by

CH/π interactions. This energy function was used to score the crucial interactions

between CH groups lining the carbohydrate ring and the π electron densities in aromatic

amino acids of interacting proteins. Employing the CH/π interaction energy function to

rescore docked protein-carbohydrate complexes improved the rankings of accurate pose

predictions made by both AutoDock Vina and Vina-Carb. The scoring functions

developed and used in this work are transferable and can therefore be used with other

docking programs and also in the refinement of experimental carbohydrate structures.

INDEX WORDS: Autodock, AutoDock Vina, Molecular Docking, Protein-Carbohydrate

Docking, Docking Scoring Functions, Internal Energies, Carbohydrate, Carbohydrate

Intrinsic Energy Functions, CHI Energy Functions, Vina-Carb, Antibody, Antigen,

Lectin, Enzyme, Carbohydrate Binding Module, CH/π Interactions



by

Anita Karen Nivedha

B. Tech., Vellore Institute of Technology University, India, 2008

A Thesis Submitted to the Graduate Faculty of The University of Georgia in Partial

Fulfillment of the Requirements for the Degree

DOCTOR OF PHILOSOPHY

ATHENS, GEORGIA

2015

© 2015

Anita Karen Nivedha

All Rights Reserved



by

Anita Karen Nivedha

Major Professor: Robert J. Woods

Committee: James H. Prestegard

Liming Cai

Donald Evans

Electronic Version Approved:

Suzanne Barbour

Dean of the Graduate School

The University of Georgia

December 2015

iv

DEDICATION

I would like to dedicate this work to my beloved parents, Jenetta and Joshwa.

v

ACKNOWLEDGEMENTS

Firstly, I would like to acknowledge and extend my gratitude to my major

Professor, Dr. Robert J. Woods for his support, encouragement, guidance and for giving

me the wonderful opportunity to be a part of the Woods’ Group Family. I would like to

thank my PhD Advisory Committee, Dr. James H. Prestegard, Dr. Liming Cai and Dr.

Donald L. Evans for their valuable advice, insight and suggestions over the years as my

dissertation took shape. I would like to thank colleagues who were directly involved in

my research, Dr. B. Lachele Foley, Dr. Matthew B. Tessier, Dr. Spandana Makeneni and

David F. Thieker. It has been a great learning experience and a pleasure collaborating and

working with each one of you.

I would like to acknowledge the support of my peers in the Woods’ group: Dr.

Arunima Singh, Amika Sood, Dr. Jodi Hadden, Mark Baine, Dr. Xiaocong Wang, Dr.

Keigo Ito, Dr. Oliver Grant, Huimin Hu, Dr. Valerie Murphy, Dr. Mari DeMarco, Mia Ji,

Dr. Elisa Fadda, Dr. Joanne Martin and Dr. Hannah Smith. Matt, thank you for helping

me when I was a newbie in the group, and amongst other things, for teaching me to do

docking, which constitutes a major portion of my dissertation today. Arunima, Amika,

Spandana and Jodi, thank you for being with me through the ups and downs in Graduate

School. Keigo, thank you for helping me with all my QM questions and for your tips on

scientific writing. Mark, thank you for being a huge support during my time in the group

and for all of your efforts in keeping everything around the lab in order.

I am thankful to God for being my Provider and for all of His blessings at every stage

of my life as a graduate student. I would like to acknowledge the unconditional love,

vi

support and encouragement given by Mama and Papa. Thank you for being my greatest

cheerleaders. I would like to extend my heartfelt thanks to Amy, Ashley, Niranjana,

Madison, Jagadish, Cookieday, Adwoa, Ken, Femi, Anna, Ebenezer, Adeline, Savior

Karnik and Manikins, for being there for me, for believing in me, cheering me on and

supporting me throughout Graduate School. I could not have done it without your solid

support.

I would certainly not be where I am if not for all of the wonderful people who have

sown into my life and my career. For them, I am forever grateful.

vii

LIST OF TABLES

Table 4.1 PDB IDs and ligand sequences employed in the study, including the shape

RMSD (SRMSD) values for the ligands generated by GLYCAM, relative to the

crystallographic ligands. ....................................................................................... 25

Table 5.1 Comparison between ADV and VC at the four settings of CHI-coefficient and

CHI-cutoff. ............................................................................................................ 68

Table 5.2 PRMSDmin(5) produced by ADV and VC1|2 for the 12 test systems with ligands

containing 1,6-linkages. ........................................................................................ 69

Table 5.3 Comparison between ADV and VC1|2 for the apo proteins Test Set. ............... 76

Table 6.1 Average rank of accurate PRMSDmin pose predictions by ADV and VC1|2

before and after rescoring as a function of the CH/π interaction energy

coefficients. The systems are divided into different groups based on the number of

detected CH/π interactions. ................................................................................... 95

viii

LIST OF FIGURES

Figure 2.1. An illustration of the conversion from the chain and ring form of glucose. .... 6

Figure 2.2 A representation of two chair conformations of Glucose, namely, 4C1 and

1C4.

................................................................................................................................. 7

Figure 2.3 A 1-3 glycosidic linkage formation between a glucopyranose (Glcp) unit and a

galactopyranose (Galp) unit. The D in the name refers to the molecule being

dextrorotatory, which refers to it rotating plane polarized light to the right. .......... 8

Figure 2.4 Carbohydrate epimers: galactose and glucose are C4 epimers, while glucose

and mannose are C2 epimers. .................................................................................. 8

Figure 3.1 a.) Rigid Docking b.) Flexible Ligand Docking .............................................. 14

Figure 3.2 The workflow within the AutoDock Vina algorithm. ..................................... 18

Figure 4.1 (a) Illustration of an antibody with its variable fragment (Fv) aligned to the

grid box. The yellow dot represents the CoM of the CDRs (0,0,0), and the green

dot represents the center of the grid box (0,0,11). (b) Aligned orientation of an

antibody antigen-binding fragment (Fab), with respect to the internal reference

axes. The region in red + pink represents the VH domain (CDRs (red) and

framework regions (pink) of the heavy chain) of the antibody, while the region in

blue represents the VL domain (CDRs (dark blue) and framework regions (cyan)

of the light chain). The X-axis for the alignment was defined by a vector passing

through the CoM of the variable light chain (VL domain, which contains the light

chain CDRs and framework sequences), and the CoM of the variable heavy chain

(VH domain). The Z-axis was defined as a vector normal to the X-axis, and

passing through the CoM of the entire variable region, or variable fragment (Fv).

ix

The antibody was then translated so that the CoM of the CDRs was placed at the

origin. The Y-axis was defined as a vector perpendicular to the XZ-plane, and

passing through the origin. The docking grid box was aligned to the internal co-

ordinate axes with its center offset from the origin by 11Å along the Z-axis, so as

to optimally encompass the CDR loops, while also permitting adequate volume

for the movement of the ligand during docking. Such a definition enabled the

docking grid box to be consistently aligned with respect to the CDRs. ............... 28

Figure 4.2 PRMSD and SRMSD calculation. Shown in (a) and (b) are the PRMSD and

SRMSD, respectively, of a representative docked pose with respect to its crystal

ligand. (a) The PRMSD is the RMSD between the ring atoms of a representative

docked structure (white) and the corresponding crystal structure (black). (b) The

SRMSD is the RMSD value obtained after the docked structure (white) is

superimposed on the crystal structure (black). ..................................................... 30

Figure 4.3 The φ and ψ angle distributions from 100 docked structures, for selected

linkages, as indicated by the dashed rectangle. Data are presented, in order, for

AD3 (black bars), AD4.2 (white bars) and ADV (grey bars). The bin containing

the experimentally-determined values is highlighted with a light blue outline. The

bin containing the structure with the lowest docked energy is indicated as follows:

AD3, yellow; AD4.2, orange; ADV, green. ......................................................... 34

Figure 4.4 Representation of the 8 model disaccharides pertinent to the development of

CHI energy functions. The models depicting 1,2-linkages can be used to model

1,4-linkages due to symmetry about the O5 atom. ................................................ 35

x

Figure 4.5 Individual (dashed lines) and average (solid line) rotational energy curves for

models (see Figure 4.4) whose linkages have similar local geometries. .............. 36

Figure 4.6 Comparison of the CHI energy functions (solid line) to the glycosidic torsion

angle distributions of carbohydrates from experimental co-crystal structures

(histograms). ......................................................................................................... 38

Figure 4.7 Scatter plots demonstrating improvement in the linear correlation between

SRMSD and docked energies after rescoring, for each of the three docking

programs. Points before rescoring are shown in dark grey and points after

rescoring are shown in light grey. Shown in the insets are SRMSD vs. docked

energy plots of only the overall lowest PRMSD structure for each of the six

antibody systems before (dark grey) and after (light grey) rescoring. The black

rectangles in all insets enclose plot areas with SRMSD ≤ 1 Å and energies ≤ 0

kcal/mol................................................................................................................. 39

Figure 4.8 Graphs showing the distribution of conformations produced by AD3 ( ),

AD4.2 ( ) and ADV ( ) plotted onto the corresponding CHI energy curves for

each of the representative linkage combinations; the curves are offset from each

other by 6 kcal/mol. .............................................................................................. 41

Figure 4.9 a) SRMSDs of the lowest energy poses for all six systems from AD3, AD4.2

and ADV, before (dark grey) and after (light grey) rescoring. (b) PRMSDs of the

lowest energy poses for all six systems from all three docking programs, before

(dark grey) and after (light grey) rescoring. .......................................................... 43

Figure 4.10 (a) AD3 lowest energy pose for 1S3K before rescoring (white) compared to

the crystal ligand (black); PRMSD = 5.7 Å. (b) Lowest energy pose after

0

2

4

6

8

10

12

14

16

18

20

22

0 60 120 180 240 300 360

ΔE

[k

cal/

mo

l]

ψ [deg]

0

2

4

6

8

10

12

14

16

18

20

22

0 60 120 180 240 300 360

ΔE

[k

cal/

mo

l]

ψ [deg]0

2

4

6

8

10

12

14

16

18

20

22

0 60 120 180 240 300 360

ΔE

[k

cal/

mo

l]

ψ [deg]

xi

inclusion of the CHI energy (white) compared to the crystal ligand (black);

PRMSD = 0.6 Å. ................................................................................................... 44

Figure 4.11 Docking the trisaccharide to the Salmonella antibody (in 1MFD and 1MFA).

(a) Lowest energy pose from ADV for 1MFD before rescoring (white) compared

to the crystal ligand (black); PRMSD = 5.5Å. (b) Lowest energy pose from ADV

for 1MFD after rescoring (white) compared to the crystal ligand (black); PRMSD

= 1.0Å. (c) and (d) show the 1MFD antibody in transparent surface representation

along with the oxygen atom belonging to the water molecule from the

crystallographic co-complex, WAT 601; in (c) the crystal ligand from 1MFD is

shown in CPK representation, and in (d) the lowest energy pose from ADV for

1MFD before rescoring (in CPK representation) showing the Gal residue

replacing Abe within the binding pocket is shown. (e) The Gal residue from the

ligand in 1MFD (in van der Waals representation) after being superimposed onto

the Abe residue from the ligand in 1MFA is shown within the 1MFA binding site.

A cross-section of the 1MFA antibody is represented as a transparent surface with

potential steric clashes visible between the Gal residue and the antibody. (f) Same

as (e) but with the 1MFA antibody represented as an opaque surface thus more

clearly depicting potential steric clashes between the O-3 and O-6 groups of the

Gal residue and the interior of the binding pocket. ............................................... 48

Figure 4.12 Docking to the antibody in 1M7I using AD4.2. (a) Lowest energy pose

before rescoring (white) compared to the crystal ligand (black); PRMSD = 3.9Å.

(b) Lowest energy pose after rescoring (white) compared to the crystal ligand

(black); PRMSD = 10.7Å. .................................................................................... 49

xii

Figure 5.1 a.) The effect of applying CHI-coefficient values of 1 (solid line), 2 (dashed

line) and 5 (dotted line) to the original VCΦ|β curve. b.) The effect of applying a

CHI-cutoff value of 2 to the original CHIΦ|β curve (VC1|2). ................................. 58

Figure 5.2 Assessment of docking to 14 antibody systems with ADV and various CHI-

energy coefficients of VC a.) SRMSDavg amongst the 5 top-ranked poses b.)

PRMSDmin(5). ......................................................................................................... 65

Figure 5.3 Comparison of the VC1|2 (dotted line) and VC2|1 (solid line) CHIΦ|β curve to

the distribution of glycosidic linkages in carbohydrate crystal structures in the

PDB. The bottom X-axis and left Y-axis correspond to the histogram which

depicts the distribution of PDB structures, while the top X-axis and right Y-axis

correspond to the CHI-energy curves. .................................................................. 67

Figure 5.4 Distribution of ω angles produced by ADV (blue) and VC1|2(green) for 12 test

systems containing one or more 1,6-linkages overlaid against the reference crystal

structure ω angles (red dots) and the corresponding CHI energy curve. .............. 70

Figure 5.5 a.) The PRMSDmin(5) pose from ADV compared to the reference ligand (blue).

b.) The PRMSDmin(5) pose from VC1|2 compared to the reference ligand (blue). c.)

The Φ torsion angles of α-sugars from the docked poses of the 3C6S ligand from

both ADV (yellow triangles) and VC1|2 (green squares) plotted on to the CHI

curve. The torsion angles corresponding to the reference are plotted as blue

circles. ................................................................................................................... 72

Figure 5.6 The crystal structure of a CBM from endoglucanase Cel5A (PDB ID: 4AFD)

is depicted in complex with a tetrasaccharide ligand. All amino acids further than

xiii

5 Å away from the ligand are colored grey. Those residues within 5 Å are colored

orange if they are cyclic and red if acyclic. .......................................................... 74

Figure 5.7 a.) Models representing the PRMSDmin(5) produced by docking 1MFC with

ADV (yellow) and VC at CHI1|2 (green). The primary difference between docked

models is a rhamnose ring that is flipped approximately 180 degrees, highlighted

by the orange arrows. b.) Ligands from two crystal structures, 1MFB (blue) and

1MFC (cyan), also differ by the orientation of the RAM 524 ring....................... 75

Figure 5.8 A depiction of the ranks of acceptable poses (Rankacc), i.e., the lowest-ranked

pose with PRMSD ≤ 2Å, produced by ADV and VC1|2 from docking

oligosaccharide ligands onto apo protein structures. ............................................ 77

Figure 5.9 a) The ligands from five crystal structures (PDB ID: 2E0P, 2EO7, 2EEX,

2EJ1, and 2EQD) of the Cel44A enzyme are superimposed on the protein from

PDB ID: 2EQD. Amino acids reported to be involved in substrate binding (N45,

R47, W64, W71, W327, W331, E359, and W392) are colored orange or red,

depending on whether the residue is aromatic or not. 146

The catalytic residue

(Q186) is colored yellow. All other amino acids are grey. The active site has been

separated into a (-) and (+) site. The circled values represent the position of each

residue relative to the glycosidic linkage that is cleaved during catalysis. The

ligands exclusive to the (-) side of the active site are depicted by varying shades

of purple. The octasaccharide that extends across both the (-) and (+) site (2EQD)

is colored blue. Each carbohydrate ring is colored according to whether the CHI

energy penalty is applied to the surrounding Φ/Ψ values. Rings are either green or

red depending on whether VC is or is not applied, respectively. b) A

xiv

representation of the PRMSDmin(5) and PRMSDmin(20) poses from ADV and VC1|2.

c) The glycosidic linkages of the octasaccharide that extends across the active site

(2EQD) are labeled according to the penalty received by the CHI energy curve.

Penalties greater than 2 kcal/mol are highlighted in red. VC is not applied to the (-

1) residue since it is neither a 4C1 nor

1C4 chair, so the ring is colored red and the

penalties are unlisted. ............................................................................................ 80

Figure 6.1 The replacement of the aromatic group in (A) by the group aliphatic group in

(B) in the study by Water et al. 154

in an interaction with a tetraacetylglucose

molecule led to a decrease in the interaction energy of the system. ..................... 85

Figure 6.2 The carbohydrate antigen from Salmonella stacking against two aromatic

amino acids, namely, a Tryptophan and a Tyrosine in the binding pocket of an

antibody Fab fragment. (PDB ID: 1MFE)33

......................................................... 87

Figure 6.3 Representation of CH/π interactions between β-D-Glucopyranose (βDGlcp)

and Phenylalanine. ................................................................................................ 88

Figure 6.4 The mathematical model (Lennard-Jones potential) used in this study to

describe the interaction between a CH-group and an aromatic moiety. ............... 90

Figure 6.5 Detection of CH/π interactions a.) An average position of the co-ordinates of

the atoms C2, O5 and O1 is determined. In order to find the vector C1H1, the

negative of the vector between points C1 and the average of atom positions C2, O5

and O1 (computed in (a.)) is determined. b.) The distance between the centroid of

the aromatic ring and the plane of the carbohydrate ring delineated by atoms O5,

C2, C3 and C5 is determined, dcenters (≤ 7Å). c.) The carbon atoms in the

carbohydrate ring are projected onto the aromatic ring plane and the distances

xv

between each of these projections and the centroid of the aromatic ring is

determined, dcp (≤ 2.5Å). Shown in green are the CH bond vectors pointing

towards the aromatic ring (scored), and shown in red are the CH bond vectors

pointing away from the aromatic ring (not scored). ............................................. 93

Figure 6.6 The effect of applying the CH/π interaction to the top-ranked pose produced

by VC1|2 before and after rescoring. Shown in green is the crystal ligand, in white

is the top-ranked pose before rescoring (PRMSD = 5.6Å) and in blue is the top-

ranked pose after rescoring (PRMSD = 0.9Å). ..................................................... 97

Figure 6.7 Model Systems used by Ringer et al. to quantify CH/π interactions using

quantum mechanical calculations ......................................................................... 99

Figure 6.8 a.) The individual interaction energy curves for the models (as described in

Figure 6.7) used by Ringer et al. 155

, alongside the average of the individual

curves. b.) The average curve (a) shown alongside the mathematical model used

in the current study.............................................................................................. 100

xvi

CONTENTS

ACKNOWLEDGEMENTS ................................................................................................ v

LIST OF TABLES ............................................................................................................ vii

LIST OF FIGURES ......................................................................................................... viii

1. Introduction ................................................................................................................. 1

2. Carbohydrates: Biological Significance and Structure ................................................ 4

3. Computational Methods/Molecular Docking ............................................................ 12

4. The Importance of Ligand Conformational Energies in Carbohydrate Docking:

Sorting the Wheat from the Chaff ..................................................................................... 19

Abstract ......................................................................................................................... 20

Introduction ................................................................................................................... 20

Methods......................................................................................................................... 23

Results and Discussion ................................................................................................. 30

Conclusions ................................................................................................................... 50

Individual Author Contributions ................................................................................... 51

5. Vina-Carb: Improving Glycosidic Angles During Carbohydrate Docking ............... 52

Abstract ......................................................................................................................... 53

Introduction ................................................................................................................... 54

Methods......................................................................................................................... 56

Results & Discussion .................................................................................................... 62

xvii

Conclusions ................................................................................................................... 81

Individual Author Contributions ................................................................................... 83

6. The Consideration of CH/π Interactions in Carbohydrate-Protein Docking ............. 84

Introduction ................................................................................................................... 84

Methods......................................................................................................................... 89

Results and Discussion ................................................................................................. 94

Conclusions ................................................................................................................... 97

Future Directions .......................................................................................................... 98

7. CONCLUSIONS ..................................................................................................... 101

8. REFERENCES ........................................................................................................ 103

9. Appendix ................................................................................................................. 127

Supplementary Information Chapter 4........................................................................ 127



1

1. INTRODUCTION

This dissertation can be sub-divided into the following sections:

1. The comparison of docking programs for carbohydrate docking and the

development of Carbohydrate Intrinsic (CHI) Energy Functions, which describe

the rotational preferences of oligosaccharides about the glycosidic linkage.

2. The development and evaluation of Vina-Carb, formed by incorporating the CHI

energy functions within the scoring function of AutoDock Vina, and comparison

to the original program, AutoDock Vina.

3. The development of a CH/π interaction energy term to score CH/π interactions in

protein-carbohydrate complexes and the application of the function to docked

protein-carbohydrate complexes.

The above topics, along with a literature review of background information and the

computational methods applied in each case are presented in the following manner:

CHAPTER 2: CARBOHYDRATES: BIOLOGICAL SIGNIFICANCE AND

STRUCTURE

Chapter 2 is a discussion on the structure and biological significance of carbohydrate and

protein-carbohydrate interactions.

CHAPTER 3: MOLECULAR DOCKING

Chapter 3 discusses the theory behind the molecular docking computational method to

predict intermolecular interactions. It further discusses the challenges associated with

carbohydrate ligands, and specifically describes the AutoDock Vina docking algorithm.

2

Additionally in this chapter, an introduction to the research described in the following

chapters is presented.

CHAPTER 4: IMPORTANCE OF LIGAND CONFORMATIONAL ENERGIES IN

CARBOHYDRATE DOCKING: SORTING THE WHEAT FROM THE CHAFF

Chapter 4 is an original research study, in which the performances of various versions of

the popular docking program, AutoDock is compared using a set of antibody-

carbohydrate complexes. A set of Carbohydrate Intrinsic (CHI) energy functions are

developed, which are used to describe the conformational preferences of glycosidic

linkages constituting oligosaccharides. The CHI energy functions are then employed to

rescore the docked poses. The results from this study was published as a journal article.

A. K. Nivedha, S. Makeneni, B. L. Foley, M. B. Tessier , R. J. Woods, J. Comput. Chem.

2014, 35, 526–539.

CHAPTER 5: VINA-CARB: IMPROVING GLYCOSIDIC ANGLES DURING

CARBOHYDRATE DOCKING

Chapter 5 describes original research in which the CHI energy functions were

incorporated within AutoDock Vina’s scoring function, leading to the development of

Vina-Carb. The performances of Vina-Carb and AutoDock Vina were evaluated using a

set of protein-carbohydrate complexes consisting of antibodies, lectins, carbohydrate

binding modules and enzymes. This work has been accepted for publication.

A. K. Nivedha, D. F. Thieker, R. J. Woods, J. Chem. Theory. Comput. 2015

3

CHAPTER 6: THE CONSIDERATION OF CH/Π INTERACTIONS IN

CARBOHYDRATE-PROTEIN DOCKING

Chapter 6 describes original research in which, utilizing available literature, a

mathematical model to score CH/π interactions in protein-carbohydrate complexes has

been developed and employed in rescoring docking results from AutoDock Vina and

Vina-Carb, for a test set consisting of lectin-carbohydrate complexes.

CHAPTER 7: CONCLUSIONS AND FUTURE DIRECTIONS

Chapter 7 summarizes the main conclusions from the preceding chapters and discusses

future directions.

4

2. CARBOHYDRATES: BIOLOGICAL SIGNIFICANCE AND STRUCTURE

Carbohydrates play a central role in energy metabolism, biological recognition

and as structural components in living organisms. 1-3

4-6

Carbohydrate-binding proteins

are required for transportation, degradation, biosynthesis, storage, antigen-binding and

signaling. 7,8

They may exist both as freestanding entities or covalently linked to

macromolecules such as proteins (glycoproteins) and lipids (glycolipids), frequently

found attached to the outer cell surfaces, where they are conveniently positioned to

modulate interactions between various components of the living system by mediating

cell-cell and cell-molecule interactions. 9 When oligosaccharides are organized in the

form of glycoconjugates, the mere size of the attached oligosaccharides influences the

interactions of the glycoconjugates with other molecules. For example, N-glycosylation

and O-glycosylation are common post-translational modifications which occur in

proteins. 10

11-14

, which protect the protein from degradation and in intracellular

trafficking and secretion. 2 Aberrant glycosylation is often a hallmark of diseases such as

rheumatoid arthritis 15-19

and cancer.20-23

Many carbohydrate-based host-pathogen interactions are currently known. 24

Surface polysaccharides are the most common structures found on the outer surfaces of

bacterial cells. 25,26

In gram negative bacteria, carbohydrates are found constituting the

lipopolysaccharides, lipooligosaccharides or capsular polysaccharides.27

The conjugation

of a polysaccharide to a carrier protein has resulted in the production of commercially

available vaccines such as those against Haemophilus influenzae 28

and Streptococcus

pneumoniae 29

Many bacterial and viral pathogens bind to host tissue via interactions

5

with carbohydrates on the surfaces of the host cell. Antibodies contain glycans as part of

their structure and some antibodies are reactive against sugars found on cell surfaces of

bacteria such as Shigella and Salmonella. 30-35

Of the four major classes of macromolecules found in living organisms, namely,

nucleic acids, proteins, carbohydrates and lipids, carbohydrates are the most structurally

diverse. 36

They are primarily defined as polyhydroxyaldehydes or polyhydroxyketones,

and in their simplest form exist as monosaccharides, which combine with each other via

glycosidic linkages forming oligosaccharides. Monosaccharides can exist in both the

open chain and ring forms. When the chain-form of the monosaccharide has a carbonyl

group (C==O) on one end which forms an aldehyde, it is called an aldose, whereas if this

carbonyl group is in the middle forming a ketone, it is referred to as a ketose. The ring

form of a monosaccharide, which is the preferred form in aqueous solutions and in

oligosaccharides, is formed when the oxygen on C5, i.e., O5 links with the carbon

comprising the carbonyl group (C1), transferring its hydrogen to the carbonyl oxygen

forming a hydroxyl group. This forms a chiral anomeric center at C1. The oxygen at C1

(O1) can be either axial or equatorial with respect to the carbohydrate ring. This

electronegative O1 atom prefers to adopt the axial orientation due to steric and

stereoelectronic effects, instead of the less hindered equatorial orientation which would

be expected to be the preferred orientation based on steric effects alone. This is known as

the anomeric, or more accurately, the endo-anomeric effect.

6

Figure 2.1. An illustration of the conversion from the chain and ring form of glucose.

Monosaccharides forming a five-membered ring are called furanoses and those

which form a six-membered ring are called pyranoses. Similar to cyclohexanes, 6-

membered monosaccharides exist most often in one of two isomeric chair conformations,

which are specified as 1C4 and

4C1, where the letter C stands for ‘chair’ and the numbers

indicate the carbon atoms above and below the reference plane of the chair conformation

formed by the atoms C2, C3, C5 and O5. (Figure 2.2)

chain form of glucose

anomeric carbon

α-glucopyranose β-glucopyranose

7

Figure 2.2 A representation of two chair conformations of Glucose, namely, 4C1 and

1C4.

The individual units constituting proteins and nucleic acids are generally

connected in a linear fashion by a single type of linkage, namely, the amide linkage

between amino acids in proteins and the 3’ to 5’ phosphodiester bonds in nucleic acids. 37

Oligosaccharides however, can be linear or branched and each monosaccharide unit can

be linked to another via a glycosidic linkage which can be if different types depending on

the stereochemistry of the C1 atom on the non-reducing sugar and that of the linking atom

on the reducing sugar. A disaccharide is formed when two monosaccharides combine via

a condensation reaction, resulting in the release of a water molecule and the formation of

a glycosidic bond. The formation of a glycosidic linkage results in the formation of a

reducing sugar on one end and a non-reducing sugar on the other.

4C11C4

8

Figure 2.3 A 1-3 glycosidic linkage formation between a glucopyranose (Glcp) unit and a

galactopyranose (Galp) unit. The D in the name refers to the molecule being

dextrorotatory, which refers to it rotating plane polarized light to the right.

Different kinds of sugars exist in nature and the main difference between most

saccharides is in the orientation of the hydroxyl groups with respect to the plane of the

carbohydrate ring, resulting in significant differences in the physical and chemical

properties of the sugars. Glucose and mannose are C2-epimers while glucose and

galactose are C4-epimers. (Figure 2.4) These hexoses have the molecular formula

C6H12O6. The stereoisomers for these aldohexoses were identified by the German chemist

Emil Fischer in the early 19th

century. 38

Figure 2.4 Carbohydrate epimers: galactose and glucose are C4 epimers, while glucose

and mannose are C2 epimers.

H2O

bDGlcp bDGalp bDGlc1-3bDGal

bDGalp bDGlcp bDManp

9

The three-dimensional structures of carbohydrates are greatly influenced by the

conformations of the glycosidic linkages connecting individual monosaccharide units.

The lone pair of electrons on the O5 atom of the sugar ring has a significant effect on the

conformational stability and orientation of the glycosidic linkage. 39,40

The anomeric

effect is observed in saccharides, due to which the electronegative substituent at the C1

position tends to adopt the axial orientation rather than the equatorial orientation in

contrast with expectations based solely on sterics. 41-46

From previous work analyzing the preferences of glycosidic bonds, it is clear that

carbohydrates most prefer a single rotamer at both the Φ and Ψ linkages. The preferred

range of glycosidic angle values is broader for the Ψ angle compared to the Φ linkage. It

is also known that some proteins distort the carbohydrate ring shapes, and consequently

the glycosidic linkages upon binding. A survey of the PDB for protein-carbohydrate

crystal complexes in which the oligosaccharide is bound to enzymes in addition to other

proteins such as lectins an antibodies, revealed that the distortion of glycosidic linkage by

binding partners of carbohydrates is a rare occurrence. 47,48

Carbohydrate-Protein Complexes

Proteins that bind to carbohydrates have a great diversity of binding site

topologies and functions, and include enzymes, lectins, antibodies and periplasmic

receptors. 49

Complex formation is driven primarily by hydrogen bonding, van der Waals

contacts, and hydrophobic interactions. 50

Whereas the former contributes to specificity,

51 by virtue of the directionality of the hydroxyl groups, the latter two contribute to

affinity through non-specific interactions. 52

Being highly polar molecules, sugars are

highly solvated in an aqueous solution. The hydroxyl groups in a sugar molecule are

10

involved in cooperative hydrogen bonds, bidentate hydrogen bonds and hydrogen

bonding networks. 53

Each hydroxyl group in a saccharide can engage in two kinds of

hydrogen bonds, as a donor of one hydrogen bond and an acceptor of two through the sp3

lone pairs. When the sugar hydroxyl group is a donor, the hydrogen bonds formed are

shorter or stronger than those formed when the sugar hydroxyl group is an acceptor. 54

In

cooperative hydrogen bonds, the hydroxyl group in the sugar acts as both a donor and

acceptor of hydrogen bonds. A bidentate hydrogen bond is formed when two adjacent

hydroxyl groups in a 4C1 sugar interact with a different atom of the same planar polar

side-chain residue. The presence of both cooperative and bidentate hydrogen bonds leads

to the creation of networks of hydrogen bonds between the sugars and interacting amino

acids. And when these planar polar residues hydrogen bond with nearby polar residues, it

results in the formation of a more elaborate hydrogen bond network. Hydrogen bonds

formed as a result are strong enough to stabilize the complex but are also weak enough to

accommodate ligand dynamics. Amino acids with polar planar side-chain groups, capable

of forming all three kinds of hydrogen bonds, such as Glu, Gln, Asp, Asn, Arg and His,

are abundant in the binding sites of sugars. 51

Van der Waals interactions make a significant contribution to protein-

carbohydrate complex-formation, in addition to contributions from other interactions

such as the stacking of the hydrophobic patches of carbohydrate rings against aromatic

amino acids lining the binding site. An analysis of protein-carbohydrate complexes in the

PDB has revealed that carbohydrate binding sites have a higher propensity for aromatic

amino acids namely, tryptophan, tyrosine, phenylalanine and histidine compared to the

rest of the protein. 55-57

The presence of aromatic amino acids in the sugar binding site

11

also contributes to specificity by allowing or disallowing particular sugar epimers

through the combination of steric hindrance and a favorable or unfavorable polar

environment. 58

A wealth of information can be gained from an understanding of the structure and

dynamics of protein-carbohydrate interactions, however, carbohydrates are extremely

flexible molecules 59

, making protein-carbohydrate complexes particularly challenging to

crystallize. As a result, computational methods such as molecular docking and molecular

dynamics simulations can be employed to gain insight into the physical and biochemical

properties carbohydrate molecules, both freely in solution and in complex with proteins.

The knowledge thus gained has various applications including gene therapy and the

design of carbohydrate-based biotherapeutic agents.

12

3. COMPUTATIONAL METHODS/MOLECULAR DOCKING

A detailed understanding of the three-dimensional structure and subsequently the

function of carbohydrates is vital in increasing our understanding of crucial biological

processes. However, obtaining experimental 3D structures of carbohydrates is a

challenge, 60

and as a result, theoretical modeling methods can be employed to aid in

understanding the relationship between the structure and function of oligosaccharides.

Molecular docking and molecular dynamics simulations are key computational

approaches used in the study of carbohydrate molecules. In this chapter we will focus on

molecular docking methodologies, specifically in relation to oligosaccharide ligands.

Molecular docking predicts the binding orientation and affinity of a small molecule

(ligand), with respect to a larger molecule (macromolecule). The area around the

predicted ligand binding site on the macromolecule is specified using a gridbox. The two

main steps in docking are searching and scoring. The search algorithm searches the

available conformational space for favorable binding modes of the ligand with respect to

the macromolecule, while the docking scoring function evaluates each pose generated by

the algorithm. During docking, a compromise between speed and effectiveness in

sampling the conformational space available has to be made. The program typically

produces several models at the end of a docking run, which are then ranked based on

calculated binding affinities.

There are different approaches to docking, such as rigid docking and flexible docking.

Figure 3.1 When all torsion angles are frozen during docking, it is termed as rigid

13

docking. During flexible docking, some if not all of these parameters are allowed to vary.

If upon complex-formation significant conformational change occurs in either the protein

or ligand or in both molecules, rigid docking is inadequate to model such a binding event.

In such cases, flexible docking should be the method of choice, which allows for induced

fit during complex formation. The level of computational complexity allowed during a

docking run can be set by the user, by adjusting the level of flexibility of the ligand and

macromolecule. Proteins can be docked rigidly, because, a comparison of experimental

protein-ligand complexes to their unbound counterparts has revealed that in most cases,

only a few side-chains in the active site of the protein change conformation.

1.

2.

n.

.

.

.

.

.

.

.

.

Macromolecule Ligand

Gridbox

Docked Complexes Ranked

according to Binding Affinities.a.)

14

Figure 3.1 a.) Rigid Docking b.) Flexible Ligand Docking

The application of a scoring function helps to assess protein-ligand

complementarity more than calculating binding affinity, as even non-binder ligands can

be docked and given a binding affinity score using molecular docking. However, docking

has proved to be an indispensable computational tool which helps in obtaining a 3D

starting structure for a bound protein-ligand complex, which could not be obtained

experimentally. It also helps to assess the binding of multiple small molecules against a

single protein target and compare binding affinities. Protein-ligand complementarity is a

prerequisite for binding to occur, but cannot be used as the sole criterion for evaluation.

Docking scoring functions evaluate how well the predicted binding pose of a

ligand complements the protein binding site, and can be empirical or knowledge-based

scoring functions. Empirical scoring functions operate on the assumption that binding

1.

2.

n.

.

.

.

.

.

.

.

.

Macromolecule Ligand

Gridbox

Docked Complexes Ranked

according to Binding Affinities.b.)

15

affinities can be evaluated by the summation of independent interaction energy terms,

which in most cases is a weighted sum of electrostatics, hydrogen bonding, hydrophobic

interaction and repulsion terms. The coefficients for the individual terms of the scoring

function are derived by fitting to experimentally determined Ki values of protein-ligand

complexes with solved crystal structures. In general, these scoring functions suffer from a

significant dependence on ligand size, i.e., greater the size of the docked ligand, greater

or better the calculated binding affinity. Knowledge-based scoring functions are derived

by performing a statistical analysis of experimentally-determined protein-ligand

complexes based on the assumption that if certain contacts occur at a statistically

significant rate, it must be favorable and vice versa.

Several parameters affect the performance of the docking scoring function,

including the physical and chemical properties of input molecules, the preparation of the

input and the individual terms of the docking scoring function. Docking scoring functions

are usually developed for the purpose of high-throughput virtual screening of relatively

small, rigid, drug-like molecules. In this thesis, we will study the performance of such

docking methodologies with respect to carbohydrate ligands, which are larger, more

flexible molecules ranging from a disaccharide to a dodecasaccharide connected by 1,x-

linkages (x = 2, 3, 4 or 6). Applying these generalized docking scoring functions to

carbohydrate docking usually leads to an unfavorable deviation of the carbohydrate

ligands from their natural conformations. It may be useful to customize docking scoring

functions to specifically dock carbohydrate ligands.

The glycosidic torsion angles connecting individual monosaccharide units have a

major influence on the overall conformation of an oligosaccharide ligand. Although these

16

linkages are generally flexible, this flexibility spans a limited range of preferred torsion

angles, which has been identified from a survey of carbohydrate crystal structures in the

PDB. 48

All protein-carbohydrate complexes found in the PDB were included in this

survey which consisted of carbohydrates both covalently and non-covalently interacting

with proteins such as lectins, antibodies, enzymes, carbohydrate binding modules, etc. In

the past, efforts have been made to model the conformational preferences of

carbohydrates into molecular docking; the approaches used include a re-calibration of an

existing docking scoring function to model carbohydrate properties, the inclusion of

additional interaction energy terms in the scoring function which are crucial to protein-

carbohydrate binding and the inclusion of a carbohydrate conformational energy score to

an existing docking scoring function.

In this thesis, the performances of a few docking programs are evaluated and

compared using a set of antibody-carbohydrate complexes with solved X-ray crystal

structures from the PDB. A standardized docking protocol for docking oligosaccharide

ligands onto antibodies has also been described. A set of energy functions which

calculate the conformational energies of carbohydrates has been derived using quantum

mechanical methods. These carbohydrate internal energy functions, known as

Carbohydrate Intrinsic (CHI) energy functions score a disaccharide molecule based on

the orientations of the glycosidic torsion angles. The CHI energies were then added to

docked energies, showing a significant improvement in the ranking of accurate binding

poses. Finally, the CHI energy functions were coded to constitute the docking program’s

(AutoDock Vina) scoring function leading to the development of Vina-Carb. The

performance of Vina-Carb was evaluated against a set of 72 protein-carbohydrate

17

complexes with solved crystallographic structures from the PDB, and compared to the

performance of the original docking program without the CHI energy functions,

AutoDock Vina.

For each AutoDock Vina docking job, multiple runs are started from random

conformations. The number of individual runs are determined by the exhaustiveness

parameter, which can be set by the user. Each run consists of a set of sequential steps,

which are determined heuristically based on the number of flexible bonds in the system

under study. Each step consists of 3 stages, namely a random perturbation of the system,

followed by a local optimization using the Broyden-Fletcher-Goldfarb-Shanno algorithm

and a selection step in which the step is either accepted or not. Each local optimization

involved numerous evaluations of the docking scoring function, and is decided based on

convergence and other criteria. Each run can produce multiple promising results, which

are stored, and finally merged, clustered and sorted to produce the final result of docked

poses. (Figure 3.2)

18

Figure 3.2 The workflow within the AutoDock Vina algorithm.

Run R1

Run R2

Run RN

Step S1

Step S2

Step SN

Random

Perturbation

Local Optimization

(BFGS)

AutoDock Vina Each Run, Ri

Each Step, Si

Evaluations

of Scoring

Function

Selection

Merged. Refined. Clustered. Sorted. Final Result

19

4. THE IMPORTANCE OF LIGAND CONFORMATIONAL ENERGIES IN

CARBOHYDRATE DOCKING: SORTING THE WHEAT FROM THE CHAFF

_____________________________

A. K. Nivedha, S. Makeneni, B. L. Foley, M. B. Tessier , R. J. Woods, J. Comput. Chem.

2014, 35, 526–539. Reprinted here with the permission of publisher.

20

Abstract

Docking algorithms that aim to be applicable to a broad range of ligands suffer reduced

accuracy because they are unable to incorporate ligand-specific conformational energies.

Here, we develop internal energy functions, Carbohydrate Intrinsic (CHI), to account for

the rotational preferences of the glycosidic torsion angles in carbohydrates. The relative

energies predicted by the CHI energy functions mirror the conformational distributions of

glycosidic linkages determined from a survey of oligosaccharide-protein complexes in

the Protein Data Bank. Addition of CHI energies to the standard docking scores in

Autodock 3, 4.2, and Vina consistently improves pose ranking of oligosaccharides

docked to a set of anti-carbohydrate antibodies. The CHI energy functions are also

independent of docking algorithm, and with minor modifications, may be incorporated

into both theoretical modeling methods, and experimental NMR or X-ray structure

refinement programs.

Introduction

Protein-carbohydrate interactions are crucial in numerous aspects of biology, including

metabolism, gene expression, cell-cell communication, growth, development, and

immune response 9. In vivo, complex carbohydrates (glycans) are found on cell surfaces

as glyconjugates (glycoproteins/glycolipids) or polysaccharides, mediating biological

function by their direct interaction with proteins, such as receptors (lectins), enzymes,

and antibodies. Cancer is marked by aberrant glycosylation which can serve as a disease-

related marker, or as a target for therapeutic intervention 22,61-63

. Conversely, endogenous

cell-surface glycans are frequently exploited by infectious agents, as in the

21

hemagglutinin-mediated adhesion of influenza A virus. 64-66

A physical understanding of

carbohydrate-protein interactions aids in the development of therapeutic agents designed

to block such interactions, 67-70

such as antibodies which target specific glycans. 71,72

A

better understanding of the immune system’s response to carbohydrate-based vaccines, 73-

76 facilitates the prediction and rationalization

71 of hazardous or misleading cross-

reactivities between antibodies against disease-related carbohydrates, and endogenous

glycans. 77,78

The challenges involved in obtaining co-complexed carbohydrate-protein structures using

experimental methods such as X-ray crystallography and NMR spectroscopy include,

production and purification of the protein, isolation or synthesis of the glycan, and co-

crystallization of the complex.60

Therefore, there is a long-standing interest in applying

theoretical modeling methods (automated docking) to aid in the characterization of the

3D structure of carbohydrate-protein complexes. 71,79-84

However, these methods also

have limitations. Automated docking faces the triple challenge of accurately predicting 1)

the ligand orientation in the binding site (pose); 2) the ligand conformation in the binding

site (shape); and 3) the relative affinity of the optimal pose (interaction energy). Ligand

internal energies are only approximately modeled within docking algorithms by mainly

considering energies associated with internal steric repulsion. Such an approximation

inherently degrades the accuracy of docking predictions as various ligand classes have

specific conformational properties. The glycosidic torsion angles between individual

monosaccharides forming glycans are crucial in defining their 3D structure and

dynamics. The accurate prediction of oligosaccharide conformations requires the

22

additional consideration of stereo-electronic properties responsible for the anomeric, exo-

anomeric, and gauche effects. 85

Their omission frequently leads to the incorrect

prediction of docked oligosaccharide conformations. 86-88

Docking programs treat interaction energy terms as empirically-adjustable components,

which may be tuned for a particular ligand class, such as carbohydrates. 89

Inclusion of

carbohydrate conformational energies in the docking energy function would likely

require reoptimization of the empirical weighting resulting in a non-transferable

carbohydrate-specific implementation of the algorithm. Alternatively, we wished to

develop a carbohydrate-specific conformational energy function which predicts

oligosaccharide energies independent of docking algorithm, and could potentially also be

employed to evaluate the conformational energies of experimentally-determined

oligosaccharide structures. We focused on modeling conformational properties intrinsic

to glycosidic linkages between pyranoses, with the criterion that the method should also

be generalizable to other carbohydrate ring forms, such as furanoses, as well as to other

linkages, such as 1-6, 2-3, 2-6, etc. Tetrahydropyran, and related analogs, have long been

employed as representative carbohydrates in quantum mechanical calculations for this

purpose. 90-97

The assumption being that any additional effects on the conformational

properties, for example from hydrogen bonding, overlay the intrinsic properties of the

linkages between pyran rings. Quantum mechanical calculations were employed on a set

of glycosidically-linked tetrahydropyrans representing all two-bond linkages between

pyranoses. The rotational energy profiles for these linkages were used to derive the

desired carbohydrate intrinsic (CHI) energy functions. Given a 3D oligosaccharide

23

structure, the CHI energy functions may be employed to estimate the energy arising from

any distortion of the glycosidic linkages, relative to their lowest energy conformations.

Because of the important roles of anti-carbohydrate antibodies in therapeutic and

diagnostic applications, and the challenges associated with experimentally defining their

3D structures, they have been the subject of numerous automated docking studies. 98-104

We chose six crystallographically-determined antibody-carbohydrate complexes to

evaluate the ability of CHI energy functions to improve predicted rankings of the docked

poses. These systems were selected based on the diversity of the antibody binding site

topologies (canyon, valley, crater), 105

and size variations of the carbohydrate ligands (tri-

to penta saccharides including linear and branched sequences).

Methods

System selection and docking protocol

Docking was performed using AutoDock 3.0.5 (AD3), 106

4.2 (AD4.2) 107

and Vina 1.1.2

(ADV). 108

Details of the reference systems, including PDB IDs, ligand sequences and

biological origin are presented in Table 4.1. In each case, the protein chain containing the

ligand with the lowest average B-factor was selected for docking. The carbohydrate

ligands in systems 1UZ8, 1S3K and 1M7I were built using the Carbohydrate Builder on

GLYCAM-Web (www.glycam.org). 109

The remaining ligands contain the non-standard

sugar residues abequose and 2-deoxy-rhamnose. Oligosaccharides containing these

deoxy residues were assembled using the tLEaP 110

module from the AMBER package

employing GLYCAM06i force field parameters and PREP residue structure files,

available for download at www.glycam.org (S4.11). The antibody structures were

http://www.glycam.org/


24

obtained from the PDB (www.rcsb.org). 111

All protein and ligand files were prepared for

docking using AutoDock Tools 1.5.4 (ADT). 107

The choice of partial charge was based

on the method used to calibrate the scoring functions of the individual docking programs;

Kollman charges 112

were added to the protein for docking with AD3, while Gasteiger

charges 113

were used to prepare proteins for docking with AD4.2 and ADV, and in each

case Gasteiger charges were assigned to the ligands. AutoDock distributes any non-zero

residual net charge across the macromolecule. Hydrogen atoms were added to the protein

using ADT, whereas GLYCAM hydrogens were retained in the ligands. A standard grid

box (dimensions: 26.25 x 26.25 x 37.50Å) was employed for all runs, centered relative to

the complementarity determining regions (CDRs) of the antibody (Figure 1a). Before

docking, the ligand was translated to the center of mass (CoM) of the CDRs but

maintained in the default GLYCAM orientation and conformation. VMD 109

was used for

molecular visualization and image-rendering.

http://www.rcsb.org/

25

Table 4.1 PDB IDs and ligand sequences employed in the study, including the shape

RMSD (SRMSD) values for the ligands generated by GLYCAM, relative to the

crystallographic ligands.

PDB ID:

Chain ID

(Resolution)

a

Ligand

(average B-

factor)b

Graphic representation of

the ligand

SRMSD

a,c

Biological

Origin

1MFA69,d

:

L/H

(1.7)

DAbepα1-

3[DGalpα1-

2]DManpα-

OMe

(25.1)

0.6

Mus

musculus

1MFD70,d

:

L/H

(2.1)

DAbepα1-

3[DGalpα1-

2]DManpα-

OMe

(30.1)

0.5

Mus

musculus

1UZ871

:

A/B

(1.8)

DGalpβ1-

4[LFucpα1-

3]DGlcpNAc

β-OMe

(41.8)

0.3

Mus

musculus β 4

α3

26

1M7D72

:

A/B

(2.3)

LRhapα1-3(2-

deoxy)LRhap

α1-

3DGlcpNAcβ

-OMe

(39.8)

0.3

Mus

musculus

1S3K73

:

L/H

(1.9)

LFucpα1-

2DGalpβ1-

4[LFucpα1-

3]DGlcpNAc

α-OH

(26.6)

0.4

Homo

sapiens, Mus

musculus

1M7I72

:

A/B

(2.5)

LRhapα1-

2LRhapα1-

3LRhapα1-

3DGlcpNAcβ

1-2LRhapα-

OMe

(35.4)

1.1

Mus

musculus

α 3α 3

β 4

α3

α2

α 3α 3α 2 β 2

= Mannose (Man) = Galactose (Gal) = Fucose (Fuc) = 2-Deoxy Rhamnose

= Abequose (Abe) = N-Acetyl Glucosamine(GlcNAc) = Rhamnose (Rha) = Aglycon (OME/OH)

27

aIn Å.

bIn Å

2.

cSRMSD defined in Section Shape, and pose, RMSD values.

d1MFA and

1MFD, consisted of the trisaccharide antigen from Salmonella serotype B. In 1MFD, the

trisaccharide is bound to a Fab antibody fragment, while in 1MFA the trisaccharide is

bound to a single-chain Fv fragment of the antibody. Although the antigen-binding site in

both the Fab and scFv fragments are essentially the same, and bound to the same

trisaccharide antigen, in the Fv-complex a water molecule has become inserted into an

internal hydrogen bond within the trisaccharide, leading to a perturbation of the

trisaccharide conformation.

In all ligands, the hydroxyl groups and glycosidic torsion angles were defined as

being flexible, while the C5-C6 bonds were restrained at the orientation present in the

reference crystal structures. The protein was maintained rigid. In AD3 and AD4.2, 100

runs of the Lamarckian Genetic Algorithm were employed, with 800,000 energy

evaluations per run, and a population size of 200. The translation step size was 2Å, while

the quaternion and dihedral step sizes were each 50°. The ADV source code was

modified to increase the total number of output structures from 20 to 100 (Supplementary

Material, S4.1). The maximum energy difference between the best and worst binding

modes was set at 10 kcal/mol while the exhaustiveness value was 8. The complete set of

docking parameters used is given in S4.2, S4.3 and S4.4.

Antibody and docking grid box alignment

Consistent grid box placement on the CDRs was achieved by positioning the box

relative to three points defined by specific CoM’s within the CDRs. The CDRs were

28

identified using the AbM definition, 114,115

based on both the Kabat 116

and Chothia 117

numbering schemes. To ensure consistent orientation of the antibody surface relative to

the box grid points, the protein coordinates were transformed with respect to a set of

internal coordinate axes, as shown in Figure 4.1. This protocol removes any issues arising

from the fact that the grid is cubic and not spherical, which can otherwise result in varied

regions of each antibody being included within the grid.

Figure 4.1 (a) Illustration of an antibody with its variable fragment (Fv) aligned to the

grid box. The yellow dot represents the CoM of the CDRs (0,0,0), and the green dot

represents the center of the grid box (0,0,11). (b) Aligned orientation of an antibody

antigen-binding fragment (Fab), with respect to the internal reference axes. The region in

red + pink represents the VH domain (CDRs (red) and framework regions (pink) of the

heavy chain) of the antibody, while the region in blue represents the VL domain (CDRs

(dark blue) and framework regions (cyan) of the light chain). The X-axis for the

alignment was defined by a vector passing through the CoM of the variable light chain

29

(VL domain, which contains the light chain CDRs and framework sequences), and the

CoM of the variable heavy chain (VH domain). The Z-axis was defined as a vector

normal to the X-axis, and passing through the CoM of the entire variable region, or

variable fragment (Fv). The antibody was then translated so that the CoM of the CDRs

was placed at the origin. The Y-axis was defined as a vector perpendicular to the XZ-

plane, and passing through the origin. The docking grid box was aligned to the internal

co-ordinate axes with its center offset from the origin by 11Å along the Z-axis, so as to

optimally encompass the CDR loops, while also permitting adequate volume for the

movement of the ligand during docking. Such a definition enabled the docking grid box

to be consistently aligned with respect to the CDRs.

Quantum mechanical calculations

Quantum mechanical calculations were performed using Gaussian09. 118

Structures were optimized at the HF/6-31G++(2d, 2p) level of theory, and single-point

energies calculated at the B3LYP/6-31G++(2d, 2p) level, consistent with the approach

used in the GLYCAM force field development. 94

Rotational energy profiles were

computed at 15° increments, allowing complete relaxation of other coordinates.

Shape, and pose, RMSD values

Pose RMSD (PRMSD) values were obtained by calculating the RMSD between

the ring atoms of the crystal ligand maintained in its native co-crystallised position, and

the corresponding ring atoms in the docked ligand maintained in its docked position

(Figure 4.2a). A pose with a PRMSD ≤ 2Å was considered to have been successfully

docked. Shape RMSD (SRMSD) values were obtained by first superimposing the crystal

and docked ligands followed by calculating the RMSD between their respective ring

30

atoms (Figure 4.2b). The SRMSD is a quantification of the dissimilarity in the 3D

conformations of the docked and crystal ligands, irrespective of their relative positions on

the protein surface.

Figure 4.2 PRMSD and SRMSD calculation. Shown in (a) and (b) are the PRMSD and

SRMSD, respectively, of a representative docked pose with respect to its crystal ligand.

(a) The PRMSD is the RMSD between the ring atoms of a representative docked

structure (white) and the corresponding crystal structure (black). (b) The SRMSD is the

RMSD value obtained after the docked structure (white) is superimposed on the crystal

structure (black).

Results and Discussion

Assessment of current docking methodologies

The six ligands extracted from their co-crystal structures could successfully be

docked back rigidly into the same structure of the protein (results not shown); this is an

outcome observed previously in studies of carbohydrate-protein docking. 103,119

Although

necessary, this docking experiment is not a sufficient prerequisite for any docking

method, since both molecules in a co-crystallized complex are already in the correct

conformation for binding, and do not require induced fit to occur during docking.

Pose RMSD = 5.5Å Shape RMSD = 1.1Åa

b

31

Independently-generated oligosaccharide 3D structures were employed as ligands

to test the performance of the docking methodologies in predicting bound conformations

of unknown carbohydrate-protein complexes. These starting structures were generated

using GLYCAM, known to produce low-energy conformations of carbohydrates; the

structures generated were found to be essentially equivalent to the same ligands found in

the co-crystal structures, as indicated by their SRMSDs (Table 1), and by a comparison of

their glycosidic torsion angles (S4.5). The average SRMSD between the crystallographic

ligands and theoretical structures was 0.53Å. The preliminary SRMSD analysis also

showed that the ligand in each antibody complex adopted a low energy conformation,

similar to that expected for the free ligand.

A second requirement for a general docking protocol is to permit the ligands a

reasonable level of freedom by allowing their glycosidic torsion angles and hydroxyl

groups complete flexibility. This approach enables comparisons to be made between

structures of the experimental and theoretical ligands, facilitating an assessment of the

impact of induced fit in the ligand on the outcome from docking analysis.

After docking, the φ (O5’-C1’-Ox-Cx) and ψ (C1’-Ox-Cx-Cx-1) glycosidic torsion angles

of the docked poses (Figure 4.4-I) were measured, and compared to the torsion angles of

corresponding linkages in the experimental co-crystal structure, and in the initial

GLYCAM theoretical structure. The analysis indicated that the distribution of the torsion

angle values amongst the docked poses frequently deviated considerably from both the

crystal and GLYCAM reference values (S4.5). Five examples of this analysis are

highlighted in Figure 4.3. Presented in Figure 4.3a is an instance in which all three

docking programs identified the lowest energy pose correctly (that is, with the glycosidic

32

angles falling within 30° of the corresponding torsion angles in the crystal structure).

Presented in Figure 4.3b, c, and d are cases in which only one of the docking programs

identified the correct pose, and finally an example is shown in which all three programs

failed to produce the correct torsion angles (Figure 3e). All of the methods were able to

generate some number of conformations that were within 30° of the crystallographic φ

and ψ values, however, these were often not the poses that had the best docking energy.

Thus, in a routine application of docking, they would not be identified as the most likely

(highest-ranked) pose. Overall, a very broad range of torsion angles (and therefore 3D

shapes) were generated by each algorithm, indicating a potential opportunity to employ a

conformational energy function as an additional filter to identify unlikely conformations

in the docking data.

33

Perc

enta

ge

of

stru

ctu

res

0

10

20

30

40

50

0

10

20

30

40

50

60

70

0

10

20

30

40

50

60

70

80

0

10

20

30

40

50

60

0

10

20

30

40

50

60

70

80

0

10

20

30

40

50

60

Expt.: 76.1

Expt.: 277.3 Expt.: 260.6

Expt.: 220.6

Expt.: 71.5 Expt.: 224.9

a

b

c

1UZ8

1MFD

1MFA

Perc

enta

ge

of

stru

ctu

res

φ ψ

0

10

20

30

40

50

0

10

20

30

40

50

60

0

10

20

30

40

50

φ [30 deg bins]

0

10

20

30

40

50

60

ψ [30 deg bins]

Expt.: 282.2 Expt.: 256.6

Expt.: 269.8 Expt.: 53.4

d

e

1S3K

1M7I

34

Figure 4.3 The φ and ψ angle distributions from 100 docked structures, for selected

linkages, as indicated by the dashed rectangle. Data are presented, in order, for AD3

(black bars), AD4.2 (white bars) and ADV (grey bars). The bin containing the

experimentally-determined values is highlighted with a light blue outline. The bin

containing the structure with the lowest docked energy is indicated as follows: AD3,

yellow; AD4.2, orange; ADV, green.

Development and validation of the CHI energy functions

Quantum mechanical conformational energies for a variety of model

disaccharides were obtained by employing tetrahydropyran (THP) as the minimal model

of a carbohydrate ring. Two THP molecules were used to model each glycosidic linkage

(1-2, 1-3 and 1-4) between pyranoses in the 4C1 and

1C4 configurations. Given that there

are two anomeric configurations (α and β), and two hydroxyl configurations (axial (ax)

and equatorial (eq)), associated with each linkage, the development of each CHI energy

function required the analysis of the glycosidic rotational energies of at least four

structures per linkage. For example, the different models used in modeling the 1-3

linkage are presented in Figure 4.4.

35

Figure 4.4 Representation of the 8 model disaccharides pertinent to the development of

CHI energy functions. The models depicting 1,2-linkages can be used to model 1,4-

linkages due to symmetry about the O5 atom.

Individual rotational energy profiles were determined for both the φ (O5’-C1’-Ox-

Cx) and ψ (C1’-Ox-Cx-Cx-1) glycosidic torsion angles of the various disaccharide models

(Figure 4.5). A similar approach has been employed by A. D. French to examine the

properties of various disaccharides and disaccharide analogs. 96,98,112,120

Models with

similar local symmetries gave rise to similar torsional energy profiles and were grouped

together. Average energy curves were then obtained for each group. Based on similar

energy profiles, two average energy curves for the Φ-linkage were computed: one, for all

models with an α-linkage (Figure 4.5a), and the other for all models with a β-linkage

(Figure 4.5b). Similarly, two average curves for the Ψ-linkage were computed, based on

division of the linkages into the following two groups: 1-2ax, 1-4ax, 1-3eq (Figure 4.5c);

and 1-2eq, 1-4eq, 1-3ax (Figure 4.5d).

I II

V VI

φψ

III IV

VII VIII

(eq)

(ax)

(ax)

(ax) (ax)

(ax)

(eq)

(eq)

(eq)(eq)

(eq)

(eq)

(eq)

(ax)

(ax)

(ax)

36

Figure 4.5 Individual (dashed lines) and average (solid line) rotational energy curves for

models (see Figure 4.4) whose linkages have similar local geometries.

The CHI energy functions (S4.6) were generated by fitting Gaussian expansions

(Eqn 4.1) to the average energy values for each of the curves in Figure 5 using the default

fitting routine in Gnuplot ver. 4.0 113

:

𝑓(𝑥) = ∑ 𝑎𝑖𝑁𝑖=1 𝑒

−(𝑥−𝑏𝑖)

2

𝑐𝑖

(Eqn 4.1)

where, N is the number of individual Gaussian functions used for each CHI energy

equation, x refers to the glycosidic torsion angle (φ or ψ), and ai, bi, and ci refer to the

37

magnitude, width, and mid-point of the distribution respectively. All curves (S4.7) were

adjusted to a minimum value of 0 kcal/mol, and may therefore be considered

conformational energy penalty functions. In order to apply the energy curves shown in

Figure 4.5 to linkages containing L-sugars, it is simply necessary to employ the mirror

images of the relevant energy curve.

The experimental distribution of glycosidic angles in carbohydrate-protein crystal

structures in the PDB provides an independent metric for comparison with the predicted

CHI energies. Glycosidic torsion angle data for over 13,000 glycosidic linkages were

extracted using the GlyTorsion web-tool 121

(S4.8), binned, and plotted against the

corresponding CHI energy curves (Figure 4.6). The comparison leads to the important

conclusion that the majority of proteins that recognize oligosaccharides select low energy

(solution-like) conformations of the glycosidic linkage. This has considerable importance

for carbohydrate docking, as it supports the view that biasing selection toward low energy

linkage conformations should enhance the likelihood of correct pose prediction.

38

Figure 4.6 Comparison of the CHI energy functions (solid line) to the glycosidic torsion

angle distributions of carbohydrates from experimental co-crystal structures (histograms).

Refinement of the docking results using the CHI energy functions

An assessment of the performance of each of the docking algorithms can be made

by plotting the difference between the conformations of the ligands, relative to that in the

co-complex (SRMSDs), against the predicted interaction energies. Ideally, poses with

correct ligand shapes should have lower interaction energies than seen for incorrect

shapes. Plots of interaction energy versus SRMSD were generated for AD3, AD4.2 and

ADV (Figure 4.7), and the coefficient of determination (R2) computed by linear

regression. In each case, only weak linear relationships between ligand shape and

0 60 120 180 240 300 360

0

2

4

6

8

10

12

02468

10121416

0 t

o 4

25

to 2

9

50

to 5

4

75

to 7

9

10

0 t

o 1

04

12

5 t

o 1

29

15

0 t

o 1

54

17

5 t

o 1

79

20

0 t

o 2

04

22

5 t

o 2

29

25

0 t

o 2

54

27

5 t

o 2

79

30

0 t

o 3

04

32

5 t

o 3

29

35

0 t

o 3

54

φ [deg]

ΔE

[k

ca

l/m

ol]

Perc

en

tag

e o

f st

ructu

res

φ [5 deg bins]

0 60 120 180 240 300 360

0123456789

02468

10121416

0 t

o 4

25

to 2

9

50

to 5

4

75

to 7

9

10

0 t

o 1

04

12

5 t

o 1

29

15

0 t

o 1

54

17

5 t

o 1

79

20

0 t

o 2

04

22

5 t

o 2

29

25

0 t

o 2

54

27

5 t

o 2

79

30

0 t

o 3

04

32

5 t

o 3

29

35

0 t

o 3

54

φ [deg]

ΔE

[k

ca

l/m

ol]

Perc

nta

ge o

f st

ructu

res

φ [5 deg bins]

0 60 120 180 240 300 360

0

1

2

3

4

5

6

02468

10121416

0 t

o 4

25

to 2

9

50

to 5

4

75

to 7

9

10

0 t

o 1

04

12

5 t

o 1

29

15

0 t

o 1

54

17

5 t

o 1

79

20

0 t

o 2

04

22

5 t

o 2

29

25

0 t

o 2

54

27

5 t

o 2

79

30

0 t

o 3

04

32

5 t

o 3

29

35

0 t

o 3

54

ψ [deg]

ΔE

[k

ca

l/m

ol]

Perc

en

tag

e o

f st

ructu

res

ψ [5 deg bins]

a b

c d0 60 120 180 240 300 360

0

1

2

3

4

5

6

02468

10121416

0 t

o 4

25

to 2

9

50

to 5

4

75

to 7

9

10

0 t

o 1

04

12

5 t

o 1

29

15

0 t

o 1

54

17

5 t

o 1

79

20

0 t

o 2

04

22

5 t

o 2

29

25

0 t

o 2

54

27

5 t

o 2

79

30

0 t

o 3

04

32

5 t

o 3

29

35

0 t

o 3

54

ψ [deg]

ΔE

[kca

l/m

ol]

Perc

en

tag

e o

f st

ructu

res

ψ [5 deg bins]

VII, VIII, VI, V III, IV, I, II

VII, III, V, II VIII, IV, VI, I

39

interaction energy were observed (R2 ≤ 0.19), and in the case of ADV a slight negative

slope was observed. Following rescoring of the docked poses by addition of the CHI

energy from each glycosidic angle to the docked energy of the structure, a clear

enhancement of the R2 values was observed, across all three programs (0.60 ≤ R

2 ≤ 0.68).

It should be reiterated here that none of the three docking algorithms include internal

rotational energies (torsion terms), and at best account for ligand internal energies in a

general steric sense. In the case of glycosidic linkages, this internal energy was found to

be less than approximately 0.2 kcal/mol. Thus, while some double counting of internal

energy is introduced by adding the CHI energy directly to the total docking energy, it

does not result in a significant error.

Figure 4.7 Scatter plots demonstrating improvement in the linear correlation between

SRMSD and docked energies after rescoring, for each of the three docking programs.

Points before rescoring are shown in dark grey and points after rescoring are shown in

light grey. Shown in the insets are SRMSD vs. docked energy plots of only the overall

lowest PRMSD structure for each of the six antibody systems before (dark grey) and after

R² = 0.09

R² = 0.60

-20

-10

0

10

20

30

40

50

60

0 2 4 6

AD3

R² = 0.19

R² = 0.68

-20

-10

0

10

20

30

40

50

60

0 2 4 6

AD4.2

R² = 0.12

R² = 0.66

-20

-10

0

10

20

30

40

50

60

0 2 4 6

ADV

SRMSD [Å]

En

erg

y[k

cal/

mo

l]

-20

0

20

0 1 2 3

-20

0

20

0 1 2 3

-20

0

20

0 1 2 3

40

(light grey) rescoring. The black rectangles in all insets enclose plot areas with SRMSD ≤

1 Å and energies ≤ 0 kcal/mol.

Prior to inclusion of the CHI energies, all poses from AD3 and ADV and a

majority of those from AD4.2 were predicted to have favorable (negative) interaction

energies; a result of the nearly horizontal slope of the SRMSD-versus-interaction-energy

curves. Addition of the CHI energies led to positive slopes and frequently unfavorable

interaction energies (positive) for high-energy ligand conformations. Therefore, an

intuitive interaction energy cut-off of 0 kcal/mol could be defined as a convenient filter

for eliminating the most unlikely structures.

For all six antibody complexes, the poses that are most similar to the co-crystal

(lowest PRMSD poses) also have CHI-adjusted interaction energies ≤ 0 kcal/mol, with

the single exception being the AD4.2 results for 1M7I (Figure 4.7b). All 100 docked

poses of that pentasaccharide received positive rescored interaction energies, reflecting

the sub-optimal quality of the conformations produced by AD4.2 for this system. In this

case, the pose closest to the co-complex displayed a PRMSD = 3.4Å, and a CHI-

corrected interaction energy of 14.7 kcal/mol; rescoring can’t correct for the absence of a

correct pose. Thus, the addition of the CHI energy to the docked energy scores provides

a cutoff (0 kcal/mol), below which all poses may be considered possible binders.

Presented in Figure 4.8, are the φ and ψ torsion angles for the docked poses from

all 6 antibody-carbohydrate systems, overlaid onto the corresponding CHI energy curves.

They provide a clear indication that the docking algorithms sample a disproportionately

large number of high-energy ligand conformations, particularly evident for AD4.2 and

ADV. Several low energy regions, particularly for the ψ angles, are also not well-

41

represented. In quantitative terms, for AD3 >45% of the poses contain ligands with at

least one bond in a high energy conformation (CHI energies > 2 kcal/mol); the numbers

for AD4.2 and ADV being 73 and 77 %, respectively.

Figure 4.8 Graphs showing the distribution of conformations produced by AD3 ( ),

AD4.2 ( ) and ADV ( ) plotted onto the corresponding CHI energy curves for each of

the representative linkage combinations; the curves are offset from each other by 6

kcal/mol.

Pose ranking after including the CHI energy:

In 9 of the 18 cases (6 antibodies x 3 docking algorithms), the top-ranked pose

remained the same before and after inclusion of the CHI energies (Figure 4.9), with an

0

2

4

6

8

10

12

14

16

18

20

22

0 60 120 180 240 300 360

ΔE

[k

cal/

mo

l]

φ [deg]

0

2

4

6

8

10

12

14

16

18

20

22

24

26

0 60 120 180 240 300 360

ΔE

[k

cal/

mo

l]

φ [deg]

0

2

4

6

8

10

12

14

16

18

20

0 60 120 180 240 300 360

ΔE

[k

cal/

mo

l]

ψ [deg]

0

2

4

6

8

10

12

14

16

18

20

0 60 120 180 240 300 360

ΔE

[k

cal/

mo

l]

ψ [deg]

VII, VIII, VI, V III, IV, I, II

VII, III, V, II VIII, IV, VI, I

0

2

4

6

8

10

12

14

16

18

20

22

0 60 120 180 240 300 360

ΔE

[k

cal/

mo

l]

ψ [deg]

0

2

4

6

8

10

12

14

16

18

20

22

0 60 120 180 240 300 360

ΔE

[k

cal/

mo

l]

ψ [deg]0

2

4

6

8

10

12

14

16

18

20

22

0 60 120 180 240 300 360

ΔE

[k

cal/

mo

l]

ψ [deg]

42

average SRMSD of 0.3Å. That the ranking of these poses did not change is unsurprising,

given that inclusion of the CHI energy function does not greatly alter the interaction

energy if the ligand is already in a low-energy conformation. However, in 7 of the 9

remaining cases, the SRMSD of the top-ranked pose improved by an average of 0.8Å,

after rescoring and reranking.

Prior to rescoring, from the 100 docking runs, poses with PRMSDs ≤ 1Å were

obtained in 17 out of the 18 cases, however, they were not necessarily lowest energy

poses, highlighting the challenge in recognizing a correctly docked pose amongst all

poses produced by a docking run. The impact of the CHI energy on the ability of

docking to both produce a correctly docked pose and rank it as the lowest energy

structure is indicated in terms of PRMSDs in Figure 4.9b. In several instances in which

the lowest energy pose produced by the docking program was incorrect (PRMSD > 2Å),

reranking after including the CHI energy led to lowest energy structures having both

PRMSD and SRMSD < 1Å.

43

Figure 4.9 a) SRMSDs of the lowest energy poses for all six systems from AD3, AD4.2

and ADV, before (dark grey) and after (light grey) rescoring. (b) PRMSDs of the lowest

energy poses for all six systems from all three docking programs, before (dark grey) and

after (light grey) rescoring.

The impact of rescoring on the conformations (SRMSDs) and orientations

(PRMSDs) of the top-ranked poses are presented for several examples in the following

section. Docking of the tetrasaccharide ligand onto the 1S3K antibody, using AD3

0.7

0.20.3

0.9

1.2

0.70.7

0.20.3

0.50.4

0.8

0.6

1.1

0.2

1.6

0.4

2.8

0.60.4

0.20.1

0.4

1.5

0.4

1.1

0.2 0.20.3

1.2

0.60.5

0.2 0.20.3

1.1

1.8

5.4

1

2.6

5.7

1

1.8

5.4

0.8

2.4

0.61.1

1.6

5.5

0.5

5

0.6

3.9

1.6

5.5

0.5 0.5 0.6

10.7

0.5

5.5

0.4 0.3 0.3

1.31.51

0.4 0.3 0.3

2

AD3

AD4.2

ADV

SR

MS

D o

f th

e lo

wes

t en

ergy

pose

[Å]

PR

MS

D o

f th

e lo

wes

t en

erg

y p

ose

[Å]

a b1MFA 1MFD 1S3K1UZ8 1M7D 1M7I 1MFA 1MFD 1S3K1UZ8 1M7D 1M7I

44

(Figure 4.10), and docking of the trisaccharide ligand onto the 1M7D antibody, using

AD4.2 yielded top-ranked poses with PRMSDs > 5Å. Both these structures obtained high

CHI energy scores of 7.0 kcal/mol and 11.6 kcal/mol, respectively. The lowest energy

poses after reranking had PRMSDs of 0.6Å (1S3K/AD3), and 0.5Å (1M7D/AD4.2), with

lower CHI energies of 1.0 kcal/mol and 0.9 kcal/mol respectively.

Figure 4.10 (a) AD3 lowest energy pose for 1S3K before rescoring (white) compared to

the crystal ligand (black); PRMSD = 5.7 Å. (b) Lowest energy pose after inclusion of the

CHI energy (white) compared to the crystal ligand (black); PRMSD = 0.6 Å.

Prior to rescoring, lowest energy structures obtained for 1MFD from all three

programs had PRMSDs > 5Å, with CHI energies > 4 kcal/mol for the poses from AD4.2

and ADV, and 1.3 kcal/mol for the pose from AD3 (Figure 4.11a, S4.9). After rescoring,

the lowest energy pose from AD3 remained unchanged, whereas, the corresponding pose

from AD4.2 was replaced by a pose with a lower CHI energy score, however, the newly

top-ranked pose still had a high PRMSD. Even though rescoring did not result in

a PRMSD = 5.7Å b PRMSD = 0.6Å

45

correctly docked lowest energy poses in either of these cases, it improved the overall

ranking of the lowest PRMSD structures (PRMSDs < 1Å) from 18 to 9 in AD3, and 13 to

2 in AD4.2 (S4.10). It should also be noted that the second lowest energy pose in AD3

(PRMSD = 1Å) remained unchanged in ranking after rescoring. In contrast, the relatively

high CHI energy score of the lowest energy pose from ADV contributed to this pose

being replaced by a correctly docked structure, with a lower CHI energy score, after

rescoring (Figure 4.11b, S4.9, S4.10).

The ligand in 1MFD is a branched trisaccharide comprised of mannose (Man),

galactose (Gal), and abequose (Abe). Abe is an analog of Gal (3,6-dideoxyGal), and the

anchoring residue for the trisaccharide in the crystal structure 32

(Figure 4.11c). An

examination of the docking results indicated that all three docking programs consistently

generated better scores for poses in which the Gal residue replaces Abe in the binding site

(Figure 4.11d), with little increase in the SRMSD for the incorrect pose. That is, the

trisaccharide can fit equally well into the binding site in the two possible orientations

effectively flipped by 180°. The theoretical preference for Gal in the binding site appears

to be a consequence of its ability to make additional hydrogen bonds with the protein

relative to the more hydrophobic Abe. This observation suggests that the balance

between contributions from hydrogen bonding versus hydrophobic interactions is

imperfect in these docking algorithms. In addition, the 1MFD crystal structure reveals

the presence of a water molecule within the binding pocket, mediating hydrogen bond

interactions between the antibody and the ligand’s Abe residue. Given that explicit

waters are not generally included in docking studies, the algorithms may be

compensating for their absence by placing the more polar Gal inside the binding pocket.

46

This conclusion is supported by the observation that one of the hydroxyl groups of the

Gal residue (O-4) occupies a position in close proximity to this water molecule (PDB

residue name: WAT 601) originally found in the crystal complex (Figure 4.11d).

The flipping of the carbohydrate ligand that was observed in 1MFD, was not

observed in the case of its scFv counterpart (1MFA); instead, all three lowest energy

poses (AD3; AD4.2; ADV) for 1MFA had orientations similar to that of the crystal

ligand (PRMSDs < 2 Å). Since the ligands being docked to both antibodies are identical,

we can infer that the two binding sites are not identical (Table 1). To facilitate a better

understanding of the difference between the two binding pockets, their volumes were

calculated using Fpocket 122

; the volume of the 1MFA binding pocket was calculated to

be 423.01Å3, while that of 1MFD was 582.51Å

3. The 1MFD binding pocket, being

150Å3 larger, is able to accommodate the flipped orientation of the Gal residue, whereas,

the smaller 1MFA binding pocket is not as accommodating of this ligand orientation, due

to possible steric clashes. This potential steric clash was confirmed by superimposing the

coordinates of the Gal residue onto those of Abe in 1MFA (Figure 4.11e, f).

47

ba

PRMSD = 5.5Å PRMSD = 1.0Å

GAL

ABE

O4

d

WAT 601:O

GAL

ABE

c

WAT 601:O

fe O3

O6

48

Figure 4.11 Docking the trisaccharide to the Salmonella antibody (in 1MFD and 1MFA).

(a) Lowest energy pose from ADV for 1MFD before rescoring (white) compared to the

crystal ligand (black); PRMSD = 5.5Å. (b) Lowest energy pose from ADV for 1MFD

after rescoring (white) compared to the crystal ligand (black); PRMSD = 1.0Å. (c) and

(d) show the 1MFD antibody in transparent surface representation along with the oxygen

atom belonging to the water molecule from the crystallographic co-complex, WAT 601;

in (c) the crystal ligand from 1MFD is shown in CPK representation, and in (d) the

lowest energy pose from ADV for 1MFD before rescoring (in CPK representation)

showing the Gal residue replacing Abe within the binding pocket is shown. (e) The Gal

residue from the ligand in 1MFD (in van der Waals representation) after being

superimposed onto the Abe residue from the ligand in 1MFA is shown within the 1MFA

binding site. A cross-section of the 1MFA antibody is represented as a transparent surface

with potential steric clashes visible between the Gal residue and the antibody. (f) Same as

(e) but with the 1MFA antibody represented as an opaque surface thus more clearly

depicting potential steric clashes between the O-3 and O-6 groups of the Gal residue and

the interior of the binding pocket.

The known challenge associated with docking large, flexible molecules using

AD4.2 108,123

was encountered with the linear pentasaccharide ligand in 1M7I. None of

the 100 poses were correctly docked (all PRMSDs > 2Å); the lowest energy pose had a

PRMSD of 3.9Å and a CHI energy of 18.5 kcal/mol (Figure 12a). After rescoring, the

lowest energy pose had a considerably improved CHI energy score of 4.3 kcal/mol,

however, it still had a high PRMSD (Figure 12b). It has been suggested that the

maximum number of rotatable bonds be limited to 10 when employing AD4.2.123

The

49

ligand in 1M7I has nearly double that number at 19, making this quite a challenging

system to dock using AD4.2. In AD3, although only 4 of the 100 output poses were

correctly docked, they occupied the top 4 ranks, before and after rescoring. In ADV, 7 of

the 100 output poses were correctly docked, of which 5 were amongst the 8 top-ranked

poses, before and after rescoring. Although both AD3 and ADV seem to have had

difficulty in finding the correct pose for the pentasaccharide, whenever such a pose was

found, both programs scored them favorably. As these poses also had low SRMSD

values, they were identified as lowest energy poses after rescoring.

Figure 4.12 Docking to the antibody in 1M7I using AD4.2. (a) Lowest energy pose

before rescoring (white) compared to the crystal ligand (black); PRMSD = 3.9Å. (b)

Lowest energy pose after rescoring (white) compared to the crystal ligand (black);

PRMSD = 10.7Å.

a PRMSD = 3.9Å b PRMSD = 10.7Å

50

Conclusions

A solution to a major challenge encountered in flexible carbohydrate docking has

been presented in this study by the development of intrinsic energy terms for

carbohydrates, which quantify the relative energy of their glycosidic torsion angles. In 7

of the 18 cases (6 systems x AD3/AD4.2/ADV), the lowest energy poses generated by the

docking programs had PRMSDs > 2Å, however, after rescoring using the CHI energy

functions, the PRMSDs in 4 of the 7 cases improved, with correctly docked poses

(PRMSDs ≤ 2Å) replacing incorrect poses, and increasing the total count of correctly

docked lowest energy poses to 15 out of 18. Rescoring also led to lowest energy poses

that had SRMSDs ≤ 1Å in 16 out of 18 cases, and SRMSDs ≤ 1.5Å in the two remaining

cases. Among the three docking programs employed in this study, ADV was most

successful in producing and appropriately ranking the correct ligand pose, with a success

rate of 83% before rescoring, and 100% after rescoring. Inclusion of the CHI energy term

in rescoring docked poses enabled the filtering of poses based on their conformations,

increasing the chances of finding the correct pose amongst all output poses generated.

In most docking applications, locating the correctly docked pose amongst the

numerous output poses largely depends on the ranking of these poses based on their

energy scores. The CHI energy functions may in principle be used in the assessment of

carbohydrate structures obtained from any theoretical or experimental method. By

favoring energetically reasonable ligand conformations, the CHI energies significantly

improve the pose ranking for structures obtained from docking algorithms, making the

rescored energy a better predictor of the quality of the docked pose. This improvement

was observed across all three programs indicating that the CHI energy functions may be

51

employed independently of the scoring functions. The CHI energy functions could also

be incorporated directly within docking programs as a component of the scoring function,

although that might require a reoptimization of the scoring functions. Application to

crystallographic data leads to the conclusion that proteins primarily recognize low-energy

conformations of carbohydrates. This final observation has considerable relevance to the

design of carbohydrate-based inhibitors and vaccines.

Individual Author Contributions

Anita K. Nivedha: Authored portions of the paper and prepared figures for the paper;

designed docking protocols and the antibody alignment algorithm; performed the

dockings; developed the CHI energy functions and applied the functions to docking

results; provided tools for analysis, analyzed and interpreted the data.

Spandana Makeneni: Authored portions of the paper; co-designed docking protocols and

the antibody alignment algorithm; performed binding site volume calculations and

provided tools for the analysis of data.

B. Lachele Foley: Contributed to the design of the antibody alignment algorithm and the

development of the CHI energy functions.

Matthew B. Tessier: Contributed to the design of preliminary docking protocols,

provided PREP files for the non-standard sugar residues, and scripts for the collection of

quantum mechanical data.

Robert J. Woods: Authored the paper; conceived and designed the experiment, and

contributed to the analysis and interpretation of data.

52

5. VINA-CARB: IMPROVING GLYCOSIDIC ANGLES DURING

CARBOHYDRATE DOCKING

_____________________________

A. K. Nivedha, D. F. Thieker, R. J. Woods. Accepted by J. Chem. Theory Comput.

Reprinted here with permission of publisher.

53

Abstract

Docking programs are primarily designed to dock rigid, drug-like fragments onto

macromolecules, and frequently encounter issues predicting more flexible carbohydrate

molecules. The primary source of flexibility within a carbohydrate is the glycosidic

linkage. Previous efforts have developed Carbohydrate Intrinsic (CHI) energy functions

that reflect glycosidic torsion angle preferences. The following work represents the

incorporation of the CHI-energy functions into the AutoDock Vina (ADV) scoring

function, subsequently termed Vina-Carb (VC). Carbohydrate models generated by VC

are penalized according to the CHI-energy profiles. Two new, user-adjustable parameters

have been introduced; namely, a CHI-energy weight term (chi_coeff) that affects the

magnitude of the CHI-energy penalty, and a CHI-cutoff term (chi_cutoff) that negates

CHI-energy penalties lower than the specified value. A dataset consisting of 76 protein-

carbohydrate complexes and 29 apoprotein structures were used in the development of

VC, including antibodies, lectins and carbohydrate binding modules. Accounting for the

intramolecular energies of carbohydrate ligands produced docked models that better

reflected the natural configuration on the protein surface. VC produced accurate

structures ranked within the top five models amongst 68% of the systems tested,

compared to a success rate of 49% for ADV. Finally, a single enzyme system was

employed in order to demonstrate the potential application of VC to proteins which

distort glycosidic linkages of carbohydrate ligands upon binding. VC represents a

significant step towards accurately predicting protein-carbohydrate interactions. In

addition, the approach we present is generalizable to any other class of ligands that

populate multiple well-defined conformational states.

54

Introduction

Carbohydrates represent one of the four major classes of organic macromolecules,

and are involved in a range of processes that are critical for proper cellular function.9

Structural characterization of glycans and their binding partners (i.e. antibodies, lectins,

carbohydrate binding modules, enzymes, etc.) has advanced our understanding of the

molecular recognition process; however, obtaining three dimensional structures of these

interactions is particularly challenging due to the inherent flexibility of glycans.124,125

This flexibility stems from either two or three freely rotatable bonds constituting the

glycosidic linkages. 126

In contrast, rotation about the peptide backbone is restricted by

the partial double-bond character of amide linkages. 127

As a result of the increased

molecular motion present within carbohydrates, the majority of glycan-binding partners

are not resolved in complex with their substrate. 59

Theoretical methods offer an

alternative means for studying intermolecular glycan interactions that can complement

experimental results. 94,95,128

Molecular docking is one such method that aims to predict various modes of non-

covalent interaction between a macromolecule and a ligand, ranking the results based on

binding energies. 129

In general, docking energy functions are a summation of the energy

contributions from various non-bonded interactions in protein-ligand complexes such as

electrostatics, van der Waals, hydrogen bonding, and hydrophobic interactions.129,130

These semi-empirical scoring functions are generalized for small molecule ligands with

limited flexibility and often produce unnatural glycosidic angles when docking

carbohydrates.48

This distortion is especially pronounced for large oligosaccharides

which contain a higher number of degrees of freedom. 108

55

Previous studies have customized docking scoring functions for carbohydrates by either

re-calibration of existing terms 89

or the inclusion of additional energy terms which model

specific protein-carbohydrate interactions 131

. For example, the SLICK scoring function

131 within BALLDock

132 includes an energy term for CH/π stacking interactions, and was

calibrated using a set of carbohydrate-lectin complexes. In contrast, the previously

reported CHI-energy functions 48

assign relative energies to the torsion angles of the

glycosidic linkages. The CHI-energy functions were derived quantum mechanically

based on the torsional energy profiles of several tetrahydropyran-based disaccharide

models. Although the functions were developed using unbound carbohydrate models, the

distribution of glycosidic torsion angles in protein-carbohydrate complexes obtained from

the Protein Data Bank (PDB) has corresponded with the CHI-energy profiles.48

The

conformational similarity between bound and unbound carbohydrates suggested that the

CHI-energy functions would perform well within a docking program. The CHI energy

functions are transferable between scoring functions, and could also be employed in the

evaluation and refinement of carbohydrate conformations obtained using experimental

methods.

Vina-Carb (VC) represents the incorporation of the CHI-energy functions 48

into

the AutoDock Vina 1.1.2 (ADV) scoring function. 108

The CHI-energy is calculated for

each carbohydrate pose generated by VC, and added to the respective intermolecular

interaction energy. Energetically unfavorable carbohydrate conformations generated by

the program are penalized, and often rejected, within the Metropolis subroutine. The user

can control how the CHI-energy penalty is applied in VC by adjusting the values of two

input variables: a CHI-energy coefficient term (chi_coeff) and an energy cutoff value

56

(chi_cutoff). Changing the CHI-energy coefficient term affects the relative magnitude of

the CHI-energy penalty compared to other energy terms within the ADV scoring

function. The CHI-energy cutoff variable prevents penalization of poses with

conformations which deviate from the ideal due to induced fit. Models with glycosidic

torsion angles that would receive energetic penalties less than the CHI-energy cutoff

value are reduced to zero. Here we expand the previous set of CHI-energy functions to

include the ω-angle associated with glycosidic linkages to the O6 atom.

Unlike BALLDock/SLICK, which was calibrated on a set of lectin-sugar

complexes, the optimum settings for Vina-Carb were determined using a set of 72

carbohydrate ligands crystallized with antibodies, lectins or carbohydrate binding

modules from the PDB. Ligands within the development set range from a disaccharide to

an undecasaccharide in length. A test set consisting of apo-proteins of receptors from the

development set was used to examine and compare the optimized settings of VC with the

original ADV. Finally, an application of VC to an enzyme system is demonstrated.

Methods

File Preparation

Antibody, lectin and CBM complexes containing carbohydrate ligands were

collected from the Protein Data Bank (PDB) and employed as the Development Set for

VC. Details about the test systems used are provided in the S5.1. When duplicate protein

chains were present in the PDB file, the chain corresponding to the lowest average B

value of the corresponding ligand's atoms was used for docking. The apo-protein

structures were employed as a Test Set, and the average B value of the individual protein

57

chains was used to select between duplicate chains. The antibodies were aligned to the Z-

axis based on their CDR regions, as described previously48

The protein and ligand co-

ordinates were formatted for docking with AutoDock Tools (ADT) 107

using the protocol

described previously 48

. Each docking event consists of a rigid macromolecule and a

flexible ligand. Unless otherwise noted, all of the rotatable bonds within the ligand were

flexible except for carbon-carbon and carbon-nitrogen bonds.

Docking Parameters

The dimensions and centers of the grid boxes are described in the SI. The

maximum number of binding modes was limited to 20, and the energy range set at 10

kcal/mol. Two parameters have been added to the scoring function in VC that can be

adjusted by the user: 1) chi_coeff, a weighting term for the CHI energies that augments

the strength of the energetic penalty applied to the glycosidic torsion angles within the

ligand (Figure 5.1a), 2) chi_cutoff, a parameter that introduces a flat-bottom potential by

neutralizing the penalty assigned by the CHI energy curves to those glycosidic torsion

angles which would receive a penalty less than the cutoff value (Figure 5.1b). For

example, employing a chi_coeff of 2 is represented in the paper as VC2, and employing

both a chi_coeff of 2 and chi_cutoff of 4 is depicted as VC2|4.

58

Figure 5.1 a.) The effect of applying CHI-coefficient values of 1 (solid line), 2 (dashed

line) and 5 (dotted line) to the original VCΦ|β curve. b.) The effect of applying a CHI-

cutoff value of 2 to the original CHIΦ|β curve (VC1|2).

Analysis

The results of each ADV docking experiment are variable due to the random seed

implemented within the genetic algorithm. In order to account for this variation, the

results from multiple independent docking experiments were averaged for each system

tested. Unless otherwise stated, each Root Mean Square Deviation (RMSD) provided in

this article represents the average result of 10 docking events. This method of analysis

aims to eliminate spurious results and allows for a more accurate comparison between

ADV and VC. To increase comparability, the 10 random seeds generated for each of the

10 ADV docking experiments were explicitly defined for the 10 corresponding VC

docking events.

Docking accuracy is determined through two types of RMSDs; namely, pose and

shape RMSD. Both RMSDs compare the location of the docked ligand's ring atoms (C1,

0

5

10

15

20

25

30

35

40

45

0 60 120 180 240 300 360

ΔE

[k

ca

l/m

ol]

ϕ [deg]

0

1

2

3

4

5

6

7

8

9

0 60 120 180 240 300 360

ΔE

[k

ca

l/m

ol]

ϕ [deg]

b.)a.)

59

C2, C3, C4, C5, and O5) to that of the crystal structure's equivalent atoms. A pose RMSD

(PRMSD) represents the deviation of the docked model from the location of the reference

structure in space. In this manner, the PRMSD represents the accuracy of docking the

ligand to the receptor. In contrast, the shape RMSD (SRMSD) uses least squares fitting to

compare the docked model to the reference structure irrespective of their locations in

space. The SRMSD represents the deviation of the docked model’s shape from that of the

reference structure. The rmsd and match functions within Chimera 133

were used to

calculate the PRMSD and SRMSD values. The PRMSDmin(5) and PRMSDmin(20)

represents the minimum PRMSD from the top 5 ranked and top 20 models respectively,

averaged across the 10 docking events. The SRMSDavg was calculated by averaging the

SRMSD values for each of the 20 models from the 10 docking experiments. The standard

deviation values were calculated as the standard deviation of a sample.

Images of the molecules were prepared using the Visual Molecular Dynamics

(VMD) program. 134

The ligands are colored according to the source of the file. Crystal

structures are colored blue, and output from ADV and VC are colored yellow and green,

respectively. Additionally, each carbohydrate ring is colored according to whether the

CHI energy penalty is applied to the surrounding Φ/Ψ values. The 1C4 and

4C1 chair

conformations are colored green, and other conformations that would be skipped by VC

are colored red. Ring conformations have been determined according to the Cremer-

Pople definition. 135

60

CHI Energy Integration

Parsing the Ligand: The atom names for carbohydrate residues within the ligand file must

follow established atom naming to be identified by the CHI energy scoring function of

VC. While the carbohydrate ligand file is parsed within parse_pdbqt.cpp, information

about the atoms and residues of the ligand is stored within the data structure ligand_info.

Relevant glycosidic linkages, namely (1,2), (1,3), (1,4) and (1,6) linkages are detected.

Since the CHI energy functions were originally developed for chair conformations of

oligosaccharide rings, it is necessary to determine the conformations of the residues

comprising the input oligosaccharide ligand before the application of the energy

functions.

Determination of Ligand Carbohydrate Ring Conformation: The ring conformations are

identified based on a modified version of the Best-Fit-Four-Membered-Plane (BFMP)

method 136

Selections made about the appropriate CHI energy functions to be used for

each linkage are stored in the data structures glyco_info and ligand_glyco_info.

According to the BFMP method, a carbohydrate ring must fit three criteria in order to be

classified as a 1C4 or

4C1 sugar; namely, the internally defined

2d5,

4d1, and

6d3 or

5d2,

1d4,

and 3d6 conformations, respectively. When the program encounters carbohydrate

conformations for which the CHI_energy functions are not applicable, it simply ignores

the associated linkages. In certain protein-carbohydrate systems the sugar rings are only

slightly distorted from the standard 4C1 and

1C4 conformations and still merit application

of CHI energy penalties. To accommodate such minor conformational distortions of the

carbohydrate ring, in the current implementation of the BFMP method, a saccharide is

classified as a 1C4 or a

4C1 sugar if any 2 of the 3 criteria can be identified for the ring.

61

Scoring Individual Ligand Poses: Each docking run consists of a certain number of steps,

determined heuristically. Each step is characterized by a random perturbation and a local

optimization, which is followed by an evaluation of the generated pose. The random

perturbation is performed by either transposing or rotating the ligand, or by adjusting any

of the flexible torsion angles. A new function, eval_chi has been introduced within

model.cpp in order to calculate the CHI energy penalty for each ligand pose. This

function uses data from ligand_glyco_info to calculate the CHI energy penalty for every

oligosaccharide pose generated. The CHI energy penalty calculated for each glycosidic

torsion angle within eval_chi is modified according to two user-adjustable parameters

(chi_coeff and chi_cutoff). The total CHI energy of a given oligosaccharide is the

summation of the CHI energies for each glycosidic torsion angle comprising the model,

which is combined with the interaction energy natively calculated by ADV within the

function eval_deriv. This composite energy is implemented within the metropolis_accept

function in monte_carlo.cpp to calculate the acceptance probability of each ligand pose.

A ligand pose with unfavorable glycosidic torsion angles would be penalized by the

application of CHI energies, thereby increasing its probability of rejection within the

function.

Log file: A VC log file (called, VC_log.txt) is written out for each execution of the

program and contains information about the glycosidic linkages identified by the program

and details about whether CHI energy penalties were applied to each linkage.

62

Results & Discussion

Implementation of the CHI energy function aims to improve docking accuracy by

correcting the shape of the carbohydrate ligand. In order to determine whether correcting

the ligand shape would be sufficient to produce an accurate model for a complex, each of

the crystal structures were initially subjected to a unique docking procedure in which the

glycosidic linkages of the ligand were restrained to the angles that were present in the

crystal structure. Of the 87 crystal structures selected for evaluation, 11 failed this initial

positive control. Failure during this step suggests that alternative modifications to the

ADV scoring function would be necessary to produce accurate models for these 11

complexes; therefore, optimization of VC continued with the remaining 76 structures.

Optimization of the CHI-Energy Coefficient

Incorporation of the CHI-energy term into the ADV scoring function immediately

produced output carbohydrate conformations comparable to X-ray crystal structures

(ADV vs. VC1 in Figure 5.1a). However, since the CHI-energy term was developed

independently of the ADV scoring function, it may be disproportionate in magnitude.

Therefore, a range of CHI-energy coefficients (1, 2, 3, 4, 5, 10, and 50) were examined.

The effect of varying the CHI-coefficient for a set of 14 antibody-carbohydrate systems is

reported in Figure 5.1. Each CHI-coefficient value led to poses with improved ligand

conformations (lower SRMSDavg(20) values) than those produced with ADV. The CHI-

coefficient imposes a higher penalty for torsions outside of the local minima of the CHI

energy curves, thereby attenuating the production of incorrect oligosaccharide

conformations during docking. Increasing the magnitude of the CHI-energy contribution

generally led to a corresponding decrease in the SRMSDavg(20). This trend was

63

particularly noticeable for systems containing more than 5 carbohydrate residues, due to

the increasing number of glycosidic linkages that were affected (Figure 5.1a).

Interestingly, the largest CHI-coefficient (CHI50) increased the SRMSDavg(20) for ligands

containing less than 4 carbohydrate residues. This result is most likely due to an induced

fit that occurred upon ligand binding, which caused the glycosidic linkages of the

crystallized ligand to deviate from the theoretical minima that are heavily biased by

CHI50.

Notably, the accuracy of the pose (Figure 5.1b) diminished as the CHI

contribution became increasingly large (i.e. VC10 and VC50), despite producing ligand

conformations similar to the reference structure (Figure 5.1a). This suggests a problem

associated with pose identification. To demonstrate this, the lowest energy model

generated from flexibly docking the 3C6S 35

ligand using VC50 (SRMSD = 1.13 Å;

PRMSD = 23.8 Å) was rigidly re-docked. Results from ten docking experiments

consistently produced an accurate model with a PRMSDmin(5) of 1.98 Å. Rigidly re-

docking the ligand allowed the docking scoring function to segregate poses solely based

on intermolecular interactions between the protein and ligand. However, during flexible

docking, the harsh penalty applied by VC50 eliminated any model that deviated from the

minima of the energy curve. Since very few of the generated models met this criterion,

only those models that were unaffected by the CHI-energy penalty remained, including

those positioned incorrectly. The intramolecular forces imparted by a high CHI-energy

penalty appear to outweigh contributions from intermolecular interactions between the

protein and ligand.

64

The effect of over-weighting the CHI contribution suggests that a fine balance

between inter- and intramolecular interactions is required to successfully dock

carbohydrate ligands. As a result, lower coefficients of the CHI-energy function (less

than 4) produced more accurate models by enabling the generation of favorable

glycosidic torsion angles without overshadowing the intermolecular forces involved in

ligand binding. The performance of ADV and VC are comparable for systems containing

di-, tri-, tetra- and pentasaccharide ligands; however, VC outperforms ADV with regards

to larger oligosaccharide ligands. For example, the improvements in PRMSDmin amongst

the 5 top-ranked poses produced by ADV and VC1 for 1MFB34

, 3BZ435

, and 3C6S were

1.1, 2.0, and 2.27 Å, respectively. Using VC1 and VC2 produced acceptable PRMSDmin(5)

poses for 13 out of 14 systems. As a result, only CHI coefficients of 1 or 2 were

considered for subsequent experiments.

65

Figure 5.2 Assessment of docking to 14 antibody systems with ADV and various CHI-

energy coefficients of VC a.) SRMSDavg amongst the 5 top-ranked poses b.)

PRMSDmin(5).

Optimization of the CHI-Energy Cutoff

The CHI-energy functions were originally developed by modeling the rotational

properties of disaccharide analogs in vacuo. The minima of the CHI-energy curves

generally corresponded to experimentally-determined oligosaccharide structures as

determined crystallographically; 48

however, oligosaccharides often undergo

conformational changes resulting from induced fit, which may cause glycosidic linkages

to deviate from idealized low energy values. Rather than defining the well bottom in

terms of a range of allowable torsion angles, the limits are defined by CHI-energy range.

The chi_cutoff term negates the penalty associated with glycosidic linkage conformations

surrounding the absolute energy minima in the CHI-energy curves. Use of a flat-

bottomed CHI-energy potential allows induced fit to occur with no internal energy

di- tri- tetra- penta- hepta- deca- undeca-

0

1

2

3

4

5

6S

RM

SD

avg

(20)[Å

]

ADV

VC1

VC2

VC3

VC4

VC5

VC10

VC50

0123456789

1011121314

2G12 291-2G3-A SYA/J6 scFv SE155-4 Fab SE155-4 SE155-4 HU3S193 BR96 BR96 SE155-4 SYA/J6 SE155-4 F22-4 F22-4

1OP3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K 1CLY 1CLZ 1MFC 1M7I 1MFB 3BZ4 3C6S

PR

MS

Dm

in(5

)[Å

]

ADV

VC1

VC2

VC3

VC4

VC5

VC10

VC50

ADV

VC1

VC2

VC3

VC4

VC5

VC10

VC50

di- tri- tetra- penta- hepta- deca- undeca-

0

1

2

3

4

5

6

SR

MS

Davg

(20)[Å

]

ADV

VC1

VC2

VC3

VC4

VC5

VC10

VC50

0123456789

1011121314

2G12 291-2G3-A SYA/J6 scFv SE155-4 Fab SE155-4 SE155-4 HU3S193 BR96 BR96 SE155-4 SYA/J6 SE155-4 F22-4 F22-4

1OP3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K 1CLY 1CLZ 1MFC 1M7I 1MFB 3BZ4 3C6S

PR

MS

Dm

in(5

)[Å

]

ADV

VC1

VC2

VC3

VC4

VC5

VC10

VC50

ADV

VC1

VC2

VC3

VC4

VC5

VC10

VC50

a.

b.

66

penalty. Within this region, the pose is scored solely on the basis of the intermolecular

interactions dictated by the native ADV scoring function.

To identify the optimal setting that permits an acceptable range of glycosidic

angles, a CHI-energy cutoff was evaluated at integer values from 1 to 5 kcal/mol (Table

S5.2). Optimal results were obtained for each CHI-coefficient (VC1 and VC2) using CHI-

cutoff values of either 1 or 2 kcal/mol (VC1|1, VC1|2, VC2|1 and VC2|2). These four settings

of VC identified acceptable binding modes ranked within the top 20 poses for each of the

14 antibody systems, and ranked within the top 5 poses for 13 of the 14 antibodies. In

order to examine the applicability of VC to protein-carbohydrate complexes other than

antibody systems, as well as to further optimize the VC parameters, the study was

extended to 62 additional carbohydrate-protein complexes, including carbohydrate

binding modules (CMBs), lectins, and enzymes. The best performance was attained using

a CHI-coefficient of 1 and a CHI-cutoff of 2 (VC1|2), which generated an acceptable pose

amongst the top 5 models for 75% of the systems, compared to a 56% success rate for

ADV (Table 1). Although each of the 76 systems passed a positive control in which the

reference structure was successfully docked with rigid glycosidic linkages, VC1|2 was

unable to identify an acceptable pose for 25% of these systems. Challenges which may

have prevented VC from identifying correct models will be discussed in the following

section.

67

Figure 5.3 Comparison of the VC1|2 (dotted line) and VC2|1 (solid line) CHIΦ|β curve to

the distribution of glycosidic linkages in carbohydrate crystal structures in the PDB. The

bottom X-axis and left Y-axis correspond to the histogram which depicts the distribution

of PDB structures, while the top X-axis and right Y-axis correspond to the CHI-energy

curves.

Similar to the analysis performed by Nivedha et al.48

, the carbohydrate crystal

structures in the PDB were surveyed using the GlyTorsion tool from

www.glycosciences.de 137

in order to calculate the percentage of glycosidic linkages

exempted from penalization as a consequence of applying VC1|1, VC1|2, VC2|1, and VC2|2.

At VC1|2, the CHI energy penalty for 87% of glycosidic linkages in the PDB was nullified

(Figure 5.3), compared to values of 77%, 62% and 76% for VC1|1, VC2|1 and VC2|2,

respectively. Therefore, using VC1|2 allowed for the maximum flexibility of glycosidic

0 60 120 180 240 300 360

0

2

4

6

8

10

12

14

0

2

4

6

8

10

12

14

16

0 t

o 4

25 t

o 2

9

50 t

o 5

4

75 t

o 7

9

100 t

o 1

04

125 t

o 1

29

150 t

o 1

54

175 t

o 1

79

200 t

o 2

04

225 t

o 2

29

250 t

o 2

54

275 t

o 2

79

300 t

o 3

04

325 t

o 3

29

350 t

o 3

54

ϕ [deg]

ΔE

[kc

al/m

ol]

Perc

nta

ge o

f str

uctu

res

ϕ [5 deg bins]

http://www.glycosciences.de/

68

linkages without penalization by the CHI-energy functions. Although VC1|2 was selected

as default, the alternatives (VC1|1, VC2|1 and VC2|2) were nearly as efficient in binding

mode prediction (Table 5.1); therefore, the CHI-cutoff and CHI-coefficient parameters

remain user-adjustable.

Table 5.1 Comparison between ADV and VC at the four settings of CHI-coefficient and

CHI-cutoff.

System types No. of Systems

Success Rate* [%]

ADV VC1|1 VC1|2 VC2|1 VC2|2

Antibodies 14 79 100 93 100 100

Lectins 42 55 64 71 67 67

CBMs 20 35 50 60 55 45

Totals 76 56 71 75 74 71

*Success Rate is defined as producing an accurate binding mode (PRMSDmin(5) < 2 Å)

Performance with ligands containing 1-6 linkages

In total there are 12 systems, consisting of lectins and CBMs, with ligands containing one

or more 1,6 glycosidic linkages. The success rates for these systems (producing an

accurate pose prediction amongst the top-5 poses) for ADV and VC1|2 were 25% and

42% respectively (Table 5.2).

69

Table 5.2 PRMSDmin(5) produced by ADV and VC1|2 for the 12 test systems with ligands

containing 1,6-linkages.

PDB

ID

1jpc 1k9i 1tei 1zhs 2vco 4gk9 2vuz 2yfz 1oh4 2ypj 2j73 2i74

ADV 5.0 4.7 1.1 1.4 3.5 4.3 3.4 3.3 3.6 5.2 0.7 3.6

VC1|2 5.7 5.0 0.6 1.6 1.7 1.2 7.6 1.8 3.0 9.9 2.7 4.0

The performances of VC1|2 and ADV were further compared for the above systems by

binning the values of the ω angle of all the docked poses from both programs into 10°

bins. The histogram thus obtained was compared to the corresponding CHI-energy

curves, for example, the data pertaining to sugars with 1,6 linkages in which the O4 atom

is equatorially attached to the reducing sugar is shown in Figure 5.4. The distribution of

ω angles produced by VC1|2 can be divided into three energy regions centered around 60°,

180° and 300°, which is in agreement with the low-energy regions of the corresponding

CHI-energy curve. Additionally, the ω angles corresponding to the reference crystal

structures also fall within the range of the two lowest energy wells of the CHI energy

curve. In contrast, the distribution of ω angles produced by ADV are more evenly

distributed across the 0° to 360° range. The challenges faced by the docking programs

with docking the test set used in this study is outlined below.

70

Figure 5.4 Distribution of ω angles produced by ADV (blue) and VC1|2(green) for 12 test

systems containing one or more 1,6-linkages overlaid against the reference crystal

structure ω angles (red dots) and the corresponding CHI energy curve.

Docking Challenges

Both ADV and VC encountered recurring difficulties while docking ligands in the

development set. These challenges resulted from issues inherent to the docking program,

as well as ambiguities in atomic placement within the crystal structures that were used as

a reference.

Excessive Carbohydrate-Protein Interactions

Obtaining a docked oligosaccharide in which part of the ligand extends away

from the protein is particularly difficult for automated docking algorithms.138,139

Docking

predicts complexes using a scoring function that maximizes favorable intermolecular

0 60 120 180 240 300 360

0

2

4

6

8

10

12

0

2

4

6

8

10

12

14

16

ω [ ]

En

erg

y [

kcal/

mo

l]

Perc

en

tag

e o

f S

tru

ctu

res

ω [10 bins]

ADV

VC1.2

Series3

VC1|2

71

interactions. This approach promotes models that contain many residues interacting with

the protein. For example, both ADV and VC1|2 fail to identify an acceptable pose

amongst the 5 top-ranked poses when docking the tetrasaccharide ligand to the lectin

binding domain of lectinolysin (PDB ID: 4GWI140

). Only one residue of the ligand

completely interacts with the protein surface in the crystal structure; however, the models

produced during docking are unable to reproduce this orientation. Although VC1|2

produced poses similar to the crystal ligand (PRMSD=2.2Å), they were ranked lower

than the other models which interact with the protein surface in their entirety. One

approach to surmount this problem would be to dock only the component of the

oligosaccharide that is in direct contact with the protein. Such a minimal binding

determinant may be inferred from experimental binding data, such as glycan array

screening141,142

. VC improves the likelihood that the non-interacting segment will remain

distal from the protein surface by penalizing unlikely glycosidic torsion angles. As an

example, docking results produced by ADV and VC1|2 for the largest oligosaccharide in

this test set (PDB ID: 3C6S) are displayed in Figure 5.5. While those residues that

interact with the protein are correctly predicted in both instances, the model produced by

VC1|2 better represents the solvent-exposed residues. Glycosidic torsion angles obtained

from the reference structure have been plotted as a function of the CHIφ|α energy curve

alongside those of the 20 models produced by either ADV or VC1|2 (Figure 5.5c).

Approximately half of the ADV torsions exceeded the 2 kcal/mol cutoff, some of which

would receive CHI energy penalties greater than 8 kcal/mol. In contrast, none of these

torsion angles produced by VC were penalized by the CHI energy function for exceeding

the 2 kcal/mol cutoff.

72

Figure 5.5 a.) The PRMSDmin(5) pose from ADV compared to the reference ligand (blue).

b.) The PRMSDmin(5) pose from VC1|2 compared to the reference ligand (blue). c.) The Φ

torsion angles of α-sugars from the docked poses of the 3C6S ligand from both ADV

(yellow triangles) and VC1|2 (green squares) plotted on to the CHI curve. The torsion

angles corresponding to the reference are plotted as blue circles.

0

2

4

6

8

10

12

0 60 120 180 240 300 360

CH

I E

ne

rgy [

kc

al/m

ol]

CHIΦ|α torsion angle [°]

a. b.

c.

73

Aromatic Stacking

The importance of aromatic residues within the binding site has been

demonstrated by the corresponding decrease in affinity upon their substitution with other

amino acids 143

; however, aromatic stacking interactions are currently omitted from

consideration in most docking scoring functions. As a result, docking algorithms can

encounter difficulties when predicting binding modes of ligands that stack against

aromatic amino acids. As an example, the carbohydrate ligand in 4AFD 144

stacks against

four Tryptophan residues (Trp 55, 60, 99 and 108) in the binding groove of the

corresponding CBM. Neither ADV nor VC accurately predict the binding mode,

obtaining high PRMSDmin(5) values of 8.9Å and 5.4Å, respectively (Figure 5.6). In these

situations, consideration of aromatic stacking interactions within the docking scoring

function would be expected to improve the results. Previously, efforts have been made to

incorporate CH/ᴨ stacking effects during carbohydrate docking 132,145

.

74

Figure 5.6 The crystal structure of a CBM from endoglucanase Cel5A (PDB ID: 4AFD)

is depicted in complex with a tetrasaccharide ligand. All amino acids further than 5 Å

away from the ligand are colored grey. Those residues within 5 Å are colored orange if

they are cyclic and red if acyclic.

Low-Resolution Experimental Data

Docking the tetrasaccharide ligand to the Se155-4 antibody (PDB ID: 1MFC34

)

appeared more challenging for VC than ADV (Figure 5.7a); however, the results were

comparable for the other three ligands that have been crystallized with this antibody

(PDB ID: 1MFA31

, 1MFD32

, and 1MFE33

). These three systems contain the same

trisaccharide ligand, but differ from the tetrasaccharide by a rhamnose (Rha) residue.

This extra residue is responsible for the difference in PRMSDmin(5) values between VC1|2

and ADV (Figure 5.7a). While the positions of three of the four residues in the individual

75

structures closely align with one another, the pyranose ring of Rha-524 in the model

produced by VC is flipped approximately 180° around the glycosidic ψ-angle, compared

to the model produced by ADV. In the reported crystal structure for this complex 34

,

residue Rha-524 was described as “disordered,” and was placed in both the expected 94

and the flipped orientation in structures 1MFB and 1MFC, respectively. The ADV

orientation more closely aligns with the “flipped” ligand from 1MFC, giving rise to a low

PRMSDmin(5) relative to VC, which predicts the normal conformation 94

to be preferred.

While it is expected that complexation with the protein will distort the conformation of a

bound oligosaccharide, the preponderance of crystallographic data (Figure 5.7b) indicates

that large distortions, such as the flip of the glycosidic ψ-angle in 1MFC are rare. Thus,

there is a clear role for the CHI-energy functions to aid in crystal structure refinement

and/or curation by identifying such distorted glycosidic linkages as high energy.

Figure 5.7 a.) Models representing the PRMSDmin(5) produced by docking 1MFC with

ADV (yellow) and VC at CHI1|2 (green). The primary difference between docked models

is a rhamnose ring that is flipped approximately 180 degrees, highlighted by the orange

a. b.

76

arrows. b.) Ligands from two crystal structures, 1MFB (blue) and 1MFC (cyan), also

differ by the orientation of the RAM 524 ring.

An Assessment of ADV and VC using a Test Set of Apo Proteins

Cognate docking is useful for determining the ability of the docking algorithm to

correctly place the ligand when the binding site is already preordered to receive the

ligand; however, if the ultimate goal of docking is to successfully predict protein-ligand

interactions in the absence of a pre-configured binding site, it is necessary to assess the

performance on apo proteins. Apo protein crystal structures were available for a subset of

systems from the cognate development set, and were employed as test cases to compare

the performance of ADV and VC1|2. The average difference in amino acid positions

between the apo and corresponding cognate proteins for residues within 5Å of the ligand

was 0.77Å. ADV correctly predicted the binding modes in 35% of the systems, whereas

VC1|2 succeeded in 55% of the systems. If the top-20 poses were considered, instead of

only the top-5, the success rates for ADV and VC1|2 increased to 55% and 83%

respectively (

Table 5.3). VC1|2 also improved the rankings of these acceptable pose predictions

(Figure 5.8). In a given docking run, if there are multiple poses with PRMSD ≤ 2Å, the

pose with a higher rank is considered an acceptable pose.

Table 5.3 Comparison between ADV and VC1|2 for the apo proteins Test Set.

77

System types

No. of Systems

Success Rate* [%]

PRMSDmin(5) PRMSDmin(20)

ADV VC1|2 ADV VC1|2

Antibodies 7 71 86 71 100

Lectins 10 50 50 70 90

CBMs 12 0 42 33 67

Totals 29 35 55 55 83

*Success Rate is defined as finding an accurate binding mode. (PRMSDmin < 2Å)

Figure 5.8 A depiction of the ranks of acceptable poses (Rankacc), i.e., the lowest-ranked

pose with PRMSD ≤ 2Å, produced by ADV and VC1|2 from docking oligosaccharide

ligands onto apo protein structures.

64%

22%

14%

VC1|2

Rank < 5

6 < Rank < 10

Unacceptable53%

10%

37%

ADV

acc

acc

78

Evaluation of Docking to an Enzyme System using ADV and Optimized VC

Enzyme active sites often distort monosaccharide ring shapes during catalysis,

which makes docking to this class of proteins particularly challenging. As the CHI-

energy functions were developed for use with low energy ring conformations, they would

not necessarily be applicable to the distorted glycans found in enzyme complexes, and

hence VC is unlikely to offer considerable improvement over ADV when applied to

carbohydrate-processing enzymes. An exception to this general statement is for segments

of the oligosaccharides extending beyond the active site, in which case the CHI-functions

in VC should provide some enhanced accuracy. A single example of docking to a

retaining glycoside hydrolase is presented here in order to demonstrate the potential

application of VC to enzymes. Kitago et al. produced a series of crystal structures of the

WT cellulase 44A (Cel44A), and a catalytic knockout, in combination with cellulosic

fragments. 146

Of the five structures produced, four of the ligands were bound to the (-)

site (relative to the catalytic nucleophile), while only one contained a ligand that spanned

the entire active site (PDB ID: 2EQD146

) (Figure 5.9a). In that work, a reaction

mechanism was proposed in which initial substrate binding enhanced activity through an

assortment of interactions with the carbohydrate in the (-) site, while a dearth of

interactions in the (+) site promoted product release. 146

VC successfully produced a model of the complex for the four ligands in the (-)

site, but failed to correctly position the largest ligand that crosses the (+) site (Figure

5.9b). Although ADV failed to generate a correct model for the ligands bound to the (-)

site, it outperformed VC when docking the ligand that extends across the active site. This

result is unsurprising considering the high torsional penalties that would be applied by

79

VC to some of the glycosidic linkages within the crystal structure (Figure 5.9c). Although

VC would not penalize the glycosidic linkage of the (-1) residue due to the non-chair ring

conformations, there are other uncommon torsion values in the distal regions of the

ligand. For example, the Φ linkage between residues (+1) and (+2) of the reference

structure would receive a penalty of 8 kcal/mol by the CHI energy function, effectively

precluding selection of such a model by VC.

80

Figure 5.9 a) The ligands from five crystal structures (PDB ID: 2E0P, 2EO7, 2EEX,

2EJ1, and 2EQD) of the Cel44A enzyme are superimposed on the protein from PDB ID:

2EQD. Amino acids reported to be involved in substrate binding (N45, R47, W64, W71,

W327, W331, E359, and W392) are colored orange or red, depending on whether the

2E0P 2EO7 2EEX 2EJ1 2EQD

0

2

4

6

8

10

12

14

PR

MS

Dm

in[Å

]

ADV

VC1|2

ADV

VC1|2

VC1|2

ADV

ADV

VC1|2

PRMSDmin(5)

PRMSDmin(20)

a.

b.

c.

0.1 / 0.1 --- / 0.7 1.3 / 0.3 0.1 / 0.5

3.2 / --- 8.0 / 0.1 5.2 / 3.5

-3

-2

-1

+1

+2 +3

+4

+5

81

residue is aromatic or not. 146

The catalytic residue (Q186) is colored yellow. All other

amino acids are grey. The active site has been separated into a (-) and (+) site. The circled

values represent the position of each residue relative to the glycosidic linkage that is

cleaved during catalysis. The ligands exclusive to the (-) side of the active site are

depicted by varying shades of purple. The octasaccharide that extends across both the (-)

and (+) site (2EQD) is colored blue. Each carbohydrate ring is colored according to

whether the CHI energy penalty is applied to the surrounding Φ/Ψ values. Rings are

either green or red depending on whether VC is or is not applied, respectively. b) A

representation of the PRMSDmin(5) and PRMSDmin(20) poses from ADV and VC1|2. c) The

glycosidic linkages of the octasaccharide that extends across the active site (2EQD) are

labeled according to the penalty received by the CHI energy curve. Penalties greater than

2 kcal/mol are highlighted in red. VC is not applied to the (-1) residue since it is neither a

4C1 nor

1C4 chair, so the ring is colored red and the penalties are unlisted.

Conclusions

The CHI energy functions were incorporated into ADV in order to improve

carbohydrate docking results. Docking performance was evaluated with 72 antibody,

lectin, or CBM systems. Although various CHI-energy coefficients were evaluated, the

original energy profiles (chi_coeff = 1) produced accurate models with the highest

frequency. Although exocyclic groups have been omitted from consideration during the

modeling of the CHI energy curves by the use of tetrahydropyran molecules, the

remaining interaction energy terms within the ADV scoring function account for the

interactions of the molecule arising from the presence of these exocyclic groups. An

82

additional term that allows a range of glycosidic torsion angles to remain unpenalized has

been implemented to enhance docking performance (chi_cutoff = 2). Although these

settings have been selected as default values, the variables remain user-adjustable. VC1|2

produced accurate docked models for more systems than ADV when docking to either

holo- or apo-protein receptors; however, ADV outperformed VC in a few cases where the

reference ligands contained high-energy glycosidic linkages according to the CHI energy

curves. This result suggests that accurately predicting warped glycosidic linkages, such as

those found within the active site of an enzyme, would be difficult for VC. Although VC

was not designed for enzymes, results from docking to a cellulase demonstrate the

potential application of VC towards accurately predicting enzyme-glycan interactions.

There were a few commonalities within the systems that neither ADV nor VC

could accurately reproduce. Ligands that partially extend into solution were difficult to

reproduce due to the lack of intermolecular interactions. For these ligands, better results

may be produced by docking only those parts of the ligand which are expected to interact

with the protein. A few other systems were identified which may benefit from a term that

accounts for aromatic stacking. Finally, a few low-resolution crystal structures were

identified which contained ambiguous coordinates for the reference ligands, indicating a

potential role for the CHI energy functions as a validation technique for crystallographic

models.

VC is currently applicable to the most common saccharide moieties and linkages,

such as chair conformations and 1,x-linkages. Additional residues, such as sialic acid,

may be incorporated into VC once the CHI-energy functions become available.

83

The source code for VC is freely available at http://glycam.org/publication-

materials/vina-carb

Individual Author Contributions

Anita K. Nivedha: Authored portions of the paper; coded CHI energy functions within

AutoDock Vina; co-designed docking protocols and analysis methodologies; provided

tools for the analysis of data and made images for the paper.

David F. Thieker: Authored portions of the paper; co-designed docking protocols and

analysis methodologies; provided tools for the analysis of data and made images for the

paper.

Robert J. Woods: Authored the paper; conceived and designed the experiment, and

contributed to the analysis and interpretation of data.

http://glycam.org/publication-materials/vina-carb


84

6. THE CONSIDERATION OF CH/Π INTERACTIONS IN CARBOHYDRATE-

PROTEIN DOCKING

Introduction

CH/π interactions occur between -CH groups and the π-electron density in

aromatic molecules. These interactions were first postulated by Tamres in 1952 147

, who

noted that dissolving benzene in chloroform was an exothermic reaction. This result was

followed up by extensive NMR and IR studies 148,149

which showed that this type of non-

covalent interaction is qualitatively similar to hydrogen bonds. CH/π interactions have

been described as interactions between a weak acid (C-H donor) and a weak base (π-

acceptor), the interaction between a weak acid and a weak base, and are stable in both

polar and non-polar solvents. 150

Individually, these bonds are relatively weak, with each

interaction contributing 0.5-1.0 kcal/mol to the overall stabilization energy of the

complex, 150,151

but the cumulative effect of multiple CH/π interactions has a pronounced

influence on stability.

It has also been proposed that the strength of the CH/π interaction primarily

originates from charge transfer, 152

indicating that dispersive forces play a major role in

these interactions. 153

The hydrophobic effect also contributes favorably to this interaction

when it is present in water as the solvent. However, it is not the major contributing factor,

as shown in the study by Waters et al.,154

in which the replacement of an aromatic moiety

by a more hydrophobic aliphatic group, led to a decrease in the interaction energy of the

system under study, in which they showed that the mutation of a phenylalanine by a

synthetic analog, in which the phenyl ring was replaced by a cyclohexane ring, weakened

the interaction energy of the system with an acetylated monosaccharide from -0.5

85

kcal/mol to -0.1 kcal/mol. This result showed that the hydrophobic effect was not the

major contributing factor to the interaction energy when there was a potential to form

CH/π interactions. Additionally, CH/π interactions may occur even in vacuum, 155,156

whereas hydrophobicity stems from a molecule’s interaction with water.

Figure 6.1 The replacement of the aromatic group in (A) by the group aliphatic group in

(B) in the study by Water et al. 154

in an interaction with a tetraacetylglucose molecule led

to a decrease in the interaction energy of the system.

Multiple surveys of the Protein Data Bank (PDB) have been performed to

investigate the presence of CH/π interactions in protein crystal structures, and the sheer

number of these interactions reveals their importance in protein structure stability and

function. 50,157

For example, in a 2001 study, a survey of PDB was conducted on a set of

1154 non-redundant protein structures to detect CH/π interactions, and the authors

A. B.

86

detected 31,087 individual interactions which satisfied their selection criterion. 151

They

discovered that nearly three-fourths of the Tryptophan residues, half of all Tyrosine and

Phenylalanine residues and one-fourth of all Histidine residues were involved as

acceptors in CH/π interactions. In addition to their contribution to the stabilization of

protein structures151

, CH/π interactions are also found occurring in complexes of proteins

with ligands or cofactors, nucleotides, carbohydrates or peptides.151,158

159

They are

particularly common in carbohydrate-binding proteins, and affect binding affinity and

conformation. For example, human lysozyme is an endoglycosidase which binds to the β-

1,4-linked homopolymer of N-acetylglucosamine (GlcNAc), the main cell wall

component in fungi. The enzyme has several aromatic amino acids in its binding pocket

crucial for ligand recognition. An alteration of these aromatic residues using site-specific

mutagenesis affected the affinity and the catalytic efficiency of the enzyme. 158

Protein-carbohydrate interactions are at the heart of several life processes

including fertilization, embryogenesis, tissue maturation and tumor metastasis. 160

The

affinities associated with this class of molecular recognition phenomenon are often

strengthened by multivalency 161,162

, as well as by interactions between polar or charged

groups (hydrogen bonds, salt bridges), van der Waals contacts, and aromatic amino acids

and –CH groups in carbohydrate residues (CH/π interactions) 55

(Figure 6.3). These CH/π

interactions cause carbohydrate rings to stack roughly parallel or perpendicular to

aromatic amino acids. 51,163

They have been observed in most protein-carbohydrate

complexes, including enzymes and receptors, and more specifically for example in,

lectins, plant toxins, antibodies and transport proteins. 56,164

Antibodies can also be raised

87

against carbohydrate antigens, and can therefore interact with sugars intermolecularly 25

via stacking interactions.

Figure 6.2 The carbohydrate antigen from Salmonella stacking against two aromatic

amino acids, namely, a Tryptophan and a Tyrosine in the binding pocket of an antibody

Fab fragment. (PDB ID: 1MFE)33

Any pyranoside has two distinctive faces that can interact with an aromatic

residue. From experimental and theoretical studies it can be seen that the presence of

several axially oriented CH bonds facing the aromatic ring is favored, while interactions

with axially oriented OH bonds is disfavored. 160

In a typical carbohydrate CH/π

interaction, the hydrogen atoms in two or three CH groups on the hydrophobic face of a

monosaccharide overlap with the π-electron density in an aromatic amino acid (Fig.

88

3.1b). It has also been shown experimentally that the elimination of aromatic residues

within these binding sites leads to a decrease in the affinity of the protein-carbohydrate

interaction, 165

and replacing one aromatic residue by another can be performed to

modulate the properties of the interaction. It was found also that as the size of the

interacting amino acid ring increased, there was a corresponding increase in affinity. At

the same time, if electron-withdrawing groups, such as Flourine were added to the ring, it

led to a decrease in affinity.147,158,166,167

Figure 6.3 Representation of CH/π interactions between β-D-Glucopyranose (βDGlcp)

and Phenylalanine.

In this the present work, we obtained a CH/π interaction energy function using

knowledge from previous experiments about the nature of interaction between the two

groups and their contribution to the overall interaction energy of the complex.154

We

examined the use of the resulting CH/π function in improving the ranking of theoretical

89

interaction energies for a test set of 60 lectin-carbohydrate systems that consisted of

complexes in which CH/π interactions visibly contributed to binding, those which had a

fewer than four CH/π interactions in the binding site, and also systems in which these

interactions were absent. The theoretical structures were generated by automated

docking employing AutoDock Vina 108

and Vina-Carb. 48

The CH/π function was applied

after docking to assess its ability to improve the ranking of the theoretical poses, relative

to the known crystal structures for the 60 systems.

Methods

CH/π interaction energy function

The CH/π interaction energy curve between a CH model and an aromatic ring moiety can

be described using a Lennard-Jones’ potential with the minimum of the curve at ~0.5

kcal/mol, which is known to be the contribution from an individual CH/π interaction.154

The equation used to model these interactions is shown in Figure 6.4.

90

Figure 6.4 The mathematical model (Lennard-Jones potential) used in this study to

describe the interaction between a CH-group and an aromatic moiety.

Evaluation of Results

The RMSD of the docked ligand pose was computed relative to that in the crystal

structure (PRMSD) for the carbohydrate ring atoms. Previously, we have reported that

PRMSD values are a convenient quantitative measure of the quality of a theoretical

carbohydrate pose. 48

168

For re-scoring of the docking energies, the cumulative CH/π interaction energy

score for each pose was combined with the docked energy obtained from ADV and VC,

and the new energies are used to re-rank the docked models. The rank of the model with

the lowest PRMSD (PRMSDmin) was calculated before and after rescoring. 48

Test systems

The test set consisted of 60 lectin-carbohydrate crystal structures extracted from

the PDB. Details about the systems are provided in Supplementary Information (SI). In

-1

-0.5

0

0.5

1

1.5

2

3 4 5 6

En

erg

y [

kc

al/

mo

l]

Distance, x [Å]

Model

where, 4ε = 1.84; σ = 3.26

x

91

the case of PDB files with multimers of the complex, the monomer with the lowest

average B-factor for the carbohydrate ligand was selected. The systems were prepared for

docking using AutoDockTools. 107

The docking grid box was centered on the binding site

of the protein, and docking was repeated ten times. The (x, y, z) co-ordinates of the grid

box center are provided in S6.3. Docking was performed ten times, and the lowest Pose

PRMSD model determined each time. The average value of these ten lowest PRMSD

values was calculated as the PRMSDmin in each case, as described in the work done by

Nivedha et al. 168

The requested number of output models was set to 20 for each of the 10

independent docking runs. Following docking, the CH/π interaction energy scoring

function was applied to each docked model. The algorithm to perform this post-docking

application of the CH/π function is described in detail in the following section.

Automatic Detection of CH/π interactions

The program reads in protein and carbohydrate structure files (PDB format) and

calculates the equation of the plane, ax + by + c for each pyranose ring using the co-

ordinates of the ring atoms. For the 5 carbon atoms in the ring, the positions of the

attached hydrogen atoms are calculated as shown in 48

. Using the H and C atomic co-

ordinates, CH vectors are generated, and the center of the plane of the carbohydrate ring

demarcated by atoms O5, C2, C3 and C5 is determined.

The program detects all Tyrosine, Tryptophan and Phenylalanine residues

according to their residue name in the PDB file, and stores the coordinates for all the

atoms comprising the aromatic rings. For each aromatic ring, the centroid is calculated

(one for each ring, therefore a total of two in the case of Tryptophan) and the distances

between each aromatic center and all centers of the carbohydrate ring planes are

92

determined (dcenters) (Figure 6.5b). If any of the dcenters distances calculated is found to be

less than 7Å, distances between the projections of the pyranose ring carbon atoms and the

centroids of the aromatic rings are calculated (dcp) (Figure 6.5c). For each dcp distance

calculated, if the value is less than 2.5Å and if the orientation if the CxHx bond is pointing

towards the aromatic ring, an aromatic CH/π interaction energy score is calculated for

that interaction using the distance between the carbon atom and the centroid of the

aromatic ring as input (dcπ), c) Summation of all CH/π interaction energy scores for the

entire carbohydrate molecule gives the total CH/π interaction energy score for that pose.

The performance of various CH/π interaction energy score coefficients were examined,

namely, 0.3, 0.5, 0.7, 1.0, 1.5 and 2.0.

93

Figure 6.5 Detection of CH/π interactions a.) An average position of the co-ordinates of

the atoms C2, O5 and O1 is determined. In order to find the vector C1H1, the negative of

the vector between points C1 and the average of atom positions C2, O5 and O1 (computed

in (a.)) is determined. b.) The distance between the centroid of the aromatic ring and the

plane of the carbohydrate ring delineated by atoms O5, C2, C3 and C5 is determined,

dcenters (≤ 7Å). c.) The carbon atoms in the carbohydrate ring are projected onto the

aromatic ring plane and the distances between each of these projections and the centroid

of the aromatic ring is determined, dcp (≤ 2.5Å). Shown in green are the CH bond vectors

pointing towards the aromatic ring (scored), and shown in red are the CH bond vectors

pointing away from the aromatic ring (not scored).

dcenters

dcp

a.)

b.) c.)

Avg. position of O5, C2 and O1

94

Docking protocol

The protein and ligand files were prepared using AutoDockTools (version 1.5.4).

107 All C-O bonds were allowed freedom to rotate in the carbohydrate ligands. According

to the protocol used in our earlier work, 168

the docking was performed 10 times using

AutoDock Vina and Vina-Carb. The 10 random seeds for each of the 10 docking runs

were explicitly defined in order to increase comparability between results.

Results and Discussion

The systems in the test set were divided based on the number of detected CH/π

interactions (n) (Table 6.1). Based on Boisbouvier’s work, 169

firstly, for each

carbohydrate ligand in the test set, distances between all the ring carbon atoms and the

centroids of all aromatic rings in the interacting protein were calculated, dCπ. For each

ligand, a CH/π interaction is considered as being present if the dCπ distance is ≤ 4.3Å.

Both programs had a greater success rate at making accurate binding mode predictions of

complexes with a greater number of intermolecular CH/π interactions in their binding

pockets. This result could be indicative of the crucial role that these types of interactions

play in determining the binding specificity of the carbohydrate ligands to their respective

receptors.

Amongst the systems for which the programs succeeded in accurately predicting

the ligand binding modes, the ranking of the accurate PRMSDmin poses improved after

the addition of the CH/π interaction energy score especially in cases with a greater

number of CH/π interactions (n≥2). In the case of systems with n≤1, the addition of the

CH/π interaction energy term decreased the ranking of the PRMSDmin pose. (Table 6.1)

95

Table 6.1 Average rank of accurate PRMSDmin pose predictions by ADV and VC1|2

before and after rescoring as a function of the CH/π interaction energy coefficients. The

systems are divided into different groups based on the number of detected CH/π

interactions.

VC1|2

Number

of CHs

(n)

PRMSDmin [Å] Rank Before

Rank After addition of CH/π

energies

CH/π Coefficient

0.3 0.5 0.7 1 1.5 2

0 (n=8) 0.96 3.88 4.24 4.45 4.90 5.30 5.75 6.09

1 (n=3) 0.50 1.33 1.37 1.60 1.77 2.83 3.80 4.63

2 (n=2) 0.98 2.95 2.45 2.35 2.35 2.10 2.05 2.15

3 (n=4) 1.13 2.98 2.80 2.68 2.60 2.53 2.58 2.60

4 (n=8) 0.95 2.32 2.02 2.00 1.91 1.91 1.92 2.01

5 (n=7) 1.30 2.83 2.14 1.84 1.69 1.60 1.59 1.51

6 (n=1) 0.87 2.00 2.00 2.00 1.00 1.00 1.00 1.00

7 (n=1) 1.13 1.00 1.00 1.00 1.00 1.00 1.00 1.00

9 (n=2) 0.54 5.55 1.00 1.00 1.05 1.15 1.25 1.40

10 (n=2) 0.72 7.85 1.40 1.20 1.00 1.20 1.00 1.00

12 (n=1) 1.00 7.10 3.20 2.20 1.90 2.00 2.10 2.40

16 (n=1) 0.42 1.00 1.00 1.00 1.00 1.00 1.00 1.00

total,

n=40 0.87 3.40 2.05 1.94 1.85 1.97 2.09 2.23

ADV

Number of CHs (n) PRMSDmin [Å] Rank Before Rank After

CH/π Coefficient = 0.7

0 (n=5) 1.09 4.04 4.58

1 (n=3) 0.78 1.00 2.00

2 (n=1) 0.62 1.00 1.00

3 (n=4) 1.12 2.25 2.10

4 (n=8) 0.86 2.54 1.88

5 (n=7) 1.31 4.23 2.90

6 (n=1) 0.79 2.00 1.00

96

7 (n=1) 1.26 3.30 6.30

9 (n=2) 0.63 4.75 3.80

10 (n=2) 0.95 5.00 1.00

12 (n=0) - - -

16 (n=1) 0.42 1.00 1.00

total, n=35 0.89 2.83 2.51

Amongst the various CH/π interaction energy coefficients tested, coefficient

values ≥ 0.7 resulted in the most improvement of pose ranking. Using higher values of

the coefficient on with systems with a lower number of CH/π interactions caused the

ranking of the PRMSDmin to decline. Therefore, based on the data obtained, a coefficient

value of 0.7 was chosen as the optimal value to rescore docked carbohydrate poses using

the CH/π interaction energy function.

In the case of systems 1VEO and 1ITC, the application of the CH/π interaction

energy scores, improved the ranking of the accurate PRMSDmin poses produced by 7.6

and 10 places respectively. For example, in the case of ADV1|2, the PRMSD of the top-

ranked pose before rescoring was 5.6Å, whereas after rescoring, the PRMSD of the top-

ranked pose became 0.9Å. (Figure 6.6)

97

Figure 6.6 The effect of applying the CH/π interaction to the top-ranked pose produced

by VC1|2 before and after rescoring. Shown in green is the crystal ligand, in white is the

top-ranked pose before rescoring (PRMSD = 5.6Å) and in blue is the top-ranked pose

after rescoring (PRMSD = 0.9Å).

Conclusions

The incorporation of the CH/π interaction energy term improved rankings of

accurate PRMSDmin pose predictions produced by both ADV and VC1|2. A CH/π

interaction energy coefficient of 0.7 produced optimal results for the test set considered.

In at least 40% of the total test systems, both docking programs were unable to produce

accurate binding mode predictions. The inclusion of the CH/π interaction energy function

within the VC scoring function can be expected to improve binding mode prediction, by

98

favorably scoring any such interaction between every docked pose generated by the

algorithm and the protein receptor. This would in turn decrease the probability of

rejection of such poses during the selection stage of the algorithm, before the final results

are assembled. Additionally, an appropriate CH/π interaction coefficient value should

also be included and its optimum value determined.

The algorithm for the detection of CH/π interactions can be further improved, for

instance, by considering the angle of the CH vectors with respect to the normal to the

aromatic ring plane. The test set can also be expanded to increase diversity, both with

respect to receptor and ligand types, and also with respect to systems with or without

intermolecular CH/π interactions. The consideration of pivotal CH/π interactions in

protein-carbohydrate complexes, and accounting for the energies that these non-covalent

interactions contribute to protein-carbohydrate binding can improve our binding mode

predictions, and help us better understand the factors influencing biological recognition.

Future Directions

The CH/π interaction energy function presented in this study is a first-order

approximation of an energy curve to model the interaction between an aliphatic CH-

group and an aromatic ring. The model can be further improved by using data from

available literature studying these interactions. In the 2006 study by Ringer et al., 155

the

authors performed QM calculations to estimate the contribution of CH/π interactions to

the total interaction energy in model systems using the Symmetry Adapted Perturbation

Theory (SAPT) 170

analysis. The authors performed computations on model systems

99

consisting of methane, as a model for aliphatic side-chains, and benzene, phenol or indole

as aromatic components of phenylalanine, tyrosine and tryptophan. The model systems

used are show in Figure 6.7.

Figure 6.7 Model Systems used by Ringer et al. to quantify CH/π interactions using

quantum mechanical calculations

They obtained potential energy curves by varying distances between the methane

molecule and the aromatic moieties in each model complex. We observed that the

reported energies were remarkably similar in terms of maximum interaction energy and

shape of the interaction potential, and have developed a generic CH/π function by

averaging the QM data and fitting a Lennard-Jones potential to the average values. Figure

6.8. This new energy function can be used to score CH/π interactions.

𝑉𝑥 = 𝜀 [((𝜎

𝑥)

12

− (𝜎

𝑥)

6

)] [6.1]

a. b. c. d.

100

where, x is distance between carbon atom and aromatic ring centroid.

Figure 6.8 a.) The individual interaction energy curves for the models (as described in

Figure 6.7) used by Ringer et al. 155

, alongside the average of the individual curves. b.)

The average curve (a) shown alongside the mathematical model used in the current study.

-2

-1

0

1

2

3

4

5

0 1 2 3 4 5 6 7

En

erg

y [

kc

al/

mo

l]

Distance [Å]

Model 6.4a

Model 6.4b

Model 6.4c

Model 6.4d

Average Curve

-2

-1

0

1

2

3

4

5

0 1 2 3 4 5 6 7

En

erg

y [

kc

al/

mo

l]

Distance [Å]

Average Curve

Model

a.)

b.)

101

7. CONCLUSIONS

In Chapter 4, the performances of three docking programs, namely AutoDock 3.0.5,

AutoDock 4.2 and AutoDock Vina were compared and AutoDock Vina had the most

success in accurately predicting binding modes of the carbohydrate ligands. A set of six

antibody-carbohydrate systems were used in this study. An algorithm for aligning the

antibody structures to the co-ordinate axes prior to docking based on the complementarity

determining regions was developed in order to increase comparability and reproducibility

of the results, in addition to being useful in an automated docking pipeline to be

implemented in GlycamWeb (www.glycam.com). A set of disaccharide models were

used to develop the Carbohydrate Intrinsic (CHI) energy functions, which score

oligosaccharide structures based on the conformations of their glycosidic linkages.

Application of the CHI energy functions resulted in an improvement of the rankings of

the accurate pose predictions. A survey of the PDB for carbohydrate crystal structures,

consisting of carbohydrates linked either covalently or non-covalently to various

receptors including lectins, antibodies, enzymes and carbohydrate binding modules,

revealed that the glycosidic torsion preferences of these structures were similar despite of

being bound to different kinds of substrates. A majority of the glycosidic torsion angles

fall into the same energy well, for each CHI energy curve. These energy functions can

therefore also aid in the refinement of experimental oligosaccharide structures.

The research presented in chapter 5 described the incorporation of the CHI energy

functions within AutoDock Vina’s scoring function, leading to the development of Vina-

Carb. The performance of Vina-Carb and the original AutoDock Vina were evaluated and

compared against a set of protein-carbohydrate systems consisting of lectins, antibodies,

http://www.glycam.com/

102

carbohydrate binding modules and enzymes. Vina-Carb significantly improved the

conformations of the docked oligosaccharide poses. The integration of the CHI energy

functions within the program led to the penalization of unfavorable glycosidic torsion

angles, increasing the appearance of poses with energetically favorable glycosidic

linkages in the output. The improvements effected in the conformation of the

carbohydrate ligand automatically improved the chances of VC making accurate binding

mode predictions. The source code of Vina-Carb ver. 1.0 is available for download at:

http://glycam.org/publication-materials/vina-carb. The suite of CHI energy functions

could be further expanded to include 2,x linkages, and other standard sugar

conformations as needed.

In chapter 6, the role of CH/π interactions in binding specificity and affinity in

protein-carbohydrate complexes has been outlined. Previously available quantum

mechanical data describing the interaction between models of CH groups and aromatic

amino acids was used to obtained mathematical models describing the CH/π interactions

energy in such complexes. This CH/π interaction energy function, when applied to lectin-

carbohydrate docked complexes with significant CH/π contacts in the binding pocket,

improved the rankings of accurate binding mode predictions. This function can be

incorporated within Vina-Carb’s scoring functions so that the presence of CH/π

interactions is favored during docking, which could consequently further improve

oligosaccharide binding mode predictions.


103

8. REFERENCES

(1) Drickamer, K.; Taylor, M. E. Biology of Animal Lectins. Annu. Rev. Cell

Biol. 1993, 9, 237-264.

(2) Varki, A. Biological Roles of Oligosaccharides: All of the Theories are

Correct. Glycobiology 1993, 3, 97-130.

(3) Haltiwanger, R. S.; Lowe, J. B. Role of glycosylation in development.

Annu Rev Biochem 2004, 73, 491-537.

(4) Cobb, B. A.; Kasper, D. L. Coming of age: carbohydrates and immunity.

European Journal of Immunology 2005, 35, 352-356.

(5) Beuvery, E. C.; Vanrossum, F.; Nagel, J. COMPARISON OF THE

INDUCTION OF IMMUNOGLOBULIN-M AND IMMUNOGLOBULIN-G

ANTIBODIES IN MICE WITH PURIFIED PNEUMOCOCCAL TYPE-3 AND

MENINGOCOCCAL GROUP-C POLYSACCHARIDES AND THEIR PROTEIN

CONJUGATES. Infection and Immunity 1982, 37, 15-22.

(6) Brown, G. D.; Gordon, S. Immune recognition: A new receptor for [beta]-

glucans. Nature 2001, 413, 36-37.

(7) Rademacher, T. W.; Parekh, R. B.; Dwek, R. A. Glycobiology. Ann. Rev.

Biochem. 1988, 57, 785-838.

(8) Feizi, T. Carbohydrate differentiation antigens: probable ligands for cell

adhesion molecules. Trends in Biochemical Sciences 1991, 16, 84-86.

104

(9) Varki, A.; Cummings, R.; Esko, J.; Freeze, H.; Hart, G.; Marth, J.:

Essentials of Glycobiology; Cold Spring Harbor Laboratory Press: New York, 1999.

(10) Roth, Z.; Yehezkel, G.; Khalaila, I. Identification and quantification of

protein glycosylation. International Journal of Carbohydrate Chemistry 2012, 2012.

(11) Chou, C.-F.; Smith, A. J.; Omary, M. Characterization and dynamics of O-

linked glycosylation of human cytokeratin 8 and 18. Journal of Biological Chemistry

1992, 267, 3901-3906.

(12) Jackson, S. P.; Tijan, R. O-Glycosylation of Eukaryotic Transcription

Factors: Implications for Mechanisms of Transcriptional Regulation. Cell 1988, 55, 125-

133.

(13) Gerken, T. A.; Butenhof, K. J.; Shogren, R. Effects of Glycosylation on

the Conformation and Dynamics of O-Linked Glycoproteins: Carbon-13 NMR Studies of

Ovine Submaxillary Mucin. Biochem. 1989, 28, 5536-5543.

(14) Wittwer, A. J.; Howard, S. C.; Carr, L. S.; Harakas, N. K.; Feder, J.;

Parekh, R. B.; Rudd, P. M.; Dwek, R. A.; Rademacher, T. W. Effects of N-Glycosylation

on in Vitro Activity of Bowes Melanoma and Human Colon Fibroblast Derived Tissue

Plasminogen Activator. Biochem. 1989, 28, 7662-7669.

(15) Saso, L.; Silvestrini, B.; Guglielmotti, A.; Lahita, R.; Cheng, C. Y.

ABNORMAL GLYCOSYLATION OF ALPHA(2)-MACROGLOBULIN, A NON-

ACUTE-PHASE PROTEIN, IN PATIENTS WITH AUTOIMMUNE-DISEASES.

Inflammation 1993, 17, 465-479.

105

(16) Rook, G. A. W.; Steele, J.; Brealey, R.; Whyte, A.; Isenberg, D.; Sumar,

N.; Nelson, L.; Bodman, K. B.; Young, A.; Roitt, I. M.; Hutchison, F.; Williams, P.;

Scragg, I.; Edge, C. J.; Arkwright, P.; Ashford, D.; Wormald, M.; Rudd, P.; Redman, C.;

Dwek, R. A.; Rademacher, T. W. Changes in IgG Glycoform Levels may be Relevant to

Remission of Arthritis During Pregnancy.

(17) Rademacher, T. W.; Parekh, R. B.; Dwek, R. A.; Isenberg, D.; Rook, G.;

Axford, J. S.; Roitt, I. The Role of IgG Glycoforms in the Pathogenesis of Rheumatoid

Arthritis. Springer Semin. Immunopathol. 1988, 10, 231-249.

(18) Renaudineau, Y.; Saraux, A.; Dueymes, M.; Le Goff, P.; Youinou, P.

Importance of IgG Glycosylation in Rheumatoid Arthritis. Rev. Rhum. 1998, 65, 429-

433.

(19) Watson, M.; Rudd, P.; Bland, M.; Dwek, R.; Axford, J. S. Sugar Printing

Rheumatic Diseases: A Potential Method for Disease Differentiation Using

Immunoglobulin G Oligosaccharides. Arth Rheum 1999, 42, 1682-1690.

(20) Brockhausen, I.: Glycodynamics of mucin biosynthesis in gastrointestinal

tumor cells. In Glycobiology and Medicine; Axford, J. S., Ed.; Advances in Experimental

Medicine and Biology, 2003; Vol. 535; pp 163-188.

(21) Porowska, H.; Paszkiewicz-Gadek, A.; Anchim, T.; Wolczynski, S.;

Gindzienski, A. Inhibition of the O-glycan elongation limits MUC1 incorporation to cell

membrane of human endometrial carcinoma cells. International Journal of Molecular

Medicine 2004, 13, 459-464.

106

(22) Hakomori, S. I. Aberrant Glycosylation in Tumors and Tumor-Associated

Carbohydrate Antigens. Advances in Cancer Research 1989, 52, 257-331.

(23) Dennis, J. W.; Granovsky, M.; Warren, C. E. Glycoprotein glycosylation

and cancer progression. Biochimica et Biophysica Acta 1999, 1473, 21 - 34.

(24) Paulson, J. C.; Blixt, O.; Collins, B. E. Sweet Spots in Functional

Glycomics. Nat Chem Biol 2006, 2, 238-248.

(25) Murase, T.; Zheng, R. B.; Joe, M.; Bai, Y.; Marcus, S. L.; Lowary, T. L.;

Ng, K. K. S. Structural Insights into Antibody Recognition of Mycobacterial

Polysaccharides. Journal of Molecular Biology 2009, 392, 381-392.

(26) Kotra, L. P.; Golemi, D.; Amro, N. A. Dynamics of the

Lipopolysaccharide Assembly on the Surface of Escherichia coli. J. Am. Chem. Soc.

1999, 121, 8707-8711.

(27) Park, B. S.; Song, D. H.; Kim, H. M.; Choi, B.-S.; Lee, H.; Lee, J.-O. The

Structural Basis of Lipopolysaccharide Recognition by the TLR4–MD-2 Complex.

Nature 2009, 458, 1191-1195.

(28) Kelly, D. F.; Moxon, E. R.; Pollard, A. J. Haemophilus influenzae type b

conjugate vaccines. Immunology 2004, 113, 163-174.

(29) Darkes, M. J. M.; Plosker, G. L. Pneumococcal conjugate vaccine

(Prevnar; PNCRM7): a review of its use in the prevention of Streptococcus pneumoniae

infection. Paediatric drugs 2002, 4, 609-630.

107

(30) Vyas, N. K.; Vyas, M. N.; Chervenak, M. C.; Johnson, M. A.; Pinto, B.

M.; Bundle, D. R.; Quiocho, F. A. Molecular Recognition of Oligosaccharide Epitopes

by a Monoclonal Fab Specific for Shigella Flexneri Y Lipopolysaccharide: X-ray

Structures and Thernodynamics. Biochemistry 2002, 41, 13575-13586.

(31) Zdanov, A.; Li, Y.; Bundle, D. R.; Deng, S.-J.; MacKenzie, C. R.; Narang,

S. A.; Young, N. M.; Cygler, M. Structure of a Single-Chain Antibody Variable Domain

(Fv) Fragment Complexed with a Carbohydrate Antigen a 1.7-Å Resolution. Proc. Natl.

Acad. Sci. USA 1994, 91, 6423-6427.

(32) Bundle, D. R.; Baumann, H.; Brisson, J.-R.; Gagné, S. M.; Zdanov, A.;

Cygler, M. Solution Structure of a Trisaccharide-Antibody Complex: Comparison of

NMR Measurements with a Crystal Structure. Biochemistry 1994, 33, 5183-5192.

(33) Cygler, M.; Rose, D. R.; Bundle, D. R. Recognition of a Cell-Surface

Oligosaccharide of Pathogenic Salmonella by an Antibody Fab Fragment. Science 1991,

253, 442-445.

(34) Cygler, M.; Wu, S.; Zdanov, A.; Bundle, D. R.; Rose, D. R. Recognition

of a carbohydrate antigenic determinant of Salmonella by an antibody. Biochem Soc

Trans 1993, 21, 437-441.

(35) Vulliez-Le Normand, B.; Saul, F. A.; Phalipon, A.; Bélot, F.; Guerreiro,

C.; Mulard, L. A.; Bentley, G. A. Structures of synthetic O-antigen fragments from

serotype 2a Shigella flexneri in complex with a protective monoclonal antibody.

Proceedings of the National Academy of Sciences of the United States of America 2008,

105, 9976-9981.

108

(36) Roseman, S. Reflections on glycobiology. Journal of Biological

Chemistry 2001, 276, 41527-41542.

(37) Dwek, R. A. Glycobiology: Toward Understanding the Function of

Sugars. Chem Rev 1996, 96, 683-720.

(38) Fischer, E. Ueber die Configuration des Traubenzuckers und seiner

Isomeren. II. Berichte der deutschen chemischen Gesellschaft 1891, 24, 2683-2687.

(39) Juaristi, E.; Cuevas, G.: The anomeric effect; CRC press, 1994.

(40) Juaristi, E.; Cuevas, G. Recent Studies of the Anomeric Effect.

Tetrahedron 1992, 48, 5019-5087.

(41) Tvaroska, I.; Carver, J. P. The Anomeric, Reverse Anomeric and Exo-

Anomeric Effects in C-, N-, and S- Glycosyl Compounds. Manuscript.

(42) Anomeric Effect. Origin and Consequences; Szarek, W. A.; Horton, D.,

Eds.; American Chemical Society: Washington, D.C., 1979; Vol. 87, pp 132.

(43) Tvaroska, I.; Kozar, T. The Conformational Properties of the Glycosidic

Linkage. Carbohydr. Res. 1981, 90, 173-185.

(44) Kirby, A. J.: The Anomeric Effect and Related Stereoelectronic Effects at

Oxygen; Springer-Verlag: New York, 1983.

(45) Fuchs, B.; Schleifer, L.; Tartakovsky, E. Probing the Anomeric Effect1:

The Structural Criterion. Nouveau Journal de Chimie 1984, 8, 275-278.

109

(46) Tvaroska, I.; Bleha, T.: Anomeric and Exo-Anomeric Effects in

Carbohydrate Chemistry. In Adv. Carbohydr. Chem. Biochem.; Tipson, R. S., Derek, H.,

Eds.; Academic Press: New York, 1989; Vol. 47; pp 45-123.

(47) Agirre, J.; Davies, G.; Wilson, K.; Cowtan, K. Carbohydrate anomalies in

the PDB. Nature chemical biology 2015, 11, 303-303.

(48) Nivedha, A. K.; Makeneni, S.; Foley, B. L.; Tessier, M. B.; Woods, R. J.

Importance of ligand conformational energies in carbohydrate docking: Sorting the wheat

from the chaff. J Comput Chem 2013.

(49) Bourne, Y.; van Tilbeurgh, H.; Cambillau, C. Protein-Carbohydrate

Interactions. Curr. Opin. Struct. Biol. 1993, 3, 681-686.

(50) Vyas, N. K. Atomic Features of Protein-Carbohydrate Interactions. Curr.

Opin. Struct. Biol. 1991, 1, 732-740.

(51) Quiocho, F. A. Carbohydrate-Binding Proteins: Tertiary Structures and

Protein-Sugar Interactions. Ann. Rev. Biochem. 1986, 55, 287-315.

(52) Munske, G. R.; Krakauer, H.; Magnuson, J. A. Calorimetric study of

carbohydrate binding to concanavalin A. Archives of biochemistry and biophysics 1984,

233, 582-587.

(53) Bundle, D. R.; Young, N. M. Carbohydrate-protein Interactions in

Antibodies and Lectins. Curr. Opin. Struct. Biol. 1992, 2, 666-673.

(54) Quiocho, F. A.; Vyas, N. K. Novel Stereospecificity of the L-Arabinose-

Binding Protein. Nature 1984, 310, 381-386.

110

(55) Kozmon, S.; Matuska, R.; Spiwok, V. c.; Koca, J. Dispersion Interactions

of Carbohydrates with Condensate Aromatic Moieties: Theoretical Study on the CH–p

Interaction Additive Properties. Phys. Chem. Chem. Phys. 2011, 13, 14215–14222.

(56) Elgavish, S.; Shaanan, B. Lectin-Carbohydrate Interactions: Different

Folds, Common Recognition Principles. Trends Biochem. Sci. 1997, 22, 462-467.

(57) Vyas, N. K.; Vyas, M. N.; Quiocho, F. A. Sugar and signal-transducer

binding sites of the Escherichia coli galactose chemoreceptor protein. Science 1988, 242,

1290-1295.

(58) Quiocho, F. A. Protein-carbohydrate interactions: basic molecular

features. Pure and Applied Chemistry 1989, 61, 1293-1306.

(59) DeMarco, M. L.; Woods, R. J. Structural Glycobiology: A Game of

Snakes and Ladders. Glycobiology 2008, 18, 426-440.

(60) Woods, R. J.; Tessier, M. B. Computational Glycoscience: Characterizing

the Spatial and Temporal Properties of Glycans and Glycan—Protein Complexes. Curr.

Opin. Struct. Biol. 2010, 20, 575-583.

(61) Ghazarian, H.; Idoni, B.; Oppenheimer, S. B. A Glycobiology Review:

Carbohydrates, Lectins and Implications in Cancer Therapeutics. Acta Histochem. 2011,

113, 236-247.

(62) Hakomori, S. Tumor-associated carbohydrate antigens. Annu Rev Immunol

1984, 2, 103-126.

111

(63) Fukuda, M. Possible roles of tumor-associated carbohydrate antigens.

Cancer Research 1996, 56, 2237-2244.

(64) Eisen, M. B.; Sabesan, S.; Skehel, J. J.; Wiley, D. C. Binding of the

Influenza A Virus to Cell-Surface Receptors: Structures of Five Hemagglutinin–

Sialyloligosaccharide Complexes Determined by X-Ray Crystallography. Virology 1997,

232, 19-31.

(65) Suzuki, Y.; Nagao, Y.; Kato, H.; Matsumoto, M.; Nerome, K.; Nakajima,

K.; Nobusawa, E. Human influenza A virus hemagglutinin distinguishes

sialyloligosaccharides in membrane-associated gangliosides as its receptor which

mediates the adsorption and fusion processes of virus infection. Specificity for

oligosaccharides and sialic acids and the sequence to which sialic acid is attached.

Journal of Biological Chemistry 1986, 261, 17057-17061.

(66) Wiley, D. C.; Skehel, J. J. The structure and function of the hemagglutinin

membrane glycoprotein of influenza virus. Annual review of biochemistry 1987, 56, 365-

394.

(67) Magnani, J. L.; Ernst, B. From Carbohydrate Leads to Glycomimetic

Drugs. Nature Reviews Drug Discovery 2009, 8, 661-677.

(68) Dreitlein, W. B.; Maratos, J.; Brocavich, J. Zanamivir and oseltamivir:

Two new options for the treatment and prevention of influenza. Clinical Therapeutics

2001, 23, 327-355.

112

(69) Moscona, A. Neuraminidase Inhibitors for Influenza. N Engl J Med 2005,

353, 1363-1373.

(70) Kevin, H. M.: Galectins and Disease Implication for Targeted

Therapeutics. In American Chemical Society, 2012; pp 61-77.

(71) Tessier, M. B.; Grant, O. C.; Heimburg-Molinaro, J.; Smith, D.; Jadey, S.;

Gulick, A. M.; Glushka, J.; Deutscher, S. L.; Rittenhouse-Olson, K.; Woods, R. J.

Computational Screening of the Human TF-Glycome Provides a Structural Definition for

the Specificity of Anti-Tumor Antibody JAA-F11. PLoS One 2013, 8, e54874.

(72) Woods, R.; Yongye, A.: Computational Techniques Applied to Defining

Carbohydrate Antigenicity. In Anticarbohydrate Antibodies; Kosma, P., Müller-Loennies,

S., Eds.; Springer Vienna, 2012; pp 361-383.

(73) Kadirvelraj, R.; Gonzalez-Outeriño, J.; Foley, B. L.; Beckham, M. L.;

Jennings, H. J.; Foote, S.; Ford, M. G.; Woods, R. J. Understanding the Bacterial

Polysaccharide Antigenicity of Streptococcus agalactiae versus Streptococcus

pneumoniae. PNAS 2006, 103, 8149-8154.

(74) Yongye, A. B.; Gonzales Outeriño, J.; Glushka, J.; Schultheis, V.; Woods,

R. J. The Conformational Properties of Methyl α-(2,8)-di/trisialosides and Their N-acyl

Analogs: Implications for Anti-Neisseria meningitidis B Vaccine Design. Biochemistry

2008, 47, 12493–12514.

(75) Calarese, D. A.; Scanlan, C. N.; Zwick, M. B.; Deechongkit, S.; Mimura,

Y.; Kunert, R.; Zhu, P.; Wormald, M. R.; Stanfield, R. L.; Roux, K. H.; Kelly, J. W.;

113

Rudd, P. M.; Dwek, R. A.; Katinger, H.; Burton, D. R.; Wilson, I. A. Antibody Domain

Exchange Is an Immunological Solution to Carbohydrate Cluster Recognition. Science

2003, 300, 2065-2071.

(76) Dyekjær, J. D.; Woods, R. J.: Predicting the Three-Dimensional Structures

of Anti-Carbohydrate Antibodies: Combining Comparative Modeling and MD

Simulations. In NMR Spectroscopy and Computer Modeling of Carbohydrates. Recent

Advances. ; Vliegenthart, J. F. G., Woods, R. J., Eds.; ACS Symposium Series 930;

American Chemical Society: Washington, 2006; Vol. 930; pp 203-219.

(77) Gildersleeve, J.; Roach, T. A.; Li, Z.; Gildersleeve, J. C. Supplier-

Dependent Antiglycan Monoclonal Antibody Specificities: Comment On "High-

Throughput Carbohydrate Microarray Profiling of 27 Antibodies Demonstrates

Widespread Specificity Problems. Glycobiology 2008, 18, 746-756.

(78) Pincus, S. H.; Moran, E.; Maresh, G.; Jennings, H. J.; Pritchard, D. G.;

Egan, M. L.; Blixt, O. Fine specificity and cross-reactions of monoclonal antibodies to

group B streptococcal capsular polysaccharide type III. Vaccine 2012, 30, 4849-4858.

(79) Cooke, R. M.; Hale, R. S.; Lister, S. G.; Shah, G.; Weir, M. P. The

Conformation of the Sialyl Lewis X Ligand Changes upon Binding to E-Selectin.

Biochemistry 1994, 33, 10591-10596.

(80) Mahmoudian, M. The cannabinoid receptor: computer-aided molecular

modeling and docking of ligand. Journal of Molecular Graphics and Modelling 1997, 15,

149-153.

114

(81) Laederach, A.; Dowd, M. K.; Coutinho, P. M.; Reilly, P. J. Automated

Docking of Maltose, 2-Deoxymaltose, and Maltotetraose into the Soybean -Amylase

Active Site. Proteins: Structure, Function and Genetics 1999, 37, 166-175.

(82) Goodsell, D. S.; Morris, G. M.; Olson, A. J. Automated docking of

flexible ligands: Applications of autodock. Journal of Molecular Recognition 1996, 9, 1-

5.

(83) Sotriffer, C. A.; Flader, W.; Winger, R. H.; Rode, B. M.; Liedl, K. R.;

Varga, J. M. Automated Docking of Ligands to Antibodies: Methods and Applications.

Methods, Companion to Methods in Enzymol. 2000, 20, 280-291.

(84) Jorgensen, W. L. The many roles of computation in drug discovery.

Science 2004, 303, 1813-1818.

(85) Foley, B. L.; Tessier, M. B.; Woods, R. J. Carbohydrate Force Fields.

WIREs Computational Molecular Science 2011, 1-69.

(86) Laederach, A.; Reilly, P. J. Modeling Protein Recognition of

Carbohydrates. Proteins: Struct. Funct. Genet. 2005, 60, 591-597.

(87) Sapay, N.; Nurisso, A.; Imberty, A.: Simulation of Carbohydrates, from

Molecular Docking to Dynamics in Water. In Biomolecular Simulations; Monticelli, L.,

Salonen, E., Eds.; Methods in Molecular Biology; Humana Press, 2013; Vol. 924; pp

469-483.

115

(88) Bras, N. F.; Fernandes, P. A.; Ramos, M. J. Docking and molecular

dynamics studies on the stereoselectivity in the enzymatic synthesis of carbohydrates.

Theor. Chem. Acc. 2009, 122, 283-296.

(89) Laederach, A.; Reilly, P. J. Specific Empirical Free Energy Function for

Automated Docking of Carbohydrates to Proteins. J. Comput. Chem. 2003, 24, 1748-

1757.

(90) Hwang, M.-J.; Ni, X.; Waldman, M.; Ewig, C. S.; Hagler, A. T.

Derivation of Class II Force Fields. VI. Carbohydrate Compounds and Anomeric

Effects. Biopolymers 1998, 45, 435-468.

(91) Woods, R. J.; Edge, C. J.; Wormald, M. R.; Dwek, R. A.: GLYCAM_93:

A Generalized Parameter Set for Molecular Dynamics Simulations of Glycoproteins and

Oligosaccharides. Application to the Structure and Dynamics of a Disaccharide Related

to Oligomannose. In Complex Carbohydrates in Drug Research; Bock, K., Clausen, H.,

Krogsgaard-Larsen, P., Kofod, H., Eds.; Munksgaard: Copenhagen, Denmark, 1993; Vol.

36; pp 15-36.

(92) Weldon, A. J.; Tschumper, G. S. Intrinsic Conformational Preferences of

and an Anomeric-Like Effect in 1-Substituted Silacyclohexanes. Int. J. Quantum Chem.

2007, 107, 2261-2265.

(93) Woodcock, H. L.; Moran, D.; Pastor, R. W.; MacKerell, A. D.; Brooks, B.

R. Ab initio modeling of glycosyl torsions and anomeric effects in a model carbohydrate:

2-Ethoxy tetrahydropyran. Biophysical Journal 2007, 93, 1-10.

116

(94) Kirschner, K. N.; Yongye, A. B.; Tschampel, S. M.; González-Outeiriño,

J.; Daniels, C. R.; Foley, B. L.; Woods, R. J. GLYCAM06: A Generalizable

Biomolecular Force Field. Carbohydrates. J. Comput. Chem. 2008, 29, 622–655.

(95) Guvench, O.; Mallajosyula, S. S.; Raman, E. P.; Hatcher, E.;

Vanommeslaeghe, K.; Foster, T. J.; Jamison, F. W.; MacKerell, A. D. CHARMM

Additive All-Atom Force Field for Carbohydrate Derivatives and Its Utility in

Polysaccharide and Carbohydrate–Protein Modeling. J. Chem. Theory Comput. 2011, 7,

3162-3180.

(96) French, A. D.; Kelterer, A.-M.; Johnson, G. P.; Dowd, M. K.; Cramer, C.

J. HF/6-31G* Energy Surfaces for Disaccharide Analogs. J. Comput. Chem. 2001, 22, 65.

(97) French, A. D.; Dowd, M. K. Exploration of Disaccharide Conformations

by Molecular Mechanics. J. Mol. Struct. (Theochem) 1993, 286, 183-201.

(98) Talavera, A.; Eriksson, A.; Ökvist, M.; López-Requena, A.; Fernández,

Y.; Pérez, R.; Moreno, E.; Krengel, U. Crystal Structure of an Anti-ganglioside Antibody,

and Modeling of the Functional Mimicry of its NeuGc-GM3 Antigen by an Anti-

idiotypic Antibody. Molec. Immun. 2009, 46, 3466-3475.

(99) Paula, S.; Monson, N.; Ball, W. J., Jr. Molecular Modeling of Cardiac

Glycoside Binding by the Human Sequence Monoclonal Antibody 1B3. Proteins 2005,

60, 382-391.

117

(100) Blaszczyk-Thurin, M.; Murali, R.; Westerink, M. A. J.; Steplewski, Z.;

Sung Co, M.; Kieber-Emmons, T. Molecular recognition of the Lewis Y antigen by

monoclonal antibodies. Protein Engineering 1996, 9, 447-459.

(101) Vyas, N. K.; Vyas, M. N.; Chervenak, M. C.; Bundle, D. R.; Pinto, B. M.;

Quiocho, F. A. Structural Basis of Peptide-Carbohydrate Mimicry in an Antibody-

Combining Site. Proc. Natl. Acad. Sci. USA 2003, 100, 15023-15028.

(102) Agostino, M.; Sandrin, M. S.; Thompson, P. E.; Ramsland, P. A.; Yuriev,

E. Peptide Inhibitors of Xenoreactive Antibodies Mimic the Interaction Profile of the

Native Carbohydrate Antigens. Pep. Sci. 2011, 96, 193-206.

(103) Agostino, M.; Jene, C.; Boyle, T.; Ramsland, P. A.; Yuriev, E. Molecular

Docking of Carbohydrate Ligands to Antibodies: Structural Validation against Crystal

Structures. J. Chem. Inf. Model. 2009, 49, 2749-2760.

(104) Agostino, M.; Sandrin, M. S.; Thompson, P. E.; Yuriev, E.; Ramsland, P.

A. In Silico Analysis of Antibody-carbohydrate Interactions and its Application to

Xenoreactive Antibodies. Glycobiol. 2009, 47, 105-115.

(105) Lee, M.; Lloyd, P.; Zhang, X.; Schallhorn, J. M.; Sugimoto, K.; Leach, A.

G.; Sapiro, G.; Houk, K. N. Shapes of Antibody Binding Sites: Qualitative and

Quantitative Analyses Based on a Geomorphic Classification Scheme. J. Org. Chem.

2006, 71, 5082-5092.

118

(106) Huey, R.; Morris, G. M.; Olson, A. J.; Goodsell, D. S. A Semiempirical

Free Energy Force Field with Charge-Based Desolvation. J. Comput. Chem. 2007, 28,

1145-1152.

(107) Morris, G. M.; Huey, R.; Lindstrom, W.; Sanner, M. F.; Belew, R. K.;

Goodsell, D. S.; Olson, A. J. Autodock4 and AutoDockTools4: Automated Docking with

Selective Receptor Flexiblity. J. Comput. Chem. 2009, 30, 2785-2791.

(108) Trott, O.; Olson, A. J. AutoDock Vina: Improving the Speed and

Accuracy of Docking with a New Scoring Function, Efficient Optimization and

Multithreading. J. Comput. Chem. 2010, 31, 455-461.

(109) GLYCAM Web. http://www.glycam.org.

(110) Zhang, W. H., T; Schafmeister, C; Ross, W. S., Case, D. A. AmberTools

Version 1.0. 2008.

(111) Bernstein, F. C.; Koetzle, T. F.; Williams, G. J. B.; Meyer, E. F.; Brice, M.

D.; Rodgers, J. R.; Kennard, O.; Shimanouchi, T.; Tasumi, M. Protein Data Bank -

Computer-Based Archival File for Macromolecular Structures. J. Mol. Biol. 1977, 112,

535-542.

(112) French, A. D.; Johnson, G. P.; Cramer, C. J.; Csonka, G. I.

Conformational analysis of cellobiose by electronic structure theories. Carbohydrate

research 2012, 350, 68-76.

(113) Williams, T.; Kelley, C. gnuplot 4.6 (2013). URL http://www. gnuplot.

info/documentation. html.


http://www/

119

(114) Martin, A.; Cheetham, J. C.; Rees, A. R. Modeling antibody hypervariable

loops: a combined algorithm. Proceedings of the National Academy of Sciences 1989, 86,

9268-9272.

(115) Martin, A. C. R.; Cheetham, J. C.; Rees, A. R. Molecular Modeling of

Antibody Combining Sites. Methods Enzymol. 1991, 203, 121-153.

(116) Wu, T. T.; Kabat, E. An analysis of the sequences of the variable regions

of Bence Jones proteins and myeloma light chains and their implications for antibody

complementarity. The Journal of experimental medicine 1970, 132, 211-250.

(117) Chothia, C.; Lesk, A. M. Canonical Structures for the Hypervariable

Regions of Immunoglobulins. J. Mol. Biol. 1987, 196, 901-917.

(118) Frisch, M.; Trucks, G.; Schlegel, H. B.; Scuseria, G.; Robb, M.;

Cheeseman, J.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. Gaussian 09,

Revision A. 02, Gaussian. Inc., Wallingford, CT 2009, 200.

(119) Hill, A. D.; Reilly, P. J. A Gibbs free energy correlation for automated

docking of carbohydrates. Journal of computational chemistry 2008, 29, 1131-1141.

(120) French, A. D.; Tran, V. H.; Pérez, S.: Conformational Analysis of a

Disaccharide (Cellobiose) with the Molecular Mechanics Program (MM2). In Computer

Modeling of Carbohydrate Molecules; French, A. D., Brady, J. W., Eds.; American

Chemical Society: Washington, DC, 1990; Vol. Symposium Series 430; pp 191-212.

120

(121) Lütteke, T.; von der Lieth, C.-W. Data Mining the PDB for Glyco-

Related Data. Methods in Molecular Biology, Glycomics: Methods and Protocols 2009,

534, 293-310.

(122) Le Guilloux, V.; Schmidtke, P.; Tuffery, P. Fpocket: An Open Source

Platform For Ligand Pocket Detection. BMC Bioinform. 2009, 10, 168.

(123) Chang, M. W.; Ayeni, C.; Breuer, S.; Torbett, B. E. Virtual Screening for

HIV Protease Inhibitors: A Comparison of AutoDock 4 and Vina. PLoS ONE 2010, 5, 9.

(124) Bohne, A.; Lang, E.; von der Lieth, C.-W. W3-SWEET: Carbohydrate

Modeling By Internet. Journal of Molecular Modeling 1998, 4, 33-43.

(125) Fadda, E.; Woods, R. J. Molecular Simulations of Carbohydrates and

Protein–carbohydrate Interactions: Motivation, Issues and Prospects. Drug Discov. Today

2010, 15, 596-609.

(126) Imberty, A. Oligosaccharide Structures: Theory Versus Experiment. Curr.

Opin. Struct. Biol. 1997, 7, 617-623.

(127) Pauling, L.: The Nature of the Chemical Bond; Cornell university press

Ithaca, NY, 1960; Vol. 3.

(128) Damm, W.; Frontera, A.; Tirado-Rives, J.; Jorgensen, W. L. OPLS All-

Atom Force Field for Carbohydrates. J. Comput. Chem. 1997, 18, 1955-1970.

121

(129) Halperin, I.; Ma, B.; Wolfson, H.; Nussinov, R. Principles of Docking:

An Overview of Search Algorithms and a Guide to Scoring Functions. Proteins:

Structure, Function and Genetics 2002, 47, 409-443.

(130) Schulz-Gasch, T.; Stahl, M. Scoring functions for protein–ligand

interactions: a critical perspective. Drug Discovery Today: Technologies 2004, 1, 231-

239.

(131) Kerzmann, A.; Neumann, D.; Kohlbacher, O. SLICK − Scoring and

Energy Functions for Protein−Carbohydrate Interactions. J. Chem. Inf. Model. 2006, 46,

1635–1642.

(132) Kerzmann, A.; Fuhrmann, J.; Kohlbacher, O.; Neumann, D.

BALLDock/SLICK: A new method for protein-carbohydrate docking. Journal of

Chemical Information and Modeling 2008, 48, 1616-1625.

(133) Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt,

D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera - A Visualization System for

Exploratory Research and Analysis. J. Comp. Chem. 2004, 25, 1605-1612.

(134) Humphrey, W.; Dalke, A.; Schulten, K. VMD - Visual Molecular

Dynamics. J. Molec. Graphics 1996, 14, 33-38.

(135) Cremer, D.; Pople, J. A. A General Definition of Ring Puckering

Coordinates. J. Am. Chem. Soc. 1975, 97, 1354-1358.

122

(136) Makeneni, S.; Foley, B. L.; Woods, R. J. BFMP: A Method for

Discretizing and Visualizing Pyranose Conformations. Journal of chemical information

and modeling 2014, 54, 2744-2750.

(137) Lütteke, T.; Bohne-Lang, A.; Loss, A.; Goetz, T.; Frank, M.; von der

Lieth, C.-W. GLYCOSCIENCES. de: an Internet portal to support glycomics and

glycobiology research. Glycobiology 2006, 16, 71R-81R.

(138) Sternberg, M. J. E.: Protein Structure Prediction: A Practical Approach,

1996.

(139) Schwede, T.: Computational Structural Biology: Methods and

Applications, 2008.

(140) Lawrence, S.; Feil, S.; Holien, J.; Kuiper, M.; Doughty, L.; Dolezal, O.;

Mulhern, T.; Tweten, R.; Parker, M. Manipulating the Lewis antigen specificity of the

cholesterol-dependent cytolysin lectinolysin. Frontiers in Immunology 2012, 3.

(141) Oyelaran, O.; Gildersleeve, J. C. Glycan Arrays: Recent Advances and

Future Challenges. Current Opinion in Chemical Biology 2009, 13, 406-413.

(142) Taylor, M. E.; Drickamer, K. Structural Insights into what Glycan Arrays

tell us About how Glycan-binding Proteins Interact with their Ligands. Glycobiology

2009, 19, 1155–1162.

(143) Muraki, M.; Morikawa, M.; Jigami, Y.; Tanaka, H. The roles of conserved

aromatic amino-acid residues in the active site of human lysozyme: a site-specific

123

mutagenesis study. Biochimica et Biophysica Acta (BBA) - Protein Structure and

Molecular Enzymology 1987, 916, 66-75.

(144) Luís, A. S.; Venditto, I.; Temple, M. J.; Rogowski, A.; Baslé, A.; Xue, J.;

Knox, J. P.; Prates, J. A.; Ferreira, L. M.; Fontes, C. M. Understanding how noncatalytic

carbohydrate binding modules can display specificity for xyloglucan. Journal of

Biological Chemistry 2013, 288, 4799-4809.

(145) Kerzmann, A.; Neumann, D.; Kohlbacher, O. SLICK - Scoring and

Energy Functions for Protein-Carbohydrate Interactions. J. Chem. Inf. Model. 2006, 46,

1635-1642.

(146) Kitago, Y.; Karita, S.; Watanabe, N.; Kamiya, M.; Aizawa, T.; Sakka, K.;

Tanaka, I. Crystal structure of Cel44A, a glycoside hydrolase family 44 endoglucanase

from Clostridium thermocellum. The Journal of biological chemistry 2007, 282, 35703-

35711.

(147) Tamres, M. Aromatic Compounds as Donor Molecules in Hydrogen

Bonding1. Journal of the American Chemical Society 1952, 74, 3375-3378.

(148) Reeves, L.; Schneider, W. Nuclear magnetic resonance measurements of

complexes of chloroform with aromatic molecules and olefins. Canadian Journal of

Chemistry 1957, 35, 251-261.

(149) Pimentel, G. C.; McClellan, A. L.: The Hydrogen Bond; W. H. Freeman

and Company: New York, 1960.

124

(150) Nishio, M. The CH/π hydrogen bond: Implication in chemistry. Journal of

Molecular Structure 2012, 1018, 2-7.

(151) Brandl, M.; Weiss, M. S.; Jabs, A.; Sühnel, J.; Hilgenfeld, R. C-h⋯π-

interactions in proteins. Journal of Molecular Biology 2001, 307, 357-377.

(152) Kitaura, K.; Morokuma, K. A new energy decomposition scheme for

molecular interactions within the Hartree‐Fock approximation. International Journal of

Quantum Chemistry 1976, 10, 325-340.

(153) Tsuzuki, S.; Honda, K.; Uchimaru, T.; Mikami, M.; Tanabe, K. The

Magnitude of the CH/π Interaction between Benzene and Some Model Hydrocarbons.

Journal of the American Chemical Society 2000, 122, 3746-3753.

(154) Laughrey, Z. R.; Kiehna, S. E.; Riemen, A. J.; Waters, M. L.

Carbohydrate−π Interactions: What Are They Worth? J. Am. Chem. Soc. 2008, 130,

14625–14633.

(155) Ringer, A. L.; Figgs, M. S.; Sinnokrot, M. O.; Sherrill, C. D. Aliphatic

C−H/π Interactions: Methane−Benzene, Methane−Phenol, and Methane−Indole

Complexes. The Journal of Physical Chemistry A 2006, 110, 10822-10828.

(156) Mohamed, M. N. A.; Watts, H. D.; Guo, J.; Catchmark, J. M.; Kubicki, J.

D. MP2, density functional theory, and molecular mechanical calculations of C–H···π

and hydrogen bond interactions in a cellulose-binding module–cellulose model system.

Carbohydrate Research 2010, 345, 1741-1751.

125

(157) Asensio, J. L.; Ardá, A.; Cañada, F. J.; Jiménez-Barbero, J. Carbohydrate–

Aromatic Interactions. Acc. Chem. Res. 2012, 46, 946-954.

(158) Muraki, M. The Importance of Ch / π Interactions to the Function of

Carbohydrate Binding Proteins. Protein and Peptide Letters 2002, 9, 195-209.

(159) Krapp, S.; Mimura, Y.; Jefferis, R.; Huber, R.; Sondermann, P. Structural

Analysis of Human IgG-Fc Glycoforms Reveals a Correlation Between Glycosylation

and Structural Integrity. J. Mol. Biol. 2003, 325, 979-989.

(160) Fernández-Alonso, M. d. C.; Cañada, F. J.; Jiménez-Barbero, J.; Cuevas,

G. Molecular Recognition of Saccharides by Proteins. Insights on the Origin of the

Carbohydrate−Aromatic Interactions. Journal of the American Chemical Society 2005,

127, 7379-7386.

(161) Thobhani, S.; Ember, B.; Siriwardena, A.; Boons, G.-J. Multivalency and

the Mode of Action of Bacterial Sialidases. J. Am. Chem. Soc. 2002, A-B.

(162) Kitov, P. I.; Bundle, D. R. On the nature of the multivalency effect: a

thermodynamic model. Journal of the American Chemical Society 2003, 125, 16271-

16284.

(163) Neumann, D.; Kohlbacher, O. In Tilte2009.

(164) Vyas, N. K.; Vyas, M. N.; Quiocho, F. A. Comparison of the Periplasmic

Receptors for L-Arabinose, D-Glucose/D-Galactose, and D-Ribose. J. Biol. Chem. 1991,

266, 5226-5237.

126

(165) Vardakou, M.; Flint, J.; Christakopoulos, P.; Lewis, R. J.; Gilbert, H. J.;

Murray, J. W. A family 10 Thermoascus aurantiacus xylanase utilizes arabinose

decorations of xylan as significant substrate specificity determinants. J Mol Biol 2005,

352, 1060-1067.

(166) Chávez, M. I.; Andreu, C.; Paloma, V.; Aboitiz, N.; Freire, F.; Groves, P.;

Asensio, J. L.; Asensio, G.; Muraki, M.; Cañada, F. J.; Jiménez-Barbero, J. On the

Importance of Carbohydrate-Aromatic Interactions for the Molecular Recognition of

Oligosaccharides by Proteins: NMR Studies of the Structure and Binding Affinity of

AcAMP2-like Peptides with Non-Natural Naphthyl and Fluoroaromatic Residues. Chem.

Eur. J. 2005, 11, 7060-7074.

(167) Wojciechowski, M.; Lesyng, B. Generalized Born Model: Analysis,

Refinement, and Applications to Proteins.

(168) Nivedha, A. K., Thieker F. David, Woods, R. J. Vina-Carb: Improving

Glycosidic Angles During Carbohydrate Docking. J. Chem. Theory Comput. 2015,

(Accepted).

(169) Plevin, M. J.; Bryce, D. L.; Boisbouvier, J. Direct detection of CH/π

interactions in proteins. Nat Chem 2010, 2, 466-471.

(170) Jeziorski, B.; Moszynski, R.; Szalewicz, K. Perturbation theory approach

to intermolecular potential energy surfaces of van der Waals complexes. Chem Rev 1994,

94, 1887-1930.

127

9. APPENDIX

Supplementary Information Chapter 4

S4.1. Editing Autodock Vina’s source code to give 100 docked poses

ADV’s source code was downloaded, and the variable par.mc.num_saved_mins in

main_procedure (in the file main.cpp) was set to 100, and the program was re-compiled.

S4.2. AD3 parameters

a) Docking parameters

outlev 1 # diagnostic output level

seed pid time # seeds for random generator

types COH # atom type names

move 1MFA_ligand.pdbq # small molecule

about 0.003 0.036 -0.094 # small molecule center

tran0 random # initial coordinates/A or random

quat0 random # initial quaternion

ndihe 15 # number of active torsions

dihe0 random # initial dihedrals (relative) or random

tstep 2.0 # translation step/A

qstep 50.0 # quaternion step/deg

dstep 50.0 # torsion step/deg

torsdof 7 0.3113 # torsional degrees of freedom and coeffiecent

intnbp_r_eps 4.00 0.0222750 12 6 # C-C lj

intnbp_r_eps 3.60 0.0257202 12 6 # C-O lj

intnbp_r_eps 3.00 0.0081378 12 6 # C-H lj

intnbp_r_eps 3.20 0.0297000 12 6 # O-O lj

intnbp_r_eps 2.60 0.0093852 12 6 # O-H lj

intnbp_r_eps 2.00 0.0029700 12 6 # H-H lj

rmstol 2.0 # cluster_tolerance/A

extnrg 1000.0 # external grid energy

e0max 0.0 10000 # max initial energy; max number of retries

ga_pop_size 200 # number of individuals in population

ga_num_evals 800000 # maximum number of energy evaluations

ga_num_generations 27000 # maximum number of generations

ga_elitism 1 # number of top individuals to survive to next

generation

ga_mutation_rate 0.02 # rate of gene mutation

ga_crossover_rate 0.8 # rate of crossover

ga_window_size 10 #

ga_cauchy_alpha 0.0 # Alpha parameter of Cauchy distribution

ga_cauchy_beta 1.0 # Beta parameter Cauchy distribution

set_ga # set the above parameters for GA or LGA

sw_max_its 300 # iterations of Solis & Wets local search

128

sw_max_succ 4 # consecutive successes before changing rho

sw_max_fail 4 # consecutive failures before changing rho

sw_rho 1.0 # size of local search space to sample

sw_lb_rho 0.01 # lower bound on rho

ls_search_freq 0.06 # probability of performing local search on

individual

set_sw1 # set the above Solis & Wets parameters

ga_run 100 # do this many hybrid GA-LS runs

analysis # perform a ranked cluster analysis

b) Grid parameters npts 70 70 100 # num.grid points in xyz

spacing 0.375 # spacing(A)

gridcenter 0.0 0.0 11.0 # xyz-coordinates or auto

smooth 0.5 # store minimum energy w/in rad(A)

dielectric -0.1465 # <0, AD4 distance-dep.diel;>0, constant

S4.3. AD4.2 parameters

a) Docking parameters

autodock_parameter_version 4.2 # used by autodock to validate parameter set

outlev 1 # diagnostic output level

intelec # calculate internal electrostatics

seed pid time # seeds for random generator

ligand_types C HD OA # atoms types in ligand

move 1MFD_ligand.pdbqt # small molecule

about 0.045 -0.063 -0.041 # small molecule center

tran0 random # initial coordinates/A or random

axisangle0 random # initial orientation

dihe0 random # initial dihedrals (relative) or random

tstep 2.0 # translation step/A

qstep 50.0 # quaternion step/deg

dstep 50.0 # torsion step/deg

torsdof 15 # torsional degrees of freedom

rmstol 2.0 # cluster_tolerance/A

extnrg 1000.0 # external grid energy

e0max 0.0 10000 # max initial energy; max number of retries

ga_pop_size 200 # number of individuals in population

ga_num_evals 800000 # maximum number of energy evaluations

ga_num_generations 27000 # maximum number of generations

ga_elitism 1 # number of top individuals to survive to next

generation

ga_mutation_rate 0.02 # rate of gene mutation

ga_crossover_rate 0.8 # rate of crossover

ga_window_size 10 #

ga_cauchy_alpha 0.0 # Alpha parameter of Cauchy distribution

ga_cauchy_beta 1.0 # Beta parameter Cauchy distribution

set_ga # set the above parameters for GA or LGA

sw_max_its 300 # iterations of Solis & Wets local search

sw_max_succ 4 # consecutive successes before changing rho

sw_max_fail 4 # consecutive failures before changing rho

sw_rho 1.0 # size of local search space to sample

sw_lb_rho 0.01 # lower bound on rho

ls_search_freq 0.06 # probability of performing local search on

individual

set_psw1 # set the above pseudo-Solis & Wets parameters

unbound_model bound # state of unbound ligand

ga_run 100 # do this many hybrid GA-LS runs

write_all # write all conformations in a cluster

analysis # perform a ranked cluster analysis

129

b) Grid parameters

npts 70 70 100 # num.grid points in xyz

spacing 0.375 # spacing(A)

gridcenter 0.0 0.0 11.0 # xyz-coordinates or auto

smooth 0.5 # store minimum energy w/in rad(A)

dielectric -0.1146 # <0, distance-dep.diel;>0, constant

S4.4. ADV Docking parameters

center_x = 0

center_y = 0

center_z = 11

size_x = 26.25

size_y = 26.25

size_z = 37.5

energy_range = 10

num_modes = 100

cpu = 8

S4.5. Comparison of the glycosidic torsion angles in the crystal carbohydrate

ligands to those in the Glycam ligands and their corresponding CHI energy

scores.

Syste

m

Torsion

angle

Experimental Glycam

Disaccharide Unit Torsion

angle

CHI

energy

Torsion

angle

CHI

energy

1MF

A

φ 71.5 0.0 60.0 0.4 DAbepα1-3DManpα-OMe

77.4 0.1 60.0 0.4 DGalpα1-2DManpα-OMe

ψ -135.1 0.1 -120.2 0.3 DAbepα1-3DManpα-OMe

-94.7 0.0 -118.1 0.3 DGalpα1-2DManpα-OMe

1MF

D

φ 76.1 0.0 60.0 0.4 DAbepα1-3DManpα-OMe

103.7 1.4 60.0 0.4 DGalpα1-2DManpα-OMe

ψ -139.4 0.1 -120.2 0.3 DAbepα1-3DManpα-OMe

-151.6 0.2 -118.1 0.3 DGalpα1-2DManpα-OMe

1UZ8

φ -66.7 0.0 -65.4 0.0 DGalpβ1-4DGlcpNAcβ-OMe

-82.7 0.2 -65.5 0.2 LFucpα1-3DGlcpNAcβ-OMe

ψ 128.0 0.3 125.1 0.3 DGalpβ1-4DGlcpNAcβ-OMe

-99.4 0.1 -101.1 0.1 LFucpα1-3DGlcpNAcβ-OMe

1M7

D φ -78.3 0.1 -60.0 0.4

(2-deoxy)LRhapα1-

3DGlcpNAcβ-OMe

130

-63.2 0.3 -60.0 0.4 LRhapα1-3DGlcpNAcβ-OMe

ψ -114.1 0.3 -120.2 0.3

(2-deoxy)LRhapα1-

3DGlcpNAcβ-OMe

111.2 0.2 120.1 0.3 LRhapα1-3DGlcpNAcβ-OMe

1S3K

φ

-77.8 0.1 -66.9 0.1 LFucpα1-3DGlcpNAcα-OH

-77.7 0.3 -66.4 0.0 DGalpβ1-4DGlcpNAcα-OH

-78.1 0.1 -69.4 0.1 LFucpα1-2DGalpβ

ψ

-103.4 0.1 -99.4 0.1 LFucpα1-3DGlcpNAcα-OH

139.3 0.3 127.8 0.3 DGalpβ1-4DGlcpNAcα-OH

140.4 0.3 125.5 0.3 LFucpα1-2DGalpβ

1M7I

φ

-90.2 1.0 -62.4 0.1 LRhapα1-3DGlcpNAcβ

-115.3 2.3 -69.5 0.1 LRhapα1-2LRhapα

-81.7 0.1 -69.0 0.1 LRhapα1-3LRhapα

-63.5 0.3 -65.9 0.2 LRhapα1-2LRhapα

ψ

53.4 0.9 112.4 0.3 LRhapα1-2LRhapα

-90.9 0.0 -114.8 0.3 LRhapα1-2LRhapα

121.0 0.3 114.7 0.3 LRhapα1-3LRhapα

149.3 0.1 112.3 0.3 LRhapα1-2LRhapα

Torsion angles are given in degrees; CHI energy scores are given in kcal/mol.

131

S4.6. CHI Energy Functions

Equation for the φ angle in α-linkages:

E(φ) =

2.977𝑒−

(φ+1.9949∗ 102)2

6.7781∗102 + 1.0225 ∗ 102𝑒−

(φ−1.706∗ 102)2

1.6968 ∗ 103 + 1.0745 ∗

101𝑒−

(φ+1.0531∗102)2

4.7246∗ 103 + 3.6735𝑒

−(φ−6.2012)2

1.3477 ∗ 103 + 2.061𝑒−

(φ−9.1655 ∗ 101)2

1.5 ∗ 103 +

6.1939𝑒−

(φ+2.2979∗ 101)2

2.1223 ∗ 103 − 2.1115𝑒−

(φ−8.3602∗ 101)2

1.2541∗ 103 − 9.8001 ∗ 101𝑒−

(φ−1.7001∗ 102)2

1.5987∗ 103

Equation for the φ angle in β-linkages:

E(φ) =

4. 5054 ∗ 102𝑒−

(φ+3.3077∗ 102)2

4.4498∗ 103 + 2.3712 ∗ 101𝑒−

(φ−3.0463∗ 102)2

8.3752 ∗ 103 +

5.9353𝑒−

(φ+1.5208 ∗ 102)2

6.0498 ∗ 103 + 2.2467 ∗ 101𝑒−

(φ+2.3516 ∗ 101)2

6.0690 ∗ 102 + 1.0036 ∗

101𝑒−

(φ−1.2096 ∗ 102)2

4.038∗ 103 − 1.8141 ∗ 101𝑒−

(φ+2.4268 ∗ 101)2

5.4305 ∗ 102 + 5.8823𝑒−

(φ−1.9632 ∗ 101)2

8.9793 ∗ 102 −

2.1283

Equation for the ψ angle in 1-2ax, 1-4ax and 1-3eq linkages:

132

E(ψ) = 4.6237𝑒−

(ψ−5.0456)2

5.0058 ∗ 103 + 4.6139𝑒−

(ψ−3.6249 ∗ 102)2

2.0906 ∗ 103 + 4.9419𝑒−

(ψ−1.212 ∗ 102)2

2.0938 ∗ 103 +

4.029 ∗ 10−1𝑒−

(ψ−2.4143 ∗ 102)2

4.5683∗ 102 + 7.9888 ∗ 10−1𝑒−

(ψ−6.8425∗ 101)2

6.7881 ∗ 102 + 2.2299 ∗

10−1𝑒−

(ψ−1.9293∗ 102)2

3.4725∗ 102 − 1.2565 ∗ 10−1

Equation for the ψ angle in 1-2eq, 1-4eq and 1-3ax linkages:

E(ψ) =

4.4681𝑒−

(ψ−10−30)2

1.2796 ∗ 103 + 4.382𝑒−

(ψ−3.5777 ∗ 102)2

6.0501∗ 103 + 2.8495 ∗ 102𝑒−

(ψ−1.4664 ∗ 102)2

1.5518 ∗ 103 +

4.7613𝑒−

(ψ−2.2068 ∗ 102)2

5.8929 ∗ 103 − 1.692 ∗ 102𝑒−

(ψ−1.4737 ∗ 102)2

1.7425 ∗ 103 − 1.1844 ∗

102𝑒−

(ψ−1.4606 ∗ 102)2

1.3598 ∗ 103 + 1.0220

133

S4.7. Plots showing agreement between the quantum mechanical data points

(black dots) and CHI energy curves (grey lines)

Root mean squared deviations (RMSDs) were calculated between the quantum

mechanical data points and corresponding data points on the CHI energy curve.

S4.8. GlyTorsion analysis

The various searches performed using the web-tool are tabulated below. S1 refers to the

non-reducing sugar residue, while S2 refers to the reducing sugar residue.

GlyTorsion searches performed for Figure 4.6 (main text):

Figure 8a Figure 8b

S. No. S1 linkage S2 S. No. S1 linkage S2

1 a-D-* 1-2 *-Manp* 1 b-D-* 1-2 *-Manp*

2 a-D-* 1-2 *-Galp* 2 b-D-* 1-2 *-Galp*

3 a-D-* 1-2 *-Glcp* 3 b-D-* 1-2 *-Glcp*

4 a-L-* 1-2 *-Manp* 4 b-L-* 1-2 *-Manp*

5 a-L-* 1-2 *-Galp* 5 b-L-* 1-2 *-Galp*

6 a-L-* 1-2 *-Glcp* 6 b-L-* 1-2 *-Glcp*

7 a-D-* 1-4 *-Manp* 7 b-D-* 1-4 *-Manp*

8 a-D-* 1-4 *-Galp* 8 b-D-* 1-4 *-Galp*

9 a-D-* 1-4 *-Glcp* 9 b-D-* 1-4 *-Glcp*

0

2

4

6

8

10

12

0 60 120 180 240 300 360

ΔE

[kca

l/m

ol]

φ [deg]

RMSD: 0.16

0

2

4

6

8

10

12

0 60 120 180 240 300 360

ΔE

[k

cal/

mol]

φ [deg]

RMSD: 0.02

0

1

2

3

4

5

6

7

8

0 60 120 180 240 300 360

ΔE

[k

cal/

mol]

ψ [deg]

RMSD: 0.02

0

1

2

3

4

5

6

7

8

0 60 120 180 240 300 360

ΔE

[k

cal/

mol]

ψ [deg]

RMSD: 0.04

134

10 a-L-* 1-4 *-Manp* 10 b-L-* 1-4 *-Manp*

11 a-L-* 1-4 *-Galp* 11 b-L-* 1-4 *-Galp*

12 a-L-* 1-4 *-Glcp* 12 b-L-* 1-4 *-Glcp*

13 a-D-* 1-3 *-Manp* 13 b-D-* 1-3 *-Manp*

14 a-D-* 1-3 *-Galp* 14 b-D-* 1-3 *-Galp*

15 a-D-* 1-3 *-Glcp* 15 b-D-* 1-3 *-Glcp*

16 a-L-* 1-3 *-Manp* 16 b-L-* 1-3 *-Manp*

17 a-L-* 1-3 *-Galp* 17 b-L-* 1-3 *-Galp*

18 a-L-* 1-3 *-Glcp* 18 b-L-* 1-3 *-Glcp*

Figure 8c Figure 8d

S. No. S1 linkage S2 S. No. S1 linkage S2

1 *-Manp* 1-2 *-D-Manp* 1 *-Manp* 1-2 *-D-Glcp*

2 *-Galp* 1-2 *-D-Manp* 2 *-Galp* 1-2 *-D-Glcp*

3 *-Glcp* 1-2 *-D-Manp* 3 *-Glcp* 1-2 *-D-Glcp*

4 *-Manp* 1-2 *-L-Manp* 4 *-Manp* 1-2 *-L-Glcp*

5 *-Galp* 1-2 *-L-Manp* 5 *-Galp* 1-2 *-L-Glcp*

6 *-Glcp* 1-2 *-L-Manp* 6 *-Glcp* 1-2 *-L-Glcp*

7 *-Manp* 1-3 *-D-Glcp* 7 *-Manp* 1-2 *-D-Galp*

8 *-Galp* 1-3 *-D-Glcp* 8 *-Galp* 1-2 *-D-Galp*

9 *-Glcp* 1-3 *-D-Glcp* 9 *-Glcp* 1-2 *-D-Galp*

10 *-Manp* 1-3 *-L-Glcp* 10 *-Manp* 1-2 *-L-Galp*

11 *-Galp* 1-3 *-L-Glcp* 11 *-Galp* 1-2 *-L-Galp*

12 *-Glcp* 1-3 *-L-Glcp* 12 *-Glcp* 1-2 *-L-Galp*

13 *-Manp* 1-3 *-D-Manp* 13 *-Manp* 1-4 *-D-Glcp*

14 *-Galp* 1-3 *-D-Manp* 14 *-Galp* 1-4 *-D-Glcp*

15 *-Glcp* 1-3 *-D-Manp* 15 *-Glcp* 1-4 *-D-Glcp*

16 *-Manp* 1-3 *-L-Manp* 16 *-Manp* 1-4 *-L-Glcp*

17 *-Galp* 1-3 *-L-Manp* 17 *-Galp* 1-4 *-L-Glcp*

18 *-Glcp* 1-3 *-L-Manp* 18 *-Glcp* 1-4 *-L-Glcp*

19 *-Manp* 1-3 *-D-Galp* 19 *-Manp* 1-4 *-D-Galp*

20 *-Galp* 1-3 *-D-Galp* 20 *-Galp* 1-4 *-D-Galp*

21 *-Glcp* 1-3 *-D-Galp* 21 *-Glcp* 1-4 *-D-Galp*

22 *-Manp* 1-3 *-L-Galp* 22 *-Manp* 1-4 *-L-Galp*

23 *-Galp* 1-3 *-L-Galp* 23 *-Galp* 1-4 *-L-Galp*

24 *-Glcp* 1-3 *-L-Galp* 24 *-Glcp* 1-4 *-L-Galp*

25 *-Manp* 1-4 *-D-Manp*

26 *-Galp* 1-4 *-D-Manp*

27 *-Glcp* 1-4 *-D-Manp*

28 *-Manp* 1-4 *-L-Manp*

29 *-Galp* 1-4 *-L-Manp*

135

30 *-Glcp* 1-4 *-L-Manp*

In GlyTorsion, the ψ torsion angle is defined with respect to the Cx+1 atom of the

reducing sugar, but since the CHI energy functions can only be applied to ψ torsion

angles defined w.r.t. the Cx-1 atom, the ψ angle values from the web-tool were used to

obtain the same torsion angle values defined w.r.t. the Cx-1 atom, by adding or subtracting

120° to the value depending on the D/L configuration of the reducing sugar.

S4.9. CHI energy score in kcal/mol of top-ranked poses, before and after

rescoring

System

AD3 AD4.2 ADV

Before

rescoring

After

rescoring

Before

rescoring

After

rescoring

Before

rescoring

After

rescoring

1MFA 1.31 1.31 1.48 1.48 2.84 1.73

1MFD 1.29 1.29 4.42 1.84 4.21 2.51

1UZ8 1.29 0.29 0.65 0.65 0.76 0.76

1M7D 3.14 1.19 11.60 0.93 1.11 1.11

1S3K 7.02 0.99 1.26 1.26 1.67 1.67

1M7I 7.72 2.33 18.45 4.31 10.20 2.31

S4.10. Rank of lowest PRMSD structure, before and after rescoring

S4.11. tLEaP input files to assemble ligands with deoxy sugars

System

AD3 AD4.2 ADV

Before

rescoring

After

rescoring

Before

rescoring

After

rescoring

Before

rescoring

After

rescoring

1MFA 55 67 10 14 1 1

1MFD 18 9 13 2 2 1

1UZ8 4 3 4 2 1 1

1M7D 10 4 7 2 1 1

1S3K 2 1 5 3 1 1

1M7I 1 3 31 50 1 1

136

a) Assembling the 1MFA/1MFD ligand

# ----- leaprc for loading the Glycam_04 force field

addPdbResMap {{ 0 "OLS" "NOLS" } { 1 "OLS" "COLS" } { 0 "OLT" "NOLT" } { 1 "OLT" "COLT" }

{ 0 "OLP" "NOLP" } { 1 "OLP" "COLP" } { 0 "HYP" "NHYP" } { 1 "HYP" "CHYP" }}

# load atom type hybridizations

addAtomTypes {{ "C" "C" "sp2" } { "CG" "C" "sp3" } { "CY" "C" "sp3" } { "H" "H"

"sp3" } { "H1" "H" "sp3" } { "H2" "H" "sp3" } { "HC" "H" "sp3" } { "HO" "H" "sp3" } {

"HW" "H" "sp3" } { "N" "N" "sp2" } { "OH" "O" "sp3" } { "OS" "O" "sp3" } { "O" "O"

"sp2" } { "O2" "O" "sp2" } { "OW" "O" "sp3" } {"OY" "O" "sp3" } { "S" "S" "sp3" }}

# load the main paramter set

parm94 = loadamberparams /usr/local/programs/amber10/dat/leap/parm/parm94.dat

glycam_06 = loadamberparams /usr/local/programs/glycam06/parameters/Glycam_06_current.dat

# load all prep files for polysaccharides

loadamberprep /usr/local/programs/glycam06/prep_files/Glycam_06_current.prep

# load lib files

loadOff solvents.lib

loadOff ions94.lib

amber_seq = sequence { OME ZMA }

set amber_seq tail amber_seq.2.O3

amber_seq=sequence { amber_seq 0AE }

set amber_seq tail amber_seq.2.O2

amber_seq=sequence { amber_seq 0LA }

impose amber_seq {3 2} { {H1 C1 O3 C3 -60.0} }

impose amber_seq {3 2} { {C1 O3 C3 H3 0.0} }

137

impose amber_seq {4 2} { {H1 C1 O2 C2 -60.0} }


charge amber_seq

savepdb amber_seq salmonella_glycam.pdb

b) Assembling the 1M7D ligand

# ----- leaprc for loading the Glycam_04 force field

addPdbResMap {{ 0 "OLS" "NOLS" } { 1 "OLS" "COLS" } { 0 "OLT" "NOLT" } { 1 "OLT" "COLT" }

{ 0 "OLP" "NOLP" } { 1 "OLP" "COLP" } { 0 "HYP" "NHYP" } { 1 "HYP" "CHYP" }}

# load atom type hybridizations

addAtomTypes {{ "C" "C" "sp2" } { "CG" "C" "sp3" } { "CY" "C" "sp3" } { "H" "H"

"sp3" } { "H1" "H" "sp3" } { "H2" "H" "sp3" } { "HC" "H" "sp3" } { "HO" "H" "sp3" } {

"HW" "H" "sp3" } { "N" "N" "sp2" } { "OH" "O" "sp3" } { "OS" "O" "sp3" } { "O" "O"

"sp2" } { "O2" "O" "sp2" } { "OW" "O" "sp3" } { "OY" "O" "sp3" } { "S" "S" "sp3" }}

# load the main paramter set

parm94 = loadamberparams /usr/local/programs/amber10/dat/leap/parm/parm94.dat

glycam_06 = loadamberparams /usr/local/programs/glycam06/parameters/Glycam_06_current.dat

# load all prep files for polysaccharides

loadamberprep /usr/local/programs/glycam06/prep_files/Glycam_06_current.prep

amber_seq = sequence { OME 3YB 2DR 0hA }

impose amber_seq {3 2} { {H1 C1 O3 C3 60.0} }


impose amber_seq {4 3} { {H1 C1 O3 C3 60.0} }


charge amber_seq

savepdb amber_seq 1M7D_glycam.pdb

138


S5.1. A comparison of PRMSDmin(5) poses obtained using ADV and VC1

amd VC2 at all 5 CHI-cutoff values (1 to 5) is shown in 1a. The corresponding

standard deviation values are depicted in 1b.

a.)

PDB

1OP

3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K

1CL

Y

1CL

Z 1MFC 1M7I 1MFB 3BZ4

3C

6S

Ab

2G1

2

291-2G3-

A

SYA/J

6

scFv SE155-

4

Fab SE155-

4

SE155-

4

HU3S19

3

BR9

6

BR9

6

SE155-

4

SYA/J

6

SE155

-4 F22-4

F2

2-

4

AD

V 0.27 0.19 0.26 1.11 0.52 0.59 0.30 0.45 0.68 1.01 1.31 3.1 3.5 6.9

VC1|

0

0.25

4 0.18 0.281 0.947 0.39 0.986 0.293 0.54 0.43 1.76 0.748 1.1 1.1 2.8

VC1|

1 0.33 0.19 0.26 1.01 0.52 0.59 0.30 0.45 0.66 1.78 0.71 1.5 1.3 1.5

VC1|

2 0.21 0.18 0.26 1.00 0.52 0.59 0.30 0.45 0.68 1.72 0.54 1.4 1.7 2.4

VC1|

3 0.33 0.18 0.26 1.12 0.52 0.59 0.30 0.45 0.68 1.70 0.54 1.3 2.7 5.8

VC1|

4 0.21 0.19 0.26 1.23 0.52 0.59 0.30 0.45 0.68 1.04 0.69 1.8 3.2 5.7

VC1|

5 0.39 0.19 0.26 1.10 0.52 0.59 0.29 0.44 0.68 1.01 0.72 2.3 3.7 5.7

VC2|

0 0.32 0.18 0.51 0.96 0.37 0.89 0.19 0.65 0.59 1.88 0.80 1.3 1.2 3.6

VC2|

1 0.21 0.19 0.26 0.96 0.52 0.59 0.30 0.45 0.65 1.80 0.64 1.5 1.3 1.3

VC2|

2 0.21 0.19 0.26 1.12 0.52 0.59 0.30 0.45 0.68 1.72 0.54 1.4 1.6 1.9

VC2|

3 0.21 0.19 0.26 1.15 0.52 0.59 0.30 0.44 0.68 1.76 0.54 1.5 2.7 7.0

VC2|

4 0.33 0.19 0.26 1.16 0.52 0.59 0.30 0.46 0.69 0.97 0.76 1.9 3.3 5.8

VC2|

5 0.21 0.19 0.26 1.18 0.52 0.59 0.30 0.45 0.68 1.00 0.73 2.4 3.8 5.6

Those highlighted in green are less than 2.0 Å

b.)

PDB

1OP

3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K

1CL

Y

1CL

Z 1MFC 1M7I 1MFB

3BZ

4

3C6

S

Ab

2G1

2

291-2G3-

A

SYA/J

6

scFv SE155-

4

Fab SE155-

4

SE155-

4

HU3S19

3

BR9

6

BR9

6

SE155-

4

SYA/J

6

SE155-

4

F22-

4

F22-

4

ADV 0.2 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.5 1.2

VC1|0 0.0 0.0 0.0 0.1 0.0 0.3 0.0 0.0 0.0 0.2 0.1 0.1 0.2 0.9

VC1|1 0.2 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.2 0.1 0.1 0.1 0.7

VC1|2 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.5 0.6 1.5

VC1|3 0.2 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.4 0.9 0.3

VC1|4 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.4 0.3 0.4 0.2 0.6

VC1|

5 0.3 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.5 0.2 0.1

VC2|0 0.0 0.0 0.4 0.0 0.0 0.3 0.0 0.1 0.1 0.1 0.0 0.3 0.1 1.0

VC2|1 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.0

VC2|2 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.4 0.5 1.3

VC2|3 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.3 0.9 2.3

VC2|4 0.2 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.4 0.4 0.4 0.4 0.7

VC2|5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.5 0.1 0.3

139

S5.2. A list of all protein-carbohydrate crystal structures employed in the

study. The carbohydrate sequences have been obtained using the pdbcare tool

in glycosciences.de.

S.No. PDB ID LINUCS

1 1hql a-D-Galp-(1-3)-b-D-Galp-(1-1)-methyl

2 1lte b-D-Galp-(1-4)-b-D-Glcp

3 1niv a-D-Manp-(1-3)-a-D-Manp-(1-1)-methyl

4 1qos a-D-GlcpNAc-(1-4)-b-D-GlcpNAc

5 1slt b-D-Galp-(1-4)-a-D-GlcpNAc

6 2aai b-D-Galp-(1-4)-b-D-Glcp

7 2ovu a-D-Manp-(1-2)-a-D-Manp-(1-1)-methyl

8 2pel b-D-Galp-(1-4)-a-D-Glcp

b-D-Galp-(1-4)-b-D-Glcp

9 3o0x a-D-Glcp-(1-3)-a-D-Manp-(1-2)-a-D-Manp-(1-2)-a-D-Manp

10 4g1r a-D-Manp-(1-2)-a-D-Manp

11 4g1s a-D-Manp-(1-2)-a-D-Manp

12 1jpc a-D-3,6-deoxy-Manp

a-D-Manp-(1-6)+

|

a-D-Manp

|

a-D-Manp-(1-3)+

13 1qot a-L-Fucp-(1-2)-b-D-Galp

a-L-Fucp-(1-2)-b-D-Galp-(1-4)-b-D-Glcp

14 1sl6 b-D-Galp-(1-4)+

|

a-D-GlcpNAc

|

140

a-L-Fucp-(1-3)+

15 2auy b-D-GlcpNAc-(1-2)-a-D-Manp-(1-3)-a-D-Manp-(1-1)-methyl

16 2bos a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp-(1-1)-butyl

a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp

a-D-Galp-(1-4)-b-D-Galp

17 2e6v a-D-Manp-(1-2)-a-D-Manp-(1-3)-b-D-Manp

18 2eal a-D-GalpNAc-(1-3)-b-D-GalpNAc-(1-3)-b-D-Galp

19 2g7c a-D-Galp-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc

20 2vxj a-D-Galp-(1-3)-b-D-Galp-(1-4)-b-D-Glcp

a-D-Galp-(1-3)-b-D-Galp-(1-4)-a-D-Glcp

a-D-Galp-(1-4)-D-1-deoxy-Galp

21 3ef2 a-D-Galp-(1-3)+

|

b-D-Galp

|

a-L-Fucp-(1-2)+

a-D-Galp-(1-3)+

|

a-D-Galp

|

a-L-Fucp-(1-2)+

22 1gsl a-L-Fucp-(1-2)-b-D-Galp-(1-4)+

|

b-D-GlcpNAc-(1-1)-methyl

|

a-L-Fucp-(1-3)+

23 1j8r b-D-GalpNAc-(1-3)-a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp

24 1led a-L-Fucp-(1-4)+

|


|

a-L-Fucp-(1-2)-b-D-Galp-(1-3)+

141

25 1ulf a-D-GalpNAc-(1-3)+

|


|

a-L-Fucp-(1-2)+

26 1w8f b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)+

|

b-D-Glcp

|

a-L-Fucp-(1-3)+

b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)+

|

b-D-Glcp

|

a-L-Fucp-(1-3)+

b-D-Galp-(1-4)+

|

b-D-Glcp

|

a-L-Fucp-(1-3)+

27 2zhk b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc

28 3lek a-L-Fucp-(1-4)+

|

a-D-GlcpNAc

|

a-L-Fucp-(1-2)-b-D-Galp-(1-3)+

29 3o0w a-D-Glcp-(1-3)-a-D-Manp-(1-2)-a-D-Manp-(1-2)-a-D-Manp

30 3wg3 a-D-GalpNAc-(1-3)+

|

b-D-Galp-(1-4)-b-D-GlcpNAc

|

a-L-Fucp-(1-2)+

143

a-L-Fucp-(1-3)+

38 2vco a-D-Manp-(1-6)+

|


|

a-D-Manp-(1-3)+

39 4gk9 a-D-Manp-(1-6)+

|

a-D-Manp-(1-6)+

| |

a-D-Manp-(1-3)+ b-D-Manp

|

a-D-Manp-(1-3)+

40 2wt2

b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-

D-GlcpNAc

41 2zhm

b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-

D-GlcpNAc

b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc

42 2vuz b-D-GlcpNAc-(1-2)-a-D-Manp-(1-6)+

|


|


43 2ygm a-D-Galp-(1-3)+

|

b-D-Galp-(1-4)-b-D-GlcpNAc

|

a-L-Fucp-(1-2)+

44 1j84 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

45 2i74 a-D-Manp-(1-6)+

|

a-D-Manp-(1-6)-a-D-Manp

144

|

a-D-Manp-(1-3)+

46 2j1t a-L-Fucp-(1-2)-b-D-Galp-(1-4)+

|

b-D-GlcpNAc

|

a-L-Fucp-(1-3)+

47 2j1u a-D-GalpNAc-(1-3)+

|


|

a-L-Fucp-(1-2)+

48 2j72 a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp

49 2j73 a-D-Glcp-(1-6)-a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp

a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp

50 3ach b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

51 1gu3 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

52 1of4 b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp

53 1uxx b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp

54 3aci b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

55 2y6l b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp

56 2yfz b-D-Galp-(1-2)-a-D-Xylp-(1-6)+

|

b-D-Glcp-(1-4)-b-D-Glcp

|

b-D-Glcp-(1-4)+

57 2zex b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

58 1gwl

b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-

Manp

59 1gwm b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-a-D-Glcp

60 1oh3 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-a-D-Glcp

61 1oh4 a-D-Galp-(1-6)+

145

|

a-D-Galp-(1-6)+ b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp

| |

b-D-Manp-(1-4)+

|

b-D-Manp-(1-4)+

62 2ypj a-D-Xylp-(1-6)+

|

a-D-Xylp-(1-6)+ b-D-Glcp-(1-4)-b-D-Glcp

| |

b-D-Glcp-(1-4)+

|

a-D-Xylp-(1-6)-b-D-Glcp-(1-4)+

63 2eo7 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Manp

64 2eex b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

65 2ej1 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

66 1mfa a-D-3-deoxy-Fucp-(1-3)+

|

a-D-Manp-(1-1)-methyl

|

a-D-Galp-(1-2)+

67 1mfd a-D-3-deoxy-Fucp-(1-3)+

|

a-D-Manp-(1-1)-methyl

|

a-D-Galp-(1-2)+

68 1mfc a-D-3-deoxy-Fucp-(1-3)+

|

a-D-Manp-(1-4)-a-L-Rhap

|

a-D-Galp-(1-2)+

69 1mfb a-D-3-deoxy-Fucp-(1-3)+

146

|

a-D-Manp-(1-4)-a-L-Rhap

|

a-D-Galp-(1-2)-a-D-Manp-(1-4)-a-L-Rhap-(1-3)-a-D-Galp-(1-2)+

70 1mfe a-D-3-deoxy-Fucp-(1-3)+

|

a-D-Manp

|

a-D-Galp-(1-2)+

71 1s3k a-L-Fucp-(1-2)-b-D-Galp-(1-4)+

|

a-D-GlcpNAc

|

a-L-Fucp-(1-3)+

72 1uz8 b-D-Galp-(1-4)+

|


|

a-L-Fucp-(1-3)+

73 1m7d a-L-Rhap-(1-3)-a-L-2,6-deoxy-Glcp-(1-3)-b-D-GlcpNAc-(1-1)-methyl

74 1m7i a-L-Rhap-(1-2)-a-L-Rhap-(1-3)-a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-2)-a-L-Rhap-(1-1)-methyl

75 3bz4 a-D-Glcp-(1-4)+

|

a-D-Glcp-(1-4)+ a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-

1)-methyl

| |

a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-2)-a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+

|

a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+

76 3c6s a-D-Glcp-(1-4)+

|

a-D-Glcp-(1-4)+ a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-

147

2)-L-1-deoxy-Rhap

| |

a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-2)-a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+

|

a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+

77 1cly a-L-Fucp-(1-2)-b-D-Galp-(1-4)+

|

b-D-GlcpNAc-(1-1)-<C10O2>

|

a-L-Fucp-(1-3)+

78 1clz a-L-Fucp-(1-2)-b-D-Galp-(1-4)+

|

b-D-GlcpNAc-(1-1)-<C10O2>

|

a-L-Fucp-(1-3)+

79 1op3 a-D-Manp-(1-2)-a-D-Manp

80 2eqd

b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-

4)-b-D-Glcp-(1-4)-b-D-Glcp

Systems which failed the positive control test.

1 1hlc b-D-Galp-(1-4)-b-D-Glcp

2 2dur a-D-Manp-(1-2)-a-D-Manp

3 3a0e a-D-Manp-(1-3)-a-D-Manp

4 2zx4 a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp

5 1sl4 a-D-Manp-(1-6)+

|

a-D-Manp-(1-6)-a-D-Manp

|

a-D-Manp-(1-3)+

6 2zl6 a-L-Fucp-(1-2)-b-D-Galp-(1-3)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-Glcp

7 1sl5 b-D-Galp-(1-4)+

|

b-D-GlcpNAc-(1-3)-b-D-Galp

148

|

a-L-Fucp-(1-3)+

8 4afd b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

9 2j1v a-L-Fucp-(1-2)-b-D-Galp-(1-4)-b-D-GlcpNAc

10 2jcq

b-D-GlcpNAc-(1-4)-b-D-GlcpA-(1-3)-b-D-GlcpNAc-(1-4)-b-D-GlcpA-(1-3)-b-D-GlcpNAc-

(1-4)-b-D-GlcpA-(1-3)-b-D-GlcpNAc

11 2yg0 a-D-Galp-(1-6)-b-D-Manp

149


S6.1. PDB IDs of Lectin-Carbohydrate Systems used to test the CH/π

interaction energy function.

VC1|2

Accurate Predictions Made?

Yes No

1led 2auy 2gvy 1cxf

1veo 2bos 1eo5 1qos

3azs 2ovu 1hlc 2gou

2aai 1lte 2vuz 4gwi

1itc 4g1r 2zl6 2zx4

1gsl 1tei

1sl6

2pel 4g1s 1k9i

1gwl 2g7c 2vxj

1pj9 3o0w 1jpc

1hql 3o0x 1ulf

3pfz 1niv 1w8f

1mxd 2o2l 1sl4

3wg3 2dur 1vbp

1j8r 3lek 1sl5

2wdb 1zhs 3a0e

4gk9 2e6v

3ef2

1uh3

2zhk

1qot

2vco

2wt2

2zhm

3zwe

ADV

1veo 1cxf 1hlc 2gou

1led 1lte 2gvy 1niv

3azs 1qos 2vuz 1ulf

2aai 1tei 2pel 3lek

1gsl 1zhs 2vco 4gwi

1pj9 2auy 1eo5 2vxj

3zwe 2bos 4gk9 2zx4

1mxd 2dur

1jpc

3wg3 2g7c 2e6v

1hql 2o2l 1k9i

150

3pfz 2ovu 1sl6

1gwl 3o0w 1w8f

1itc 3o0x 1sl5

1qot 4g1r 1sl4

1uh3 4g1s 1vbp

2wdb

3a0e

3ef2

2zl6

2wt2

2zhk

1j8r

2zhm

S6.2. CHI energy functions to score the ω glycosidic torsion angle in

1,6-linkages.

Using the GlyTorsion tool available at www.glycosciences.de, the distribution of ω

glycosidic torsion angles for 1,6 linkages were collected.

0

5

10

15

20

25

30

35

160

140

120

100

80

60

40

20 0

-20

-40

-60

-80

-10

0

-12

0

-140

-16

0

-18

0

Dis

trib

uti

on

of

Str

uc

ture

s [

%]

ω [ ]

Equatorial O4

Axial O4

http://www.glycosciences.de/

151

Figure S6.2: The distribution of ω glycosidic angles from carbohydrate crystal structures

with 1,6-linkages divided into two sets based on the position of attachment

(equatorial/axial) of the O4 atom to the reducing sugar.

The dataset was divided into two based on whether the O4 atom forming a part of the

reducing sugar is attached to the plane of the ring axially or equatorially. Based on the

three possible rotamers, three parabolas were joined to form the CHI energy equation for

these linkages. The relative energies of the minima for each of the three parabolic curves

were determined using the crystal structure data, by making use of the formula to

determine the Boltzmann factor by using the Boltzmann distribution for two states. The

equations thus obtained are as follows:

𝐸 = 𝑘 ∗ (𝑥 − 𝜃)2 + 𝑏

where, k = 0.0025.

When O4 is equatorially attached to the plane of the carbohydrate ring,

and if x ∈ [0 – 120], θ = 60, b = 0.21,

and if x ∈ (120 – 240], θ = 180, b = 1.39,

and if x ∈ (240 – 360], θ = 300, b = 0.

When O4 is axially attached to the plane of the carbohydrate ring,

and if x ∈ [0 – 120], θ = 60, b = 0,

and if x ∈ (120 – 240], θ = 180, b = 0.3,

and if x ∈ (240 – 360], θ = 300, b = 1.0.

152

S6.3. Gridbox Centers of Test Systems

1gsl 1veo 2g7c 2vuz 1tei 3azs 2zx4 4gk9

center_x 30.779 31.309 26.162 -10.236 37.595 25.555 39.534 5.942

center_y 14.986 13.767 14.95 -34.789 4.058 -0.181 20.789 2.899

center_z 32.023 40.95 4.176 3.215 -43.079 7.932 46.443 53.234

1jpc 2gou 2o2l 2wt2 1ulf 1eo5 3a0e 3pfz

center_x 56.063 8.405 15.19 22.86 60.265 85.067 20.259 -22.802

center_y 45.229 -8.714 -55.302 6.174 3.52 60.767 -37.806 30.906

center_z 25.17 22.111 -15.243 12.353 5.723 44.647 3.05 -19.125

1k9i 2gvy 2ovu 3ef2 1vbp 1j8r 3lek 1cxf

center_x 8.948 28.687 25.16 2.304 109.791 15.851 4.052 44.796

center_y 46.79 23.377 20.968 64.669 108.52 12.655 4.27 89.74

center_z 44.049 -19.395 21.031 3.633 127.115 62.124 19.329 47.379

1niv 1gwl 2vxj 3wg3 1w8f 1qos 3o0w 1hlc

center_x 113.529 8.285 16.514 54.63 -2.792 -15.015 -18.2 11.686

center_y 52.413 -10.664 -30.606 -40.764 4.137 43.955 -18.233 32.302

center_z 138.995 27.185 115.835 -3.111 34.364 31.99 14.748 84.653

1sl4 1pj9 2zhk 3zwe 2aai 1qot 3o0x 1hql

center_x 41.291 41.326 7.879 -5.218 31.715 77.957 31.2 23.809

center_y 28.944 85.533 42.544 11.166 41.265 8.421 11.371 8.536

center_z 16.404 46.799 7.094 -1.876 11.239 29.285 -16.511 12.031

1sl5 1uh3 2zhm 4g1r 2bos 1zhs 4gwi 1led

center_x 29.388 37.554 -5.21 -1.592 14.64 13.899 -3.557 30.923

center_y -7.683 25.783 -5.945 -7.611 3.347 18.066 -3.798 14.723

center_z -4.434 49.609 14.265 11.645 56.284 -12.694 -20.05 31.526

1sl6 2wdb 2zl6 4g1s 2dur 2pel 1itc 1lte

center_x 127.995 40.041 25.33 10.35 114.805 56.016 16.854 18.882

center_y 48.794 21.235 40.009 -12.03 16.861 27.792 -18.317 -2.108

center_z 38.953 14.007 -18.489 7.428 48.688 64.361 -12.045 36.227

2e6v 2vco 1mxd 2auy

center_x 36.602 51.993 24.973 3.166

center_y 9.703 2.279 31.414 43.303

center_z 39.651 -20.137 1.73 25.044

CONSIDERATION OF GLYCOSIDIC TORSION ANGLE …

Documents

Transcript of CONSIDERATION OF GLYCOSIDIC TORSION ANGLE …