Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of...

44
This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 1 Simulation of NMR observables of carbohydrates (FULL VERSION of Recent Advances in Computational Predictions of NMR Parameters for Structure Elucidation of Carbohydrates: Methods and Limitations, DOI: 10.1039/b000000x) Filip V. Toukach, Valentine P. Ananikov* Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow, 119991, Russia. Fax: +7 499 135 5328; 5 E-mail: [email protected] All living systems are comprised of four fundamental classes of macromolecules - nucleaic acids, proteins, lipids, and carbohydrates (glycans). Glycans play a unique role of joining three principal hierarchical levels of the living world: 1) molecular level (pathogenic agent and vaccine recognition by the immune system; metabolic pathways involving saccharides that provide cells with energy; and energy accumulation via photosynthesis); 2) nanoscale 10 level (cell membrane mechanics; structural support of biomolecules; and glycosylation of macromolecules); 3) microscale and macroscale levels (polymeric materials, such as cellulose, starch, glycogen, and biomass). NMR spectroscopy is the most powerful research approach for getting insight into solution structure and function of carbohydrates at all hierarchical levels, from monosaccharides to oligo- and polysaccharides. Recent progress in computational procedures opened a novel opportunity to reveal structural information available in the NMR 15 spectra of saccharides and to advance our understanding of corresponding biochemical processes. The ability to predict the molecular geometry and NMR parameters is crucial for elucidation of carbohydrate structures. In the present paper, we review the major NMR spectrum simulation techniques in regard to chemical shift, coupling constant, relaxation rate and nuclear Overhauser effect prediction applied to the three levels of glycomics. Outstanding development in the related fields of genomics and proteomics has clearly shown that it is the 20 advancement of research tools (automated spectrum analysis, structure elucidation, synthesis, sequencing and amplification) that drives grand challenges in modern science. Combination of NMR spectroscopy and computational analysis of structural information encoded in the NMR spectra reveals the way to automated elucidation of the structure of carbohydrates. Contents 25 1. Introduction 1 2. Computation of the NMR parameters of carbohydrates 4 3. Empirical methods of NMR parameter prediction 5 3.1. Database approach 5 3.2. Usage of neural networks 6 30 3.3. Regression-based methods 7 3.4. CHARGE approach 7 3.5. Incremental approach at the residual level 8 4. Models and methods for carbohydrate 3D structural studies 9 4.1. Molecular mechanics and molecular dynamics 9 35 4.2. Semi-empirical methods 9 4.3. Ab initio and density functional modeling 11 4.4. Hybrid QM/MM, QM/QM and ONIOM approaches 12 4.5. Interaction with solvent 13 5. Computation of NMR chemical shifts 14 40 5.1. Monosaccharides and derivatives 14 5.2. Oligosaccharides and polysaccharides 19 6. Computation of NMR coupling constants 23 6.1. Intra-residue coupling constants 24 6.2. Inter-residue coupling constants 28 45 7. Computation of NMR relaxation rates 29 8. Computation of other NMR parameters 30 9. Conclusions 31 10. Abbreviations 35 11. Acknowledgements 36 50 12. References 36 1. Introduction Glycochemical and glycobiological research has recently shown a tremendous growth and rapidly developed into one of the leading 55 forces in modern science. Novel synthetic approaches and rational design of carbohydrates and glycoconjugates revealed new opportunities in drug and vaccine discovery. 1-5 Detailed insight was gained into the key role of carbohydrates in biological recognition, development of diseases and control of the immune 60 response. 6-11 Nowadays a lot of new carbohydrate drugs are licensed or are in clinical testing. 2-4, 6-13 Glyco-nanomaterials are perspective building blocks for such applications as biosensors or multivalent scaffolds for drug delivery. 14 With such an outstanding progress demonstrated in recent decades a new era 65 has emerged in medicinal and pharmaceutical applications of carbohydrates. The role of oligo- and polysaccharides and their conjugates in cellular biology can hardly be overestimated. 15-19 Carbohydrate functions in living organisms vary from the energy storage and 70 the maintenance of the cellular shape to provision of the immunological uniqueness of microorganisms. The high structural diversity of saccharide residues and their linkages allows carbohydrate-containing molecules to present a huge number of signals to their surroundings, making them well suited 75 for the control of molecular recognition in living cells, 20 highly involved in signal transduction, 21 and in multiple biosynthetic pathways. 22 Carbohydrate microarrays and other analytical techniques dedicated to probing of glycan-related processes in cells have been developed. 23-25 80

Transcript of Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of...

Page 1: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 1

Simulation of NMR observables of carbohydrates

(FULL VERSION of Recent Advances in Computational Predictions of NMR Parameters for

Structure Elucidation of Carbohydrates: Methods and Limitations , DOI: 10.1039/b000000x)

Filip V. Toukach, Valentine P. Ananikov* Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow, 119991, Russia. Fax: +7 499 135 5328; 5

E-mail: [email protected]

All living systems are comprised of four fundamental classes of macromolecules - nucleaic acids, proteins, lipids, and carbohydrates (glycans). Glycans play a unique role of joining three principal hierarchical levels of the living world: 1) molecular level (pathogenic agent and vaccine recognition by the immune system; metabolic pathways involving saccharides that provide cells with energy; and energy accumulation via photosynthesis); 2) nanoscale 10

level (cell membrane mechanics; structural support of biomolecules; and glycosylation of macromolecules); 3) microscale and macroscale levels (polymeric materials, such as cellulose, starch, glycogen, and biomass). NMR spectroscopy is the most powerful research approach for getting insight into solution structure and function of carbohydrates at all hierarchical levels, from monosaccharides to oligo- and polysaccharides. Recent progress in computational procedures opened a novel opportunity to reveal structural information available in the NMR 15

spectra of saccharides and to advance our understanding of corresponding biochemical processes. The ability to predict the molecular geometry and NMR parameters is crucial for elucidation of carbohydrate structures. In the present paper, we review the major NMR spectrum simulation techniques in regard to chemical shift, coupling constant, relaxation rate and nuclear Overhauser effect prediction applied to the three levels of glycomics. Outstanding development in the related fields of genomics and proteomics has clearly shown that it is the 20

advancement of research tools (automated spectrum analysis, structure elucidation, synthesis, sequencing and amplification) that drives grand challenges in modern science. Combination of NMR spectroscopy and computational analysis of structural information encoded in the NMR spectra reveals the way to automated elucidation of the structure of carbohydrates.

Contents 25

1. Introduction 1 2. Computation of the NMR parameters of carbohydrates 4 3. Empirical methods of NMR parameter prediction 5

3.1. Database approach 5 3.2. Usage of neural networks 6 30

3.3. Regression-based methods 7 3.4. CHARGE approach 7 3.5. Incremental approach at the residual level 8

4. Models and methods for carbohydrate 3D structural studies 9 4.1. Molecular mechanics and molecular dynamics 9 35

4.2. Semi-empirical methods 9 4.3. Ab initio and density functional modeling 11 4.4. Hybrid QM/MM, QM/QM and ONIOM approaches 12 4.5. Interaction with solvent 13

5. Computation of NMR chemical shifts 14 40

5.1. Monosaccharides and derivatives 14 5.2. Oligosaccharides and polysaccharides 19

6. Computation of NMR coupling constants 23 6.1. Intra-residue coupling constants 24 6.2. Inter-residue coupling constants 28 45

7. Computation of NMR relaxation rates 29 8. Computation of other NMR parameters 30 9. Conclusions 31 10. Abbreviations 35 11. Acknowledgements 36 50

12. References 36

1. Introduction Glycochemical and glycobiological research has recently shown a tremendous growth and rapidly developed into one of the leading 55

forces in modern science. Novel synthetic approaches and rational design of carbohydrates and glycoconjugates revealed new opportunities in drug and vaccine discovery.1-5 Detailed insight was gained into the key role of carbohydrates in biological recognition, development of diseases and control of the immune 60

response.6-11 Nowadays a lot of new carbohydrate drugs are licensed or are in clinical testing.2-4, 6-13 Glyco-nanomaterials are perspective building blocks for such applications as biosensors or multivalent scaffolds for drug delivery.14 With such an outstanding progress demonstrated in recent decades a new era 65

has emerged in medicinal and pharmaceutical applications of carbohydrates. The role of oligo- and polysaccharides and their conjugates in cellular biology can hardly be overestimated.15-19 Carbohydrate functions in living organisms vary from the energy storage and 70

the maintenance of the cellular shape to provision of the immunological uniqueness of microorganisms. The high structural diversity of saccharide residues and their linkages allows carbohydrate-containing molecules to present a huge number of signals to their surroundings, making them well suited 75

for the control of molecular recognition in living cells,20 highly involved in signal transduction,21 and in multiple biosynthetic pathways.22 Carbohydrate microarrays and other analytical techniques dedicated to probing of glycan-related processes in cells have been developed.23-25 80

Page 2: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

2 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

Fig. 1. Representative 1H NMR spectra in D2O: (A) cyclic pentapeptide

showing individual signals in a wide range of chemical shifts from 0 to 11 ppm 26; (B) regular polymer with pentasaccharide repeating unit showing signals in two narrow regions 1.0–2.5 ppm and 3.0–6.0 ppm, 5

including a strong overlap in a range 3.5-4.5 ppm.27 (reproduced with permission, © Elsevier Ltd., 2005)

Cellulose and chitin are two most abundant natural polymers on Earth and their industrial utilization is the question of primary importance within a widely accepted sustainable concept. 10

Diversity of industrial applications benefits from employing procedures developed in carbohydrate chemistry towards biomass processing. Carbohydrates contribute up to 75% to world renewable biomass.28-30 Development of practically useful and efficient procedures for conversion of cellulose into platform 15

chemicals and biofuels was identified as one of the central research challenges in the coming century.31-34 The estimations have shown that up to 30% of the transportation fuel demands could be fulfilled by cellulose biomass.31-35 However, in spite of massive development of fascinating 20

applications, carbohydrates remain the least structurally characterized among the major classes of biological molecules. Carbohydrates are very difficult to crystallize and in most cases single crystals of sufficient quality for X-ray analysis cannot be obtained36, 37. Moreover, even for such minority of successful 25

crystallizations, X-ray crystallography was reported to give poorly resolved structures of glycan moieties36-39. Limited application of X-ray structure determination for carbohydrates is in sharp contrast to proteins, where crystallization and X-ray structure elucidation have become a standard research tool40-42. 30

Mass-spectrometry of carbohydrates is a useful technique, however it is not sufficient as a structural tool alone since the crucial issue of stereochemistry of carbohydrates cannot be solved by routinely available methods43. Unlike many high-throughput analytical methods, NMR 35

spectroscopy is tolerant to the incompleteness of reference data

and thus plays a key role in primary structural elucidation of new natural glycans44. Besides its ubiquitous use in structural studies of carbohydrates it makes a significant insight into the mechanisms of their biological action38, 43, 45, 46. In fact, NMR 40

spectroscopy provided most of the experimental data on solution structure of carbohydrates, complex equilibriums and interconversions of sugar units, monitoring of chemical reactions involving carbohydrates, characterization of carbohydrate binding to other bioactive molecules and other processes of biological 45

relevance38, 43, 45-48. It has been recognized as a valuable tool for quality control and characterization of carbohydrates-containing drugs49. NMR-based approaches were incorporated into the World Health Organization recommendations on the production and quality control of glycoconjugate vaccines50. 50

Important advantage of the NMR spectroscopy concerns determination of three-dimensional structure directly in water solution (in water and organic solvents), where the processes of biological and chemical relevance occur. To achieve this goal several experimental methods were developed for measurement 55

of the key NMR parameters: chemical shifts, coupling constants, NOE data and relaxation rates. Highly sensitive and powerful 1D and 2D NMR experiments were developed and optimized to carry out the measurements of carbohydrates15, 36, 43, 51. Rapid progress in the NMR hardware and development of new NMR 60

experiments made structural elucidations routinely available in everyday practice in chemical and biological research laboratories. Such an impressive development has clearly identified state-of-the-art challenge in the field of structural studies of 65

carbohydrates. However, further insight in this fascinating area of research is limited by difficulties in interpretation of the NMR parameters, rather than by recording of the NMR spectra. Indeed, proving correct signal assignment and understanding the relationship between measured NMR parameters and molecular 70

structure is still a tedious task, especially for such chemically diverse class of compounds as carbohydrates. In spite of wide structural diversity of carbohydrates, the majority of their NMR studies is limited to 1H and 13C nuclei in contrast to proteins (1H, 13C and 15N) and nucleic acids (1H, 13C, 75

15N and 31P). Isotope labeling, routinely used in protein NMR spectroscopy to enhance automated structure analysis, is only limitedly applicable to carbohydrates52-56. Although the building blocks of carbohydrates are more diverse in nature compared to structural units of nucleic acids and proteins, their NMR chemical 80

shifts are located in much narrower region (Fig. 1). Thus, assignment and interpretation of the NMR spectra remain a challenge in modern structural glycoscience. Proper interpretation of the NMR parameters requires a theoretical analysis. Particularly, to correlate the time-averaged experimental NMR 85

data with the primary and secondary chemical structure, the former can be computed by molecular modeling48. Modelling of the carbohydrate structure and molecular properties has benefit from a variety of computational methods57. In the present review we discuss recent progress in 90

development of computational approaches for modeling of the NMR parameters of carbohydrates. The review covers a set of topics important for structure elucidation: i) theoretical calculations and analysis of 1H, 13C, 15N, 17O, 31P chemical shifts;

Page 3: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 3

Fig. 2. Selected monosaccharides used in this review, shown in pyranose form, and their IUPAC abbreviations. The monosaccharides that typically exhibit an equlibrium are shown in both forms (A). Various forms of monosaccharides exemplified by D-glucose (IUPAC abbreviations in red). Numbers stand

for carbon atom enumeration (B). Schematic representation of some 4C1 and 1C4 chair hydroxyl and hydroxymethyl rotamers of β-D-glucose. The idealized torsions are denoted by g+, t and g- for gauche clockwise (60°), anti (180°), and gauche counterclockwise (-60°) respectively. The idealized O5-5

C5-C6-O6 dihedral angles for the hydroxymethyl group are denoted by capital letters: G+, T, and G-. g++ or g-- notate torsions far from the idealized values for 1C4 chair conformer58 (C, reproduced with permission, © Elsevier Science B.V., 1996).

Page 4: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 4

Fig. 3. Representative parts of 2D 1H,13C HSQC (A) and homonuclear NOESY (B) spectra of a sulfated trisaccharide recorded in D2O at 500 MHz59. In

spite of strong signal overlap in 1D spectra, the cross-peaks are clearly resolved in 2D spectra. Reproduced with permission, © Elsevier Ltd., 2005.

ii) computations of chemical shielding tensors and chemical shift 5

surfaces; iii) prediction of H-H, P-H, C-H and C-C coupling constants essential for structural studies; iv) modeling of relaxation parameters; v) prediction of nuclear Overhauser effects and other NMR parameters. A discussion is provided on the scope and limitations of 10

available theoretical approaches including ab initio, density functional, semiempirical, molecular mechanics, molecular dynamics, empirical and hybrid calculations in relation to the NMR structural analysis. Application of modern approaches for theoretical prediction of the NMR properties, together with 15

experimental data, results in revealing the key information concerning the molecular structure. One of the major goals of the theoretical NMR calculations is a faithful reproduction, and, later, prediction of the experimental data. The present reviews focuses on the prediction and analysis of 20

NMR parameters of carbohydrates and their derivatives using empirical methods and expert systems60, 61, as well as calculations at quantum-chemical level61-65.

2. Computation of the NMR parameters of carbohydrates 25

Increasing demand in the NMR structure analysis of carbohydrates emerged development of wide variety of computational approaches to predict and analyze chemical shifts, spin-spin coupling constants, relaxation rates and other parameters. 30

The first class of approaches includes empirical methods, which operate with molecules basing on the connectivity of atoms or residues. These methods do not require thorough evaluation of atomic coordinates, except for rough stereochemistry. A series of easy to use and computationally 35

efficient tools were developed based on empirical methods, and these tools are now routinely used in everyday research practice. The concise overview of the empirical methods is given in section 3. Straightforward “first principles” modeling of the NMR 40

parameters with ab initio and density functional methods requires calculation of a molecular structure followed by derivation of the NMR data. The necessary description of models and methods important for NMR structural studies of carbohydrates is summarized in section 4. Detailed discussion of application of the 45

computational methods for structure elucidation of carbohydrates is divided to: a) calculation of NMR chemical shifts (section 5); b) calculation of spin-spin coupling constants (section 6); c) prediction of other NMR parameters (sections 7 and 8). In spite of rapid development of ab initio and density 50

functional methods facilitated by the increasing performance of computational hardware, it should be pointed out that empirical predictions are still widely used. A rough estimation of NMR prediction quality using standard “out-of-box” protocols shows that empirical methods produce good accuracy and are very fast. 55

As is discussed below, there are several options available to improve the performance of ab initio and density functional predictions of the NMR parameters of carbohydrates. However, these options are mostly described in the specialized theoretical articles without being widely known to researchers working on 60

the the experimental data analysis and structure elucidation. Exchange of knowledge between these fields is an important goal of the present review. Typical carbohydrate building blocks and characteristic geometrical features are shown in Fig. 2 (for the list of 65

abbreviations see section 10). On the one hand, a diversity of building blocks and a variety of available inter-residue

Page 5: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 5

connections generate a huge number of possible carbohydrate structures. On the other hand, this abundant structural information usually cannot be deduced from 1D 1H and 13C NMR spectra due to ambiguous assignment pathways and strong signal overlap. Nowadays, multidimensional spectroscopy allows determination 5

of the NMR parameters and reliable structure elucidation of carbohydrates43, 49, 51, 66. Indeed, even addition of ony one spectral dimension (2D NMR spectroscopy) results in clearly resolved signals (Fig. 3) as compared to 1D spectra (Fig. 1B). The aim of computational prediction and analysis of 2D NMR spectra 10

highlights the most important challenge in the field since accurate calculation of 1H and 13C NMR chemical shifts and coupling constants is required.

3. Empirical methods of NMR parameter prediction 15

Since 1975, a number of chemical shift collections dedicated solely to carbohydrates have evolved43, encouraging many groups to develop algorithms that can utilize this information in computational prediction of the NMR spectra of carbohydrates. Most of the chemical shift databases provided a signal search 20

tool, making NMR data easily interpretable in terms of structure. The simplest class of empirical methods implies only a small reference database (so called base values) and a multitude of additive rules and increments parameterized for every class of compounds. This approach has developed to a number of 25

initiatives discussed in the present section of the review. As a representative example, in case of 1H NMR a mean deviation of 0.2-0.3 ppm was observed for prediction of 90% of all CHx-groups chemical shifts in unpolar solvents and in case of 13C NMR >95% of the chemical shifts were predicted by CHARGE 30

with a mean deviation of 3.8 ppm67. Empirical methods, as well as usage of neural networks, enable the fastest and fully automatized calculation that can generate up to 10,000 chemical shifts per second on a desktop computer with an accuracy of 1.6-1.8 ppm68. Programs utilizing statistical 35

processing of reference chemical shifts databases provide similar or better accuracy at slower but still acceptable performance69. Every structural fragment is assigned a descriptor that correlates with its major structural peculiarities. When the database is queried with the descriptor, similar structures are identified, and 40

the resulting values are weighted averages of the experimental NMR data corresponding to these structures. However, the predictions are solely limited to the structural information deposited in a database. As a result, empirical methods have only a limited application in the elucidation of 45

secondary structure, as they are unable to predict non-averaged properties of molecules in a certain conformation or under conditions different from those utilized in the database. Another known drawback concerns inability to account varying conditions of spectra recording. It was reported that the use of different 50

solvents may strongly increase the deviations and deteriorate the accuracy of prediction67. In spite of these limitations, simple algorithms and very fast calculations with reasonable accuracy in basic cases govern ubiquitous application of empirical methods in modern NMR 55

structural analysis of carbohydrates. Incremental empirical or

neural network methods of chemical shift prediction can be successfully used at the selection stage of structural hypotheses which are later verified by time-consuming molecular geometry optimization and ab initio calculations of chemical shifts70. 60

Below we provide a brief review of empirical techniques useful for research and educational purpose in the field.

3.1. Database approach

Historically the first database approach to chemical shift prediction was described by Bremser71 and was called a 65

hierarchical organization of spherical environments (HOSE). Since then it has been improved and remains the most popular structure description algorithm in database-oriented NMR predictors. Particularly this algorithm was used as one of the approaches in ACDLabs ACD/NMR and Modgraph 70

Consultants Ltd. NMR prediction software72, 73. The HOSE starts at the atom whose chemical shift has to be predicted, expands one bond away from the atom (“1st sphere”) and tries to find this environment in the reference database. If the search is successful it moves two bonds away (“2nd sphere”) and tries again and so 75

until either the fragment is not deposited in the database or the molecule boundary is reached. The HOSE approach exhibits good results for the structures where the fragments are well represented in the reference collection. As a rule of thumb, if the analyzed atoms can be predicted using three or more spheres the 80

prediction is considered reliable. In modern implementations, HOSE is extended to treat stereochemistry (3D HOSE), by assigning higher weight to the structural database entries that describe the same stereochemistry as the fragments under analysis72. 85

As realized in ACDLabs 8.0 NMR predictor this approach provided a standard error of 0.22 ppm per 1H resonance (tested on 54,608 organic molecules), and 2.33 ppm per 13C resonance (tested on 68,129 organic molecules). 62% of predicted 1H NMR chemical shifts were less than 0.1 ppm from the experimental 90

values, and 64% of 13C NMR chemical shifts were less than 1 ppm from the experimental values74. Novel versions of ACD/NMR predictors utilize the combined approach, where the results from HOSE and neural network algorithm (discussed in the next section) are compared to retrieve the best-fitting value. 95

The Table 1 illustrates the statistics on the reference databases for several nuclei valuable in carbohydrate chemistry75. More details about ACD/NMR predictor are available in a review of Elyashberg and coworkers60.

Table 1. ACD/NMR reference databases available for NMR spectra 100

prediction (data for version 11 75).

Nuclei Number of structures Number of chemical shifts

1H 210 000 1.7 million 13C 191 900 2.5 million 15N 9 287 21 782 31P 27 578 34 020

Another family of computational products utilizing HOSE approach includes Modgraph-based general-purpose 13C and heteronuclei NMR predictors73 (implemented and tested in a number of software packages, such as MestreLabs NMRPredict76 105

Page 6: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

6 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

and PerkinElmer ChemBioOffice77). Currently Modgraph uses a HOSE code algorithm capable to analyze up to five spheres and a database of 193,352 most highly verified 13C records abstracted from the literature by Robien and coworkers78. This database is a further development of a product reported earlier79. Additionally, 5

185,517 13C and 86,480 heteronuclei records from Chemical Concepts are available as an option. Modgraph automatically selects a better 13C NMR prediction for each atom from HOSE and neural network prediction methods (see section 3.2 for the latter). The higher the number of 10

HOSE spheres was reached for each atom, the more emphasis is given to the HOSE code prediction. The target mean error of 0.18 ppm per resonance was reported after evaluation of ca. 90,000 structures and stereochemistry of the molecule was considered. Several other 13C and heteronuclei chemical shift databases 15

were reported: CSEARCH80, Chemical Concepts SpecSurf / SpecInfo81, WINDAT82, freely-accessible NMRshiftDB83. Some of these projects were continuously developed and transformed into a dedicated computational tools for empirical spectra predictions. 20

An alternative approach to encode stereochemical information in HOSE-based predictors was developed by Satoh and coworkers84. This encoding scheme, called CAST (Canonical representation of stereochemistry), includes different descriptors at the planar, conformational and configurational levels for each 25

atom. Although no usage of CAST for carbohydrates has been reported, predictions of chemical shifts in a linear triol part of 20-hydroxyecdysone exhibited an average deviation from the experimental spectrum within 0.5 ppm per resonance85. Kelleher and Simpson carried out 1H and 13C NMR predictions 30

in the form of HSQC spectrum for the 2D model of humic acid and compared it with HSQC spectra of the soil samples, including the amylopectin carbohydrate moiety86. The predictions were based on HOSE code matches and incremental algorithms implemented in ACDLabs Spec Manager 9.06. Although this 35

approach has been used to produce accurate predictions for non-carbohydrate soil components87, there was generally poor correlation between experimental signals and those simulated for the proposed structural model.

3.2. Usage of neural networks 40

Neural network is a mathematical construction allowing optimization of non-linear dependencies between input descriptors and output values88-90. It consists of artificial neurons organized in a number of layers, where each neuron is a function that transforms its input value to the output value. The first layer 45

(“input layer”) gathers numerical atomic descriptors and no calculations are performed on it. Input layer is fed with structural parameters that are converted to numbers using HOSE, increments or other structure description schema. In chemical shift prediction, the last layer (“output layer”) contains a single 50

neuron that produces the predicted chemical shift. The output value of each neuron in hidden layers in between is an input to the neuron in the next layer. Different connections between neurons have different weight parameter, and the total output depends on the input non-linearly. 55

Prediction approaches based on neural networks benefit from self-learning and ability to model properties of compounds

without understanding of the underlying phenomena, which is especially demanded for non-linear relationships typical for instrumental analytical chemistry91. To make use of a neural 60

network in NMR data prediction, it should be trained against a database of known chemical shifts in order to optimize the weights of neuron connections88, 90, 91. Radomski and coworkers showed the ability of neural networks to recognize and process spectra with low signal-to-65

noise ratio, which could hardly be analyzed by regular visual inspection92. Since then a number of applications of neural networks to prediction of the NMR chemical shifts, especially 13C, have been reported for general organic compounds91, 93 and certain biomolecular classes, including proteins94. 70

Gerbst and coworkers demonstrated that ART1-type neural network is capable to identify the class of fucoidan polysaccharides from the characteristic 13C NMR signals. However, the structure abalysis quality was satisfactory only if the neural network training set contained exactly the residues 75

present in a molecule to identify95. A combination of fragmental approach and usage of a neural network is implemented in various computational tools. Particularly, ModGraph 13C NMR predictor includes a neural network algorithm to help the prediction of molecules, which are 80

not well represented in the HOSE reference database. Testing of this neural network on 345,000 reference spectra exhibited an average deviation between experimental and calculated chemical shifts of below 2 ppm96. Purtuc and coworkers designed a neural network with 85

extensive utilization of stereochemical information in 13C NMR chemical shift prediction with no need in 3D atomic coordinates97. The data used during training and evaluation of the network were selected from the CSEARCH database of ca. 230,000 13C NMR spectra (ca. 2,700,000 chemical shifts). A 90

typical training set consisted of 400,000 examples selected on a random basis to reduce the resource consumption during network optimization79. Le Bret reported a neural network trained on 8,342 13C NMR chemical shifts described by 314 topological and chemical 95

descriptors related to the atom itself and its nearest neighborhood. The average deviation of 4.5 ppm was claimed to be independent on the size and complexity of the molecule. However only routine molecular types and molecules smaller than 64 carbons were considered98. 100

Meiler and coworkers constructed a three-layer neural network that considered 28 atom types and two summarizing parameters in every of six spheres. The best results (standard deviation 2.1 ppm for ca. 15,000 test atoms) were achieved with a number of hidden neurons from 5 to 2099. Later this network was used to 105

elucidate structures of up to 20 carbons by a genetic algorithm100 and improved by the introduction of an extended hybrid numerical description of the carbon atom environment. Genetic algorithm is an iterative search heuristic utilizing benefits of evolutional algorithms, in which solution generations undergo 110

inheritance, mutation, selection and crossover101. Standard deviation for an independent test data set of ca. 42,500 carbons was reported as 2.4 ppm102. The neural network designed by Smurnyy and coworkers recognized 32 atom types and double bond stereochemistry (as a 115

Page 7: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 7

separate sphere), and its output was additionally corrected with rule-based algorithm that used increments shared by two or more substituents (“cross-increments”). The network was trained on a database of 190,000 structures and about two million chemical shifts, and validated on a database of 8,500 structures and 5

~118,000 chemical shifts. It is difficult to design a single network that covers all the range of 13C NMR chemical shifts, thus reference database was split into subdatabases accordingly to the nature of the central atom103. This neural network was used as one of the algorithms implemented in ACD Labs/NMR68. 10

Smurnyy and coworkers compared neural network and least-square linear regression approaches in prediction quality and performance, and optimized several parameters (number of subdatabases, number of structural descriptors, network parameters etc.). As a result, they obtained an average error of 15

1.5 ppm for 13C and 0.2 ppm for 1H NMR chemical shifts, and supposed that further improvement is much more dependent on the choice of structural, and especially stereochemical, descriptors and quality of the training databases rather than on the regression method. Linear regression and neural network 20

produced results of similar accuracy, however linear regression was 2-3 times faster68.

3.3. Regression-based methods

Linking of structural descriptors and chemical shifts (especially for carbons) by a mathematical relationship and obtaining weight 25

factors has been a challenging task for several decades. In 1987 McIntyre and Small developed a methodology for simulation of the 13C NMR spectra of monosaccharides. Using experimental data from literature and own recorded spectra of 35 pyranoses and methyl pyranosides, the authors constructed models that 30

related observed chemical shifts to 2-6 numerical parameters encoding aspects of carbon atom chemical environment (functions of distances, van der Waals’ energies, etc.). These parameters encoded the effects of multiple oxygen atoms in the carbon atom surrounding. They were derived from the atomic 35

coordinates optimized by MM2 calculations of both chair conformations of every monosaccharide. The authors applied a multiple linear regression analysis to construct chemical shift models independently for five carbon types in pyranose residues. The models were tested on 15 pyranoses and methyl pyranosides 40

not included in the reference set. The standard prediction error appeared to be from 0.43 to 0.85, depending on the atom type. This pioneering study104 encouraged further development of regression-based methods in computational analysis on NMR structural data of carbohydrates. 45

It was shown that a chemical shift can be represented as a function of variables describing characteristic molecular features. Within every proposed mathematical model, an experimental database can be used to calculate the regression parameters and to check the prediction. Least-square regression techniques, neural 50

network or HOSE approach were used to formulate the additivity rules within the NMR parameter prediction by incremental method on the atomic level68. In contrast to other incremental schemes, such combination requires a potentially smaller number of examples from which the necessary rules can be established, 55

followed by application to a broader range of chemical structures.

The general-purpose atom-based regression scheme, derived using least-square method, has been recently designed by Blinov and coworkers. As compared to neural network approach, usage of linear regression provided ultra-fast calculation (ca. 10,000 13C 60

NMR chemical shifts per second on a desktop computer) with an average deviation of 1.85 ppm 105. Within this scheme every atom surrounding an atom under consideration is characterized with 9 parameters (element, hybridization state, valence, etc.). The concept of “atom pairs” was introduced to the single-atom 65

increments and to add more descriptors to the structure encoding105. Mitchell and Jurs developed linear-regression mathematical models to obtain 13C NMR chemical shifts from a number of atom-based structural descriptors of monosaccharides106. These 70

descriptors included topological, geometric, and electronic information about carbon atoms in a conformation obtained by energy minimization using MM2 force field. The training data set included 55 pyranoses and 56 furanoses. As a result of multiple linear regression analysis, an eleven-descriptor model was 75

designed for pyranoses and an eight-descriptor model was designed for furanoses. The models were submitted to neural networks, giving improved results with final RMS deviation of 1.03 ppm for pyranoses and pyranosides and 1.58 ppm for furanoses and furanosides106. 80

A similar approach has been used by Clouser and Jurs for prediction of 13C NMR chemical shifts of 17 ribonucleosides107. The atoms to predict were divided into two subsets, one for those inside the ribofuranose ring, and the other for those contained in nucleosides. Multiple linear regression allowed building of a 85

four-descriptor model (three topological descriptors and one geometrical) for the former subset and an eleven-desciptor model (four electronic, three topological and two geometrical descriptors) for the latter one. Submission of the derived models to a three-layer fully-connected neural network made it possible 90

to reach the accuracy of 0.39 ppm for the first subset and 0.98 for the second one. The former value does not differ much from a regression model output as there are not enough input descriptors to make use of non-linearity of a neural network. In the case of the second subset usage of neural networks significantly 95

improved the prediction accuracy as compared to a regression model output107.

3.4. CHARGE approach

CHARGE is a semi-empirical incremental scheme based on electronic, steric and other effects parameterized for a variety of 100

functional groups108, 109. CHARGE algorithms do not include geometry optimization but should be given a determined geometry of a molecule to process. CHARGE is implemented as a part of ModGraph 1H NMR chemical shift predictor73 (implemented in MestreLabs MestreNova and Cambridge 105

ChemBioOffice). This predictor starts with generation of all 3D conformers from a primary structure, followed by CHARGE prediction for each conformer and resulting in a weighted average spectrum. The prediction includes the substituent chemical shifts approach, which is a general-purpose additive incremental 110

scheme utilizing 3D structures. This approach is the extension of the Proton Shift program developed earlier110.

Page 8: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

8 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

The CHARGE approach combines short-range and long-range substituent effects. The short range effects are reflected in the calculation of the partial atomic charge of the atom under consideration, based upon electronegativity and polarizability of atoms in close proximity and the dihedral angles. The calculated 5

α-, β- and γ-effects produce a partial charge on the given atom, which is converted to the charge-derived chemical shift using the equation δcharge = 160.84×q-6.68. The effects of more distant atoms on the 1H NMR chemical shifts are represented as a sum of steric, electric field, anisotropic, π-electron and ring current 10

contributions. CHARGE is considered less fundamental but faster and more convenient in routine usage than ab initio calculations. No dedicated parameterization for carbohydrates has been reported, however parameterization of CHARGE for polyatomic alcohols, 15

including inositol, provided acceptable agreement with the experimental data109. Escalante-Sanchez and Pereda-Miranda used this approach to simulate oligosaccharide 1H NMR spectra and to find proper parameters for the 1st and 2nd-order analysis of the experimental 1D NMR data111. The scope of the study 20

included batatin I, batatin II and two ester-type dimers of acylated plant pentasaccharides. The experimental NMR spectroscopic values registered for batatinoside I were used as a starting point for the NMR simulation of batatins I and II. Spectroscopic simulation carried out in Mestre-C was used to reproduce the 25

registered 1H NMR data and thus permitted a correct assignment for the chemical shifts and coupling constants of all superimposed protons in batatins I and II111.

3.5. Incremental approach at the residual level

General-purpose computational tools discussed above, based on 30

incremental and neural network approaches, do not provide the accuracy sufficient for 13C NMR “fingerprint” of natural glycans. In contrast to the fragmental approach on the atomic level, algorithms that partition structures on the level of residues were much better parameterized for carbohydrates. The latter approach 35

implies application of the substitution effects to the spectra of monosaccharides or other small structural fragments. The substitution effects reflect chemical shift changes caused by addition of certain structural units to a known position in a monosaccharide. The more structural features of substituents are 40

taken into consideration, the better the spectrum simulation accuracy of is. Thus, the accuracy of chemical shift computational prediction significantly depends on completeness of the spectroscopic databases for a given class of monosaccharides. 45

Toukach and Shashkov implemented incremental 13C NMR prediction scheme developed earlier112 in the computational tool BIOPSEL, capable to predict 13C NMR chemical shifts of regular glycopolymers in water solutions113. Incremental approach was used in calculations to elucidate polymeric glycan structures 50

based on 13C NMR data only. An empirical database of chemical shifts of mono-, di- and trisaccharide fragments was obtained from retrospective literature analysis and applied in calculations. A substitution effect database derived from published spectra of di- and trisaccharides was used to calculate chemical shifts of the 55

unknown structural entities.

Rigorous verification of BIOPSEL predictions was carried out on repeating units of Proteus bacterial polysaccharides113. The published experimental structures were found among the five highest ranked predicted structures in 80% cases, of which in 60

60% cases the correct structure was ranked the highest. The simulated spectra showed average deviation from the experimental data in the range from 0.13 to 0.45 ppm. Recently chemical shift prediction module of this software became a part of Bacterial Carbohydrate Structure Database114, got web-65

interface115, and was extended to predict 13C NMR chemical shifts and glycosylation effects for oligomeric or polymeric glycans, including those containing rare monosaccharides. Widmalm and coworkers designed a web-interface116 to the CASPER program for structure elucidation of oligo- and 70

polysaccharides using 13C and 1H NMR data, including chemical shift correlation experiments117. They provided a schema for structural elucidation of polysaccharides based solely on the NMR data118. The algorithm of CASPER, which uses an incremental approach to the calculation of 13C and 1H NMR 75

chemical shifts, was developed earlier119. There are three data

Fig. 4. Conformation of a tetrasaccharide repeating unit of Shigella

dysenteriae type 2 O-antigen predicted by MM3(1996) with the use of 80

genetic algorithms120. Reproduced with permission, © Elsevier Ltd., 2005.

categories utilized in the simulation of NMR spectra: chemical shifts in monosaccharides, glycosylation shifts in disaccharides, and correction sets being the differences between the observed 85

chemical shifts for spatially strained trisaccharide models and those calculated by the additive approach119. The interface and the underlying program have been extensively tested using published data and proved to be able to simulate 13C NMR spectra for >200 structures with an average 90

error of about 0.3 ppm/resonance. When applied to the repeating units of Escherichia coli bacterial polysaccharides, the published structures were found among the five highest ranked predicted structures in 75% cases. The average deviation between calculated and experimental chemical shifts was 0.54 ppm and 95

0.06 ppm for 13C and 1H nuclei, respectively. Oligosaccharide 13C spectra were calculated with the average error of 0.23 ppm/resonance and the correct structure was ranked first or second in all the cases examined121.

Page 9: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 9

4. Models and methods for carbohydrate 3D structural studies

In the present section we discuss only those theoretical models and methods that were coupled with the NMR structure analysis. Other computational approaches used in the studies of 5

carbohydrates has been reviewed elsewhere122-125.

4.1. Molecular mechanics and molecular dynamics

Molecular mechanics (MM) uses Newtonian mechanics to model molecular systems and calculates the potential energy using the sets of atomistic parameters derived from small model 10

compounds (force fields). The basics of this method are described in a monograph by Burkert and Allinger126. Several MM force fields, such as CHARMM and GLYCAM, have been specially optimized for carbohydrates44. A multitude of force field parameterizations have been studied in order to 15

account a flexible nature of carbohydrates. Of general-purpose force fields, MM3 has been one of the most popular ones for the optimization of the oligosaccharide structure. An example of atomic coordinates produced by MM3 energy calculations and genetic algorithms is depicted on Fig. 4. The search using 20

Fig. 5. Relative usage of modern carbohydrate force fields (based on citation index during 2005-2010)123. Reproduced with permission, ©

Elsevier Ltd., 2010. 25

GLYCAL software was performed in the conformational space of torsion angles of glycosidic bonds and exocyclic groups. Genetic algorithms use operators like mutation and crossover to generate offsprings over a random population of conformations evaluated by MM3 energy, and terminate after a fixed number of 30

generations or at no further improvement. This approach allows significant expansion of conformational space that can be explored at reasonable computational costs120. A brief guide to the MM force fields used for carbohydrate calculations is given in Table 2, and the usage statistics is 35

depicted in Fig. 5. A more complete list of force fields ever used for carbohydrates is provided in a review by Gerbst and coworkers124. Useful classical force fields applicable to

geometrical optimization of carbohydrates were reviewed by Imberty and Perez36. 40

Energy minimization procedures based on molecular mechanics and molecular dynamics are widely implemented in dedicated (Wavefunction Inc. Spartan127, Schrödinger MacroModel128, 129, MOSCITO130, 131, COSMOS132, 133 and other) or general-purpose (Gaussian Inc. Gaussian134, 135, GAMESS136-

45

138, Hypercube Inc. HyperChem139, 140 and other) software. Molecular dynamics (MD) is a form of computer simulation in which particles are allowed to interact for a period of time by approximations of known physics, giving a view of their motion. MM and MD usually share the same classical force fields, but 50

unlike MM, MD may be based on quantum chemical levels of theory. However, MD simulation capable to achieve convergence of rotamer population of the exocyclic C-C torsions with consideration of solvent requires longer timescale than assumed by a reasonable computational cost44. More detailed view on MD 55

methods is presented in a review by Adcock and McCammon141. The MD simulation technique is a good way to study inherent flexibility of a molecule since all degrees of freedom are explored simultaneously, although barrier crossing may still require very long simulations. The ensemble of MD-generated conformations 60

may be subsequently used for the prediction of parameters for which only a poor quantum mechanical experience exists. MD is of particular importance to analyze and predict NOEs in the NMR spectra of carbohydrates142-144. Replica-exchange molecular dynamics (REMD)145 employs a 65

set of frequently exchanged simulations with different temperatures, allowing a one-dimensional random walk in temperature and potential energy space. Usage of REMD for conformational studies of carbohydrates has been recently reviewed146. 70

4.2. Semi-empirical methods

Semi-empirical methods use sets of parameters derived from the experimental data in order to simplify the approximation of the Schrödinger equation. Therefore, relatively low computational resources are required and the calculations can be practically 75

applied to large molecules147, or used to obtain a starting point for subsequent ab initio calculations. Most of semi-empirical methods are known to operate poor on molecules with hydrogen bonding, transition structures, and molecules containing atoms for which they are poorly parameterized147. Among the semi-80

empirical methods employed in 3D structure elucidations of carbohydrates were AM1, PM3, and MNDO148, 149. Some of the studies used AM1 for the geometry optimization with subsequent DFT calculations of shielding in oligosaccharides150-152. Later publications often involved PM5 and PM6 methods153 applied to 85

carbohydrates and glycoconjugates154. Bond polarization theory (BPT) is a semi-empirical approach, designed by Sternberg and coworkers in 1988155, which linearly correlates atomic charges and chemical shifts to bond polarization energies. It was applied to the calculation of 13C NMR chemical 90

shift tensors with accuracy comparable to ab initio methods as they were in 1997156 and gave rise to a number of improvements such as COSMOS force field132. This force field allowed calculation of solid state chemical shifts at reasonable

Page 10: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 10

Table 2. MM force fields reported for calculations of carbohydrates. a

Name Description Implementation a ref.

MM3 MM3(1992) MM3(1996) MM3(2000)

2nd generation molecular mechanics force field for C, H, O and N atoms. It has been extensively used for carbohydrates. The MM3 force field takes into account the stretching, bending, stretch-bending, torsional and dipolar contributions and van der Waals interactions. It accounts for the anomeric and the exo-anomeric effects and has some provisions for estimation of hydrogen bonding159.

GAUSSIAN 134, 135, PCModel 160, Tinker 161

162

MM+(91) MM+

A variant of MM2 combining a functional from MM2(77) and parameterization from MM2(91) with a number of extensions.

HyperChem 139, 140 163

CHARMM CHARM22 CHARM27

Chemistry at Harvard macromolecular mechanics, a family of classical force fields for the calculation of macromolecules using molecular dynamics, and an associated software package. CHARM22, originally designed for proteins, was parameterized for explicit water model. CHARMM27 was reported to be suitable for sugars within nucleic acids.

CHARMM 164, GROMACS 165, 166, Tinker161

164, 165,

167, 168; 169

(review)

- All-atom additive empirical force field consistent with CHARMM and parameterized for the hexopyranose monosaccharides and linkages between them.

CHARMM 170, 171

- Parameterization of the additive all-atom CHARMM force field for acyclic polyalcohols, acyclic carbohydrates, and inositol.

CHARMM 172

PARM22/SU01 CHARMM22 modified for pyranosidic carbohydrates. CHARMM 173

HSEA Hard sphere approach with consideration of the exo-anomeric effect. It was shown to be able to predict the 3D structure and conformation of large oligosaccharides.

GESA, GEGOP 174

CHEAT95 Extended atom force field for hydrated oligosaccharides, a modification of CHARM22 with special atom type to account hydrogen bonding.

CHARMM 175

HGFB A revised CHARMM-type molecular mechanics potential energy function specially developed for use in the dynamical simulation of simple carbohydrates in aqueous solution. The force field was shown to represent the vibrational spectrum and ring pucker of pyranoses.

CHARMM (?) 176

PHLB Molecular dynamics force field aimed to correct the unrealistic flexibility of the HGFB carbohydrate model. Specific dihedral angle terms are parameterized to reproduce experimental vibrational frequency data and small molecule ab initio dihedral angle rotational energy profiles.

CHARMM 177

GLYCAM_93 GLYCAM2000 GLYCAM06

This generalizable biomolecular force field was initially designed to add carbohydrate functionality to AMBER. Later this dependence was removed, as well as all general or default parameters, and explicit water was accounted for.

AMBER 178, 179 180, 181

GROMOS This classical general-purpose force field associated with MD simulation software package for the study of biomolecules (A-version) has been developed for application to aqueous or unpolar solutions of proteins, nucleotides and sugars. A gas phase version (B-version) for simulation of isolated molecules is also available.

GROMOS 182-184, GROMACS

185, 186

45A4 This parameter set based on GROMOS, was developed for the explicit solvent simulation of hexopyranose based carbohydrates.

GROMOS (?) 187

OPLS-AA Originally designed as optimized potentials for liquid simulations (all-atom) it was later extended for carbohydrates and parameterized to reproduce the ab initio calculation of energies of 4C1 pyranoses with explicit water.

MOE, Tinker, Towhee

188

COSMOS-NMR Hybrid QM/MM force field that uses localized bond orbitals with fast BPT formalism for semi-empirical calculation of atomic charges and NMR parameters. It was adapted to a variety of compounds including macromolecules and optimized for the NMR-based structure elucidation. Explicit quantum-mechanical calculation of electrostatic properties is utilized.

COSMOS 132, 133 132

CSFF A development of the PHLB and HGFB carbohydrate force fields optimized for carbohydrate solutions and having improved hydroxymethyl rotations.

CHARMM 189

AMBER A functional form from which a family of classical explicit-solvent force fields are derived for molecular dynamics of biomolecules (GAFF, GLYCAM).

MacroModel 128, 129, AMBER 178, 179, other

190; 191

(review)

Amber-H Derived from AMBER for conformational analysis of oligosaccharides. Insight II 192 193

BIO+ A force field based on CHARM22 and CHARM27. HyperChem 139, 140

a Only implementations cited in carbohydrate studies are listed; other implementations used for carbohydrates implicitly are not covered here.

computational cost, as DFT methods under periodic boundary conditions demanded much higher computational power157. Later Sternberg and coworkers used COSMOS force field in 5

combination with 13C solid state chemical shift target functions to investigate the structure of cellulose I and II158. The parameters of linear polarization model for BPT were determined from a least

square fit to atomic charges in small molecules obtained by ab initio calculations using the 6-31G(d,p) basis set. The average 10

deviation between calculated and experimental data, derived from reported chemical shifts was 0.47 ppm, 0.89 ppm and 0.67 ppm for cellulose-II, cellulose-Iα and cellulose-Iβ, respectively.

Page 11: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 11

Witter and coworkers investigated the spectrum assignment for 13C-enriched bacterial cellulose Iα 194. The crystal structure was refined using the 13C NMR chemical shifts as target functions, giving 0.37 Å RMS difference with the structure determined by neutron diffraction (for heavy atoms only). Starting with 5

coordinates derived from neutron scattering, the MD simulations yielded four ensembles containing 800 structures. These four models were geometrically optimized with the given isotropic NMR chemical shift constraints and application of the crystallographic boundary conditions. 13C NMR chemical shift 10

tensors were simulated for each model (using BPT with coordinate-dependent charges) and compared with the experimental chemical shift anisotropy information obtained by 2D iso-aniso RAI acquired at magic angle spinning speed of 10 kHz. The calculations based on the COSMOS force field allowed 15

obtaining isotropic chemical shifts with average deviation of 0.59 ppm per resonance.

4.3. Ab initio and density functional modeling

A quantum chemistry modeling approach implies a combination of a theoretical method (level of theory) with a basis set. Each 20

unique pairing of method with basis set represents a certain approximation of the Schrödinger equation. Results for different systems may only be compared when they have been predicted via the same model147. The more electron correlations are considered in a theory level and the bigger a basis set is, the more 25

accurate but more computationally-expensive the calculation is. Hybrid functionals define the exchange functional as a linear combination of Hartree-Fock, local, and gradient-corrected exchange terms. The hybrid functionals most widely reported in structural studies of carbohydrates are Becke's three-parameter 30

formulations (B3LYP195, 196 and B3PW91195, 197) and their modifications. Detailed description of functionals and basis sets is beyond the scope of this review, and is reviewed elsewhere147. During recent decades density functional theory (DFT) gained increasing popularity in computations of various biomolecular 35

systems. Good accuracy at reasonable demand in computational resources is the important advantage of DFT calculations. Detailed descriptions of various functionals as well as of the scope and applications of DFT calculations were published198-204. Time-dependent DFT was reported in context of description of 40

electromagnetic field to substance interaction205, 206. QM calculations are carried out in two stages to predict the NMR properties of molecules: 1) geometry optimization to obtain three-dimensional structure; and 2) calculation of NMR parameters for a certain geometry. Very often different levels of 45

theory are applied at these stages and in most cases calculation of the NMR parameters (stage 2) requires more sophisticated level as compared to geometry optimization (stage 1). Choozing a proper combination of theory levels is an important question discussed in more details below (sections 4.3 and 4.4). 50

Several computational approaches were developed for prediction of the magnetic properties and NMR parameters. Gauge-independent atomic orbitals (GIAO) method for NMR shieldings proposed by Ditchfield in 1974207 implies that atomic orbitals have their own local gauge origins placed on the orbital 55

center and defining the vector potential of the external magnetic field. Incorporation of such features of DFT as accurate non-local

exchange-correlation functional and bigger basis sets in GIAO calculations led to significant improvement of the shielding tensor calculation quality208. 60

Attempts to improve the efficiency of the magnetic property calculations have been undertaken by applying the gauge factors to localized molecular orbitals instead of every atomic orbital. These attempts were formalized in the individual gauge localized orbital (IGLO) method209 and the localized orbital/local origin 65

(LORG) method210. The performance of IGLO was studied on small organic molecules at first,211 and later the method was combined with DFT calculations212. A few studies reported usage of GIPAW for solid state chemical shift prediction in carbohydrates213. GIPAW is a theory for all-electron magnetic 70

response within the pseudopotential approximation, based on extension of Blöchl’s PAW approach. As a valuable feature, GIPAW is valid for both finite and periodic-boundary conditions214. Density functionals commonly used in GIPAW studies have been PBE215 and KT3216. The latter is a 75

semiempirical exchange-correlation functional specially designed for the calculation of organic nuclei shielding tensors and reported to outperform hybrid functionals for molecules forming hydrogen bonds217. Comparison of GIAO, IGLO and LORG calculations showed 80

better efficiency of GIAO in terms of the required basis set and provided more accurate results218. GIAO internally extends the basis set with higher angular momentum orbitals, which are necessary for the correct description of the perturbed systems. In contrast, all atomic orbitals participating in a localized molecular 85

orbital share the same gauge factor. As compared to localized methods, GIAO is less sensitive to the quality of the employed basis set, and thus provides faster convergence of the calculated chemical shielding and does not require polarization functions to achieve the same level of accuracy218. 90

Nowadays, the main drawback of GIAO as compared to the localized methods, i.e. lower calculation performance, has been significantly compensated by development of computer hardware. The performance of modern desktop computers is now sufficient to predict NMR properties of small and medium sized 95

molecular systems with reasonable accuracy. As a result, GIAO calculations combined with density functional theory level from early 90s 219 are often used to predict NMR properties of organic and biomolecular systems. An important issue for reliable computational prediction of the 100

NMR parameters is a selection of a proper theory level for geometry optimization. NMR shielding tensor is a property that can be computed in the context of a single point energy calculation. HF/6-31G(d) on geometry optimized with B3LYP/ 6-31G(d) was cited as minimal model for predicting the NMR 105

parameters220. Due to hydrogen bonding, the basis set properly describing energies of carbohydrates should include diffuse functions; B3LYP/6-311++G(2d,2p) was reported as minimal for accurate description of aldo- and keto-hexoses in both furanose and pyranose forms221. 110

Reduction of scaling of QM calculations to the lower powers of molecular size has been a challenge. It became possible to linearize the scaling for the geometrical222 and energetic (DFT) calculations223. Within a method for the calculation of NMR chemical shielding introduced by Ochsenfeld and coworkers224, 115

Page 12: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

12 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

the cubic increase of the computational effort with molecular size is reduced to linear. This allowed treatment of large molecules (>1000 atoms with no need for molecular symmetry) at the HF and DFT levels. According to a survey of approaches to CST calculation done 5

by Sefzik and coworkers225, in most cases none of DFT functionals could perform better than HF in calculation of chemical shielding tensor components in eight solid state

Fig. 6. The structure (A) and 13C NMR chemical shift surface for the 10

anomeric carbon at the glycosidic bond (B) of α-D-Glcp-(1-4)-α-D-Glcp disaccharide in water obtained using ONIOM(DFT:HF) method226.

Reproduced with permission, © Elsevier Ltd., 2009.

1-methylpyranosides, erythritol and sucrose, however absolute values were close to the experiment. cc-pVDZ and cc-PVTZ 15

basis sets were used. A number of other methods to predict chemical shifts using quantum-mechanics calculations, were summarized by Gregor and Mauri227. General topics related to calculation of magnetic properties and the NMR parameters are well-reviewed in the 20

scientific literature61, 64, 228, 229. The main scope of the present review are the NMR computational studies of carbohydrates and their limitations.

4.4. Hybrid QM/MM, QM/QM and ONIOM approaches

Recent development of hybrid theoretical approaches made it 25

possible to divide large molecular systems into several subsystems (layers) and to treat them at different levels230-233. In these hybrid calculations the most important and relatively small

part of the molecule (higher layer) is treated at more accurate quantum mechanical theory levels, whereas other parts of the 30

molecule are treated at the less computationally-demanding levels, such as MM or low level QM. The molecules or molecular systems are usually partitioned into two (high and low) or three (high, medium and low) subsystems. In the two-layer approach the resulted hybrid methods are noted as QM/MM, QM/QM, 35

ONIOM(QM:MM) or ONIOM(QM:QM)234. In the three-layer approach the system of interest can be described as ONIOM (QM:QM:MM) with several combinations of theory levels for different layers. Utilization of hybrid approaches significantly speeds up the 40

calculation and overcomes the size limitation in computational studies. In the best case hybrid approach combines the accuracy of high level QM calculations at the speed of relatively fast low level methods (MM, etc.). The scope and limitations of hybrid approaches for studying organic and biomolecular systems were 45

reviewed in the literature230-233 including the description of developed computational tools235. ONIOM(DFT:MM) and ONIOM(DFT:HF) calculations have shown excellent performance in structure optimization and energy calculations, particularly for derivation of chemical shift surfaces of glycosidic 50

bond carbons (example in Fig. 6, discussed below)226. Two general strategies are explored in modern carbohydrate studies involving hybrid calculations. The first strategy is based on hybrid calculations only at the geometry optimization step, followed by derivation of the NMR properties with regular 55

methods and treatment of the whole molecule at the same level (usually it is the highest QM level achievable with existing computational resources). This approach benefits mainly from performance increase on the stage of molecular structure optimization. It is a reasonable and very useful combination since 60

geometry optimization is often much more time-consuming compared to GIAO calculations of chemical shifts236. The second strategy allows utilization of the hybrid approach features both in geometry optimization and in magnetic properties calculation. Morokuma and coworkers have demonstrated the 65

efficiency of the hybrid approach to calculate the NMR chemical shifts using the two-layer ONIOM scheme237. In this calculations the small (model) system containing the atoms of interest was described at a higher level of theory, and the rest of the molecule was described at a lower level. The resulting shieldings were 70

expressed as: σiso [ONIOM] = σiso (high level, model) + σiso (low level, whole molecule) - σiso (low level, model). A general recommendation for molecule partitioning says that a minimal model system for the NMR property calculation should include a nucleus for which the high accuracy is needed and its closest 75

heavy neighbors237. The usage of combined QM/MM method for the validation of the geometrical modeling of the complex of E-selectin with sialyl Lewis X was reported by Ishida238. A combined modeling was proposed to identify complex sugar-chain conformations on the 80

reduced free energy surface. The free energy profile was evaluated by classical MD simulation followed by ab initio QM/MM energy corrections. Flexible carbohydrate structures were mapped onto the reduced QM/MM 2D free energy surface, and the details of molecular interactions between each 85

monosaccharide component and the amino acid residues at the

Page 13: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 13

carbohydrate-recognition domain were identified. Using the computational procedure of the chemical shielding tensor evaluation239 the calculations for large molecules including a carbohydrate ligand were performed. This study confirmed the

modeling validity by evaluation of the 1H NMR chemical shifts 5

by ab initio QM/MM-GIAO computations at HF/6-31G*. 20 QM/MM-refined geometries sampled from the minimum free

Fig. 7. Two conformations of the IdoA2S residue of a heparin disaccharide in water: 1C4 (A) and 2S0 (B). The GlcN6S residue is in the 4C1 form. Violet dots represent sodium ions. Only a part of water molecules is shown for clarity240. Reproduced with permission, © American Chemical Society, 2011. 10

energy region in the free energy surface were used, and the averaged theoretical data were compared to the experimental NMR spectrum238. Although most proton chemical shifts were reasonably assigned by QM/MM-GIAO averaging, some resonances showed an upfield shift by 0.2-0.3 ppm, as compared 15

to the experiment. Most of these deviations were observed when monosaccharide units were exposed to a solvent-accessible region and had a relatively high flexibility. The study confirmed excellent potential of hybrid approach to study carbohydrates, as well as it pointed out the necessity of more accurate consideration 20

of solvent effects.

4.5. Interaction with solvent

The ability of carbohydrates, especially polysaccharides, to adopt a wide range of dynamic conformations in solution was recognized as the central factor for many of their biological 25

functions, and thus interaction with solvent cannot be neglected. Not only NMR properties, but also the geometry should be simulated with consideration of the solvent effects. A multitude of hydroxyl groups present in carbohydrates lead to noticeable contribution of the solvent-solute interaction and introduce 30

visible differences between solution and X-ray structures36. Structure of carbohydrates in solution is strongly influenced by solvent, which is in most cases water. In classical simulations water is often represented using a three-site (TIP3P), a four-site or a five-site water model241. As implemented in CHARMM, this 35

model implies that each atom in a water molecule is represented by a point charge and a Lennard-Jones potential energy term, and the algorithm used does not allow the water molecule geometry to change throughout the simulation. Simple water models

predominate in MD studies due to faster calculation and better 40

correspondence with existing force fields242. In contrast to rigid and non-polar molecules, carbohydrates possess strong and specific solute–solvent interactions due to hydrogen bonding and have conformational degrees of freedom, possibly with solvent-dependent distribution. Due to these factors 45

the full dynamics of carbohydrate molecules in solution is a

Fig. 8. 13C NMR chemical shift surfaces for two transglycosidic carbons of α-(1-4)-linked D-Glcp disaccharides, as a function of the glycosidic 50

bond dihedrals151. Reproduced with permission, © Elsevier Ltd., 2005.

challenging topic243. A common approach to the description of the dynamics is running an MD simulation for solute surrounded by solvent molecules with subsequent extraction of snapshots from the trajectory file. Calculation of the NMR properties 55

implies averaging over these molecular clusters as well244. However, MD simulation capable to achieve convergence of rotamer population of the exocyclic C-C torsions with consideration of solvent requires a timescale of more than 100 ns 245. This timescale is longer than assumed by reasonable 60

computational cost44.

Page 14: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

14 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

In quantum chemical calculation using HF and DFT levels, a number of solvation models have emerged. To improve the calculation performance polarizable continuum model (PCM) represents solvent as a continuum rather than individual molecules246. Several modifications of the continuum model 5

differing in interpretation of the solvent electric conductivity led to development of DPCM (solvent is treated as a dielectric) and CPCM (solvent is treated as a conductor) models247. The performance of continuum models in various solvents and their influence on geometry optimization of solute molecules were 10

addressed248, 249. Marenich and coworkers presented several solvent-independent continuum solvation models, including those based on the quantum mechanical charge density of a solute and parameterized for various organic compounds250, 251. Among them, SM8 claimed to be the most accurate continuum solvation 15

model for prediction of the free energies of solvation of molecular solutes252. Conductor-like screening model (COSMO) of solvation treats solvent as a conducting continuum located outside the molecular cavity. The shape of the cavity depends on a certain 20

representation of method and is usually constructed from Wan der Vaals radii of the atoms of the modeled compound. In contrast to PCM, COSMO derives the solvent polarization from the distribution of the electric charge of the solute. It is more accurate for solvents with higher permittivity, such as water, 25

which can be more likely modeled as a conductor253. Bagno and coworkers tested the QM prediction of the NMR parameters of glucose in water for the snapshots taken from the MD simulation of a target molecule with up to 5.5Å water sphere. Application of COSMO at the last step of DFT processing did not 30

have a valuable effect on the accuracy of chemical shift calculations254. An explicit solvent model is physically appropriate for charged molecules with strong solute-solvent interactions255. As an example, explicit inclusion of water molecules and counterions 35

allowed a comparative study of conformational, solvent, and counterion effects on coupling constants in a heparin unit (Fig. 7)240. An example of explicit inclusion of water in HF GIAO calculations of a large molecular system within a linear-scaling method has been described224. The hybrid implicit/explicit 40

solvation was investigated by Lee and coworkers, implying explicit hydration of a solute by a layer or a sphere of water

molecules, while the bulk solvent is modeled as a continuum256. ONIOM-PCM approach provides a good opportunity for investigation of hybrid solvation models257. 45

For further details of particular solvation methods and their scope and limitations please refer to the dedicated publications242,

258.

5. Computation of NMR chemical shifts

Chemical shifts have been recognized as characteristic indicators 50

of primary and regular secondary structure of carbohydrates. This section summarizes recent applications of semi-empirical and quantum chemical computations to the prediction of the NMR shielding parameters in glycans and their derivatives. Techniques used to calculate chemical shift tensors in general organic 55

chemistry were reviewed elsewhere259. It should be noted that a direct output of chemical shift calculations (e.g. GIAO) is an anisotropic chemical shielding tensor, which can be later converted to the isotropic chemical shielding observed in liquids: σiso=(σ11+ σ22+ σ33)/3, where σii are 60

the principal components of a magnetic shielding tensor expressed along three orthogonal axes in a molecule. The chemical shift is expressed as the difference between shielding of a reference compound (normally TMS, processed at the same level of theory as a target molecule) and the calculated shielding. 65

The operation of conversion of the shielding tensor to the isotropic chemical shift is often implemented in programs providing the interface to quantum chemical software packages. A chemical shift surface (CSS, example in Fig. 8, discussed below) term is used to reflect the dependence of the chemical 70

shift of the atoms in close proximity to the glycosidic bond on its φ and ψ torsion angles.

5.1. Monosaccharides and derivatives

The following analysis of available literature data and corresponding discussion are sorted by the increasing complexity 75

of the studied system. The current section covers the results obtained for monosaccharides and their derivatives containing a single sugar ring, where the basic fundamental properties and relationship between the NMR data and molecular structure can be revealed (Table 3). 80

Table 3. GIAO prediction of chemical shifts in monosaccharides, their derivatives and conjugates

Object (molecule) Parameter a

: nuclei Calculation method Application ref.

Geometry Shielding Software

α-D-Glcp, β-D-Glcp (population-weighted conformers in aqueous solution)

CS: 1H, 13C B3LYP/ 6-31G(d,p), Solvation energies: B3LYP/6-311++G(2d,2p)

B3LYP/pcJ Gaussian 03 134,

135 analysis of the experimental data

260

β-D-Glcp (five conformers)

CS: 1H, 13C, 17O

MP2/cc-pVDZ ONIOM [MP2 : HF/ 6-311++G(2d,2p)]

Gaussian 98, TURBOMOLE 261, 262

validation of ONIOM and providing guidelines for the selection of ONIOM model systems

263

α-D-Glcp, β-D-Glcp

CS: 1H, 13C, 17O

MM+ B3PW91/6-31+G(d) RHF

Gaussian 94, HyperChem 4.5 139, 140

validation of a DFT GIAO calculation on an MM+ geometry

264

Page 15: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 15

Table 3, continued

β-D-Glcp CS: 13C (solid state)

X-ray B3LYP/ 6-311+G(2d,p) (GAIOCHF procedure)

Gaussian 03 theoretical investigation of effects of the conformation and hydrogen bonding on 13C isotropic chemical shifts

265

α-D-GlcpNH3+-1,4Me2

(chitosan monomer model) CST: 1H, 15N, 17O (solid state)

X-ray; B3LYP/ 6-31++G(d,p) for protons only

B3LYP/ 6-311++G(d,p), B3LYP/6-31++G(d,p)

Gaussian 98 investigation of the hydrogen-bonding effects on the CS tensors

266

α-D-GlcpN (chitosan monomer)

CST: 1H, 15N, 17O (solid state)

X-ray; B3LYP/ 6-31++G(d,p) for hydrogens only

B3LYP/ 6-311++G(d,p) B3LYP/6-31++G(d,p)

Gaussian 98 investigation of the hydrogen-bonding effects on the CS tensors of an anhydrous crystalline structure

267

α-D-Glcp (gas phase)

CS: 1H, 13C B3LYP/ 6-31G(d,p), B3LYP/ 6-31+G(d,p)

B3LYP/cc-pVTZ; B3LYP/aug-cc-pVTZ

Gaussian 03 (QM calculations); MOSCITO130,

131 (MD simulations)

investigation of the solvent effects and comparison of calculation methods

254

α-D-Glcp (in aqueous solution)

CS: 1H, 13C B3LYP/ 6-31G(d,p), MD (OPLS-AA-type)

glucose: B3LYP/cc-pVTZ; BP86/TZ2P; water: TZP.1s; B3LYP/6-31G**

(PhO)2-P-6)-α-D-Glcp-1OMe, PhO-P-6)-α-D-Glcp-1OMe, PhO-P-6)[L-Gly(1-3)]-α-D-Glcp-1OMe,

CS: 13C, 1H, 31P

B3LYP/6-31G(d) B3LYP/6-31G(d) Gaussian 03 clarifying the structural details of the synthesized esterified methyl α-D-glucopyranoside derivatives

268

β-D-Fucp × Toluene, β-D-Glcp × 3-methylindole, β-D-Glcp × p-hydroxytoluene

CS: 1H DFT-D BLYP/TZV2D

BLYP/TZV2D Gaussian 03 (modified)

investigation of the carbohydrate–protein recognition on models

269

α-D-Lyxp-OMe, α-D-Lyxp-OMe × (H2O)1-3

13C (solid state, shielding constants)

X-ray data + PM3 B3LYP/6-31G(d) Gaussian 98 (QM calculations) HyperChem 5.02 (geometry)

confirmation of the 13C CP/MAS NMR and crystal structure analysis data and studies of the hydrogen bonding effects

270

α-D-Galp CS: 1H, 13C, 17O (solid state)

PBE/planewave, KT3/planewave, Vanderbilt's “Ultrasoft” pseudopotentials

GIPAW PBE/planewave,, KT3/planewave

CASTEP271-273

distinguishing hydrogen bonding network patterns by 1H chemical shift analysis; comparison of PBE and KT3

274

(PhB) β-D-Ribp2,4H-2, (PhB)2 β-D-ArapH-4, (PhB)2 α-D-Xyl fH-4, (PhB) α-D-Lyxf2,3H-2 (phenylboronic esters)

13C (shielding constants)

B3LYP/ 6-31+G(2d,p), PCM

PBE1PBE/ 6-311++G(2d,p), PCM

Gaussian 03 approval of a QM method as a tool for 13C NMR chemical shifts prediction

275

α-D-Lyxf, α-D-Lyxp 1C4, α-D-Lyxp 4C1, α-D-Glcp 4C1, α-D-Glcf

CS: 13C, 1H (α-D-Glcp)

BP86/TZVP, B3LYP/TZVP, MP2/TZVP, AM1

BP86/TZVP, B3LYP/TZVP, MP2/TZVP, HF SCF/TZVP

TURBOMOLE (QM calculations), HyperChem (semi-empirical calculations)

comparison and validation of theory levels and solvent models; the study of C6-O6 torsion effect on 13C NMR chemical shifts

276

D-Manp-1OMe, D-Galp-1OMe, D-Glcp-1OMe, D-Xylp-1OMe, D-Frup-1OMe, L-Sorp-1OMe, L-Rhap-1OMe, Erythritol (statistical study)

CST: 13C (solid state)

Neutron diffraction data

RHF, HFB, HFS, BLYP, B3LYP, B3P86, BVWN, SVWN, MPW1PW91 /cc-pVDZ, /cc-pVTZ

Gaussian 03 comparison of DFT and HF functionals

225

Page 16: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

16 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

Table 3, continued

P-3:5)β-D-Ribf-1U (cUMP in aqueous solution)

CS: 1H, 13C B3LYP/ 6-31G(d,p)

B3LYP/cc-pVTZ Gaussian 03 testing the prediction method and selection of the appropriate solvent model

277

β-D-2-deoxy-Ribf-A, β-D-2-deoxy-Ribf-G, β-D-2-deoxy-Ribf-C, β-D-2-deoxy-Ribf-T

CST: 13C (C1’), 15N (sugar-linked nitrogen)

B3LYP/ 6-31G(d,p)

B3LYP/ (9s,5p,1d/5s,1p) [6s,4p,1d/3s,1p] for C, N and O; B3LYP/ (5s,1p) [3s,1p] for H (IGLO II)

Gaussian 03 studying the dependence of N1/9 and C1’ chemical shielding tensors on the C1’-N torsion angle and sugar pucker

278

β-D-Glcp α-D-Glcp β-D-Glcp-2,3,6Ac α-D-Glcp-2,3,6Ac

CS: 13C (C1) (solid state)

B3LYP/ 6-311+g(d,p)

B3LYP/6-311+g(d,p) Gaussian 03 studying molecular environment in the chiral cavities of commercial polysaccharide-based sorbents (CDMPC, ADMPC, ASMBC)

279

β-D-Xylp-OMe CS: 13C (all) CST: 13C (all), O1, O5, H1

BP86/TZVP MM2

PW91/IGLO-III deMon-KS, demon-NMR280-

282, MacroModel 128,

129 V5.0 (MM calculations)

calculation of chemical shielding dependence on the dihedral angle between C1 and methyl group

283

α-D-Xylp-OMe CS: 13C (all) CST: 13C (all),

PW91/TZVP MM

PW91/B-III 284

R-(1-4)-3,6-anhydro-α-D-Galp-OMe, R-(1-4)-3,6-anhydro-D-Gal-ol, (R = 3,4-dideoxy-β-D-erythro-hexopyranose)

CS: 1H, 13C MM3, B3LYP/ 6-31+G** (selected conformers)

GIAO modified MM3 (1992-2000) 162,

285 (MM calculations), Gaussian 98W (QM calculations)

study of signal displacement upon transition from a pyranose to the open form of anhGal

286

a Notations: CS – chemical shift; CST – chemical shift tensor.

Roslund and coworkers performed a complete assignment of the NMR spectra of α- and β-D-glucopyranose by iterative fit using PERCHit software. To support the experimental data they 5

calculated the 1H and 13C NMR chemical shifts of the glucose non-hydroxyl protons at the B3LYP/pcJ-2 (NMR data) and B3LYP/6-31G(d,p) (geometry) levels of theory. The authors chose a set of three conformers for α-D-glucose and five for β-D-glucose, as most stable in aqueous solution. They obtained 10

relative stability of the conformers (∆G°+solvation energies) and estimated their population assuming Boltzmann conformer distribution. The correlation between the population weighted averages of the calculated 1H NMR chemical shifts and the corresponding experimental values were surprisingly good (linear 15

correlation 0.976-0.977; MAD 0.11 ppm for α-D-Glcp and 0.07 ppm for β-D-Glcp). The correlation factor of calculated vs. experimental 13C NMR chemical shifts was also good (0.994-0.995), however the calculated spectrum was systematically ~10 ppm downfield260. The coupling constants in glucose were also 20

predicted (see details in section 6.1). Rickard and coworkers263 compared 13C, 1H, and 17O NMR chemical shifts obtained by HF-GIAO, MP2-GIAO, and ONIOM(MP2-GIAO:HF-GIAO) for five most stable conformers of β-D-Glcp and provided sample model systems for usage in 25

post-HF chemical shift predictions of larger carbohydrates. Six small model systems including 6 or 7 heavy atoms were taken out of the whole molecule of each conformer without changing its geometry. Severed bonds were saturated with hydrogens.

The results from HF-GIAO and MP2-GIAO differed 30

dramatically, especially for the anomeric carbon and the ring oxygen. ONIOM(MP2-GIAO:HF-GIAO) with three-carbon model system was capable to yield chemical shieldings in good agreement with the results from the whole-molecule MP2-GIAO calculations, except for the ring oxygen. Maximal discrepancies 35

for 4C1 conformers were: 2 ppm for 13C (C5), 0.09 ppm for 1H (hydroxyl group at C4) and 2.21 ppm for hydroxyl 17O (hydroxymethyl group). Maximal discrepancies for 1C4

conformers were: 1.15 ppm for 13C, 0.29 ppm for 1H (anomeric hydroxyl group). 40

The results for the ring oxygen in the 4C1 conformer indicated that a small model system, in which the severed bonds are only one bond away from the atom under calculation, was not enough to model the shielding of this atom. In contrast, 1C4 conformer exhibited good agreement for the ring oxygen chemical shift and 45

poor agreement for hydroxyl oxygens that formed hydrogen bonds to non-neighboring centers. To resolve these issues authors used 9-atom model system and decreased the discrepancy to 1.40 ppm (4C1 ring oxygen) and to less than 0.5 (1C4 hydroxyl oxygens). Authors conclude that a model system should preserve 50

outcoming hydrogen bonds for the accurate prediction of the oxygen chemical shifts. The best correlation between experimental and calculated 13C NMR chemical shifts was achieved on the 4C1G

+ and 4C1G-

conformers (see Fig. 2), as expected from the predominance of 55

these two forms in aqueous solution287. Both MP2 and ONIOM(MP2-GIAO:HF-GIAO) levels were found to represent

Page 17: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 17

proton and carbon chemical shifts well, whereas gas-phase prediction of the oxygen chemical shifts much poorer correlated with the solution experimental data263. These calculations confirmed the earlier work by Kupka et al., in which DFT GIAO demonstrated better convergence than RHF method in application 5

to 1H, 13C and 17O chemical shifts of glucopyranose and its 1-C-methyl and 1-O-methyl derivatives264. Hydrogen bonds play an important role in shaping of polysaccharide molecules, and their characterization can reveal biological properties of polysaccharides266, such as recognition of 10

carbohydrate antigens by host antibodies. The nature of hydrogen bonds is strongly dependent on the electrostatic interaction, and the chemical shielding tensors at the magnetic nuclei were shown to be highly sensitive to hydrogen bond effects. High-level QM calculations were essential for the interpretation of the 15

experimentally observed isotropic chemical shifts. It was suggested that analysis of hydrogen bonding network with optimized proton positions and subsequent 1H chemical shift prediction can not only confirm, but also reveal a carbohydrate structure274. 20

Suzuki and coworkers have conducted a theoretical investigation of effects of conformation and hydrogen bonding on solid state isotropic 13C NMR chemical shifts for β-D-glucose and its oligomers. The absolute values of the predicted chemical shifts of a β-D-Glcp molecule extracted from the X-ray structure 25

without further optimization exposed a bias against the experimental CP/MAS 13C NMR, but the relative resonance positions were in reasonable agreement. The experimental linear relationship288 between the C6 chemical shift and the C6-O6 torsion angle in three predominant conformations of the 30

hydroxymethyl group was reproduced computationally, as well as dependencies for C4 and C5. In order to examine the effect of the intramolecular hydrogen bonding on 13C NMR chemical shifts in D-glucose (in gt conformation, see Fig. 2), authors calculated chemical shifts of the ring carbons as a function of the torsion 35

angle around the C3-O3 bond. C2 and C4 showed a strong dependency, which was explained by γ-gauche effect produced by the hydrogen atom. In contrast to the well-known γC-gauche effect of approx. -5 ppm289, γH-gauche effect induced an increase of the 13C NMR chemical shift by +3..+5 ppm if not reduced by 40

the formation of intramolecular hydrogen bonds265. Effects of various possible hydrogen bonds (including those with hydroxymethyl group) on the chemical shifts of all carbons in β-D-Glcp were analyzed. Khodaei and coworkers conducted a DFT study to calculate 45

the solid-state NMR parameters in crystalline chitosan/HI type I salt and showed the hydrogen bonding effects on the CS tensors266. They calculated the CS tensors of 17O, 15N, 13C, and 1H nuclei for two model systems: the monomer (non-hydrogen-bonded α-D-GlcpNH3

+-1,4Me2) and the target molecule in a 50

cluster. Both models were created from the X-ray coordinates, with subsequent optimization of protons at B3LYP/6-31++G(d,p). Esrafili et al. studied hydrogen bonding effects on the 17O, 15N, 13C and 1H CS tensors of crystalline anhydrous chitosan by comparison of the chitosan hexameric cluster to a 55

corresponding gas-phase monomer (α-D-GlcpN) 267. Both studies were dedicated mainly to a cluster model and corresponding details are given in the next section.

Bagno and coworkers applied several computational protocols combined from DFT and MD simulations to the prediction of the 60

alkyl 1H and 13C NMR chemical shifts of α-D-glucose in water254. For gas-phase calculations, geometry optimizations were carried out at B3LYP/6-31G** level. B3LYP/6-31+G** level of theory was also used to test the effect of adding diffuse functions, which were reported to be very important for 65

carbohydrates290. The NMR parameters were calculated using the adopted cc-pVTZ or aug-cc-pVTZ basis sets. MAD averaged from data for the three conformers concerning population distribution amounted to 7.1 ppm (13C) and 0.14 ppm (1H). In spite of satisfactory MAD on the absolute scale, the accuracy was 70

insufficient to assign all signals. Although a good correlation was observed (R2=0.994), 13C NMR chemical shifts were systematically overestimated. Having compared the calculated spectrum to the experimental one, the study showed that both the flexibility of the glucose molecule and the strong effect exerted 75

by water should be taken into account. In case of solution phase calculations, a structure can hardly be simulated by DFT calculations on the solute embedded in a small cluster of solvent molecules. The bias introduced by size effects and the flat potential energy surface is additionally complicated 80

by a flexible solute, such as glucose. To separate structural and solvent effects, glucose shieldings have been calculated at the B3LYP/cc-pVTZ as averages over 50-100 MD snapshots (until the convergence was reached) using a series of protocols, each of them emphasizing either a solute or a solvent. 85

In protocol a, authors used the glucose molecule geometry from the modified OPLS-AA force field calculation without explicit water molecules, but included the solvent effects using a PCM. Protocol b differed from protocol a by reoptimization of Glc at B3LYP/6-31G** prior to the NMR calculation. These two 90

protocols allowed sampling of the conformations of glucose hydroxyl groups and their rotameric distribution, included the solvent reaction field but did not include specific solvent effects. The protocols c-f employed the geometry of glucose obtained from MD simulations. In protocol c the authors included water 95

molecules surrounding glucose up to 5.5 Å from the glucose center of mass. Water molecules were modeled by TIP3P point charges and combined with glucose using ONIOM. In protocol d, glucose was simulated at BP86/TZ2P and water was at BP86/TZP.1s. Protocol e utilized B3LYP/6-31G** for water and 100

PCM solvent; f was the same as d plus COSMO solvent. The only protocol with DFT optimization of the glucose, namely protocol b, demonstrated the best correlation for both 1H (R2=0.987) and 13C (R2=0.997) NMR chemical shift simulations, and also produced the lowest MAD for 13C (1.12 ppm). From this 105

data the authors concluded that the most important factor that affected the accuracy of computed 1H NMR chemical shifts was the solute geometry, while the solvent effect could be reasonably described by self-consistent reaction field models. As judged by protocol performance comparison, glucose geometry could not be 110

accurately modeled by MD simulations alone. Surprisingly, there was no need of explicit water inclusion for the shielding constant calculation254. 13C NMR chemical shifts exhibited only a minor dependence on the solvent. Chelmeka and coworkers carried out DFT GIAO calculations 115

of 1H, 13C and 31P NMR chemical shifts at B3LYP/6-31G* level

Page 18: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

18 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

for three synthesized methyl α-D-glucopyranoside 6-phosphate derivatives esterified by phenyloxy groups and glycine. Based on the comparison of the calculated and the experimental data authors concluded that the target compound having an amino group and a phosphate occurs in the neutral form, rather than as a 5

zwitterion, and selected one of two stereoisomers differing in the absolute configuration of a phosphorus atom. The details of comparison, deviation values and correlation factors were not reported268. Electronic structure calculations were performed on complexes 10

of β-D-fucose and β-D-glucose with toluene, p-hydroxytoluene and 3-methylindole in order to model the carbohydrate–protein recognition269. The three aromatic molecules were used as analogues of phenylalanine, tyrosine and tryptophan, respectively. The work focused mainly on vibrational frequencies 15

and energy predictions using a DFT model with added empirical atom-atom dispersive term with r-6 distance dependence. The authors validated this combined model known as DFT-D291 against PM2/aug-cc-pVTZ calculations and showed that difference between DFT-D and high-level ab initio results was 20

less than 1 kcal/mol for galactose-benzene and fucose-toluene complexes. Proton chemical shifts for the geometries from DFT-D were calculated using DFT GIAO at BLYP/TZV2D level and confirmed the observation that they are strongly perturbed by 25

complexation with aromatic groups292. The reproduction of the experimental chemical shifts was poor, however their values, relative to those in free sugars, corresponded to the vibrational frequencies of CH protons269. Paradowska et al. utilized GIAO DFT calculations to confirm 30

the results of 13C CP/MAS NMR and crystal structure analysis of a series of methyl pentopyranosides. The authors used the GIAO CPHF approach at B3LYP/6-31G* level to study a hydrogen bonding effect on α-D-Lyxp-OMe surrounded by water molecules forming mono- to tri-hydrates at C2, C3 and C4. The 35

starting geometry of D-Lyxp-OMe was taken from the X-ray data and optimized in PM3 empirical force field. The calculations yielded 13C shielding constants correlated with experimental chemical shifts with R2 = 0.993 (isolated molecule) and R2 = 0.992..0.997 (hydrates) 270. The authors could not separate effects 40

produced by different hydrogen bonds but confirmed an increase of stability with every next hydrogen bond observed for α-D-lyxofuranoside earlier293. NMR observables of methyl D-xylopyranosides were predicted with the use of DFT for geometries optimized with the 45

fixed dihedral angle φ of the C1-OMe bond283, 284. Comparison of the calculated chemical shifts with the experimental data from α- and β-anomers both in solid state and in solution allowed authors to point out the basic dependencies of the chemical shifts on this dihedral angle. The derived dependence of 1J and 3J on the C1-50

O1 torsion angle was proposed as a conformational probe. Reichvilser and coworkers studied four aldo-pentoses to test their suitability as linear linkers for the formation of covalent organic boronic ester networks. As judged by the X-ray structures of the reaction products with phenylboronic acid, arabinose and 55

xylose formed diesters, while lyxose and ribose formed 2,3- and 2,4-monoesters, respectively. 13C NMR shielding constants were calculated by DFT GIAO at PBE1PBE/6-311++G(2d,p) level of

theory in order to prove the applicability of QM methods to the prediction of 13C NMR chemical shifts of these and similar 60

compounds. The structures were optimized at B3LYP/6-31+G(2d,p) with tight convergence criteria and an ultra-fine integration grid, and proved by frequency analyses. PCM was used to model solvation in DMSO during both geometry and NMR calculations. The 65

authors achieved a linear correlation between experimental chemical shifts and predicted shielding constants (correlation factor and numerical values of shielding constants were not provided) 275. Taubert and coworkers studied the 13C NMR chemical shifts 70

for α-D-lyxofuranose, α-D-lyxopyranose 1C4, α-D-lyxopyranose 4C1, α-D-glucopyranose 4C1, and α-D-glucofuranose at ab initio and DFT theory levels using TZVP basis set276. Test calculations showed B3LYP/TZVP and BP86/TZVP to be cost-efficient levels of theory for calculation of the NMR chemical shifts in 75

monosaccharides. Geometry and NMR parameter calculation were checked against ab initio HF SCF and MP2 predictions and X-ray data. The basis set convergence was checked on tetramethylsilane by employing a variety of basis sets, including large ones. Molecular structures and chemical shifts calculated at 80

B3LYP/TZVP level was similar to those obtained at the MP2 level (-0.6..+0.6 ppm for pyranoses and +0.4..+4.0 ppm for furanoses). MAD of the calculated (both at B3LYP and MP2; without solvent effects) 13C NMR chemical shifts from the measured 85

values was 5.0-5.7 ppm, and is 7.2 ppm at BP86. Authors pointed out that a better shielding reference, such as methanol, decreased the largest deviation to 4 ppm and subsequent adding empirical constant shift (-1..-2 ppm) to the calculated values improved the agreement further. As judged by the dedicated investigation of 90

α-D-Glcf at fixed values of the C5–C6–O6–H dihedral, torsional movement of C6 introduced up to ±2 ppm chemical shift correction (to all carbon atoms) at those angles that existed in the equilibrium. The authors also tested four explicit solvent models for 95

α-D-Glcf: either a shell of 116 water molecules or a shell of those 11 water molecules forming hydrogen bonds with a solute; either allowing or forbidding the whole system to relax after water addition. None of the models was good enough to reproduce experimental 13C NMR chemical shifts with acceptable accuracy. 100

COSMO solvent model253 provided better results but still -9 ppm deviation for C6, which made the model inappropriate. As for 1H NMR chemical shifts, they were predicted for α-D-Glcp 4C1. The systematic deviation of ca. 3 ppm for hydroxyl protons was accounted for hydrogen bonding, whereas 105

solvent effects on the 1H NMR chemical shifts of the aliphatic protons were small (less than 0.4 ppm, except 1.3 ppm for the anomeric proton) 276. Bagno and coworkers presented an experimental and quantum chemical NMR study of the mononucleotide cyclic 110

uridinemonophosphate in water277. They calculated 1H and 13C NMR chemical shifts and 1H–1H, 13C–1H, 31P–13C and 31P–1H coupling constants using DFT. The NMR parameters and the conformer distribution were calculated at B3LYP/cc-pVTZ level. Solvent reaction field has been included using the PCM model for 115

NMR only (protocol b), for both geometry and NMR (protocol c),

Page 19: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 19

or for none of (protocol a). cUMP has only two conformational degrees of freedom: the hydroxyl group on C2’ and the dihedral angle C6–N1–C1’–C2’ between the ribose residue and the nucleobase. Due to this, a search for the conformers was done by scanning the potential energy surface, rather than by full MD 5

simulation. After optimization at B3LYP/6-31G(d,p), 24 obtained structures converged to three minima almost isoenergetic in aqueous solution. Protocol c allowed obtaining a good correlation with the experimental chemical shifts (R2=0.986 for 1H; R2=0.996 for 13C) 10

and placement of signals in the correct order. Comparison of data from different protocols showed that the solvent effects were essential for the calculation of the NMR properties but not important for the geometry optimization, as observed earlier for D-glucose in water254. 15

This study confirmed that the 1H and 13C spectra of polar, flexible molecules in aqueous solution can be predicted with the same accuracy as less complex systems. No explicit inclusion of water molecules was needed to achieve this accuracy, but the usage of PCM was necessary277. 20

Sychrovsky and coworkers applied QM methods to investigate the dependence of N1/N9 and C1′ CS tensors on the glycosidic torsion angle and sugar pucker in four standard

2′-deoxynucleosides (dAde, dGua, dCyt, dThy). The study aimed at prediction of the cross-correlated relaxation rates between the 25

shielding tensor of the sugar-linked nitrogen of a nucleobase and the C1′-H1′ dipole-dipole278 (see section 7). All geometrical parameters were gradient optimized at the B3LYP/6-31G(d,p) level, except the C1’-N torsion angle fixed at different values. The shielding tensors were calculated using 30

IGLO II basis sets specific for each nuclei and exhibited a significant degree of conformational dependence on C1’-N dihedral angle and sugar pucker. No numerical values for CST components and chemical shifts were provided except dependence of the isotropic 15N chemical shift on the glycosidic 35

torsion angle for C2’-endo and C3’-endo sugar puckers of every deoxynucleoside.

5.2. Oligosaccharides and polysaccharides

Combining monosaccharides into the more complex and diverse structures of oligo- and polysaccharides leads to various 40

structural changes reflected by the NMR observables. Therefore, discussion of computational modeling and NMR structural studies of complex glycans and their derivatives (Table 4) is essential.

45

Table 4. Prediction of chemical shifts in oligo- and polysaccharides.

Object (molecule) Parameter a

: nuclei Calculation method Application ref.

Geometry Shielding Software

β-D-Glcp-(1-4)-β-D-Glcp (from 4-fold helical ASMBC) α-D-Glcp-(1-4)-α-D-Glcp (from 3-fold helical CDMPC)

CS: 13C (C1) (solid state)

B3LYP/ 6-311+g(d,p)

B3LYP/ 6-311+g(d,p)

Gaussian 03134, 135 studying molecular environment in the chiral cavities of commercial polysaccharide-based sorbents (CDMPC, ADMPC, ASMBC)

279

β-D-Glcp-(1-1)-β-D-Glcp α-D-Glcp-(1-1)-α-D-Glcp α/β-D-Glcp-(1-2)-D-Glcp α/β-D-Glcp-(1-3)-D-Glcp α/β-D-Glcp-(1-4)-D-Glcp α/β-D-GlcpNAc-(1-3)-L-Thr,1-NHMe,2Ac α/β-D-GlcpNAc-(1-3)-L-Ser,1-NHMe,2Ac

CSS(φ,ψ): 13C (C1)

AM1 HF/3-21G, HF/6-311G**

Tripos Sybyl294 6.5 (model build), Spartan127 5.0.1 (semi-empirical calculations), Gaussian 98 (QM calculations), Wolfram Mathematica295 3.0 (trigonometric fit)

studying the dependence of the anomeric carbon chemical shift on the glycosidic bond dihedral angles in oligosaccharide and glycopeptide model compounds

152

α-D-Glcp-(1-1)-α-D-Glcp, β-D-Glcp-(1-4)-β-D-Glcp, [-4)-β-D-Glcp-(1-]4, [-4)-β-D-Glcp-(1-]6, -4)-β-D-Glcp-(1-, -4)-α-D-Glcp-(1-

CSS(φ,ψ): 13C (glycosidic bond carbons)

AM1 (monosaccharides constrained in 4C1)

HF/3-21G, HF/6-311G**

Sybyl 6.5 (model build) Spartan 5.0.1 (semi-empirical calculations) Gaussian 98 (QM calculations) Mathematica 3.0 (trigonometric fit)

determination of a 3D structure

151

D-Glcp-α-(1→4)-D-Glcp (α-, β-, γ-, ε-, and ι-cyclodextrins)

CSS(φ,ψ): 13C (C1)

X-ray, AM1

HF/3-21G, 6-311G**

Sybyl 6.8 (model build), Spartan 5.0.1 (semi-empirical calculations), Gaussian 98 (QM calculations), AMBER 6.0 178, 179 (MD simulations), Mathematica 3.0 (trigonometric fit)

testing the prediction methodology and computation of the anomeric carbon chemical shifts in cyclodextrins

150

D-Glcp-α-(1→4)-D-Glcp (α-, β-, γ-, ε-, and ι-cyclodextrins)

CSS(φ,ψ): 13C (C1)

HF/6-31G* ONIOM (B3LYP/ 6-31G*: HF/6-31G*)

B3LYP/6-31G* ONIOM (B3LYP/ 6-31G* : HF/6-31G*)

Gaussian 03, Mathematica 3.0 (trigonometric fit)

computation of the anomeric carbon chemical shifts in cyclodextrins

226

Page 20: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

20 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

Table 4, continued

β-D-Glcp-(1-4)-β-D-Glcp (cellobiose)

CS:13C X-ray B3LYP/6-311+G(2d,p) (GAIOCHF procedure)

Gaussian 03 theoretical investigation of effects of the conformation and hydrogen bonding on 13C isotropic chemical shifts

265

β-D-Glcp-(1-4)-β-D-Glcp (cellobiose)

CS: 13C (C1-C4)

MD (GROMOS)

HF/6-31G(d) Gaussian 09 (QM calculations), GROMACS166 3.3 (MD simulations)

modelling of the conformational space of amorphous cellulose

296

-4)-β-D-Glcp(1- (Iα and Iβ cellulose)

CS: 13C (solid state)

GIPAW PBE/planewave

mPW1PW91/ 6-31G(d)

VASP297 5.4 (geometry), Gaussian 09 (NMR)

conformational studies of cellulose

298

-4)-β-D-Glcp(1- (Iα and Iβ cellulose)

CS: 1H, 13C, 17O (solid state)

B3LYP/6–31+G* (hydrogens only)

B3LYP/6-31+G*, B3LYP/ 6-31++G**

Gaussian 03 investigation differences in crystalline structure and hydrogen bond pattern in Iα and Iβ cellulose

299

α-D-Glcp-(1-4)-α-D-Glcp, α-D-Glcp-(1-4)-β-D-Glcp

CS: 1H, 13C (solid state)

PBE/planewave GIPAW PBE/planewave

CASTEP271-273 (geometry optimization), PARATEC code300, 301 (QM calculations)

investigation of weak hydrogen bonding

302

α-D-Glcp-(1-2)-β-D-Fruf (sucrose)

CST: 13C (solid state)

Neutron diffraction data

RHF, HFB, HFS, BLYP, B3LYP, B3P86, BVWN, SVWN, MPW1PW91 /cc-pVDZ, /cc-pVTZ

Gaussian 03 comparison of DFT and HF functionals

225

α-D-Glcp-(1-1)-α-D-Glcp (α,α-D-trehalose), β-D-Galp-(1-4)-β-D-Glcp, α-D-Glcp-(1-2)-β-D-Fruf (sucrose)

CSS(φ,ψ): 13C (C1) (amorphous state)

MM (BIO85, CHARM27, AMBER) on fixed φ and ψ

TNDO, B3LYP/ 6-31+G(d,2p), B3LYP/ 3-21+G**, B3PW91/ 3-21+G**

HyperChem 139, 140 (geometry), Gaussian 03 (chemical shifts)

exploration of the local structure of sugars in glassy state

303

complex of E-selectin with sialyl Lewis X b

CS: 1H QM/MM (see section 4.4)

QM/MM-GIAO (HF/6-31G*)

Own QM/MM program based on HONDO package304

validation of the geometrical modeling

238

α-D-Glcp-(1-2)-β-D-Fruf (sucrose), α-D-Glcp-(1-1)-α-D-Glcp (α,α-D-trehalose), α-D-Glcp-(1-4)-α-D-Glcp (D-maltose)

CS, CST: 13C (solid state)

X-ray and neutron diffraction data, PBE/planewave, Vanderbilt's “Ultrasoft” pseudopotentials

GIPAW PBE/planewave, Troullier-Martins norm-conserving pseudopotentials

CASTEP271-273

comparison of calculations to the chemical shift anisotropy amplification data

305

α-D-GlcpNH3+ / I-

(chitosan salt cluster) CST: 1H, 13C, 15N, 17O (solid state)

X-ray; B3LYP/ 6-31++G(d,p) for hydrogens only

LD: B3LYP/ 6-311++G(d,p), B3LYP/ 6-31++G(d,p), 6-31G (other), LANL2DZ (iodine ions)

Gaussian 98 investigation of the hydrogen bonding effects on the CS tensors

266

α-D-GlcpN (chitosan cluster)

CST: 1H, 13C, 15N, 17O (solid state)

X-ray; B3LYP/ 6-31++G(d,p) for hydrogens only

B3LYP/ 6-311++G(d,p), B3LYP/ 6-31++G(d,p)

Gaussian 98 investigation of the hydrogen bonding effects on the CS tensors of anhydrous crystalline structure

267

β-L-Fucp(1-4)α-D-Galp-OMe, β-L-Fucp (1-4)α-D-Glcp-OMe, β-L-Fucp (1-3)α-D-Glcp-OMe

CS: 1H (hydroxyl protons)

B3LYP/ 6-31G(d) HF/ 6-311++G(2d,2p), B3LYP/ 6-311++G(2d,2p)

MM3 (geometry), Gaussian 98 (QM calculations)

studying the effect of hydration on the chemical shift of hydroxyl protons

306

a Notations: CS – chemical shift; CST – chemical shift tensor; CSS – chemical shift surface.

b Sialyl Lewis X is α-Neup5Ac-(2-3)-β-D-Galp-(1-4)-β-D-GlcpNAc-(3-1)-α-D-Fucp

Page 21: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 21

13C NMR chemical shifts of the anomeric carbons in oligo- and polysaccharides can be used as conformational probes due to their periodic dependence on the glycosidic bond dihedral angles307. Kasat and coworkers used GIAO calculations at DFT level to 5

predict the 13C NMR chemical shifts of the anomeric carbons in a carbohydrate backbone, as well as of carbons in non-carbohydrate side chains while studying molecular environment in the chiral cavities of polysaccharide-based sorbents, cellulose tris(3,5-dimethylphenylcarbamate) (CDMPC), amylase tris(3,5-10

dimethylphenylcarbamate) (ADMPC), and amylase tris[(S)-α-methylbenzylcarbamate] (ASMBC)279. The authors computed the anomeric carbon 13C NMR chemical shielding in the monomers of cellulose, amylase, amylase acetate and cellulose acetate (summarized in Table 3) and in dimers extracted from ASMBC 15

octamer with 4-fold helix and CDMPC nonamer with 3-fold helix optimized by DFT methods. The geometry of the octa- and nonamers was constructed from the X-ray data using linked-atom least-squares method308. The simulations showed that helicity strongly affects the C1 20

chemical shift, clarified effects of the side chains on polymer conformations and supported the hypothesis of a 3-fold helical conformation of CDMPC and of a 4-fold one of ADMPC, which is important for the explanation of enanthioseparation of racemates on these sorbents due to differences in their high-order 25

structure309. The authors conclude that the strength of H-bonds of the C=O and NH groups in the chiral cavities of these polysaccharide-based polymers are significantly different, which may be a major factor affecting the selectivity of chiral solutes279. Swalina and coworkers used GIAO calculation to study the 30

dependence between the anomeric carbon chemical shift and the glycosidic bond ⟨φ,ψ⟩ dihedral angles in oligosaccharide and glycopeptide model compounds152. They computed full chemical shift surfaces (CSSs) versus φ and ψ for D-Glcp-D-Glcp disaccharides with (1→1), (1→2), (1→3), and (1→4) linkages in 35

both α- and β-configurations. φ and ψ were fixed in 20° steps and the geometries were optimized using the AM1 semi-empirical Hamiltonian. To simulate an observed chemical shift CSSs were corrected by adding a correction factor of +7.1 ppm calculated from comparison of TMS and dioxane as references. After 40

Bolzmann averaging of CSS, accounting for the distribution of conformers, predicted chemical shift values exhibited an RMS deviation of 1.4 ppm from the experimental data. The authors derived empirical equation of the form 13C δC1=f(φ,ψ) obtained by fitting the raw ab initio data to the trigonometric series 45

expansions, following Le and coworkers310, and realized it as a Perl script. For a series of 91 and 325 terms, RMS between the raw and derived chemical shift values was 0.56 ppm and 0.31 ppm, respectively. To reduce the computational cost CSSs were calculated using 50

the 3-21G basis set and scaled using the reference 6-311** level calculations. To obtain the scaling factor, duplicate GIAO 13C calculations using the 3-21G and 6-311G** basis sets were performed on AM1-optimized models of eight disaccharides (96 carbons). The 13C NMR chemical shifts predicted using both 55

basis sets were then correlated (R2=0.992), and the resulting linear relationship was employed to scale 3-21G results. To test the approach, 13C CSSs were calculated using a locally-dense

basis set (6-311** for the anomeric carbons and nearest neighbors; 3-21G for the remaining atoms311) or 6-311G** basis 60

set in particular cases. The RMS deviation between the scaled CSS and the test CSS obtained with locally-dense or large basis set was less than 1 ppm for disaccharides. Similar surfaces were also obtained for GlcNAc-Thr and GlcNAc-Ser model glycopeptides in α- and β-configurations. 65

Selection of any of three different conformations of the peptide moiety (freely relaxed, extended and α-helical) virtually did not affect the CSSs. In contrast to the threonine derivative, the serine derivative possessed two CS maxima on the CSS. Authors explained it by the sterically induced polarization of the electron 70

density around the anomeric carbon caused by the methyl group of threonine152. The above methodology was used later in a number of studies. Particularly, Sergeev and Moyna utilized derivation of 13C CSSs for the determination of the spatial structure of glucose 75

oligosaccharides in solid state from the experimental 13C NMR data of glycoside bond carbons (Fig. 8)151. During the CSS derivation the level of theory, basis set and scaling procedure were the same as reported for model glycopeptides152. In order to take into account the experimental chemical shifts of the 80

glycosidic bond carbons during molecular modeling, the potential energy function of the MMFF94 force field was augmented with an NMR pseudopotential energy term. This term included a function derived from CSS using a 91-term trigonometric fit, and a constant chosen so as to give the force field energy term 85

compatible weight. The authors approved the method on α-(1→1) and β-(1→4)-linked oligosaccharides by reproducing the three-dimensional structure obtained from the X-ray studies with an RMS deviation of heavy atom positions equal to 0.14Å, 0.12Å and 0.25Å 90

(trehalose, cellobiose and cellotetraose, respectively)151. In contrast, lowest energy conformer of cellotetraose predicted in vacuo by MMFF94 without NMR constraints was significantly different. Further, the authors determined the spatial structure of cellohexaose and generated structural models for cellulose II and 95

amylose V6, using hexasaccharides as models and CSSs obtained from disaccharides. These studies supported φ and ψ estimates reported earlier312, 313. O’Brien and Moyna tested the same method on cyclo-oligomaltoses (α-, β-, γ-, ε-, and ι-cyclodextrin) and 100

α-cyclodextrin inclusion complexes with 1,4-disubstituted benzenes in solid state and in solution. They used the same approach as Swalina and coworkers, and the same basis sets for generation of the anomeric carbon CSSs and derivation of the empirical formula for the chemical shift. For the solid-state 105

structures, D-Glcp-α-(1→4)-D-Glcp glycosidic bond dihedral angles were taken directly from the X-ray data. The calculated solid-state 13C NMR chemical shifts of anomeric carbons of all residues in α-, β- and γ-cyclodextrin overestimated the observed CP/MAS data by ca. 0.8 ppm 110

(cyclodextrins) and 0.4-0.6 ppm (inclusion complexes). Calculations on ε-, and ι-cyclodextrins also predicted an average chemical shift within +0.8 ppm from the solution data and allowed characterization the band-flipped residues by the abnormal upfield shift of the anomeric carbon signal. 115

Page 22: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

22 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

O’Brien and Moyna also employed derivation of 13C NMR chemical shifts from averaging back-calculated 13C shift trajectories from a series of 5 ns MD simulations of α-, β- and γ-cyclodextrin with explicit TIP3P water molecules filling an octahedral buffer of 10 Å. Application of the empirical formula 5

obtained from solid-state CSSs calculation gave an excellent agreement with the solution 13C NMR data (MAD 0.36 ppm)150. Lefort and coworkers studied the local structure and conformational disorders of selected disaccharides in amorphous state by comparison of CPMAS data to 13C NMR chemical shift 10

surfaces of C1 calculated by GIAO for MM-optimized geometries. They provided a numerical procedure to treat discontinuities in the CPMAS spectrum, and demonstrated that force field geometry optimization did not critically hamper the accuracy of the results303. 15

Tafazzoli and Ghiasi studied the anomeric carbon chemical shifts of α-, β- and γ-cyclodextrins in solution using two-layer ONIOM method237. The higher level of theory (B3LYP/6-31G*) included all atoms in the pyranose rings, and the lower one (HF/6-31G*) included all other atoms. The PCM model was 20

employed to model the solvent effects. The 13C NMR chemical shift surfaces for C1 in D-Glcp-α-(1→4)-D-Glcp fragment in gas phase and in solution were calculated employing the GIAO B3LYP/6-31G* method and compared to the ONIOM (B3LYP/6-31G*: HF/6-31G*) results 25

obtained for a disaccharide model. The empirical equation relating isotropic 13C shifts with the glycosidic bond ϕ and ψ dihedral angles was derived using a trigonometric expansion. The calculated average chemical shift in solution deviated from the experimental data by -0.4..+0.8 ppm, and deviations of C1 30

chemical shifts for residues 1 and 2 in α-cyclodextrin were predicted with an accuracy of 0.6 ppm and 0.5 ppm, respectively226. Conformation of the cellulose fragments have been explored and probed against experimental NMR observables in a number 35

of publications151, 265, 296, 303. Kubicki and collegues achieved an RMS error less than 3 ppm for 13C chemical shift simulation with GIPAW in periodic boundary condition for tg/NetA conformations of cellulose Iα and Iβ298. Esrafili and coworkers obtained MAD of less than 7% in DFT 13C calculations of 40

cellulose spectra, and showed that 13C chemical shifts could serve a probe for differentiation between Iα and Iβ structures299. Suzuki and coworkers applied DFT calculations to reproduce experimental dependences of 13C NMR chemical shifts on the conformation of β-D-Glcp (see details in the previous section), 45

cellobiose and cellobiose units of native cellulose capped with hydrogen atoms. The geometry was extracted from the X-ray structure without further optimization. D-Cellobiose and the cellobiose units revealed appreciable dependences of the predicted C1’ and C4 chemical shifts on the torsion angles in the 50

(1→4)-β-glycosidic linkage. In a region of the crystalline conformational minimum C1’ chemical shift was found to depend mainly on φ, whereas C4 on both φ and ψ265. The authors explained calculated chemical shifts in disaccharide units basing on γH-gauche effects and their reduction by intra-residue 55

hydrogen bonding. On the contrary, inter-residue hydrogen bonding had almost no effect on 13C NMR chemical shifts.

Khodaei and coworkers used a molecule in a cluster as a model system for the chitosan/HI salt and calculated hydrogen bonding effects on the CST of 17O, 15N, 13C, and 1H nuclei (see details in 60

the previous section). According to the locally dense basis set method314 used to speed up the calculation, the target molecule and the neighboring nuclei directly involved in its hydrogen bonding were calculated at 6-311++G(d,p) and 6-31++G(d,p) basis sets, whereas the other nuclei were calculated at 6-31G and 65

LANL2DZ (iodine ions) basis sets. The authors observed that the theoretical B3LYP/6-311++G(d,p) isotropic 13C NMR chemical shifts overestimated the experimental values (MAD 5.8, least square linear fit with R2=0.97), while chemical shifts obtained from B3LYP/6-31++G(d,p) underestimated them (MAD 5.0, 70

least square linear fit with R2=0.96). They report the results from the 6-311++G(d,p) basis set as more reliable than those from the 6-31++G(d,p) one. The difference in the isotropic shielding between monomer and target molecule in a cluster was analyzed in respect with 75

O6H…O, O6H…I, NH..O and NH..I hydrogen bonding. The authors revealed a 40 ppm increase of predicted O6 chemical shift due to the intermolecular hydrogen bonding in a cluster, as compared to the monomer, while the difference at other oxygen sites was not so dramatic. NH hydrogen bonds reduced the 80

predicted isotropic 15N chemical shift by 18.39 ppm266. Esrafili and coworkers investigated hydrogen-bonding effects on the 17O, 15N, 13C and 1H CS tensors of anhydrous chitosan as compared to its monomeric unit (α-D-GlcpN) in gas phase. The DFT calculations were performed at B3LYP/6-311++G(d,p) and 85

6-31++G(d,p) for the X-ray geometry with protons reoptimized at B3LYP/6-31++G(d,p). Authors explained deviations in 17O, 15N, and 1H CST components and anisotropy by the formation of the hydrogen bonds, primarily O3H…O and NH…O. Good correlation between the predicted and experimental isotropic 13C 90

NMR chemical shifts (R2=0.985) indicated that hydrogen bonding effects in chitosan are sufficiently described when the neighboring chains are represented by monomeric units only. As followed from the QM calculations, the intra- and intermolecular hydrogen bonding played an essential role in determination of the 95

relative orientation of oxygen and nitrogen CST principal components in the molecular frame axes267. Bekiroglu and coworkers performed the comparative QM calculations on the disaccharides (β-L-Fucp-(1→4)-α-D-Galp-OMe, β-L-Fucp-(1→4)-α-D-Glcp-OMe, and β-L-Fucp-(1→3)-α-100

D-Glcp-OMe) using HF and DFT methods. They calculated the chemical shift difference (∆δ) between the hydroxyl protons in the disaccharide and the corresponding monosaccharide methyl glycoside. The lowest energy geometries of MM3 calculations were taken 105

as starting conformers for a full DFT optimization. The ∆δ values obtained from HF and DFT calculations were similar, although the HF calculations gave systematically more upfield values than DFT calculations. The calculations in vacuo showed that one or two OH protons 110

in each disaccharide, which exhibit hydrogen bonding to the neighboring ring oxygens, are strongly deshielded (∆δ>0). In contrast, the experimental NMR data indicated shielding of these protons (∆δ<0)315. This discrepancy was accounted for the solvent effects, which were confirmed by monitoring the 115

Page 23: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 23

Fig. 9. Dependence of 3JC1-2OH on the glycosidic torsion angle ω and the

C1/HO2 dihedral angle θ calculated by DFT for methyl α- and β-D-glucopyranoside mimics (A, B), respectively) and methyl α- and β-D-5

mannopyranoside mimics (C, D, respectively), all having deoxy functions at C3, C4, and C6316. Reproduced with permission, © Elsevier Ltd., 2009.

chemical shift of the hydroxyl proton of methanol in water and other solvents, modeling the acetal groups of disaccharides. Intramolecular hydrogen bonding leads to the reduced hydration 10

of a particular hydroxyl proton, and to a consequent upfield shift306. Yates and coworkers studied the anomeric forms of maltose by 1H-13C MAS-J-HMQC solid-state NMR spectroscopy. They used chemical shift calculations for the assignment of 1H NMR 15

spectrum. Further calculations showed that the difference in the calculated 1H NMR chemical shift between the crystal and an isolated molecule with the same geometry was a quantitative measure of weak intermolecular C-H⋅⋅⋅O hydrogen bonding302. Geometry optimizations were performed using the DFT code 20

CASTEP271-273, which utilized a planewave basis set to expand the charge density and electronic wave functions, and pseudopotentials to represent the core electrons. The PBE exchange-correlation function215 and “ultrasoft” pseudopotentials with a maximum planewave cutoff of 30Ryd were used. The 25

NMR chemical shifts were computed using the PARATEC300, 301 code that employs the GIPAW method214, which is based on DFT and the plane-wave pseudopotential approach301. The calculations used a PBE exchange-correlation functional, a plane-wave basis set with a maximum energy of 80Ryd and Trouiller-Martins317 30

norm-conserving pseudopotentials. MAD of the calculated isotropic 13C NMR chemical shift from the experimental values was 1.0 ppm for the α-anomer and 0.9 ppm for the β-anomer, the highest deviations being observed for the anomeric (up to +3.0 ppm) and C6 (up to -1.4 ppm) carbons. 35

6. Computation of NMR coupling constants

Empirical relationships between molecular geometry of saccharides, such as torsion angles, and the spin-spin coupling were historically widely and successfully applied for NMR structure elucidation, especially for the identification of 40

monomeric composition of oligo- and polysaccharides43, 318, 319.

However, in spite of important scope and several useful applications, such approach is subject to fail for compounds different from those used for the calibration of the empirical equations. Especially difficult are the cases of intermolecular 45

aggregation and intercation with polar solvents and atypical functional groups. Undoubtedly, computational modeling of spin-spin coupling and its interdependencies with distinct structural parameters (like bond angles and dihedral angles) is one of the most demanding areas of research. Scalar couplings are averaged 50

linearly among conformers in solution, and thus their interpretation in terms of conformationally flexible molecular structure more straightforward320. The averaging allows easy connection with MD simulations and provides the NMR description of structural flexibility in carbohydrates. 55

A widely used non-relativistic approach to the simulation of nuclear coupling originates from well-known Ramsey equations321. Indirect scalar nuclear spin–spin coupling constant is associated with four terms: Fermi contact (dominant term, FC), orbital diamagnetic (DSO), orbital paramagnetic (PSO), spin-60

dipole (SD). The Fermi term contribution is often dominating322, especially in carbon-hydrogen saturated systems. The computational cost can be significantly reduced by calculation of the dominating FC term in a larger basis set and the computationally expensive but smaller remaining contributions 65

from the other terms in a smaller basis set323. Quantum-chemical approaches to the calculation of indirect spin-spin coupling constants have been reviewed in details recently63. As follows from the comparison of computational cost of electronic wave function and electronic density approaches in 70

calculation of spin couplings, the latter performs faster. DFT was recognized as a good tool for accurate prediction of coupling constants in medium and large molecules. In calculation of the coupling constants, DFT has been shown to give reasonable potential energy surfaces for aldo- and ketohexoses and to reduce 75

the basis superposition error in hydrogen-bonded systems, such as monosaccharides324. It was found that improved accuracy in spin–spin couplings could be obtained from the DFT calculations at DFT-optimized geometries instead of experimental or higher-level geometries325. The choice of geometry calculation theory 80

affects the accuracy of coupling calculations. As tested on methyl α-xylopyranoside, MM3 geometry gave better results for the calculation of 1JCH and 2JHH couplings, while DFT geometry produced slightly better results for 3JHH couplings284. In contrast to chemical shifts, the quantitative prediction of the 85

coupling constants is known to have a problem of linear correlation being far from ideal values (intercept = 0 and slope = 1). This is usually associated with a lack of accuracy in calculation of the Fermi contact term326. Accurate description of the electron density at the nuclei, which is needed to calculate this 90

term, often requires a specially designed basis set327. The result of coupling constant simulation versus geometrical parameters of molecules is often expressed in the form of derivation of the Karplus equation (3J = C0 + C1cosφ + C2cos2φ or 3J = C3 + C4cosφ + C5cosφ2) 328, which relates a vicinal 95

coupling constant to the torsional angle around the central bond of the fragment. Typical values for C3, C4 and C5, obtained by averaging over different H-C-C-H fragments in heparin fragments, are 0.2, -0.6. and 9.6 respectively329. Nowadays, Karplus equations have been derived virtually for every 100

Page 24: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

24 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

combination of nuclei and coupling pathways that occur in carbohydrates. Except vicinal couplings, these combinations include atoms not connected by three bonds forming a dihedral angle, but have coupling dependent on the torsion angles of substituents either inside or outside the coupling pathway. As an 5

example, 3JC1-2OH coupling constants surfaces were obtained for methyl α- and β-D-glucopyranoside mimics with deoxy functions at C3, C4 and C6 (Fig. 9). These curves indicate a minimal dependence on the glycosidic torsion angle and strong dependence on C1/HO2 dihedral angle316. 10

6.1. Intra-residue coupling constants

Analysis of NMR coupling constants is a canonical structural tool to characterize carbohydrates. A brief review provided in the present section shows an outstanding potential of computational studies to make a valuable insight into the structural and 15

electronic origins of values measured experimentally. Computations of intra-residue coupling constants (within a single monosaccharide) are discussed in the present section and summarized in Table 5. 20

Table 5. Prediction of coupling constants in monosaccharides.

Object (molecule) Coupling constant a

(e.g. 3JH-N-C-H)

Calculation method Application ref.

Geometry Coupling Software

β-D-Xylp-OMe 1,2,3JCH, 3JHH BP86/TZVP, MM2

PW91/IGLO-III deMon-KS, demon-NMR280-282, MacroModel128, 129 5.0 (MM calculations)

calculation of coupling constant dependence on the dihedral angle between C1 and methyl group

283

α-D-Xylp-OMe 1,2,3JCH, 3JHH PW91/TZVP MM

PW91/B-III 284

1C4, 2S0, 4C1 α-L-IdopA2S-OMe, Na+, (4C1, 2S0), (4C1, 1C4)

α-D-GlcpNS6S-(1→4)-α-L-IdopA2S-OMe, (heparin disaccharide),

3JH,H B3LYP/6-31++G** B3LYP/6-31++G** JAGUAR 3.5

(geometry), Gaussian 03 (couplings)

derivation of a Karplus equation

329

2S0 α-L-IdopA2S 3JH,H B3LYP/6-31+G* B3LYP/6-31+G*

[-4)-β-D-GlcpNS6S-(1-4)-α-L-IdopA2S-(1-]3

(heparin hexasaccharide)

3JH,H (in all IdopA rings)

MD (GLYCAM03) Altona and Haasnoot formalism

AMBER 5.1, AMBER 6.0 178, 179

investigation of the conformational flexibility of IdopA rings

330

1C4 α-L-IdopA2S(1→4)-α-D-GlcpN6S-1OMe, 2S0 α-L-IdopA2S(1→4)-α-D-GlcpN6S-1OMe, (heparin disaccharide)

3JHH, 1JCH, 3JCH (in rings)

B3LYP/ 6-311++G**, M05-2X/ 6-311++G** (with explicit solvent, ONIOM)

B3LYP Gaussian 03, Gaussian 09134, 135

studying coupling constants variations upon counterion and solvent effects

240

α-L-IdopA2S 3JH,H (all in ring)

MD (GROMOS96, GLYCAM06), then HF/6-31G(d)

Altona and Haasnoot formalism

Gaussian 03 (QM calculations), GROMACS166 3.3, AMBER 9.0 (MD simulations)

comparison of the prediction force of two force fields

331

α-D-Glcp-(1→1)-α-D-Glcp 3JH5,C1, 3JHH (all vicinal)

MD, (CHARMM carbohydrate force field)

Karplus type equation with the Haasnoot-Altona parameterization

CHARMM164 establishing a comprehensive understanding of the hydration pattern of trehalose

332

α-D-Glcp, β-D-Glcp

3JH,H (except with OH)

B3LYP/6-31G(d,p) B3LYP/pcJ-2 Gaussian 03 support of the experimental data

260

2HOMe-THP (model) 2JH5H6, 2JC5,H6, 2JC6,H5, 3JC4,H6

HF, B3LYP/ 6-311++G (d,p)

FF-DPT (Fermi-contact), B3LYP/ [5s2p1d/3s1d]

Gaussian 03 derivation of the Karplus equations

333

β-D-Glcp-1OMe, β-D-Glcp-1SMe, β-D-Glcp-1Ethyl,

3JH1C1OC, 3JH1C1SC, 3JH1C1CC,

β-D-Glcp-1OMe, β-D-Glcp-1SMe, β-D-Glcp-1Ethyl,

3JH1C1OC, 3JH1C1SC, 3JH1C1CC, in aqueous and methanol solutions

HF, B3LYP/ 6-311++G (d,p)

FF-DPT (Fermi-contact), B3LYP/ 6-311++G (d,p)

Gaussian 03 derivation of the Karplus equations

334

Page 25: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 25

Table 5, continued

2-hydroxymethyl-THP and other models

2JH6H6’,

3JH5,H6,

1JC5,H5, 1JC6,H6

B3LYP/6-31G(d) B3LYP/ [5s2p1d/3s1d]

Gaussian 94 derivation of the Karplus equations

335

β-D-Glcp-(1-3)-4H-pyran-4-one, β-D-4-deoxy-XylHexp-(1-3)-4H-pyran-4-one, (erigeroside and its model)

2JC5H6, 2JC6H5, 3JC4H6, 2JH6R–H6S, 3JH5H6

HF, B3LYP/ 6-311++G (d,p)

FF-DPT (Fermi-contact), B3LYP/ 6-311++G (d,p)

Gaussian 03 derivation of the Karplus equations and validation of the DFT methodology

336

β-D-Glcp-(1-3)-4H-pyran-4-one (erigeroside)

1JC1H1

2-deoxy-β-D-eryPenf 1JCH, 2JCH, 3JCH, 1JCC, 2JC3C5, 3JC1C5, 3JC2C5

B3LYP/6-31G(d) FF-DPT (Fermi-contact), B3LYP/[5s2p1d/2s]

Gaussian 94 validation of DFT methodology

320

2-deoxy-β-D-eryPenf 1-3JCH, 1-3JCC HF/6-31G(d)

B3LYP/6-31G(d) FF-DPT (Fermi-contact), B3LYP/[5s2p1d/2s]

Gaussian 94 investigation of the effect of hydroxymethyl conformation on the conformational energies and structure

337

2-deoxy-β-D-eryPenf-1NH2, 2-deoxy-β-D-eryPenf-1NH3

+

1-3JCH, 1JCC 3JCC

HF/6-31G(d) B3LYP/6-31G(d)

FF-DPT (Fermi-contact), B3LYP/[5s2p1d/2s]

Gaussian 94 investigation of the effect of the amino group on molecular properties

338

P-3:5)β-D-Ribf-1U (cUMP in aqueous solution)

JHH, JCH, JPH, JPC

B3LYP/6-31G(d,p) B3LYP/cc-pVTZ Gaussian 03 testing the prediction method and selection of the appropriate solvent model

277

α- and β-L-Eryf-1OMe-2,3-epoxy

JH,H (all in ring)

MP2, DFT B3LYP/ 6-311++G(d,p)

coupled perturbed DFT; B3LYP/ 4-31G, 6-31G(d,p), 6-311G(d,p), 6-311++G(d,p), aug-cc-pVDZ, IGLO II, IGLO III

Gaussian (geometry optimization), Cologne 99 339 (coupling calculation)

interpretation of the 1H-1H coupling constants of synthesized compounds and comparison of prediction methods

340

α- and β-L-Eryf-1OMe 2,3-epicyclic derivatives with S, NH, NR

B3LYP/ 6-311++G(d,p)

coupled perturbed DFT; B3LYP/IGLO II

α-D-GlcpNAc 3JH-N-C-H B3LYP/6-31G(d,p)

with explicit solvent: MD snapshot optimized in MMX force field

SD, PSO, DSO terms: B3LYP/IGLOO-III; FC term: B3LYP/HIIIsu3

Gaussian 03D (QM calculations), CHARM (MD calculations)

studying the dependence of 3JH-N-C-H coupling on conformation, dynamics and solvent; derivation of the Karplus curve

341

β-D-GlcpNAc, α-GalpNAc

B3LYP/HIIIsu3 (FC term only)

a See Fig. 2 for atom numbering.

Gandhi and Mancera probed two MD force fields in unconstrained molecular dynamics simulations of 2-O-sulfo-α-L-iduronic acid ring conformational flexibility in aqueous 5

solution331. The authors reported that the GROMOS96186 force field with the SPC/E water potential could successfully predict the dominant skew-boat to chair conformational transition of the IdoA2S in water, whereas the GLYCAM06180 (augmented with non-bonded parameters for sulfates and sulfamates) and the 10

TIP3P water potential sampled transitional conformations between the boat and chair forms. Simulations using GROMOS96 exhibited no pseudorotational equilibrium fluctuations and hence no inter-conversion between the boat and twist boat ring conformers. Simulation of proton NMR coupling 15

constants showed that in contrast to GLYCAM06 the GROMOS96 force field could predict the 2S0 (skew-boat) to 1C4

(chair) conformational ratio (17:83) in better agreement with the experiment. Unlike GLYCAM93, which reproduced experimental couplings well330, GLYCAM06 does not have an 20

explicit definition of the anomeric carbon, which was considered a reason of its poorer predictive force. Since a 2S0–

1C4 transition was observed after 81 ps, the 2S0 coupling constants were averaged over the initial 81 snapshots within GROMOS simulations, while the 1C4 coupling constants were averaged over 25

419 random snapshots from the remaining 419 ps. In both works330, 331 the averaged 3JH,H coupling constants were calculated using the Altona and Haasnoot formalism319 from the MD data with respect to conformer ratio, and compared to the reported experimental values. 30

A detailed look into influence of counterion and solvent on conformation and coupling constants in heparin disaccharide was

Page 26: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

26 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

reported by Hricovini240. He utilized B3LYP calculations on the geometries obtained at B3LYP and M05-2X342 theory levels to compare coupling constants of both conformers (2S0 IdoA2S + 4C1 GlcN6S and 1C4 IdoA2S + 4C1 GlcN6S) in explicit water solution and with Na+ and Ca2+ counterions. Better geometry 5

predictive strength of B3LYP, as compared to M05-2X, was shown for a disaccharide with Na+ counterions. Comparison of the calculated averaged vicinal couplings with the experimental data indicated that the 1C4 conformation of IdoA2S (Ca2+) was nearly exclusively populated. In contract to direct and 10

transglycosidic H-C couplings, averaged proton couplings were hardly affected by solvent effects. Engels and Perez calculated 3JH,H couplings for vicinal hydrogens in α,α-trehalose to study the disaccharide dynamics in water solution332. The authors calculated interglycosidic coupling 15

as well, and more details on geometry optimization are given in section 6.2. Intra-residue homonuclear couplings were calculated using a Karplus type equation with the Haasnoot-Altona parameterization319 accounting for the coupling dependence not only on the dihedral angle, but also on the electronegativity of the 20

participating atoms, and on the orientation of α- and β-substituents. Roslund and coworkers calculated coupling constants for α- and β-D-glucopyranose, as well as chemical shifts (see section 5.1). All the terms contributing to the J-couplings were predicted 25

at B3LYP/pcJ-2 on geometry optimized at B3LYP/6-31G(d,p). The authors estimated conformer populations by applying Boltzmann distribution to the relative stability of the conformers (more details are discussed in chemical shift prediction section above). The correlation factor between the population-weighted 30

averages of the calculated couplings and the experimental results was 0.994-0.995 (MAD 0.49 Hz for α-D-Glcp, 0.62 Hz for β-D-Glcp), but certain deviations were quite significant. Among vicinal couplings, the most significant differences were observed for 3JH5,H6 values (73%), indicating that the selected subsets of 35

hydroxymethyl rotamers and their relative stabilities did not thoroughly reflect the equilibrium of D-glucose in aqueous solution. For vicinal coupling within a pyranose ring, deviation values confirmed a good applicability of the selected method to reproduce the experimental data260. 40

There are more J-couplings observed in oligosaccharides, as compared to the number of NOEs sensitive to the conformational parameters. Thus, ability to predict coupling constants versus conformation of a glycosidic bond and an exocyclic group of sugar residues may become a useful tool in conformational 45

studies. Tafazzoli and Giashi derived Karplus equations by least-square parameterization from non-linear regression analysis of the simulated vicinal coupling constants related to dihedral angles ω (C5-C6), θ (C6-O6) andϕ (C1-X) in various glycosides of 50

glucose and galactose333. These studies demonstrated the ability of the DFT to predict J-couplings in aqueous solution. 2-hydroxymethyltetrahydropyran was used as a carbohydrate model. The authors optimized the geometry using a hybrid HF-DFT scheme, the adiabatic connection method B3LYP/ 55

6-311++G(d,p) with no initial symmetry restrictions, and the PCM method for the solvent effects on the conformational equilibrium. Heteronuclear coupling constants involving a

hydroxymethyl group were obtained by Fermi-contact FF-DPT calculations at B3LYP level using a basis set [5s2p1d/3s1d] 60

designed for calculation of J-couplings in the exocyclic group of a carbohydrate model compound (2-hydroxymethyl-tetrahydropyran)335. Tafazzoli and Giashi used the same model to simulate 2JH5H6,

2JCH and 1JCH coupling constants. A multitude of factors affect the CH bond length in the 65

hydroxymethyl group and direct 13C-1H coupling constants that are almost reverse-proportional to the bond length343, thus Karplus equations for these couplings possess large RMS errors. For each of three stable C5-C6 rotamers, a dependence of 2JC5,H6

on the θ angle was derived. In contrast to 2JC5,H6, 2JC6,H5 values 70

were almost insensitive to the θ angle because C5-O5 torsion is fixed by the ring conformation. Comparison of 3JC4,H6 values calculated for the model compound with those calculated for the 4-hydroxy-substituted model, did not reveal any correlation between substitution at 75

position 4 and ω / θ angular dependence of 3JC4,H6. The Karplus equations for the couplings above are given in eq. 3-8 of the original publication333. These authors studied 3JCXC1H1 dependence on the ϕ angle in 1-substituted glycosides. They used [5s2p1d/3s1d] basis set for 80

the calculation on β-D-Glcp derivatives (X = O, S, C)333 and obtained theoretical Karplus equations (e.g., for O-glycosides: 3JCOC1H1 = 6.68cos2ϕ+0.89cosϕ+0.11; RMS=0.65 Hz) resembling an empirical equation proposed previously by Tvaroska and coworkers for 1-thioglycosides344. 85

They also applied B3LYP/6-311++G (d,p) calculations to anomeric vicinal coupling constants of these compounds in order to model couplings in various derivatives of glucose and galactose (OMe-, SMe-, Et-, NHMe-, Cl- and F-glycosides ) in PCM-modeled water and methanol334. Least-squares 90

parameterization of the calculated series of coupling constants gave Karplus equations slightly differing in the last constant term only, which were close to the Karplus equations derived from the experiment315. Stenutz and coworkers studied homo- and heteronuclear 95

coupling constants involving a hydroxymethyl group of a carbohydrate model (2-hydroxymethyltetrahydropyran)335. Working on the DFT-optimized geometries of each of three C5-C6 rotamers, authors designed an extended basis set [5s2p1d/3s1d] as an improvement of the previously reported set 100

[5s2p1d/2s]. The new basis set aimed at more accurate simulation of interproton spin couplings. Three Karplus equations were derived (2JH6S,H6R,

3JH5,H6R, 3JH5,H6S) and compared to those

obtained from the experimental JHH values in 4,6-pyruvate derivatives of methyl glucosides and methyl galactosides. The 105

largest deviation (0.5 Hz) was within an RMS error (0.3-0.9). The authors showed that θ angle (C6-O6 torsion) affects 2JH6,H6 more significantly than the H-C-H bond angle. As for 1JCH, the authors found that fitting the calculated coupling constants to only two torsion angles ω / θ yields 110

relatively large RMS errors, presumably due to a solvent effect on C-O torsional behavior, which agrees with the known solvent dependence of 1JCH in saccharides345. Tafazzoli and co-authors performed DFT simulations of the anomeric center and exocyclic group (in three staggered 115

orientations) of the β-D-Glcp in erigeroside and

Page 27: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 27

4-deoxy-β-D-xylo-hexopyranose residues in a model compound to support data of a detailed NMR investigation of erigeroside from Satureja khuzistanica. The model differed from erigeroside by the absence of a hydroxyl group at position 4 and allowed avoidance of intramolecular hydrogen bonding and study of the 5

effect of a hydroxyl group on couplings involving C4. The authors calculated complete hyper surfaces for 1JC1H1,

2JC5H6, 2JC5H6,

3JC4H6, 2JH6R–H6S and 3JH5H6 and derived Karplus equations

to correlate all these couplings to C5–C6 (ω), C6–O6 (θ) and C1-O1 (ϕ) torsion angles with RMS deviation from 0.3 to 1.3 Hz. 10

These calculated J-couplings were in agreement with experimental values, confirming nearly quantitative prediction of DFT-calculated heteronuclear coupling constants in aqueous solution modeled by PCM336. The performance of DFT and a specially designed basis set 15

was demonstrated by Cloran and coworkers320 on the example of 2-deoxy-β-D-erythro-pentofuranose, the major component of DNA. These studies have shown that DFT can be used to calculate reliable JCH and JCC values in carbohydrates without scaling, within -6% and +10% of experimental values, 20

respectively. Computed molecular parameters and 1JCH spin-spin coupling constants in ten geometrically optimized envelope shapes were compared to the scaled values reported from HF and MP2 methods346. As a result, the authors concluded that DFT geometry 25

optimization substantially contributed to the difference between the scaled couplings and the DFT-derived 1JCH values. Indirect JCH exhibited weaker (2JCH) or no (3JCH) dependence on the geometry optimization method. Computed JCH values were up to 10% larger in DFT calculations than the corresponding scaled 30

HF/MP2 values, but the coupling trends predicted by both methods were almost identical. 1JCC, 2JCC and 3JCC were computed as a function of ring conformation, and theory level-dependent corrections were evaluated by comparison with HF calculations. With the accuracy 35

achieved these coupling constants can be used as monosaccharide conformational probes. All indirect couplings in the optimized structures were determined by finite (Fermi-contact) field double perturbation theory with a basis set [5s2p1d/2s] previously constructed to recover Fermi contact contribution to 13C-13C 40

coupling constants347. Later these authors investigated the effect of a hydroxymethyl group conformation on the molecular properties of the same ten geometries of 2-deoxy-β-D-erythro-pentofuranose337. Carbon-involving spin-spin coupling constants were computed using the 45

same methodology on DFT-reoptimized geometries of a gg rotamer typical for nucleic acids. The authors presented a detailed comparison of coupling magnitudes with those observed earlier320 for a gt rotamer in solution. 1JCH appeared to be most affected by C4-C5 bond rotation, presumably due to substantial changes in 50

C-H bond length accompanying the rotation. The results for 2JCH, 3JCH, 2JCC and 3JCC confirmed prior predictions of coupling dependence on the ring conformation. Cloran and coworkers investigated the effect of the amino-substitution at C1 on the molecular properties of the same 55

compound, including coupling constants338. They compared DFT predictions at B3LYP/[5s2p1d/2s] for both protonated and unprotonated form to those published for 2-deoxy-β-D-erythro-

pentofuranose without an amino group320. These studies proved a suggestion that a different projection rule is required to predict 60

2JC2,H1 in nucleosides348. Accordingly to the reported findings, N-substitution of O1 exerts only a minor effect on the magnitudes of 2JC1,H2 and 3JC1,H3, as well as on magnitudes of 3JCCCH,

3JCOCH and 3JCOCC, regardless of the state of N-protonation. In contrast, 2JCCH couplings are strongly modulated by substitution at the 65

carbon bearing a coupled proton; a much smaller effect is observed when the substitution occurs at the coupled carbon. The direct couplings were predicted to increase by ca. 10 Hz (1JC1,H1) and to decrease by ca. 2-4 Hz (1JC1,C2) upon N-protonation, which makes them a probe of a protonation state of aminosugars in 70

solution338. Calculation of homo- and heteronuclear coupling constants, as well as chemical shifts, of mononucleotide cyclic uridinemonophosphate was performed by Bagno and coworkers277. The overview of computational method used, 75

including conformational search is given above in section 5.1. The solvent was modeled using PCM both in geometry and coupling constants calculations. The calculation of coupling constants included all four Ramsay terms. In spite of a good correlation between calculated and 80

experimental values (JHH: R2 = 0.998, MAD = 0.90 Hz; JCH: R2 = 0.974, MAD = 13.4 Hz), slope and intercept of the best fit line were not ideal, especially for JHH couplings. Nevertheless, this common problem in coupling constant calculation (see introduction to this section) did not prevent a qualitative 85

agreement. Bour and coworkers interpreted indirect spin-spin NMR 1H-1H coupling constants of the synthetic erythrofuranose derivatives on the basis of ab initio modeling340. Epoxy, epithio, and epimino groups were attached to the sugars to limit their conformational 90

flexibility. These restrictions improved the calculation performance and simplified the estimation of the dependence of the spin coupling on the molecular geometry. Fully relaxed geometries were optimized with the HF, MP2, and DFT (B3LYP and BPW91) methods using the 4-31G, 95

6-31G(d,p), and 6-311++G(d,p) basis sets. The authors modeled benzene solution using COSMO248. To calculate spin-spin couplings they used the coupled perturbed approach of the DFT method349 in vacuum and various basis sets, including NMR-optimized IGLO II and IGLO III 209. B3LYP/IGLO II 100

computations included all four important magnetic terms in the Hamiltonian. Typically all coupling constants within methyl 2,3-epoxy-L-erythrofuranoside exhibited only a minor (<15%) dependence on the selected conformer and a basis set (IGLO II vs. IGLO III) and fitted the experimental data within 1 Hz. 105

However, the only short-range coupling constant 2J4R,4S decreased by 2.1 Hz upon a bigger basis set. The authors used a series of conformers optimized at MP2/6-311++G(d,p) to test the heminal and vicinal coupling prediction against different basis sets containing from 88 to 376 basis functions. Except for 4-31G, the 110

calculation accuracy was similar but the agreement with the experiment did not improve with the basis set size increase. This confirmed the complexity of the spin-spin coupling modeling regarding all four Hamiltonian terms and confirmed the earlier findings of Helgaker and coworkers350. 115

Page 28: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

28 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

Mobli and Almond used DFT methods to calculate coupling constants between HN and H2 in the N-acetylated amino sugars and to derive Karplus equations for 3JH-N-C-H

341. Ab initio calculation slightly overestimated the coupling constants. In contrast to an explicit solvent model explored by MD 5

simulations, an implicit-solvent PCM method lowered the magnitude of the calculated values, bringing them closer to the experiment. The authors explained worse results of explicit solvent inclusion by highly dynamic interactions with water, which were difficult to simulate by static DFT equations. 10

However, models predicted with explicit solvent were more conformationally realistic. The D-pyranose rings of the N-acetylated amino sugars were fixed in the 4C1-chair conformation and optimized at B3LYP/6-31G(d,p). For every amide group rotamer of 15

α-D-GlcpNAc, SD, PSO and DSO spin–spin coupling terms were calculated at B3LYP/IGLOO-III (11s,7p,2d/6s,2p) [7s,6p,2d/4s,2p]209 while the FC term was calculated using a bigger HIIIsu3 basis-set351, (14s,7p,2d/9s,2p) [14s,6p,2d/9s,2p]. Due to the observed FC term dominance β-D-GlcpNAc and 20

α-GalpNAc were processed with FC term only.

The Karplus equations derived by non-linear least-square fitting (e.g. for full calculation of α-D-GlcpNAc: 3JH-N-C-H = 9.81cos2(θ+φ)-1.51cos(θ+φ)+0.62) exhibited similar trends to those previously reported for peptide amide groups, although the 25

coupling constants were greater in magnitude. The authors showed that the analysis of molecular dynamics should not be neglected in order to reproduce experimental values of 3JH-N-C-H. Dynamical spreads at the acetamido groups were obtained by integration of a Karplus curve and subsequent analysis of the 30

group libration range.

6.2. Inter-residue coupling constants

Detailization of various effects on intra-residue coupling constants provided in the previous section makes it possible to reveal the intriguing questions related to structure and bonding of 35

more complex carbohydrates. Indeed, long-range coupling constants across glycosidic bonds serve a probe for oligo- and polysaccharide conformation. Karplus-type interpretation of these coupling constants (in addition to NOEs) provides spatial constraints for the glycosidic bond torsion angles352. 40

Representative examples of predictions of inter-residue coupling constants in carbohydrates are summarized in Table 6.

Table 6. Prediction of coupling constants in oligo- and polysaccharides

Object (molecule) Coupling constant

(e.g. 3JH-N-C-H)

Calculation method Application ref.

Geometry Coupling Software

α-D-Glcp-(1→1)-α-D-Glcp, α-D-Glcp-(1→4)-D-Glcp, β-D-Galp-(1→4)-D-Glcp, β-D-Glcp-(1→4)-D-Glcp, β-D-Glcp-(1→6)-D-Glcp, α-D-Galp-(1→6)-D-Glcp, α-D-Glcp-(1→3)-D-Glcp, β-D-Glcp-(1→3)-D-Glcp

3JCH (inter-glycosidic)

s-MD (Amber-H force field)

Karplus-type correlation curve derived by Tvaroska and coworkers353

Insight II Molecular modeling program 192 (v. 4.0.0), molecular mechanics / dynamics package (v. 2.9)

testing the suitability of the molecular modeling approach

354

1C4 α-L-IdopA2S(1→4)-α-D-GlcpN6S-1OMe, 2S0 α-L-IdopA2S(1→4)-α-D-GlcpN6S-1OMe, (heparin disaccharide)

3JCH (inter-glycosidic)

B3LYP/ 6-311++G**, M05-2X/ 6-311++G** (with explicit solvent, ONIOM)

B3LYP Gaussian 03, Gaussian 09 134, 135

studying coupling constant variations upon counterion and solvent effects

240

α-D-Glcp-(1→1)-α-D-Glcp 3JCH (inter-glycosidic)

MD, (CHARMM carbohydrate force field)

Karplus-type correlation curve derived by Tvaroska and coworkers353

CHARMM 164 establishing a comprehensive understanding of the hydration pattern of trehalose

332

β(1-4)-linked disaccharide models

2JCOC (inter-glycosidic)

B3LYP/6-31G(d) FF-DPT (Fermi-contact), B3LYP/[5s2p1d,2s]

Gaussian 94 studying the influence of structural factors on transglycosidic 2JCOC

355

β-D-Ribf-1OMe-(3-P-5)- -β-D-Ribf-1OMe (RNA backbone) in 16 “experimental” conformations

2J, 3J, 4J between all 1H, 13C and 31P

MM/Amber B3LYP/6-31G(d,p) (with explicit solvent)

coupled perturbed DFT, B3LYP/IGLO II and IGLO III

Amber 178, 179 (geometry optimization), Gaussian 03 (NMR calculation)

interpretation of nucleic acid backbone conformation using coupling constants

356

β-D-2-deoxy-Ribf-1(N-base), N-base=A,C,G,U,T

3JC-H1’, 3JC-H1’, 1JC1’-H1’

B3LYP/6-31G(d) DFT/FPT PW86/IGLO-III (FC term); SOS-DFPT (PSO and DSO terms);

Gaussian 98 deMon-NMR 280-

282

study of relationship between spin coupling and the glycosidic torsion angle

357

Page 29: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 29

Cheetham and coworkers calculated the interglycosidic heteronuclear coupling constants (3JH1Cx and 3JC1Hx) for a series of eight α- or β-linked glucosyl- and galactosyl-glucopyranoses. The authors utilized a Karplus relationship of Tvaroska and coworkers353: J = 5.7cos2φ−0.6cos φ+0.5 with cos2φ and cosφ 5

conformationally averaged from the s-MD trajectories. The aim of this calculation was to determine if a molecular modeling approach would be adequate to provide results similar to those obtained experimentally. The crystal conformation of each disaccharide was used as a starting geometry for the MD 10

simulations in Amber-Homans force field193, with the explicit inclusion of water. The authors showed that, except for C1-O1-Cx-Hx dihedral angle in β(1-4)-linked disaccharides, their relatively simple modeling could reproduce results close to the experimental and other modeling studies354. 15

Engelsen and Perez calculated interglycosidic heteronuclear coupling constants in α,α-trehalose within a study aimed at establishing a comprehensive understanding of the hydration pattern of this disaccharide and its comparison to sucrose332. Starting from X-ray geometry, the authors ran a 2.5 ns MD 20

simulation in CHARMM carbohydrate force field with the explicit inclusion of 485 TIP3P water molecules. The heteronuclear coupling constant 3JH,C across the glycosidic linkage was calculated accounting to the obtained adiabatic map and the equation for the C-O-C-H fragment parameterized by 25

Tvaroska and coworkers353. The derived value 2.3 Hz was much closer to the experiment (2.5 or 3.3 H) than 1.5 Hz obtained by the MD simulation in vacuum. In contrast, precision of intra-residue 3JH5,C1 calculation was not affected by the solvent inclusion. 30

Since DFT calculations were shown to yield computed JCC within ∼10% of experiment without scaling320, 2JCOC values could be computed to within 0.2-0.3 Hz of the experimental couplings. This accuracy allowed Cloran and coworkers to study influence of structural factors on trans-O-glycosidic 2JCOC using DFT 35

calculations. Geometric optimizations were conducted at B3LYP/6-31G*. 13C-13C spin coupling constants were obtained by finite-field double perturbation theory calculations using a basis set previously constructed for similar systems347. Only the FC component was recovered, as main component of JCC in 40

saturated systems. The calculation supported the observation that JCOC depended mainly on the φ angle of a glycosidic bond. The increase of a valent angle at oxygen produced more negative JCOC coupling. Sychrovsky and coworkers calculated indirect heteronuclear 45

coupling constants and related them to the backbone torsion angles and sugar pucker of nucleic acids. The authors used 16 known conformations of the nucleic acid backbone, including well-characterized double-helical forms (B-DNA, A-DNA, A-RNA), and applied them to baseless dinucleoside phosphate as to 50

a molecular model. The initial models of the dinucleoside phosphates were constructed as reported for RNA fragments358 and for the most prevalent DNA conformations, BI and A359. These “experimental” geometries were relaxed by molecular mechanics 55

with torsion angles of the backbone restrained to keep the conformation close to a class-identifying state. After geometry relaxation, the nitrogenous bases were substituted by methyl

groups and both the 5’ and 3’ ends were terminated by hydroxyl groups. 60

The coupled perturbed DFT method349 at B3LYP /IGLO II and IGLO III levels of theory209 was used for the calculation of 1H, 13C, and 31P coupling constants by including all four coupling terms. Explicit hydration, the PCM solvent model and their combination were compared. The PCM hydration accounted for 65

the dominant part of calculated coupling difference between in vacuo and the hydrated models. The authors calculated all possible coupling constants across two, three of four bonds and correlated them to each of seven torsion angles characterizing a βDRibf-1OMe(3-P-5)βDRibf-1OMe fragment, so that 70

experimentally observed conformations of the nucleic acid backbone could be characterized with a specific set of J-couplings356. Three of torsion angles in nucleic acid backbone have sharp population distribution. Accordingly to the authors, 3J couplings 75

correlated with “sharp” torsions are not properly described by classical Karplus equation and should be parameterized with explicit consideration of other torsions, either as a multidimensional Karplus curve or a curve parameterized with neighboring torsions fixed. 80

Munzarova and Sleknar357 studied correlation of a non-backbone glycosidic torsion angle in deoxynucleosides with the heteronuclear coupling constants of the anomeric protons. The backbone torsion angles were frozen to their values in B-DNA. The authors derived phase-shifted Karplus equation and 85

parameterized it separately for every nucleoside, as purine and pyrimidine nucleosides exhibited different torsional dependence of couplings.

7. Computation of NMR relaxation rates

Best and coworkers reported the results of molecular dynamics 90

simulations compared to the data of the NMR relaxation experiments for maltose and isomaltose. The 13C longitudinal relaxation time (T1) is dependent on the dipolar relaxation between a carbon and its directly attached proton. The relaxation parameters are a function of a spectral density. The most 95

frequently used method of experimental characterization of the spectral density is a model-free formalism. Generalized order parameters may be directly calculated from simulation. The equations linking these parameters together were published elsewhere360. Both maltose vacuum and solution simulations were 100

started from the saddle point (0,0) between the two major energy wells on the adiabatic map. On the basis of the adiabatic map, the lowest minima were chosen as starting points for the simulations of isomaltose. As a result longitudinal relaxation times were obtained for each carbon in the maltose and isomaltose (see Table 105

2 in the original publication). Later this group ran MD simulations with explicit water on the minimal model compounds for the α(1-6) branch point of amylopectin: trisaccharide panose and the tetrasaccharide 62 α-D-glucosylmaltotriose (maltotriose (1-6)-glucosylated at its middle 110

glucose residue)361. Calculation of the NMR longitudinal relaxation times for panose showed good agreement with the experimental values, and validated the simulation dynamics used. As a check of the validity of the simulation dynamics,

Page 30: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

30 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

longitudinal T1 relaxation times sensitive to both the extent and time scale of molecular motion, were directly calculated from the trajectories. Model-free formalism is most suitable for molecules with intramolecular motion much more rapid than molecular tumbling, which may be wrong for small oligosaccharides. 5

Besides this, the comparison between the experiment and the simulation is done through indirectly fitted parameters. To resolve these issues authors adopted the approach they used previously to calculate T1 relaxation times directly from MD simulation360. Relaxation times were simulated separately for CH 10

and CH2 carbon atoms in each panose ring (see Table 3 in the original publication). In both publications by Best and coworkers360, 361, all calculations were done using the PHLB carbohydrate force field specifically parameterized for carbohydrates177 and implemented 15

in CHARMM. This parameter set addressed the excessive flexibility in the earlier force field (HGFB), and the incorrect preference of the primary alcohols in that force field for adopting the tg rotamer. The TIP3P model was used for explicit representation of water. 20

The cross-correlated relaxation rates of double- and zero-quantum coherences have widespread applications in structural and conformational studies of biomolecules, including probing of the O-glycosidic linkage conformation in carbohydrates362. Sychrovsky and coworkers applied quantum chemical 25

calculation methods to investigate the dependence of N1/N9 and C1′ CS tensors on the glycosidic torsion angle and sugar pucker in 2′-deoxynucleosides (dAde, dGua, dCyt, dThy). They calculated cross-correlated relaxation rates using reduced equations published elsewhere363 and tested applicability of 30

Ravindranathan’s363 and Duchardt’s364 methods to deoxyribonucleic acids (DNAs). According to their results, these CS tensors exhibited a significant degree of conformational dependence on C1’-N torsion angle and sugar pucker, which should be taken into consideration while interpreting cross-35

correlated relaxation rates between the N1/N9 CS tensor and C1′-H1′ dipole-dipole in DNAs278. The geometry of all nucleosides was gradient optimized at B3LYP/6-31G(d,p) level. All geometrical parameters, except the C1’-N torsion angle, were freely optimized. The NMR shielding 40

tensors were calculated using the GIAO B3LYP/(9s,5p,1d/5s,1p) [6s,4p,1d/3s,1p] for carbon, nitrogen and oxygen and B3LYP/ (5s,1p) [3s,1p] for hydrogen (basis set IGLO II209). A summary of the CS tensor calculation is included in Table 3 (see above). Longitudinal and transversal relaxivities are the inverse of the 45

spin-lattice T1 and spin-spin T2 relaxation times, respectively. They contain information both on structure and molecular dynamics. Several attempts have been undertaken to predict relaxivities of hyaluronan oligomers. Calculations of 13C relaxivities based on the combination of MD simulation 50

algorithms with diffusion theory and mode-coupling approach (MCD) were performed on hyaluronan (HA)2 and (HA)4 oligomers in water solutions365. CHARMM168 was used to generate a hyaluronan structure and to merge it in a water box. Authors achieved an agreement between the calculations for the 55

two oligomers with the experimental data obtained by the “inversion-recovery” technique. For details about mode-coupling diffusion please refer to the original publication365.

Furlan and coworkers presented the calculation of dynamic properties of the hyaluronan (HA)4 with consideration of the 60

hydrophobic effect in water solution and local hydrophilic effects due to hydrogen bonding with the solvent. Several configurational distributions and dynamical parameters related to nuclear magnetic relaxation, sensitive both to the molecular structure and the mobility, were calculated from the replica-65

exchange Monte Carlo statistics at different temperatures. The diffusion theory was applied to the calculation of the longitudinal 13C relaxivities366. With the MD calculation method proved, authors reported an investigation of molecular structure and detection of the critical length of a hyaluronan polymer. They 70

followed the protocol established for the quantitative description of the size and shape of biopolymer chains including the construction of chain models by Monte Carlo simulation on the basis of conformational statistical weights of representative dimeric units. The OPLS-AA force field was used for the 75

generation of the conformational energy landscape. The mode-coupling diffusion (MCD) theory with the RM2-II basis set was applied to the calculation of the NMR relaxivities. The computed relaxivities are within 25% of the experimental data, and in all cases they are larger than in the experiments367. Parameterization 80

of the OPLS-AA force field for carbohydrates has been reported earlier188.

8. Computation of other NMR parameters

Averaged characteristics of molecular systems, including NOEs, can be calculated within the studies of the conformational 85

equilibrium. As soon as a conformational map is calculated, the averaging is usually performed over energies with an assumption that population of conformaers fits the Boltzmann distribution124. Comparison of the resulting predicted NOEs with the experimental data is a widely used approach to the validation of 90

MD experiments. Trajectories obtained from MD simulations in CSFF force field in TIP3P water model allowed analytical derivation of T1 and T2 relaxation time, cross-correlated relaxation rates and NOEs. The stochastic approach to processing MD simulation data 95

was shown suitable for description of diffusion dynamics of molecules with mainly torsional internal mobility, as demonstrated for γ-cyclodextrin368 and model tri- and pentasaccharides369. MM3 was reported as a good force field to produce 100

confomation maps for NOE calculations370. Gerbst and coworkers confirmed the adequacy of molecular modeling of fucoidans by comparison of the calculated NOEs of protons at a glycosidic bond to the experimental values. In case of fucobiose, the absence of signal overlap allowed to utilize steady-state NOE370. Starting 105

from MM3 conformation maps, the authors calculated NOEs using the iterative Noggle and Shirmer equation371, and averaged the results according to Boltzmann distribution. Selection of conformational minima with energies within 10% of the global energy minimum gave satisfactory results, particularly average 110

difference between experimenmtal and calculated relative NOEs was from 1.3% to 2.5% depending on the disaccharide sufation. In case of linear sulfated fucotriosides, anomeric proton signal overlap prohibited a utilization of a steady-state NOE. Transient

Page 31: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 31

NOEs were calculated as 1/r6, r being an interatomic distance obtained from the optimized geometry. Transient NOEs were Boltzmann-averaged over the MM3 conformation maps calculated for every dissacharide fragment using the same methodology as for fucobiosides. The resulting NOESY cross-5

peak volumes showed weak correlation to the experiment, which encouraged authors to model a reducing end sulfate group as undissociated rather than as an anion372. Casset and coworkers examined the potential energy hypersurface of sucrose using molecular mechanics calculations 10

in MM3(92) force field interfaced with two different algorithms for conformational searching (systematic grid-search approach, and CICADA procedure). CICADA (Channels In Conformational space Analyzed by Driver Approach) method drives selected dihedral angles to 15

explore the low-energy regions and permits full geometry relaxation. Using the grid-search approach, the relaxed adiabatic map of sucrose was calculated as a function of the glycosidic torsion angles, and three families of stable conformers were identified. The CICADA procedure found all minima and the 20

low-energy conversion pathways for sucrose in agreement with those located by the grid-search approach. Theoretical NOESY volumes were calculated using full relaxation method from an ensemble-averaged relaxation matrix, as described earlier373. All accessible conformations derived from 25

either the grid-search or the CICADA method were taken into account. Two sets of NOESY volumes were calculated using averaging methods appropriate for both slow and fast internal motions. The agreement factors, which are relative deviations between the calculated and the experimental non-diagonal 30

400 MHz NOESY volumes, were 0.170 (grid-search, fast motion), 0.175 (grid-search, slow motion), 0.175 (CICADA, fast motion), 0.163 (CICADA, slow motion). These values improved (0.118-0.139) when only intensive NOE peaks were considered. The study demonstrated the ability of the CICADA method to 35

reproduce the potential energy surface of a flexible molecule and therefore to simulate its NOEs159. Landstrom and Widmalm carried out an atomistic all-atom MD simulation of the branching region of Aeromonas salmonicida O-specific polysaccharide, using the β-D-ManpNAc-40

(1→4)[α-D-Glcp-(1→3)]-α-L-Rhap-OMe trisaccharide as a model, with explicit solvent molecules. The MD simulations with 1 µs duration revealed a dynamic conformational process on the nanosecond time scale, which had lacked the attention of researchers. The obtained results emphasized the predictive 45

power of MD simulations in the studies of biomolecular systems and explained an unusual NOE due to conformational exchange143. The MD simulations employed PARM22/SU01, which is a CHARMM22-type force field modified for carbohydrates173 and implemented in NAMD 2.6b1. Initial 50

conditions were prepared by placing the model trisaccharide in a previously equilibrated cubic water box, followed by energy minimization and heating. The smooth particle-mesh Ewald method374 was used to calculate the full electrostatic interaction. The 500 MHz NOESY volumes for H-2 (Rha) – H-2 55

(ManNAc) homonuclear interaction were simulated as a function of mixing time for two conformation states of the above trisaccharide model in MestreLabs Research Mspin375 using a full

relaxation-matrix formalism376, and a molecular reorientation correlation time of 200 ps. 60

Blundell and coworkers reported the complete resolution and assignment of nuclei in hyaluronan oligosaccharides with seven different naturally occurring terminal rings. They simulated the non-first-order line shape of the H-2VII proton in HA6

AN (structure 7 in the original publication) and used this data in spectrum 65

assignment. GAMMA software377 utilized in the simulation employed multiple iterative rounds with floating values for 3JH,H and ∆δ(H-3,H-4) to find the best fit to the intensity data for H-2VII proton. All five protons within the GlcA ring were used to model the spin system378. 70

9. Conclusions

Based on the literature data discussed in the present review we compare the scope and performance of various computational approaches to predict the NMR parameters of carbohydrates. For a representative view, the accuracy of 13C NMR chemical 75

shifts calculations by different methods was summarized for two simple monosaccharides, α- and β-anomeric forms of D-glucose (Table 7). On average, empirical schemes provide better accuracy with RMS in the range of 0.07–2.51 ppm (in many cases less than 1 ppm). DFT and ab initio calculations have shown varying 80

performance with the RMS values varying from 1.75 to 9.81 ppm. The best accuracy was observed for the calculations at B3PW91/6-31+G(d)//MM+ level (RMS = 1.75 ppm)264 and at PBE/TZ2p level (RMS = 2.43 ppm)236. Methyl 6-O-(diphenylphospho)-a-D-glucopyranoside 85

[(PhO)2P(O)-O-6)α-D-Glcp-1OMe ], a sugar derivative with an uncommon substituent, can serve as a crucial test. Chemical shift simulation for such a compound is a complicated task both for density functional theory (due to complex electronic structure) and for empirical schemes (due to poor representation in 90

chemical shift databases). A summary of theoretical predictions by different methods reveals important trends (and excellent predictive potential for the rest of the molecule (RMS = 0.11 and 0.52 ppm). This difference in accuracy can be accounted for the rigid structure of a benzene ring and the presence of only a single 95

stereoisomer with the given atom connectivity, which is a structural fragment highly populated in the chemical shift database. Sucrose disacharide serves as a test of chemical shift prediction strength of molecules containing glycosidic bonds and 100

residues poorly populated in chemical shift databases, such as fructopyranose. Fig. 10 illustrates the results of empirical and quantum-mechanical calculations. Only GIAO calculation at B3LYP/6-311G++(2d,2p) level in COSMO water model (Fig. 10C) could accurately predict chemical shifts of carbons 105

adjascent to a glycosidic bond, while empirical prediction was the only one to produce a correct order of signals. The accuracy of quantum mechanical predictions was similar, except for the non-hydroxylated carbon of the fructose residue (F5). The statistical and performance data of these calculations are summarized in 110

Table 9. To conclude, recent outstanding progress in development of theoretical methods and rapid increase of hardware performance greatly facilitated the application of computational tools to NMR-

Page 32: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

32 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

based structural studies of carbohydrates. Nowadays, prediction of the NMR parameters of mono- and small oligosaccharides benefit from formalized procedures and became a routine task.

Nevertheless, in spite of widespread areas of potential application, the limitations of computational approaches are still 5

an important issue. Table 8). DFT calculations at B3LYP level provide reasonable accuracy for carbohydrate moiety (RMS = 3.28 ppm) and an unacceptable error for the rest of the molecule (RMS = 5.89 ppm). Changing density functional to PBE improves description 10

of the non-carbohydrate part of the molecule (RMS = 3.03 ppm), but shows poor results for a monosaccharide core (RMS = 9.37 ppm), allowing a suggestion that B3LYP is better adopted for conformationally flexible structures. An important point for theoretical calculations at density 15

functional and ab initio levels is a choice of reference. As summarized in and excellent predictive potential for the rest of the molecule (RMS = 0.11 and 0.52 ppm). This difference in accuracy can be accounted for the rigid structure of a benzene ring and the presence of only a single stereoisomer with the given 20

atom connectivity, which is a structural fragment highly populated in the chemical shift database. Sucrose disacharide serves as a test of chemical shift prediction strength of molecules containing glycosidic bonds and residues poorly populated in chemical shift databases, such as 25

fructopyranose. Fig. 10 illustrates the results of empirical and quantum-mechanical calculations. Only GIAO calculation at B3LYP/6-311G++(2d,2p) level in COSMO water model (Fig. 10C) could accurately predict chemical shifts of carbons adjascent to a glycosidic bond, while empirical prediction was the 30

only one to produce a correct order of signals. The accuracy of quantum mechanical predictions was similar, except for the non-hydroxylated carbon of the fructose residue (F5). The statistical and performance data of these calculations are summarized in Table 9. 35

To conclude, recent outstanding progress in development of theoretical methods and rapid increase of hardware performance greatly facilitated the application of computational tools to NMR-based structural studies of carbohydrates. Nowadays, prediction of the NMR parameters of mono- and small oligosaccharides 40

benefit from formalized procedures and became a routine task. Nevertheless, in spite of widespread areas of potential application, the limitations of computational approaches are still an important issue.

Table 8, noticeable difference in accuracy may be observed upon 45

changing a reference from benzene (RMS = 9.37, 3.03 ppm) to ethylene glycol (RMS = 9.31, 4.76 ppm). Despite the rare nature of the molecule, empirical methods gave good prediction with RMS deviation being as small as 2.01 ppm using carbohydrate-optimized BIOPSEL algorithm 50

(and excellent predictive potential for the rest of the molecule (RMS = 0.11 and 0.52 ppm). This difference in accuracy can be accounted for the rigid structure of a benzene ring and the presence of only a single stereoisomer with the given atom connectivity, which is a structural fragment highly populated in 55

the chemical shift database. Sucrose disacharide serves as a test of chemical shift prediction strength of molecules containing glycosidic bonds and residues poorly populated in chemical shift databases, such as fructopyranose. Fig. 10 illustrates the results of empirical and 60

quantum-mechanical calculations. Only GIAO calculation at B3LYP/6-311G++(2d,2p) level in COSMO water model (Fig.

10C) could accurately predict chemical shifts of carbons adjascent to a glycosidic bond, while empirical prediction was the only one to produce a correct order of signals. The accuracy of 65

quantum mechanical predictions was similar, except for the non-hydroxylated carbon of the fructose residue (F5). The statistical and performance data of these calculations are summarized in Table 9. To conclude, recent outstanding progress in development of 70

theoretical methods and rapid increase of hardware performance greatly facilitated the application of computational tools to NMR-based structural studies of carbohydrates. Nowadays, prediction of the NMR parameters of mono- and small oligosaccharides benefit from formalized procedures and became a routine task. 75

Nevertheless, in spite of widespread areas of potential application, the limitations of computational approaches are still an important issue.

Table 8). An obvious drawback of the specific empirical schemes is the limitation in the analyzed structures, since BIOPSEL 80

cannot predict chemical shifts of the aromatic part, while CASPER lacks the interface for phosphorus-bonded

monosaccharides. Empirical algorithms with non-specific databases demonstrated a good predictive potential for chemical shifts of the carbohydrate moiety (RMS = 3.14 and 3.83 ppm)85

Page 33: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 33

Table 7. Comparison of 13C NMR spectra of α- and β-glucopyranose calculated by various methods in solution or gas phase.

Method (chemical shifts // geometry)

13C NMR spectrum of D-Glcp a,c, ppm RMS error b,c, ppm

Notes and reference

Non-empirical methods B3LYP/pcJ//B3LYP/6-31G(d,p) 104.74 82.14 83.96 79.55 81.89 70.10 (α)

108.59 85.01 86.00 79.23 86.58 70.18 (β) 9.81 (α) 9.68 (β)

260

BP86/TZVP//BP86/TZVP 103.16 79.17 81.03 74.10 79.53 65.28 6.81 (α) for α-D-Glcp 4C1 276

B3LYP/cc-pVTZ//B3LYP/6-31G(d,p) 101.45 79.34 81.01 75.25 79.70 66.36 (α) 6.70 (α) gas phase, averaged through conformers254

ONIOM [MP2 : HF/6-311++G(2d,2p)]// MP2/cc-pVDZ

102.88 79.37 79.65 80.59 76.55 70.37 (β) 6.28 (β) for β-D-Glcp 4C1 T 263

HF SCF/TZVP//B3LYP/TZVP 86.52 67.63 69.34 63.24 68.58 57.19 (α) 5.37 (α) for α-D-Glcp 4C1 276

B3LYP/TZVP//B3LYP/TZVP 100.06 77.88 79.77 72.60 77.91 64.23 (α) 5.12 (α) MP2/TZVP//B3LYP/TZVP 99.95 77.70 79.51 72.15 77.32 64.09 (α) 4.87 (α) ONIOM [MP2 : HF/ 6-311++G(2d,2p)]// MP2/cc-pVDZ

102.93 79.18 80.15 74.83 80.28 66.88 (β) 4.50 (β) for β-D-Glcp 4C1 G+ 263

MP2/TZVP//MP2/TZVP 99.61 77.37 78.61 71.73 75.59 63.88 (α) 4.25 (α) for α-D-Glcp 4C1 276

ONIOM [MP2 : HF/ 6-311++G(2d,2p)]// MP2/cc-pVDZ

102.61 78.95 80.09 71.25 79.60 62.84 (β) 3.39 (β) for β-D-Glcp 4C1 G- 263

PBE/TZ2p 94.47 73.45 77.42 73.23 75.46 63.25 (α) 2.43 (α) normalized against ethylenglycol236 B3PW91/6-31+G(d)//MM+ 91.6 70.6 72.3 72.1 70.4 63.9 (α)

94.8 74.6 74.6 72.4 74.6 63.5 (β) 1.75 (α) 1.83 (β)

264

Empirical methods HOSE + neural net (ACDLabs 10) 95.33 74.67 76.68 70.71 76.66 62.10 (α)

95.33 74.67 76.68 70.71 76.66 62.10 (β) 2.51 (α) 0.65 (β)

required manual setting of geometry d

HOSE (MestreNova/ ModGraph) 95.54 74.73 74.52 70.41 76.24 62.11 (α) 95.54 74.73 74.52 70.41 76.24 62.11 (β)

2.15 (α) 1.09 (β)

results for α- and β-glucose were the same.

HOSE (ACDLabs 10) 95.56 73.08 73.73 70.13 74.93 62.38 (α) 95.56 73.08 73.73 70.13 74.93 62.38 (β)

1.55 (α) 1.78 (β)

required manual setting of geometry d

Incremental (BIOPSEL / BCSDB) 93.3 72.7 74.0 70.9 73.2 61.9 (α) 97.1 75.4 77.0 70.9 77.2 62.1 (β)

0.42 (α) 0.30 (β)

reported accuracy was “best” (mark 4 of 0..4)114

Incremental (CASPER) 92.99 72.47 73.78 70.71 72.37 61.84 (α) 96.84 75.20 76.76 70.71 76.76 61.84 (β)

0.08 (α) 0.07 (β)

reported expected error was 0.00 117

Experimental data 30°C, in water 92.77 72.15 73.43 70.32 72.10 61.27 (α)

96.59 74.81 76.43 70.27 76.61 61.42 (β) 260

in water 93.3 72.7 74.0 70.9 72.7 61.9 (α) 97.1 75.4 77.0 70.9 77.2 62.1 (β)

112

70°C, in water 92.99 72.47 73.78 70.71 72.37 61.84 (α) 96.84 75.20 76.76 70.71 76.76 61.84 (β)

119

92.9 72.5 73.8 70.6 72.3 61.6 (α) 96.7 75.1 76.7 70.6 76.8 61.7 (β)

318

25°C, in water 92.48 71.87 73.15 70.04 71.82 60.99 (α) 254 93.3 73.1 74.4 71.2 72.9 62.4 (α) 379

a Signals are in the order of atom enumeration (C1-C6)

b RMS error was calculated against the experimental spectrum averaged from six listed sources.

c Anomeric configurations are in parentheses.

d ACDLabs could not properly minimize the geometry starting from the template “Glc”. Prior to calculation, the starting geometry of α- and β-Glcp was 5

manually set with subsequent built-in MM2 minimization. Results for α- and β-glucose in water were the same.

and excellent predictive potential for the rest of the molecule (RMS = 0.11 and 0.52 ppm). This difference in accuracy can be accounted for the rigid structure of a benzene ring and the presence of only a single stereoisomer with the given atom 10

connectivity, which is a structural fragment highly populated in the chemical shift database. Sucrose disacharide serves as a test of chemical shift prediction strength of molecules containing glycosidic bonds and residues poorly populated in chemical shift databases, such as 15

fructopyranose. Fig. 10 illustrates the results of empirical and quantum-mechanical calculations. Only GIAO calculation at B3LYP/6-311G++(2d,2p) level in COSMO water model (Fig. 10C) could accurately predict chemical shifts of carbons adjascent to a glycosidic bond, while empirical prediction was the 20

only one to produce a correct order of signals. The accuracy of quantum mechanical predictions was similar, except for the non-hydroxylated carbon of the fructose residue (F5). The statistical and performance data of these calculations are summarized in Table 9. 25

To conclude, recent outstanding progress in development of theoretical methods and rapid increase of hardware performance greatly facilitated the application of computational tools to NMR-based structural studies of carbohydrates. Nowadays, prediction of the NMR parameters of mono- and small oligosaccharides 30

benefit from formalized procedures and became a routine task. Nevertheless, in spite of widespread areas of potential application, the limitations of computational approaches are still an important issue.

Page 34: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 34

Table 8. Prediction of 13C NMR chemical shifts (in ppm) for (PhO)2P(O)-O-6)α-D-Glcp-1OMe using different methods.

Method Software Chemical shifts of (PhO)2P(O)-O-6)α-D-Glcp-1OMe RMS error b

C1 C2 C3 C4 C5 C6 Me i-Ph o-Ph m-Ph p-Ph Glc part

Other atoms

PBE/TZ2p (geometry, NMR)

236

PRIRODAa 380, normalized against benzene (default)

106.32 76.67 79.97 74.38 74.60 50.63 54.61 157.72, 158.47

120.08, 119.18

129.31, 128.89

124.39, 122.96

9.37 3.03

PRIRODAa 380, normalized against

ethylene glycol 102.25 72.60 75.90 70.31 70.53 46.56 50.54 154.02

116.01, 115.11

125.24, 124.82

119.61 9.31 4.76

B3LYP/ 6-31G(d) (geometry, NMR)

268

Gaussian 03 135 98.02 72.42 74.11 77.16 69.90 69.31 55.20 145.25 114.95 122.55 118.02 3.28 5.89

HOSE and its variations

Mestre Nova a 73, 76 103.31 72.66 74.67 70.79 75.08 66.89 55.81 151.31 120.53 130.07 126.15 3.83

0.52

ACDLabs a 68, 381 102.44 71.84 75.90 69.94 72.85 67.38 55.50 150.60 120.10 129.80 125.50 3.14 0.11 Incremental at residual level

BIOPSEL 113, 115 99.3 72.7 74.0 70.9 72.7 66.9 no output 2.01 - poor accuracy reported с

CASPER 116, 121 no output (P-linked sugars are not supported) - - Experiment in CDCl3

268 95.51 71.88 73.92 69.63 70.29 68.22 55.27 150.45 120.12 129.84 125.52 0 0

a Starting geometry was obtained by MM2, then re-optimized in the specified software. b RMS was calculated against experimental data in CDCl3. c BIOPSEL reported the calculation accuracy as poor (mark 1 of 0..4) for a saccharide part and as 0 of 0..4 for a non-saccharide part.

Empirical schemes are easy to use and provide good accuracy 5

for compounds possessing a widespread structural motif, while the performance for molecules with atypical substituents (not parameterized for) or with an unexpected secondary structure can be poor. Further extension of databases and algorithmic improvements are expected to enhance the application of 10

empirical algorithms. In contrast to the empirical methods, DFT and ab initio calculations should be suitable for computational predictions of the NMR parameters for any carbohydrate structures and substituents from the first principles. A careful selection of the 15

theory levels used for geometry optimization and NMR calculation allows achievement of reasonable accuracy for small systems (monosaccharides). An outstanding advantage of DFT and ab initio calculations is the ability to predict the NMR

parameters other than chemical shifts. Particularly, its successful 20

applications for the prediction of spin-spin coupling constants, relaxation rates, NOEs, chemical shifts in atypical solvents, and conformation-specific NMR parameters have been reported. Finally, we anticipate a rapid progress in application of DFT and ab initio calculations for prediction of the NMR parameters 25

of carbohydrates with more dedicated impact on everyday practical needs in experimental structure determination. At the same time, empirical correlations are still in use for routine NMR predictions and for glycans built up of three and more sugar residues, since right now it is too early to expect superior 30

performance of DFT calculations in all aspects of carbohydrate structure analysis.

Table 9. The performance of empirical and density functional prediction of 13C NMR spectrum of sucrose. a 35

Parameters BCSDB/BIOPSEL b 114

(empirical) GIAO at

PBE/TZ2p level c 236 GIAO at

B3LYP/6-311G(2d,p) level c COSMO / GIAO at

B3LYP/6-311G++(2d,2p) level d

RMS, ppm 2.6 5.4 4.6 3.9

Linear correlation 0.994 0.981 0.953 0.981

Calculation time e <0.1 sec 29 min 12.5 hours 67.8 hours

a Reference experimental data are from the 13C NMR spectrum at 25°C in D2O.

b Used chemical shift database is dedicated to water solutions of carbohydrates.

c The spectrum was normalized against chemical shifts of ethylene glycol (were 63.4 ppm in CDCl3 379). The geometry was optimized in the same basis set.

d The calculations were carried out in COSMO water model. The geometry was optimized in the same basis set. The spectrum was normalized against 40

ethylene glycol in CDCl3 (63.4 ppm379), while normalization against ethylene glycol in D2O (67.3 ppm379) gave an RMS error of 7.4 ppm.

e Performance data were obtained on a personal computer with Intel Core 2 quad-core processor running at 3.0 GHz.

Page 35: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 35

Fig. 10. Deviation of 13C chemical shifts of sucrose disaccharide predicted by different methods versus experimental spectrum. Dashed lines represent

correlations between signals. Black values (A) were predicted empirically using an incremental schema. Experimental spectrum (B) was recorded at 25°C in D2O. Signal assignment is denoted by F (fructose residue) or G (glucose residue) symbol and the carbon number. Red values (C) were calculated at

PBE/TZ2p level, and green values (D) were calculated at B3LYP/6-311G++(2d,2p) level using COSMO water model. See calculation details in Table 9. 5

10. Abbreviations

Force fields in italic. QM theory levels in bold. QM functionals and basis sets in bold-italic. 10

3D – three-dimensional 3D HOSE – enhanced HOSE that utilizes stereochemistry

information AMBER – assisted model building with energy refinement 15

B3LYP – Becke three-parameter with Lee-Yang-Parr B3PW91 – Becke three-parameter with Perdew-Wang 91 BD, BD(T), BD(TQ) – Brückner energies (including doubles, triples 20

and quadruples)

BPT – bond polarization theory CCS, CCSD, CCSDT – coupled cluster methods (including singles, 25

doubles and triples) CHARMM – chemistry at Harvard macromolecular mechanics CHF - coupled Hartree-Fock CI – configuration interaction CICADA – channels in conformational space analyzed by 30

driver approach COSMO – conductor-like screening model COSMOS – computer simulations of molecular structures CPCM – conductor-like polarizable continuum model CP/MAS - cross-polarization / magic angle spinning 35

CS – (isotropic) chemical shift CSS – chemical shift surface

Page 36: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

36 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

CST – chemical shielding tensor CSFF – carbohydrate solution force field CSGT – continuous set of gauge transformations IGAIM – individual gauges for atoms in molecules DFT – density functional theory 5

DFT-D – density functional theory with distance-dependent dispersive term

DPCM – dielectric polarizable continuum model DSO - diamagnetic spin orbit (term) FC - Fermi contact (term) 10

FF-DPT - finite-field double perturbation theory GIAO – gauge-including atomic orbital GIPAW – gauge-including projector augmented-wave GLYCAM – glycan molecular mechanics force field GROMOS – Groningen molecular simulation 15

HOSE – hierarchical organization of spherical environments HF – Hartree-Fock IGLO – individual gauge for localized orbitals LANL2DZ – Los Alamos national laboratory 2-double-z LD - locally dense 20

MAD – mean absolute deviation MCD – mode-coupling diffusion MD – molecular dynamics MM2, MM3 – molecular mechanics MNDO – modified neglect of differential overlap 25

MPx – Møller–Plesset perturbation theory at order x NMR – nuclear magnetic resonance NOE – nuclear Overhauser effect ONIOM – our own n-layer integrated molecular orbital and

molecular mechanics approach 30

OPLS-AA – optimized potentials for liquid simulations - all-atom

PBE – Perdew-Burke-Ernzerhof PCM – polarizable continuum model PHLB – Palma-Himmel-Liang-Brady 35

PSO -paramagnetic spin orbit (term) QCISD, QCISD(T), QCISD(TQ) – quadratic configuration interaction RAI – recoupling of anisotropy information 40

RMS – root mean square SCF – self-consistent field SD -spin dipolar (term) TIP3P – transferable intermolecular potential, 3 point 45

Compound names used in the review: 2HOMe-THP - 2-hydroxymethyltetrahydropyran 2-deoxy-eryPen – 2-deoxy-erythropentose Ara – arabinose DMSO – dimethylsulphoxide 50

Gal – galactose GalNAc - 2-acetamido-2-deoxygalactose Glc – glucose GlcNAc - 2-acetamido-2-deoxyglucose Ery – erythrose 55

Fuc – fucose Ido – idose IdopA2S – 2-sulpho-iduronic acid

Lyx – lyxose Rib – ribose 60

TMS - tetramethylsilane Xyl – xylose

11. Acknowledgements

This review was written in the framework of development of the NMR prediction engine of Carbohydrate Structure Database 65

funded by Russian Foundation for Basic Research, grants 05-07-90099 and 04-12-00324. Authors thank Prof. Y.A. Knirel and Prof. A.S. Shashkov for critical reading.

12. References 70

1. N. Gaidzik, U. Westerlind and H. Kunz, Chem Soc Rev, 2013, 42,

4421-4442.

2. R. D. Astronomo and D. R. Burton, Nat Rev Drug Discov, 2010, 9,

308-324.

3. T. J. Boltje, T. Buskas and G. J. Boons, Nat Chem, 2009, 1, 611-622. 75

4. B. Ernst and J. L. Magnani, Nat Rev Drug Discov, 2009, 8, 661-677.

5. M. A. Johnson and D. R. Bundle, Chem Soc Rev, 2013, 42, 4327-

4344.

6. A. W. Barb and J. H. Prestegard, Nat Chem Biol, 2011, 7, 147-153.

7. M. C. Gambetta, K. Oktaba and J. Müller, Science, 2009, 325, 93-96. 80

8. M. Molinari, Nat Chem Biol, 2007, 3, 313-320.

9. P. C. Pang, P. C. N. Chiu, C. L. Lee, L. Y. Chang, M. Panico, H. R.

Morris, S. M. Haslam, K. H. Khoo, G. F. Clark, W. S. B. Yeung and

A. Dell, Science, 2011, 333, 1761-1764.

10. G. A. Rabinovich and M. A. Toscano, Nat Rev Immun, 2009, 9, 338-85

352.

11. T. Yoshida-Moriguchi, L. Yu, S. H. Stalnaker, S. Davis, S. Kunz, M.

Madson, M. B. A. Oldstone, H. Schachter, L. Wells and K. P.

Campbell, Science, 2010, 327, 88-92.

12. О. Alper, Science, 2001, 291, 2338-2343. 90

13. Z. Shriver, S. Raguram and К. Sasisekharan, Nat Rev Drug Discov,

2004, 3, 863-873.

14. N. C. Reichardt, M. Martin-Lomas and S. Penades, Chem Soc Rev,

2013, 42, 4358-4376.

15. J. F. G. Vliegenthart and R. J. Woods, NMR spectroscopy and 95

computer modeling of carbohydrates: recent advances, 2006.

16. R. A. Dwek, Chem Rev, 1996, 96, 683-720.

17. C. J. Jones and C. K. Larive, Nat Chem Biol, 2011, 7, 758-759.

18. D. Mohnen and M. L. Tierney, Science, 2011, 332, 1393-1394.

19. S. M. Velasquez, M. M. Ricardi, J. G. Dorosz, P. V. Fernandez, A. D. 100

Nadra, L. Pol-Fachin, J. Egelund, S. Gille, J. Harholt, M. Ciancia, H.

Verli, M. Pauly, A. Bacic, C. E. Olsen, P. Ulvskov, B. L. Petersen, C.

Somerville, N. D. Iusem and J. M. Estevez, Science, 2011, 332,

1401-1403.

20. V. Wittmann and R. J. Pieters, Chem Soc Rev, 2013, 42, 4492-4503. 105

21. L. L. Kiessling and J. C. Grim, Chem Soc Rev, 2013, 42, 4476-4491.

22. C.-I. Lin, R. M. McCarty and H.-w. Liu, Chem Soc Rev, 2013, 42,

4377-4407.

23. S. Park, J. C. Gildersleeve, O. Blixt and I. Shin, Chem Soc Rev, 2013,

42, 4310-4326. 110

Page 37: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 37

24. J. Hirabayashi, M. Yamada, A. Kuno and H. Tateno, Chem Soc Rev,

2013, 42, 4443-4458.

25. S. H. Rouhanifard, L. U. Nordstrom, T. Zheng and P. Wu, Chem Soc

Rev, 2013, 42, 4284-4296.

26. USA Pat., 6346604, 2002. 5

27. N. A. Kocharova, O. G. Ovchinnikova, I. S. Bushmarinov, F. V.

Toukach, A. Torzewska, A. S. Shashkov, Y. A. Knirel and A.

Rozalski, Carbohydr Res, 2005, 340, 775-780.

28. A. Corma, S. Iborra and A. Velty, Chem Rev, 2007, 107, 2411-2502.

29. H. Röper, Starch, 2002, 54, 89-99. 10

30. L. D. Schmidt and P. J. Dauenhauer, Nature, 2007, 447, 914-915.

31. T. Werpy and G. Petersen, Top value added chemicals from biomass.

Vol. I. Results of screening for potential candidates from sugars and

synthesis gas, U.S. Department of Energy, Golden, CO, 2004.

32. T. Ståhlberg, W. Fu, J. M. Woodley and A. Riisager, ChemSusChem, 15

2011, 4, 451-458.

33. S. Van de Vyver, J. Geboers, P. A. Jacobs and B. F. Sels,

ChemCatChem, 2011, 3, 82-94.

34. M. E. Zakrzewska, E. Bogel-Łukasik and R. Bogel-Łukasik, Chem

Rev, 2011, 111, 397-417. 20

35. M. Chidambaram and A. T. Bell, Green Chem, 2010, 12, 1253-1262.

36. A. Imberty and S. Pérez, Chem Rev, 2000, 100, 4567-4588.

37. S. Pérez, C. Gauthier and A. Imberty, in Oligosaccharides in

chemistry and biology: a comprehensive handbook, eds. B. Ernst, G.

Hart and P. Sinay, Wiley/VCH: Weinheim, 2000, pp. 969-1001. 25

38. M. Hricovíni, Curr Med Chem, 2004, 11, 2565-2583.

39. R. Stenutz, The structure and conformation of saccharides

determined by experiment and simulation, Stockholm University,

1997.

40. H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. 30

Weissig, I. N. Shindyalov and P. E. Bourne, Nucleic Acids Res, 2000,

28, 235-242.

41. N. E. Chayen, Prog Biophys Mol Biol, 2005, 88, 329-337.

42. D. Stock, O. Perisic and J. Löwe, Prog Biophys Mol Biol, 2005, 88,

311-327. 35

43. J. Duus, C. H. Gotfredsen and K. Bock, Chem Rev, 2000, 100,

4589−4614.

44. M. Frank and S. Schloissnig, Cell Mol Life Sci, 2010, 67, 2749-2772.

45. J. Jiménez-Barbero, M. D. Díaz and P. M. Nieto, Anticancer Agents

Med Chem, 2008, 8, 52-63. 40

46. V. Roldós, F. J. Cañada and J. Jiménez-Barbero, ChemBioChem,

2011, 12, 990-1005.

47. F. Nicotra, L. Cipolla, B. La Ferla, C. Airoldi, C. Zona, A. Orsato, N.

Shaikh and L. Russo, J Biotechnol, 2009, 144, 234-241.

48. T. R. Rudd, E. A. Yates and M. Hricovíni, Curr Med Chem, 2009, 45

16, 4750-4766.

49. C. Jones, J Pharm Biomed Anal, 2005, 38, 840-850.

50. W. H. Organisation, World Health Organisation, Tech. Rep., Ser.927,

World Health Organisation, 2005.

51. W. A. Bubb, Concepts in Magn Reson A, 2003, 19A, 1-19. 50

52. H. S. Atreya and T. Szyperski, Methods Enzymol, 2005, 394, 78-108.

53. P. Guntert, Prog Nucl Magn Reson Spectrosc, 2003, 43, 105-125.

54. M. Kainosho, T. Torizawa, Y. Iwashita, T. Terauchi, A. M. Ono and

P. Guntert, Nature, 2006, 440, 52-57.

55. D. Malmodin and M. Billeter, Prog Nucl Magn Reson Spectrosc, 55

2005, 46, 109-129.

56. R. C. Tyler, D. J. Aceti, C. A. Bingman, C. C. Cornilescu, B. G. Fox,

R. O. Frederick, W. B. Jeon, M. S. Lee, C. S. Newman, F. C.

Peterson, G. N. Phillips, Jr., M. N. Shahan, S. Singh, J. Song, H. K.

Sreenath, E. M. Tyler, E. L. Ulrich, D. A. Vinarov, F. C. Vojtik, B. F. 60

Volkman, R. L. Wrobel, Q. Zhao and J. L. Markley, Proteins, 2005,

59, 633-643.

57. T. Lütteke, Beilstein J Org Chem, 2012, 8, 915-929.

58. G. I. Csonka, K. Elias and I. G. Csizmadia, Chem Phys Lett, 1996,

257, 49-60. 65

59. M. I. Bilan, A. G. Grachev, A. S. Shashkov, N. E. Nifantiev and A. I.

Usov, Carbohydr Res, 2006, 341, 238-245.

60. M. E. Elyashberg, A. J. Williams and G. E. Martin, Prog Nucl Magn

Reson Spectrosc, 2008, 53, 1-104.

61. M. W. Lodewyk, M. R. Siebert and D. J. Tantillo, Chem Rev, 2011, 70

112, 1839-1862.

62. L. B. Casabianca and A. C. de Dios, J Chem Phys, 2008, 128,

052201.

63. T. Helgaker, M. Jaszuński and M. Pecul, Prog Nucl Magn Reson

Spectrosc, 2008, 53, 249-268. 75

64. T. Helgaker, M. Jaszuński and K. Ruud, Chem Rev, 1999, 99, 293-

352.

65. J. Vaara, Phys Chem Chem Phys, 2007, 9, 5399-5418.

66. J. F. G. Vliegenthart, in NMR spectroscopy and computer modeling

of carbohydrates: recent advances, eds. J. F. G. Vliegenthart and R. 80

J. Woods, 2006, vol. 930, pp. 1-19.

67. _, Upstream Solutions. NMR prediction,

http://www.upstream.ch/products/nmr.html#Prediction%20Quality,

Accessed 2013 May 15.

68. Y. D. Smurnyy, K. A. Blinov, T. S. Churanova, M. E. Elyashberg 85

and A. J. Williams, J Chem Inf Model, 2008, 48, 128-134.

69. M. Elyashberg, K. Blinov, Y. D. Smurnyy, T. S. Churanova and A.

Williams, Magn Reson Chem, 2010, 48, 219-229.

70. M. Elyashberg, K. Blinov and A. Williams, Magn Reson Chem,

2009, 47, 371-389. 90

71. W. Bremser, Anal Chim Acta, 1978, 103, 355-365.

72. R. R. Sasaki and B. A. Lefebvre, Burlington, VT, 2006.

73. _, Modgraph. NMRPredict,

http://www.modgraph.co.uk/product_nmr.htm, Accessed 2013 May

15. 95

74. A. Williams, B. Lefebvre and R. Sasaki, Putting ACD/NMR

predictors to the test,

http://web.archive.org/web/20080902230404/http://www.acdlabs.co

m/products/spec_lab/predict_nmr/chemnmr/, Accessed 2012 Oct 20.

75. _, ACD/Labs. ACD/NMR databases, 100

http://www.acdlabs.com/products/dbs/nmr_db/, Accessed 2012 Oct

20.

76. _, MestreLab Research. Mnova NMRPredict,

http://mestrelab.com/software/mnova-nmrpredict-desktop/, Accessed

2013 May 15. 105

77. _, PerkinElmer. ChemBioOffice,

http://www.cambridgesoft.com/Ensemble_for_Chemistry/ChemBioO

ffice/, Accessed 2013 May 15.

Page 38: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

38 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

78. N. Haider and W. Robien, NMRPREDICT-Server,

http://nmrpredict.orc.univie.ac.at/, Accessed 2013 May 15.

79. V. Schütz, V. Purtuc, S. Felsinger and W. Robien, Fresenius J Anal

Chem, 1997, 359, 33-41.

80. Robien, Nachr Chem Tech Lab, 1998, 46, 74-77. 5

81. W. Bremser and M. Grzonka, Microchim Acta, 1991, 104, 1-6.

82. S. V. Trepalin, A. V. Yarkov, L. M. Dolmatova, N. S. Zefirov and S.

A. E. Finch, J Chem Inf Comput Sci, 1995, 35, 405-411.

83. C. Steinbeck, S. Krause and S. Kuhn, J Chem Inf Comput Sci, 2003,

43, 1733-1739. 10

84. H. Satoh, H. Koshino, K. Funatsu and T. Nakata, J Chem Inf Comput

Sci, 2001, 41, 1106-1112.

85. H. Satoh, H. Koshino, J. Uzawa and T. Nakata, Tetrahedron, 2003,

59, 4539-4547.

86. B. P. Kelleher and A. J. Simpson, Environ Sci Technol, 2006, 40, 15

4605-4611.

87. M. W. I. Schmidt and A. G. Noack, Global Biogeochem Cycles,

2000, 14, 777-793.

88. H. M. Cartwright, in Artificial neural networks: methods and

applications, ed. D. J. Livingstone, 2008, vol. 458, pp. 1-13. 20

89. L. Terfloth and J. Gasteiger, Drug Discov Today, 2001, 6(15) Suppl.,

S102-S108.

90. J. Zou, Y. Han and S. S. So, in Artificial neural networks: methods

and applications, ed. D. J. Livingstone, 2008, vol. 458, pp. 14-22.

91. M. Jalali-Heravi, in Artificial neural networks: methods and 25

applications, ed. D. J. Livingstone, 2008, vol. 458, pp. 78-118.

92. J. P. Radomski, H. van Halbeek and B. Meyer, Nat Struct Biol, 1994,

1, 217-218.

93. J. Aires-de-Sousa, M. C. Hemmer and Gasteiger, Anal Chem, 2002,

74, 80-90. 30

94. Y. Shen and A. Bax, J Biomol NMR, 2010, 48, 13-22.

95. A. G. Gerbst, A. A. Grachev, N. E. Ustuzhanina, N. E. Nifantiev, A.

A. Vyboichtchik, A. S. Shashkov and A. I. Usov, J Carbohyd Chem,

2010, 29, 92-102.

96. _, Modgraph. Neural Network Prediction, 35

http://www.modgraph.co.uk/product_nmr_network.htm, Accessed

2013 May 15.

97. V. Purtuc, V. Schütz, S. Felsinger and W. Robien, Estimation of 13C-

NMR chemical shift values using neural network technology,

http://homepage.univie.ac.at/wolfgang.robien/wr_alpha.html, 40

Accessed 2013 May 15.

98. С. Le Bret, SAR QSAR Env Res, 2000, 11, 211-234.

99. J. Meiler, R. Meusinger and M. Will, J Chem Inf Comput Sci, 2000,

40, 1169-1176.

100. J. Meiler and M. Will, J Chem Inf Comput Sci, 2001, 41, 1535-1546. 45

101. J. H. Holland, Adaptation in natural and artificial systems, MIT

Press Cambridge, MA, USA, 1992.

102. J. Meiler, W. Maier, M. Will and R. Meusinger, J Magn Reson, 2002,

157, 242-252.

103. Y. D. Smurnyy, K. A. Blinov and B. A. Lefebvre, Pacific Groove, 50

CA, 2006.

104. M. K. McIntyre and G. W. Small, Anal Chem, 1987, 59, 1805-1811.

105. K. A. Blinov, Y. D. Smurnyy, T. S. Churanova, M. E. Elyashberg

and A. J. Williams, Chemometrics and Intelligent Laboratory

Systems, 2009, 97, 91-97. 55

106. B. E. Mitchell and P. C. Jurs, J Chem Inf Comput Sci, 1996, 36, 58-

64.

107. D. L. Clouser and P. C. Jurs, J Chem Inf Comput Sci, 1996, 36, 168-

172.

108. R. J. Abraham and M. Mobli, Spectrosc Eur, 2004, 4, 16-23. 60

109. R. J. Abraham, J. J. Byrne, L. Griffiths and R. Koniotou, Magn Reson

Chem, 2005, 43, 611-624.

110. R. Bürgin Schaller, M. E. Munk and E. Pretsch, J Chem Inf Comput

Sci, 1996, 36, 239-243.

111. E. Escalante-Sanchez and R. Pereda-Miranda, J Nat Prod, 2007, 70, 65

1029-1034.

112. G. M. Lipkind, A. S. Shashkov, Y. A. Knirel, E. V. Vinogradov and

N. K. Kochetkov, Carbohydr Res, 1988, 175, 59-75.

113. F. V. Toukach and A. S. Shashkov, Carbohydr Res, 2001, 335, 101-

114. 70

114. F. V. Toukach, J Chem Inf Model, 2011, 51, 159-170.

115. F. V. Toukach, Bacterial CSDB: 13C NMR prediction,

http://csdb.glycoscience.ru/help/nmr.html, Accessed 2013 May 15.

116. G. Widmalm, Casper, http://www.casper.organ.su.se/casper/,

Accessed 2013 May 15. 75

117. M. Lundborg and G. Widmalm, Anal Chem, 2011, 83, 1514-1517.

118. M. Lundborg, C. Fontana and G. Widmalm, Biomacromolecules,

2011, 12, 3851-3855.

119. P. E. Jansson, L. Kenne and G. Widmalm, J Chem Inf Comput Sci,

1991, 31, 508-516. 80

120. A. Nahmany, F. Strino, J. Rosen, G. J. Kemp and P. G. Nyholm,

Carbohydr Res, 2005, 340, 1059-1064.

121. P. E. Jansson, R. Stenutz and G. Widmalm, Carbohydr Res, 2006,

341, 1003-1010.

122. J. D. Dyekjaer and K. Rasmussen, Mini Rev Med Chem, 2003, 3, 85

713-717.

123. E. Fadda and R. J. Woods, Drug Discov Today, 2010, 15, 596-609.

124. A. G. Gerbst, A. A. Grachev, A. S. Shashkov and N. E. Nifant'ev,

Rus J Bioorg Chem, 2007, 33, 24-37.

125. R. J. Woods and M. B. Tessier, Curr Opin Struct Biol, 2010, 20, 575-90

583.

126. U. Burkert and N. Allinger, Molecular mechanics, American

Chemical Society, Washington, DC, 1982.

127. _, Wavefunction, Inc. Spartan software,

http://www.wavefun.com/products/spartan.html, Accessed 2013 May 95

15.

128. _, Schrödinger. MacroModel,

http://www.schrodinger.com/products/14/11/, Accessed 2013 May

15.

129. F. Mohamadi, N. G. J. Richard, W. C. Guida, R. Liskamp, M. Lipton, 100

C. Caufield, G. Chang, T. Hendrickson and W. C. Still, J Comput

Chem, 1990, 11, 440-467.

130. D. Paschek and A. Geiger, Department of Physical Chemistry

University of Dortmund, Dortmund, Germany., MOSCITO 4. edn.,

2002. 105

131. _, Moscito, 139.30.122.11/MOSCITO/, Accessed 2013 May 15.

132. M. Möllhoff and U. Sternberg, J Mol Mod, 2001, 7, 90-102.

133. U. Sternberg, F. T. Koch and P. Losso, COSMOS. Computer-

Simulation von Molekül-Strukturen, http://www.cosmos-software.de/,

Accessed 2013 May 15. 110

Page 39: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 39

134. _, Gaussian Inc. Gaussian,

http://www.gaussian.com/g_prod/g09.htm, Accessed 2013 May 15.

135. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A.

Robb, J. R. Cheeseman, J. A. Montgomery, Jr., T. Vreven, K. N.

Kudin, J. C. Burant, J. M. Millam, S. S. Iyengar, J. Tomasi, V. 5

Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A.

Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J.

Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai,

M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, C. Adamo,

J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, 10

R. Cammi, C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma,

G. A. Voth, P. Salvador, J. J. Dannenberg, V. G. Zakrzewski, S.

Dapprich, A. D. Daniels, M. C. Strain, O. Farkas, D. K. Malick, A.

D. Rabuck, K. Raghavachari, J. B. Foresman, J. V. Ortiz, Q. Cui, A.

G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov, G. Liu, A. 15

Liashenko, P. Piskorz, I. Komaromi, R. L. Martin, D. J. Fox, T.

Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, M.

Challacombe, P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, C.

Gonzalez and J. A. Pople, Gaussian Inc., Wallingford CT., Gaussian

03, Revision C.02. edn., 2004. 20

136. M. S. Gordon and M. W. Schmidt, in Theory and applications of the

computational chemistry: the first 40 years, eds. C. E. Dykstra, G.

Frenkling, K. S. Kim and G. Scuseria, Elsevier Science, Amsterdam,

2005, pp. 1167-1189.

137. M. Gorgon, Gamess, http://www.msg.ameslab.gov/GAMESS/, 25

Accessed 2013 May 15.

138. M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S.

Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su,

T. L. Windus, M. Dupuis and J. A. Montgomery, J Comput Chem,

1993, 14, 1347-1363. 30

139. _, Hypercube, Inc. HyperChem,

http://www.hyper.com/Products/tabid/354/Default.aspx, Accessed

2013 May 15.

140. M. Froimowitz, Biotechniques, 1993, 14, 1010-1013.

141. S. A. Adcock and J. A. McCammon, Chem Rev, 2006, 106, 1589-35

1615.

142. L. M. Kroon-Batenburg, J. Kroon and B. R. Leeflang, Carbohydr

Res, 1993, 245, 21-42.

143. J. Landstrom and G. Widmalm, Carbohydr Res, 2010, 345, 330-333.

144. G. Widmalm, R. A. Byrd and W. Egan, Carbohydr Res, 1992, 229, 40

195-211.

145. Y. Sugita and Y. Okamoto, Chem Phys Lett, 2000, 329, 261-270.

146. S. Re, W. Nishima, N. Miyashita and Y. Sugita, Biophys Rev, 2012,

4, 179-187.

147. J. B. Foresman and A. E. Frisch, Exploring chemistry with electronic 45

structure methods, 2nd ed, Gaussian Inc., 1996.

148. T. J. Rutherford, J. Wilkie, C. Q. Vu, K. D. Schnackerz, M. K.

Jacobson and D. Gani, Nucleosides Nucleotides Nucleic Acids, 2001,

20, 1485-1495.

149. M. Rahal-Sekkal, N. Sekkal, D. C. Kleb and P. Bleckmann, J Comput 50

Chem, 2003, 24, 806-818.

150. E. P. O’Brien and G. Moyna, Carbohydr Res, 2004, 339, 87-96.

151. I. Sergeev and G. Moyna, Carbohydr Res, 2005, 340, 1165-1174.

152. C. W. Swalina, R. J. Zauhar, M. J. DeGrazia and G. Moyna, J Biomol

NMR, 2001, 21, 49-61. 55

153. J. Stewart, J Mol Model, 2004, 13, 1173-1213.

154. P. J. Madeira, N. M. Xavier, A. P. Rauter and M. H. Florêncio, J

Mass Spectrom, 2010, 45, 1167-1178.

155. U. Sternberg, J Mol Phys, 1988, 63, 249-267.

156. U. Sternberg and W. Priess, J Magn Reson, 1997, 125, 8-19. 60

157. D. Sebastiani, G. Goward, I. Schnell and M. Parrinello, Comput Phys

Commun, 2002, 147, 707-710.

158. U. Sternberg, F. T. Koch, W. Priess and R. Witter, Cellulose, 2003,

10, 189-199.

159. F. Casset, A. Imberty, C. Herve du Penhoat, J. Koca and S. Perez, J 65

Mol Struct, 1997, 395-396, 211-224.

160. _, Serena Software. PCModel, http://www.serenasoft.com/, Accessed

2013 May 15.

161. _, Tinker Molecular modelling, http://dasher.wustl.edu/ffe/, Accessed

2013 May 15. 70

162. N. L. Allinger, Y. H. Yuh and J. H. Lii, J Am Chem Soc, 1989, 1,

8551-8566.

163. A. Hocquet and M. Langgård, J Mol Model, 1998, 4, 94-112.

164. B. R. R. Brooks, C. L. L. Brooks, A. D. D. Mackerell, L. Nilsson, R.

J. J. Petrella, B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, 75

A. Caflisch, L. Caves, Q. Cui, A. R. R. Dinner, M. Feig, S. Fischer, J.

Gao, M. Hodoscek, W. Im, K. Kuczera, T. Lazaridis, J. Ma, V.

Ovchinnikov, E. Paci, R. W. W. Pastor, C. B. B. Post, J. Z. Z. Pu, M.

Schaefer, B. Tidor, R. M. M. Venable, H. L. L. Woodcock, X. Wu,

W. Yang, D. M. M. York and M. Karplus, J Comput Chem, 2009, 30, 80

1545-1614.

165. B. Hess, C. Kutzner, D. Van Der Spoel and E. Lindahl, J Chem

Theory Comput, 2008, 4, 435-447.

166. D. Van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and

H. J. Berendsen, J Comput Chem, 2005, 26, 1701-1718. 85

167. B. R. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S.

Swaminathan and M. Karplus, J Comput Chem, 1983, 4, 187-217.

168. A. D. MacKerell, J. N. Banavali and N. Foloppe, Biopolymers, 2001,

56, 257-265.

169. A. D. MacKerell, Jr., B. Brooks, C. L. Brooks, III, L. Nilsson, B. 90

Roux, Y. Won and M. Karplus, in The encyclopedia of computational

chemistry, ed. P. V. R. Schleyer, John Wiley & Sons, Chichester,

1998, vol. 1, pp. 271-277.

170. O. Guvench, S. N. Greene, G. Kamath, J. W. Brady, R. M. Venable,

R. W. Pastor and A. D. Mackerell Jr, J Comput Chem, 2008, 29, 95

2543-2564.

171. O. Guvench, E. Hatcher, R. M. Venable, R. W. Pastor and A. D.

MacKerell, J Chem Theory Comput, 2009, 5, 2353-2370.

172. E. R. Hatcher, O. Guvench and A. D. MacKerell Jr, J Chem Theory

Comput, 2009, 5, 1315-1327. 100

173. R. Eklund and G. Widmalm, Carbohydr Res, 2003, 338, 393-398.

174. R. U. Lemieux, K. Bock, L. T. J. Delbaere, S. Koto and V. S. Rao,

Can J Chem, 1980, 58, 631-653.

175. M. L. C. E. Kouwijzer and P. D. J. Grootenhuis, J Phys Chem, 1995,

99, 13426-13436. 105

176. S. N. Ha, A. Giammona, M. Field and J. W. Brady, Carbohydr Res,

1988, 180, 207-211.

177. R. Palma, P. Zuccato, M. E. Himmel, G. Liang and J. W. Brady, in

Glycosyl hydrolases in biomass conversion, eds. M. E. Himmel, J. O.

Page 40: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

40 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

Baker and J. N. Saddler, American Chemical Society, Washington,

DC, 2001, vol. 769, pp. 112-130.

178. _, AMBER Home page, http://ambermd.org/, Accessed 2013 May 15.

179. D. A. Case, T. E. Cheatham, T. Darden, H. Gohlke, R. Luo, K. M.

Merz, Jr., A. Onufriev, C. Simmerling, B. Wang and R. Woods, J 5

Comput Chem, 2005, 26, 1668-1688.

180. K. N. Kirschner, A. B. Yongye, S. M. Tschampel, J. González-

Outeiriño, C. R. Daniels, B. L. Foley and R. J. Woods, J Comput

Chem, 2008, 29, 622-655.

181. R. J. Woods, R. A. Dwek, C. J. Edge and B. Fraser-Reid, J Phys 10

Chem, 1995, 99, 3832-3846.

182. A. P. Eichenberger, J. R. Allison, J. Dolenc, D. P. Geerke, B. A. C.

Horta, K. Meier, C. Oostenbrink, N. Schmid, D. Steiner, D. Wang

and W. F. van Gunsteren, J Chem Theory Comput, 2011, 7, 3379-

3390. 15

183. A. P. E. Kunz, J. R. Allison, D. P. Geerke, B. A. C. Horta, P. H.

Hünenberger, S. Riniker, N. Schmid and W. F. van Gunsteren, J

Comput Chem, 2012, 33, 340-353.

184. N. Schmid, C. D. Christ, M. Christen, A. P. Eichenberger and W. F.

van Gunsteren, Comput Phys Commun, 2012, 183, 890-903. 20

185. M. Christen, P. H. Hünenberger, D. Bakowies, R. Baron, R. Bürgi, D.

P. Geerke, T. N. Heinz, M. A. Kastenholz, V. Kräutler, C.

Oostenbrink, C. Peter, D. Trzesniak and W. F. van Gunsteren, J

Comput Chem, 2005, 26, 1719-1751.

186. S. A. H. Spieser, J. Albert van Kuik, L. M. J. Kroon-Batenburg and J. 25

Kroon, Carbohydr Res, 1999, 322, 264-273.

187. R. D. Lins and P. H. Hünenberger, J Comput Chem, 2005, 26, 1400-

1412.

188. W. Damm, A. Frontera, J. Tirado-Rives and W. L. Jorgensen, J

Comput Chem, 1997, 18, 1955-1970. 30

189. M. Kuttel, J. W. Brady and K. J. Naidoo, J Comput Chem, 2002, 23,

1236-1243.

190. S. J. Weiner, P. A. Kollman, D. T. Nguyen and D. A. Case, J Comput

Chem, 1986, 7, 230-252.

191. J. W. Ponder and D. A. Case, Adv Prot Chem, 2003, 66, 27-85. 35

192. _, Accelrys, Inc. InsightII, http://lms.chem.tamu.edu/insightII.html,

Accessed 2013 May 15.

193. S. W. Homans, Biochemistry, 1990, 29, 9110-9118.

194. R. Witter, U. Sternberg, S. Hesse, T. Kondo, F. T. Koch and A. S.

Ulrich, Macromolecules, 2006, 39, 6125-6132. 40

195. A. D. Becke, Phys Rev A, 1988, 38, 3098-3100.

196. C. Lee, W. Yang and R. G. Parr, Phys Rev B, 1988, 37, 785-789.

197. J. P. Perdew and Y. Wang, Phys Rev B, 1992, 45, 13244-13249.

198. F. Jensen, Introduction to computational chemistry, 2nd ed, John

Wiley & Sons Ltd., 2007. 45

199. W. Koch and M. C. Holthausen, A chemist's guide to density

functional theory, 2nd ed, John Wiley & Sons Ltd., 2001.

200. E. G. Lewars, Computational chemistry. Introduction to the theory

and applications of molecular and quantum mechanics, 2nd ed,

Springer Science+Business Media B.V., 2011. 50

201. M. Orio, D. A. Pantazis and F. Neese, Photosynth Res, 2009, 102,

443-453.

202. D. Sholl and J. A. Steckel, Density functional theory: a practical

introduction, Wiley-Interscience, 2009.

203. Y. Zhao and D. G. Truhlar, Acc Chem Res, 2008, 41, 157-167. 55

204. Y. Zhao and D. G. Truhlar, Theor Chem Acc, 2008, 120, 215-241.

205. M. E. Casida and M. Huix-Rotllant, Annu Rev Phys Chem, 2012, 63,

287-323.

206. M. A. Marques and E. K. Gross, Annu Rev Phys Chem, 2004, 55,

427-455. 60

207. R. Ditchfield, Mol Phys, 1974, 27, 789-807.

208. G. Schreckenbach and T. Ziegler, J Phys Chem, 1995, 99, 606-611.

209. W. Kutzelnigg, U. Fleischer and Schindler, in NMR basic principles

and progress, Springer Verlag, Berlin/Heidelberg, 1991, vol. 213, pp.

165-262. 65

210. A. E. Hansen and T. D. Bouman, J Chem Phys, 1985, 82, 5035-5047.

211. M. Schindler and W. Kutzelnigg, J Chem Phys, 1982, 76, 1919-1933.

212. V. G. Malkin, O. L. Malkina, M. E. Casida and D. R. Salahub, J Am

Chem Soc, 1994, 116, 5898-5908.

213. C. Bonhomme, C. Gervais, F. Babonneau, C. Coelho, F. Pourpoint, 70

T. Azaïs, S. E. Ashbrook, J. M. Griffin, J. R. Yates, F. Mauri and C.

J. Pickard, Chem Rev, 2012, 112, 5733-5779.

214. C. J. Pickard and F. Mauri, Phys Rev B, 2001, 63, 245101.

215. J. P. Perdew, K. Burke and M. Ernzerhof, Phys Rev Lett, 1996, 77,

3865-3868. 75

216. T. W. Keal and D. J. Tozer, J Chem Phys, 2004, 121, 5654-5660.

217. J. Kongsted, K. Aidas, K. V. Mikkelsen and S. P. A. Sauer, J Chem

Theor Comput, 2008, 4, 267-277.

218. K. Wolinski, J. F. Hinton and P. Pulay, J Am Chem Soc, 1990, 112,

8251-8260. 80

219. K. Friedrich, G. Seifert and G. Grossmann, Z Phys D, 1990, 17, 45-

46.

220. J. R. Cheeseman, G. W. Trucks, T. A. Keith and M. J. Frisch, J Chem

Phys, 1996, 104, 5497-5509.

221. J. H. Lii, B. Ma and N. L. Allinger, J Comp Chem, 1999, 20, 1593-85

1603.

222. C. Ochsenfeld, Chem Phys Lett, 2000, 327, 216-223.

223. G. E. Scuseria, J Phys Chem A, 1999, 103, 4782-4790.

224. C. Ochsenfeld, J. Kussmann and F. Koziol, Angew Chem Int Ed,

2004, 43, 4485-4589. 90

225. T. H. Sefzik, D. Turco, R. J. Iuliucci and J. C. Facelli, J Phys Chem

A, 2005, 109, 1180-1187.

226. M. Tafazzoli and M. Ghiasi, Carbohydr Polym, 2009, 78, 10-15.

227. T. Gregor, F. Mauri and R. Car, J Chem Phys, 1999, 111, 1815-1822.

228. T. Helgaker, S. Coriani, P. Jørgensen, K. Kristensen, J. Olsen and K. 95

Ruud, Chem Rev, 2012, 112, 543-631.

229. R. Abraham and M. Mobli, Modelling 1H NMR Spectra of Organic

Componds: Theory, Applications, and NMR Prediction Software,

Wiley, NY, 2008.

230. H. Lin and D. G. Truhlar, Theor Chem Acc, 2007, 11, 185-199. 100

231. A. Lodola, C. J. Woods and A. J. Mulholland, Ann Rep Comput

Chem, 2008, 4, 155-169.

232. H. M. Senn and W. Thiel, in Atomistic approaches in modern

biology, ed. M. Reiher, Springer, Berlin, 2007, vol. 268, pp. 173-290.

233. T. Vreven and K. Morokuma, Ann Rep Comput Chem, 2006, 2, 35-105

51.

234. M. Svensson, S. Humbel, R. D. J. Froese, T. Matsubara, S. Sieber

and K. Morokuma, J Phys Chem, 1996, 100, 19357-19363.

235. P. B. Karadakov, Annu Rep Prog Chem C, 2001, 97, 61-90.

Page 41: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 41

236. P. A. Belyakov and V. P. Ananikov, Russ Chem Bull Int Ed, 2011,

60, 2626.

237. P. B. Karadakov and K. Morokuma, Chem Phys Lett, 2000, 317, 589-

596.

238. T. Ishida, J Phys Chem B, 2010, 114, 3950-3964. 5

239. Q. Cui and M. Karplus, J Phys Chem B, 2000, 104, 3721-3743.

240. M. Hricovíni, J Phys Chem B, 2011, 115, 1503-1511.

241. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey and

M. L. Klein, J Chem Phys, 1983, 79, 926-935.

242. W. L. Jorgensen and J. Tirado-Rives, Proc Natl Acad Sci USA, 2005, 10

102, 6665-6670.

243. G. Barone, D. Duca, A. Silvestri, L. Gomez-Paloma, R. Riccio and

G. Bifulco, Chem Eur J, 2002, 8, 3240-3245.

244. M. Pavone, G. Brancato, G. Morelli and V. Barone, ChemPhysChem,

2006, 7, 148-156. 15

245. J. Gonzalez-Outeirino, K. N. Kirschner, S. Thobhani and R. J.

Woods, Can J Chem, 2006, 84, 569-579.

246. S. Miertus and J. Tomasi, Chem Phys, 1982, 65, 239-245.

247. M. Cossi, N. Rega, G. Scalmani and V. Barone, J Comput Chem,

2003, 24, 669-681. 20

248. V. Barone, M. Cossi and J. Tomasi, J Comput Chem, 1998, 19, 404-

417.

249. B. Mennucci, J. Tomasi, R. Cammi, J. R. Cheeseman, M. J. Frisch, F.

J. Devlin, S. Gabriel and P. J. Stephens, J Phys Chem A, 2002, 106,

6102-6113. 25

250. A. V. Marenich, C. J. Cramer and D. G. Truhlar, J Phys Chem B,

2009, 113, 6378-6396.

251. A. V. Marenich, R. M. Olson, C. P. Kelly, C. J. Cramer and D. G.

Truhlar, J Chem Theory Comput, 2007, 3, 2011-2033.

252. C. J. Cramer and D. G. Truhlar, Acc Chem Res, 2008, 41, 760-768. 30

253. A. Klamt and G. Schüürmann, J Chem Soc Perkin Trans 2, 1993,

799-805.

254. A. Bagno, F. Rastrelli and G. Saielli, J Org Chem, 2007, 72, 7373-

7381.

255. V. Sychrovský, B. Schneider, P. Hobza, L. Zídek and V. Sklenár, 35

Phys Chem Chem Phys, 2003, 5, 734-739.

256. M. S. Lee, F. R. Salsbury and M. A. Olson, J Comput Chem, 2004,

25, 1967-1978.

257. R. I. Maurer and C. A. Reynolds, J Comput Chem, 2004, 25, 627-

631. 40

258. J. Tomasi, B. Mennucci and R. Cammi, Chem Rev, 2005, 105, 2999-

3094.

259. J. C. Facelli, Prog Nucl Magn Reson Spectrosc, 2011, 58, 176-201.

260. M. U. Roslund, P. Taehtinen, M. Niemitz and R. Sjoeholm,

Carbohydr Res, 2008, 343, 101-112. 45

261. _, TURBOMOLE GmbH. Program Package for ab initio Electronic

Structure Calculations, http://www.turbomole-gmbh.com/, Accessed

2013 May 15.

262. R. Ahlrichs, M. Bär, M. Häser, H. Horn and C. Kölmel, Chem Phys

Lett, 1989, 162, 165-169. 50

263. G. A. Rickard, P. B. Karadakov, G. A. Webb and K. Morokuma, J

Phys Chem A, 2003, 107, 292-300.

264. T. Kupka, G. Pasterna, P. Lodowski and W. Szeja, Magn Reson

Chem, 1999, 37, 421-426.

265. S. Suzuki, F. Horii and H. Kurosu, J Mol Struct, 2009, 921, 219-226. 55

266. S. Khodaei, N. L. Hadipour and M. R. Kasaai, Carbohydr Res, 2007,

342, 2396-2403.

267. M. D. Esrafili, F. Elmi and N. L. Hadipour, J Phys Chem A, 2007,

111, 963-970.

268. E. Chelmecka, K. Pasterny, M. Gawlik-Jedrysiak, W. Szeja and R. 60

Wrzalik, J Mol Struct, 2007, 834-836, 498-507.

269. R. K. Raju, A. Ramraj, M. Vincent, I. Hillier and N. Burton, Phys

Chem Chem Phys, 2008, 10, 6500-6508.

270. K. Paradowska, T. Gubica, A. Temeriusz, M. K. Cyranski and I.

Wawer, Carbohydr Res, 2008, 343, 2299-2307. 65

271. _, CASTEP Home page, http://www.castep.org/, Accessed 2013 May

15.

272. S. J. Clark, M. D. Segall, C. J. Pickard, P. J. Hasnip, M. J. Probert, K.

Refson and M. C. Payne, Z Kristallogr, 2005, 220, 567-570.

273. M. D. Segall, P. J. D. Lindan, M. J. Probert, C. J. Pickard, P. J. 70

Hasnip, S. J. Clark and M. C. Payne, J Phys Condens Matter, 2002,

14, 2717-2744.

274. M. Kibalchenko, D. Lee, L. Shao, M. C. Payne, J. J. Titman and J. R.

Yates, Chem Phys Lett, 2010, 498, 270-276.

275. M. M. Reichvilser, C. Heinzl and P. Klufers, Carbohydr Res, 2010, 75

345, 498-502.

276. S. Taubert, H. Konschin and D. Sundholm, Phys Chem Chem Phys,

2005, 7, 2561-2569.

277. A. Bagno, F. Rastrelli and G. Saielli, Magn Reson Chem, 2008, 46,

518-534. 80

278. V. Sychrovsky, N. Muller, B. Schneider, V. Smrecki, V. Spirko, J.

Sponer and L. Trantirek, J Am Chem Soc, 2005, 127, 14663-14667.

279. R. B. Kasat, N. H. Wang and E. I. Franses, Biomacromolecules,

2007, 8, 1676-1685.

280. _, deMon. A software package for density functional theory (DFT) 85

calculations, http://www.demon-

software.com/public_html/program.html, Accessed 2013 May 15.

281. D. R. Salahub, J. Weber, A. Goursot, A. M. Köster and A. Vela, in

Theory and applications of the computational chemistry: the first 40

years, eds. C. E. Dykstra, G. Frenkling, K. S. Kim and G. Scuseria, 90

Elsevier Science, Amsterdam, 2005, pp. 1079-1097.

282. A. St-Amant and D. R. Salahub, Chem Phys Lett, 1990, 169, 387-

392.

283. M. Hricovíni, O. L. Malkina, L. Bizik, T. Nagy and V. G. Malkin, J

Phys Chem A, 1997, 101, 1756-1762. 95

284. O. L. Malkina, M. Hricovíni, F. Bízik and V. G. Malkin, J Phys

Chem A, 2001, 105, 9188-9195.

285. C. A. Stortz, J Comput Chem, 2005, 26, 471-483.

286. D. A. Navarro and C. A. Stortz, Carbohydr Res, 2008, 343, 2292-

2298. 100

287. Y. Nishida, H. Ohrui and H. Meguro, Tetrahedron Lett, 1984, 25,

1575-1578.

288. F. Horii, A. Hirai and R. Kitamaru, Polymer Bull, 1983, 10, 357-361.

289. M. Barfield and S. H. Yamamura, J Am Chem Soc, 1990, 112, 4747-

4758. 105

290. J. C. Corchado, M. L. Sánchez and M. A. Aguilar, J Am Chem Soc,

2004, 126, 7311-7319.

291. S. Grimme, J Comput Chem, 2004, 25, 1463-1473.

292. M. C. Fernandez-Alonso, F. J. Canada, J. Jimenez-Barbero and G.

Cuevas, J Am Chem Soc, 2005, 127, 7379-7386. 110

Page 42: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

42 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

293. A. G. Evdokimov, J. M. L. Martin and Kalb, J Phys Chem A, 104,

5291-5297.

294. _, Tripos. Sybyl, http://www.jprtechnologies.com.au/tripos/discovery-

informatics/sybyl/, Accessed 2013 May 15.

295. _, Wolfram. Mathematica, http://www.wolfram.com/mathematica/, 5

Accessed 2013 May 15.

296. T. Mori, E. Chikayama, Y. Tsuboi, N. Ishida, N. Shisa, Y. Noritake,

S. Moriya and J. Kikuchi, Carbohydr Polym, 2012, 90, 1197-1203.

297. G. Kresse and J. Furthmüller, Phys Rev B, 1996, 54, 11169-11186.

298. J. Kubicki, M.-A. Mohamed and H. Watts, Cellulose, 2013, 20, 9-23. 10

299. M. D. Esrafili and H. Ahmadin, Carbohydr Res, 2012, 347, 99-106.

300. _, NERSC. PARATEC code,

https://www.nersc.gov/users/software/applications/materials-

science/paratec/, Accessed 2013 May 15.

301. B. G. Pfrommer, J. Demmel and H. Simon, J Comp Phys, 1999, 150, 15

287-298.

302. J. R. Yates, T. N. Pham, C. J. Pickard, F. Mauri, A. M. Amado, A. M.

Gil and S. P. Brown, J Am Chem Soc, 2005, 127, 10216-10220.

303. R. Lefort, P. Bordat, A. Cesaro and M. Descamps, J Chem Phys,

2007, 126, 014510. 20

304. M. Dupuis, A. Marquez and E. R. Davidson, in Quantum Chemistry

Program Exchange (QCPE). Indiana University, Bloomington, IN

47405., HONDO edn.

305. L. Shao, J. R. Yates and J. J. Titman, J Phys Chem A, 2007, 111,

13126-13132. 25

306. S. Bekiroglu, A. Sandstrom, L. Kenne and C. Sandstrom, Org Biomol

Chem, 2004, 2, 200-205.

307. M. C. Jarvis, Carbohydr Res, 1994, 259, 311-318.

308. P. J. C. Smith and S. Arnott, Acta Cryst A, 1978, 34, 3-11.

309. C. Yamamoto and Y. Okamoto, Bull Chem Soc Jpn, 2004, 77, 227-30

257.

310. H. Le, J. G. Pearson, A. C. de Dios and E. Oldfield, J Am Chem Soc,

1995, 117, 3800-3807.

311. D. B. Chesnut and K. D. Moore, J Comput Chem, 1989, 10, 648-659.

312. P. Langan, Y. Nishiyama and H. Chanzy, Biomacromolecules, 2001, 35

2, 410-416.

313. Y. Nishiyama, P. Langan and H. Chanzy, J Am Chem Soc, 2002, 124,

9074-9082.

314. D. B. Chesnut, B. E. Rusiloski, K. D. Moore and D. A. Egolf, J

Comput Chem, 1993, 14, 1364-1375. 40

315. I. Ivarsson, C. Sandström, A. Sandström and L. Kenne, J Chem Soc

Perkin Trans 2, 2000, 2147-2152.

316. B. Coxon, in Adv Carbohydr Chem Biochem, Elsevier, 2009, vol. 62,

pp. 17-82.

317. N. Troullier and J. L. Martins, Phys Rev B, 1991, 43, 1993-2006. 45

318. K. Bock and H. Thøgersen, Annu Rep NMR Spectrosc, 1982, 13, 1-

57.

319. C. A. G. Haasnoot, F. A. A. M. de Leeuw and C. Altona,

Tetrahedron, 1980, 36, 2783-2792.

320. F. Cloran, I. Carmichael and A. S. Serianni, J Phys Chem A, 1999, 50

103, 3783-3795.

321. N. F. Ramsey, Phys Rev, 1953, 91, 303-307.

322. M. Pecul and J. Sadlej, in Computational chemistry: reviews of

current trends, ed. J. Leszczynski, World Scientific, 2003, vol. 8, pp.

131–160. 55

323. W. Deng, J. R. Cheeseman and M. J. Frisch, J Chem Theory Comput,

2006, 2, 1028-1037.

324. B. Bose, S. Zhao, R. Stenutz, F. Cloran, P. Bondo, G. Bondo, B.

Hertz, I. Carmichael and A. S. Seianni, J Am Chem Soc, 1998, 120,

11158-11173. 60

325. T. Helgaker, O. B. Lutnæs and M. Jaszuński, J Chem Theory

Comput, 2007, 3, 86-94.

326. T. Helgaker and M. Pecul, in Calculation of NMR and EPR

parameters: theory and applications, eds. M. Kaupp, M. Bühl and V.

G. Malkin, Wiley-VCH, Weinheim, 2004, p. 101. 65

327. F. Jensen, J Chem Theory Comput, 2006, 2, 1360-1369.

328. M. Karplus, J Am Chem Soc, 1959, 85, 2870-2871.

329. M. Hricovíni and F. Bízik, Carbohydr Res, 2007, 342, 779-783.

330. J. Angulo, P. M. Nieto and M. Martín-Lomas, Chem Commun, 2003,

1512-1513. 70

331. N. S. Gandhi and R. L. Mancera, Carbohydr Res, 2010, 345, 689-

695.

332. S. B. Engelsen and S. Perez, J Phys Chem B, 2000, 104, 9301-9311.

333. M. Tafazzoli and M. Ghiasi, Carbohydr Res, 2007, 342, 2086-2096.

334. M. Tafazzoli and M. Ghiasi, J Mol Struct, 2007, 814, 127-130. 75

335. R. Stenutz, I. Carmichael, G. Widmalm and A. S. Serianni, J Org

Chem, 2002, 67, 949-958.

336. M. Tafazzoli, M. Ghiasi and M. Moridi, Spectrochimica Acta Part A,

2008, 70, 350-357.

337. F. Cloran, I. Carmichael and A. S. Serianni, J Am Chem Soc, 2001, 80

123, 4781-4791.

338. F. Cloran, Y. Zhu, J. Osborn, I. Carmichael and A. S. Serianni, J Am

Chem Soc, 2000, 122, 6435-6448.

339. E. Kraka, J. Grafenstein, J. Gauss, F. Reichel, L. Olsson, Z. Konkoli

and D. Cremer, Goteborg University, Goteborg, Sweden., Program 85

package COLOGNE 99. edn., 1999.

340. P. Bour, I. Raich, J. Kaminsky, R. Hrabal, J. Cejka and V.

Sychrovsky, J Phys Chem A, 2004, 108, 6365-6372.

341. M. Mobli and A. Almond, Org Biomol Chem, 2007, 5, 2243-2251.

342. Y. Zhao, N. E. Schultz and D. G. Truhlar, J Chem Theory Comput, 90

2006, 2, 364-382.

343. A. S. Serianni, J. Wu and I. Carmichael, J Am Chem Soc, 1995, 117,

8645-8650.

344. I. Tvaroška, K. Mazeau, M. Blanc-muesser, S. Lavaitte, H. Driguez

and F. R. Taravel, Carbohydr Res, 1992, 229, 225-231. 95

345. K. Bock and C. Pedersen, Carbohydr Res, 1979, 71, 319-321.

346. T. J. Church, I. Carmichael and A. S. Serianni, J Am Chem Soc,

1997, 119, 8946-8964.

347. I. Carmichael, J Phys Chem, 1993, 97, 1789-1792.

348. T. Bandyopadhyay, J. Wu, W. A. Stripe, I. Carmichael and A. S. 100

Serianni, J Am Chem Soc, 1997, 119, 1737-1744.

349. V. Sychrovsky, J. Gräfenstein and D. Cremer, J Chem Phys, 2000,

113, 3530-3547.

350. T. Helgaker, M. Jaszuński, K. Ruud and A. Górska, Theor Chem Acc,

1998, 99, 175-182. 105

351. O. B. Lutnæs, T. A. Ruden and T. Helgaker, Magn Reson Chem,

2004, 42, S117-S127.

352. C. A. Bush, M. Martin-Pastor and A. Imberty, Annu Rev Biophys

Biomol Struct, 1999, 28, 269-293.

Page 43: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

This journal is © The Royal Society of Chemistry 2013 Chemical Society Reviews, 2013, 0, 00–00 | 43

353. I. Tvaroška, M. Hricovíni and E. Perakova, Carbohydr Res, 1989,

189, 359-362.

354. N. W. Cheetham, P. Dasgupta and G. E. Ball, Carbohydr Res, 2003,

338, 955-962.

355. F. Cloran, I. Carmichael and A. S. Serianni, J Am Chem Soc, 2000, 5

122, 396-397.

356. V. Sychrovsky, Z. Vokacova, J. Sponer, N. Spackova and B.

Schneider, J Phys Chem B, 2006, 110, 22894-22902.

357. M. L. Munzarová and V. Sklenár, J Am Chem Soc, 2003, 125, 3649-

3658. 10

358. B. Schneider, Z. Morávek and H. M. Berman, Nucleic Acids Res,

2004, 32, 1666-1677.

359. B. Schneider, S. Neidle and H. M. Berman, Biopolymers, 1997, 42,

113-124.

360. R. B. Best, G. E. Jackson and K. J. Naidoo, J Phys Chem, 2001, 105, 15

4742-4751.

361. R. B. Best, G. E. Jackson and K. J. Naidoo, J Phys Chem, 2002, 106,

5091-5098.

362. S. Ilin, C. Bosques, C. Turner and H. Schwalbe, Angew Chem Int Ed

Engl, 2003, 42, 1394-1397. 20

363. S. Ravindranathan, C. H. Kim and G. Bodenhausen, J Biomol NMR,

2003, 27, 365-375.

364. E. Duchardt, C. Richter, O. Ohlenschlager, M. Gorlach, J. Wohnert

and H. Schwalbe, J Am Chem Soc, 2004, 126, 1962-1970.

365. S. Letardi, G. La Penna, E. Chiessi, A. Perico and A. Cesàro, 25

Macromolecules, 2002, 35, 286-300.

366. S. Furlan, G. La Penna, A. Perico and A. Cesaro, Macromolecules,

2004, 37, 6197-6209.

367. S. Furlan, G. La Penna, A. Perico and A. Cesaro, Carbohydr Res,

2005, 340, 959-970. 30

368. M. Zerbetto, D. Kotsyubynskyy, J. Kowalewski, G. Widmalm and A.

Polimeno, J Phys Chem B, 2012, 116, 13159-13171.

369. D. Kotsyubynskyy, M. Zerbetto, M. Soltesova, O. Engström, R.

Pendrill, J. Kowalewski, G. Widmalm and A. Polimeno, J Phys

Chem B, 2012, 116, 14541-14555. 35

370. A. G. Gerbst, N. E. Ustuzhanina, A. A. Grachev, N. S. Zlotina, E. A.

Khatuntseva, D. E. Tsvetkov, A. S. Shashkov, A. I. Usov and N. E.

Nifantiev, J Carbohydr Chem, 2002, 21, 313-324.

371. R. E. N. Shirmer, J.H.; Davis, J.P.; Hart, P.A., J Am Chem Soc, 1970,

92, 3266-3273. 40

372. A. G. Gerbst, N. E. Ustuzhanina, A. A. Grachev, E. A. Khatuntseva,

D. E. Tsvetkov, A. S. Shashkov, A. I. Usov, M. E. Preobrazhenskaya,

N. A. Ushakova and N. E. Nifantiev, J Carbohydr Chem, 2003, 22,

109-122.

373. D. A. Cumming and J. P. Carver, Biochemistry, 1987, 26, 6664-6676. 45

374. U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L.

Pedersen, J Chem Phys, 1995, 103, 8577-8593.

375. _, MestreLab Research. Mspin,

http://mestrelab.com/software/mspin/, Accessed 2013 May 15.

376. D. Neuhaus and M. P. Williamson, The nuclear Overhauser effect in 50

structural and conformational analysis, VCH Publishers, New York,

NY, 1989.

377. S. A. Smith, T. O. Levante, B. H. Meier and R. R. Ernst, J Magn

Reson A, 1994, 106, 75-105.

378. C. D. Blundell, M. A. Reed and A. Almond, Carbohydr Res, 2006, 55

341, 2803-2815.

379. H. O. Kalinowski, S. Berger and S. Braun, Carbon-13 NMR

spectroscopy, John Wiley & Sons Ltd., 1988.

380. D. N. Laikov and Y. A. Ustynyuk, Russ Chem Bull Int Ed, 2005, 54,

820-826. 60

381. _, ACD/Labs. ACD/NMR predictors,

http://www.acdlabs.com/products/adh/nmr/nmr_pred/, Accessed

2013 May 15.

65

Page 44: Simulation of NMR observables of carbohydrates - … · Simulation of NMR observables of carbohydrates ... plays a key role in primary structural elucidation of new ... and hydroxymethyl

44 | Chemical Society Reviews, 2013, 0, 00–00 This journal is © The Royal Society of Chemistry 2013

Philip Toukach (Ph.D. 2001, associate professor rank 2010) has been Senior Scientist at Zelinsky Institute of Organic Chemistry (since 2005), International Scientist of the Year (2004), Guest Scientist at Borstel Biochemical Research Center (2005-2007) and German Cancer Research center (2008-2011), Associate Professor at Moscow Academy of Fine Chemical Technology (since 2008). His major scientific interests are carbohydrate databases and NMR-based carbohydrate structure prediction. Further information can be found at his web-site http://toukach.ru/nmr.htm

Valentine Ananikov (Ph.D. 1999; Habilitation 2003) was appointed Professor and Laboratory Head at Zelinsky Institute of Organic Chemistry (2005), Elected Member of Russian Academy of Sciences (2008) and Professor of Chemistry Department of Moscow State University (2012). He was a recipient of the Russian State Prize for Outstanding Achievements in Science and Technology (2004), an Award of the Science Support Foundation (2005), a Medal of the Russian Academy of Sciences (2000), Liebig Lecturer by German Chemical Society (2010), and Balandin Prize (2010). International Advisory Boards membership: Advanced Synthesis & Catalysis, Organometallics and Chemistry An Asian Journal.