[Interdisciplinary Applied Mathematics] Molecular Modeling and Simulation: An Interdisciplinary...

32
7 Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure Chapter 7 Notation SYMBOL DEFINITION α polynucleotide backbone torsion angle γ polynucleotide backbone torsion angle χ glycosyl (sugar/base) torsion angle Anything that DNA can do, RNA can do better. Susan Gottesman, Cold Spring Harbor symposium, 2006. 7.1 Introduction This chapter builds upon nucleic acid concepts introduced in the prior two chapters to include a description of alternative hydrogen bonding schemes in nucleic acids, non-canonical helical and hybrid structures, DNA mimics, over- stretched and understretched DNA, and RNA structure and folding, including secondary and tertiary-structure RNA modeling. Besides the excellent texts [139, 195, 1080, 1191], the reader is invited to ex- plore the wealth of related structural data on the nucleic acid database NDB T. Schlick, Molecular Modeling and Simulation: An Interdisciplinary Guide, 205 Interdisciplinary Applied Mathematics 21, DOI 10.1007/978-1-4419-6351-2 7, c Springer Science+Business Media, LLC 2010

Transcript of [Interdisciplinary Applied Mathematics] Molecular Modeling and Simulation: An Interdisciplinary...

7Topics in Nucleic Acids Structure:Noncanonical Helices and RNAStructure

Chapter 7 Notation

SYMBOL DEFINITION

α polynucleotide backbone torsion angleγ polynucleotide backbone torsion angleχ glycosyl (sugar/base) torsion angle

Anything that DNA can do, RNA can do better.

Susan Gottesman, Cold Spring Harbor symposium, 2006.

7.1 Introduction

This chapter builds upon nucleic acid concepts introduced in the prior twochapters to include a description of alternative hydrogen bonding schemes innucleic acids, non-canonical helical and hybrid structures, DNA mimics, over-stretched and understretched DNA, and RNA structure and folding, includingsecondary and tertiary-structure RNA modeling.

Besides the excellent texts [139, 195, 1080, 1191], the reader is invited to ex-plore the wealth of related structural data on the nucleic acid database NDB

T. Schlick, Molecular Modeling and Simulation: An Interdisciplinary Guide, 205Interdisciplinary Applied Mathematics 21, DOI 10.1007/978-1-4419-6351-2 7,c© Springer Science+Business Media, LLC 2010

206 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

(ndbserver.rutgers.edu/) and the protein data bank/research collaboratory forstructural bioinformatics (PDB/RCS) resource [128,1365], as well as informationconcerning RNA on the RNA Ontology Consortium (ROC) [735].

As in the previous chapter, we abbreviate base pair and base pairs as bp andbps, respectively.

7.2 Variations on a Theme

As evident from the previous chapter, nucleic acids are extremely versatile. Theclassic right-handed double helical structure described by Watson and Crick isan excellent textbook model. In Nature, however, polynucleotide helices havedeveloped an enormous repertoire of sequence-dependent structures, helicalarrangements, and folding motifs on both the bp and higher level of folding.

7.2.1 Hydrogen Bonding Patterns in Polynucleotides

Hydrogen-bonding variations are important for DNA’s adaptability to base mod-ifications (e.g., methylation, drug or carcinogen binding), bp mutations andmismatches, and various interactions between DNA and proteins and amongpolynucleotide helices.

Classic Watson-Crick (WC)

The classic WC hydrogen-bonding arrangement as shown in Figure 5.3 ofChapter 5 is a particularly beautiful arrangement. It can be appreciated by not-ing that the C1′ (pyrimidine) to C1′ (purine) distance is around 10.6 A in both theAT and GC pairs, in good preparation for helix stacking. The intrinsic energy ofhydrogen bonds — that is, relative to vacuum — is generally in the 3–7 kcal/molrange but only up to 3 kcal/mol in nucleic acids due to geometric constraints.1

Though much weaker than covalent bonds (80–100 kcal/mol) [1191, p. 12–14],the cumulative effect in the double helix is substantial due to the collective im-pact of hydrogen-bonding interactions resulting from cooperative effects [1018].Based on theoretical studies, the strength of the hydrogen-bond energy dependson base composition but is very similar for the two WC bps. The energy of bpstacking, 4–15 kcal/mol, substantially contributes to overall helix stability anddepends on the nucleotide sequence [169].

Yet, hydrogen-bonding patterns in polynucleotides are extremely versatile.There are several nitrogens and oxygens on the bases, allowing various donor andacceptor combinations involving different interface portions of the aromatic rings.

1Estimates of base-pairing energetics in solvated nucleic acids come mainly from theoretical cal-culations [1225]. This is because the resolution of macromolecular thermodynamic measurements intosubcontributions (hydrogen bonding, stacking, and electrostatic forces) remains a challenge, despitenumerous thermodynamic studies on nucleic acid systems [169, 449, 1007].

7.2. Variations on a Theme 207

To appreciate this versatility, we first examine the classic WC pattern in detail.For an AT pair, two hydrogen bonds form between the C4–N3 face of T

and the C6=N1 face of A (see Figure 5.3 of Chapter 5 and Figure 7.1 here).One hydrogen bond involves the sequence O4 (T)· · ·H–N6 (A), i.e., betweenthe thymine carbonyl oxygen O4 attached to ring atom C4 and the adenine N6atom, which is attached to the ring C6 atom. The other hydrogen bond is N3–H(T)· · ·N1 (A), between the thymine ring nitrogen N3 and the adenine N1.

For the GC pair, three hydrogen bonds form between the C4=N3–C2 face ofC and the C6–N1–C2 face of G: N4–H (C)· · ·O6 (G), N3 (C)· · ·H–N1 (G), andO2 (C)· · ·H–N2 (G). All hydrogen-bond lengths in both the AT and GC bps arevery similar, with individual lengths ranging from about 2.85 to 2.95 A betweenthe heavy atoms.

Reverse WC

Reverse WC base-pairing schemes can result when one nucleotide rotates 180o

with respect to its partner nucleotide, as shown in Figure 7.1. Thus, one hydrogenbond from the original WC scheme might be retained, but another can involvea different atom combination. This flip of one bp changes the position of theglycosyl linkage. Hence, an AT pair can more easily accommodate this flip due tonear symmetry about the N3–C6 axis of T. That is, since the ring face C4–N3–C2of T has carbonyl attachments at both C4 and C2, the hydrogen bonding face canbe changed from C4–N3 to C2–N3 (see Figure 7.1). This type of reverse WC basepairing exists in parallel DNA [1080].

Hoogsteen

Hoogsteen bps utilize a different part of the aromatic ring for hydrogen bonding.Specifically, the N7–C5–C6 face of A and G, where N7 is in the 5-memberedring and C6 is in the 6-membered ring of the purine, is used in the Hoogsteenarrangement (see Figure 7.1). This is instead of the C6=N1 face of A and G —both atoms of which are in the 6-membered ring of the fused-ring compound —in combination with the C4–N3 face of T and C — as in WC interactions.

The Hoogsteen arrangement also implies a change in the glycosyl torsion angleχ, from anti to syn, and a shortening of the C1′–C1′ distance, from about 10.5 Ain WC to about 8.5 A in Hoogsteen base pairs (e.g., [12]).

Hoogsteen base pairing helps stabilize structures of highly bent DNA (e.g.,TATA elements [971]), tRNAs, and DNA triplexes where two strands are heldby WC hydrogen bonds and where two strands are stabilized by Hoogsteen-typepairing (see Figure 7.1). They also appear occasionally in complexes of DNA withanti-cancer drugs. Recently, an unusual Hoogsteen A–T bp was discovered in ahigh-resolution complex of unbent DNA bound to 4 MATα2 homeodomains (seeFigure 7.2) [12, 13].

208 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

N

N

O4N

N

N

N

H

N6H

T A

N

N

O2

N

N

N

NN6H

H

N

N

O2 N

NN

N

N6

H

H

N

N

N

NN6

N

N

O4 H

3

45

62

Sugar

H

1

AT

1 2

3

45

6 T

Watson-Crick

6 5

432

1

Reverse Watson-Crick

789A

5

6

1 2

Hoogsteen

Reverse Hoogsteen

4

3

8

7

9

56

4 3

21

AT

5 4

3

21

6 7

98

4

516

2

3

H3C

H

O2

HH

Sugar

Sugar

H

H3C O4

H

Sugar

H

H

H3C

H

O2Sugar

Sugar

H

H

H

Sugar

H

H3C O4

H

H

Sugar

H

H

. . .

. . .

. . .

. . .

. . .

. . .

Triplex

N N

N

N

N6

N

N

H

879

5431

6

2

H3C

A

T

H

H

H

H

H

O22

4

6

5 3

1

N

N

O2H3C T

H

H

42

6 1

35

SugarO4

O4

Sugar

N

NN

N

O6

GH

NH

H

N

N

NN

O6 G

H

N

H

H

N

NN

N

O6

GH

NH

H

N

N

N N

O6G

H

N

H

H

Guanine Quartet

87

95

43

1

6

2

. . .

. . .

. . .. . .

Sugar

Sugar

Sugar

Sugar

16

3

5

24

78

9

1

6 35

2

4

78

9

16

3

5

2 4

78

9

1

635

2

4

78

9

. .

.

. . .

. .

.. . .

. . .

. . .

. . .

. . .

Sugar

WC

Hoog

Figure 7.1. Various hydrogen-bonding schemes: (left) classic Watson-Crick, reverseWatson-Crick, Hoogsteen, and reverse Hoogsteen in base pairs; (right) WC/Hoogsteenin a triplex (from an H-DNA structure, which forms in sequences that have stretches ofpurine followed by stretches of pyrimidines, PDB code 1B4Y) [1290], and patterns in aguanine-quartet quadruplex (or guanine tetraplex, PDB code 1JB7) [569].

Reverse Hoogsteen

Similarly, a reverse Hoogsteen arrangement involves the flipping of one nucleo-tide by 180o with respect to its partner, by analogy to reverse WC, as shown inFigure 7.1.

7.2. Variations on a Theme 209

F

D

EC

G42 A1

C21

T22T22

C GG CC GA TC GA TT AT AT AA TC GT AT AA TA TT AG CT AA TC GA

Hoogsteenbase pair

1

21

42

Figure 7.2. Unusual DNA/protein complex in which a Hoogsteen A–T base pair is foundin unbent DNA. The complex is a high-resolution crystal of four MATα2 homeodomainsbound to DNA [12, 13].

Mismatches and Wobbles

Other bp arrangements involve anti/syn bps between two purines (a mismatch),in which one base adopts an atypical syn glycosyl configuration. This type ofpairing can occur due to protonation of one base (favored at acidic conditions) orketo-enol tautomerizations. (See introduction to the term in Chapter 1, subsectionon the discovery of DNA structure, and Chapter 5, subsection on nitrogenousbases).

Such ionized bases reverse the polarity of the hydrogen-bondingsites and hencelead to various hydrogen-bonding schemes, such as between G+·C (Hoogsteen)or the ionized mispair A+·C (wobble). Wobble pairs are formed when one base

210 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

is shifted in the bp plane relative to its partner due to steric misalignment of thebases. Experimental evidence for bp wobbling has come from tRNA structuralanalysis.

Thus, various non-WC base-pairing schemes can occur when bases are chemi-cally modified, found in the rare enol and imino tautomeric forms, or mismatched,for example. The alternative arrangements for mispaired bases, in particular, canin turn lead to mutations upon DNA replication if not corrected by the elaboraterepair machinery of the cell.

Other Patterns

For further details regarding these and many other base-pairing schemes in nu-cleic acids, see [1191] and [737] for an early encyclopedic compendium ofthe “bewildering” variety of observed canonical and non-canonical nucleic acidarrangements, as compiled from RNA base-pairing interactions. This varietyled to many further works and a proposal for new nomenclature for RNAs[738] which stimulated a large international initiative termed the RNA Ontol-ogy Consortium (ROC) [735]. ROC aims to establish a standard vocabulary forstudying RNA by creating a common system to describe RNA sequences, 2D, and3D structures and by developing software tools that merge sequence and structuraldatabases for RNA for use by the general scientific community. See ROC website(http://roc.bgsu.edu/) for details.

In general, non WC bps are generally lower in stability because they typicallyincur a greater distortion in the sugar/phosphate backbone (assuming that the restof the DNA is in a canonical A, B, Z-type conformation). Though they are moredifficult to accommodate in general, many local alterations in sequence compo-sition, environmental factors, and conformational patterns favor these alternativebase-pairing arrangements.

7.2.2 Hybrid Helical/Nonhelical Forms

Alternative Helical Geometries

Numerous variations are noted in the helical structures of polynucleotides (see[139, 893, 1080], for example). For example, C-DNA, D-DNA, and T-DNAhelices have been defined with 9.3, 8.5, and 8.0 bps/turn, respectively. TheC-DNA motif forms in fibers of low humidity (57–66%), D-DNA is observedin poly(dA)·poly(dT) strands, and T-DNA has been noted in bacteriophages thatcontain modified C residues and glucose derivatives for the sugar attachments.

Within polynucleotide structures, local distortions can also produce hairpins(DNA and RNA strands that fold back on themselves), cruciforms (intrastrandhydrogen bonds between complementary bases, often leaving a few unpairedbases at the hairpin tip), inlcuding bulges and loops (unpaired extrusions)(Figure 7.4). Such motifs are especially important in stabilizing the folding ofsingle-stranded RNA.

7.2. Variations on a Theme 211

DNA Triplexes and Quadruplexes

Though Linus Pauling was ridiculed for his early suggestion of a triple helixstructure for DNA [975], we now recognize the existence or the possibility for for-mation of various DNA triplexes (involving WC and Hoogsteen bps) [421,1008].Other existing or possible forms are quadruplexes (stabilized by hydrogen bond-ing and cations as in guanine-quartets; see Figure 7.1) [961, 968, 998, 1160, 1183,1319, 1320, 1337]; parallel DNA (requiring reverse WC base pairing); DNA ana-logues; and hybrids of RNA, DNA, and other polymers such as oligonucleotidemimics (see below) [712].

Triplex forming oligonucleotides are promising pharmaceutical targets sincethey modulate gene activity in vivo through recognition of duplex DNA [905].Triplexes and other unusual oligonucleotides are good subjects for simulationtechniques since the factors that govern sequence-dependent properties can beexplored [784].

Triple-helical DNA gained wide attention in the mid-1980s when variousgroups demonstrated that homopyrimidine (polynucleotide chains containingpyrimidines, such as poly(dT) or poly(dCT)) and some purine-rich oligomers(such as poly(dA) or poly(dAG)) can form stable and sequence-specific triple-stranded complexes with corresponding sites on duplex DNA; the third strandassociates with the duplex via Hoogsteen pairing (see Figure 7.1) [421].Homopurine/homopyrimidine stretches in supercoiled DNA were also found toadopt an unusual structure which includes a triplex called H-DNA.

Four-stranded DNA structures called G-quadruplexes are believed to play im-portant functional roles in genome maintenance through their stabilization ofchromosome ends or telomeres. Likened to plastic caps firming shoe-lace ends(by Stu Borman, C&EN writer), telomeres are protein/DNA assemblies that con-fine chromosome ends. Such protection from the possible fusing of chromosomesinto one another helps maintain the integrity of the genome in living cells.

Telomere structure is of great biomedical importance since aging and cancercan be associated with malfunctioning telomeres. Thus, drugs targeted totelomeres can affect telomere function in DNA synthesis and transcription.

Telomeric DNA contains short guanine-rich repeats. At the very end of telom-eres, G-quadruplexes are found, elaborately folded stacks of G-quartets (alsoknown as GGGG tetrads) — the planar arrays of hydrogen-bonded guanines asshown in Figure 7.1.

Direct evidence that quadruplexes exist in human cells was provided by Hurleyand coworkers [1183], who reported a chair-shaped quadruplex in a purine-richstrand of DNA in a promoter region of the oncogene c-Myc activated in cancercells. Their work also highlights the potential role of quadruplex-targeted com-pounds as agents to combat disease, for example through control of oncogeneexpression to induce tumor shrinkage [601].

The architecture of G-quartets in G-quadruplexes remains an area of intensestudy. The planar G-quartets, as shown in Figure 7.1, can stack vertically withmonovalent ions (Na+ or K+) sandwiched in between the layers. Wang and

212 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

O

O

P OO

OO

Base O

O

P OS

OO

Base

O

P OO

O

O BaseBaseBase

DNAPhosphate changed(phosphorothioate)

Sugar changed(homo-DNA)

Linkage changed(3'-thioformacetal)

Backbone changed(PNA)

O

S

C

O

CO

Base

Base

O Base NH

N

NH

N

Base

Base

OO

O O

Figure 7.3. Various oligonucleotide analogues that involve modifications of the DNAphosphate unit, sugar, entire phosphodiester link, or the entire sugar/phosphate backbone.

Patel [1337] revealed in 1993 an NMR structure of a G-quadruplex in sodiumsolution. The sequence studied, with three repeats of TTAGGG, has a topol-ogy of G-quartets connected by loops (the TTA sequences that join each GGGseries to the next) at the top and bottom of the quadruplex rather than thesides (see original article for figures). An exciting 2002 crystal structure by theNeidle group [961] of the same single-stranded sequence motif (specifically,TTAGGG2 and TTAGGG4) in the presence of potassium ions reveals a radi-cally different arrangement, with propeller-like connections through the sides ofthe stack (double-chain reversal topology [968]) rather than top and bottom (orone diagonal and two edgewise loops [968]).

Since mammalian cells are likely to have quadruplexes coordinated by potas-sium ions (since K+ concentration exceeds that of Na+ by an order of magnitude),this propeller-form quadruplex structure is of great importance and has clinicalimplications as well [961]. The topologies of quadruplexes and their interactionswith other DNA, RNA, and proteins continue to define an active research fron-tier with significant potential for biomedical applications, for example in cancertherapy. See [1319, 1320] for recent computational studies.

DNA Mimics

A growing number of DNA analogues (or ‘DNA mimics’) is also appearing (seeFigure 7.3) [905]. Analogues can be sought in several ways (see Figure 7.3 andBox 7.1):

• by modifying the phosphate backbone segment (e.g., one P–O link changedto P–S or P–CH3);

• by modifying the sugar (e.g., 6-membered instead of 5-membered ring);

• by modifying the entire phosphodiester linkage of 4 atoms, O3′ throughC5′ (to increase DNA hybrid stability through charge neutrality); or

• by replacing the entire sugar/phosphate backbone, retaining just the basesas in PNA, peptide (or polyamide) nucleic acids.

7.2. Variations on a Theme 213

Such close analogues of oligonucleotides are designed to bind selectively to DNAbps through triple strands or duplexes. Various duplex DNA/RNA hybrids arekey elements in transcription and replication and thus are potential therapeuticagents in anti-sense techniques. Triplex structures are also potential agents forsequence-specific cutting of DNA and drugs (by attacking various disease targetsat the genetic level). Non-complementary or ‘anti-sense’ binding to a sequence ofcomplementary messenger RNA or single-stranded DNA can also be exploited inantisense technology to suppress gene expression.

To be suitable, DNA analogues must also be biologically stable (i.e., resistantto nuclease digestion) and have favorable cellular absorption properties. All issuesof geometry, flexibility, hydrophobicity, and triplex stability must be weighed indesigning DNA mimics with practical utility.

Box 7.1: PNA (Peptide Nucleic Acid)

PNA is an especially interesting oligonucleotide analogue that forms very stable com-plexes with double-stranded DNA [228, 350, 351, 905, 907, 1161]. In studies of evolution,PNA has been proposed as a candidate for the backbone of the first genetic materialthat predates the “RNA world”. While RNA is unstable and difficult to synthesize, thepolymeric constituent of PNA, N-(2-aminoethyl)glycine (or AEG), may spontaneouslypolymerize and is accessible in prebiotic syntheses. No firm evidence, however, existsregarding this hypothesis [662].

The sugar/phosphate backbone of oligonucleotides is replaced in PNA by a pseudopep-tide chain of AEG units with the N-acetic acids of the bases (N9 for purines and N1 forpyrimidines) linked via amide bonds (Figure 7.3). This substitution of the DNA backbonemakes the PNA backbone electrically neutral rather than negatively charged; though thesame number of chemical bonds as found in DNA and RNA exist in PNA (albeit differenttypes), the replacement of the sugar ring by a linear bond sequence introduces additionaltorsional flexibility.

Stable hybrids between PNA and complementary DNA or RNA oligonucleotides formin a sequence-selective manner. Binding of PNA to double-stranded DNA leads not onlyto triple helices but also to P-loops strand-displacement complexes [905]. Model building,molecular mechanics and dynamics calculations, as well as experimental studies haveshown how the replacement of DNA by PNA disrupts the canonical DNA triple helix, bydisplacing the hydrogen-bonded bases away from the global helical axis with respect totheir positions in B-DNA helices, and changes the groove structure substantially [1216].This makes PNA/DNA hybrids (involving both single and double helical DNA) A-likerather than an intermediate between A and B-DNA forms, as is typical for canonical DNAtriplexes. Helix stabilization in PNA/DNA hybrids stems from favorable base pairing,stacking, and (reduced) electrostatic interactions.

PNA molecules have already found practical applications as probes — an alternativeto conventional DNA probes — for the detection of bacteria like Salmonella enterica orStaphylococcus aureus; chemical advantages stem from PNA’s strong resistance to degra-dation by proteases and nucleases in the cell and its salt-independent hybridization kinetics.

214 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

Triplexes and DNA analogues like PNA also have potential as drugs in gene therapy [905].In particular, PNA is quite attractive for drug development, given its high triplex stability.However, given the A-like form of PNA/DNA hybrids, the design of stable B-like hybridsrequires a different chemical approach than that used to construct PNA.

7.2.3 Unusual Forms: Overstretched and Understretched DNA

Single-Molecule Manipulations

Recently, a new experimental achievement of single-molecule observation andmanipulation experiments using laser tweezers or single glass fibers [187, 1196]has been applied to DNA [682,1049], re-invigorating interest in DNA’s structuralversatility [120,182,188,250,990,1203,1233]. Such experiments can also be usedto determine DNA force constants, such as torsional rigidity [182]. It has beenfound that DNA subjected to previously unattainable forces of 10–150 picoNew-tons (pN) displays a highly cooperative, sharp, and reversible transition to a newDNA structure at around 70 pN. This new overstretched helical structure, termedthe S-DNA ladder, has been suggested to have a rise per residue of around 5.8 A(compare to values in Table 5.3 of Chapter 5) and a notable inclination (possiblyas large as 70o) [671], but structural details remain controversial. The transitionunder a force of extension is also affected by changes in the ionic strength of themedium, sequence, and addition of intercalators. Interestingly, DNA can sustainextensions to roughly twice its original length without severe distortions to thebase pairing.

Modeling studies soon followed these experiments to investigate global con-formational features of overstretched DNA duplexes [671, 711] and the localmorphological features associated with the deformation process [678] (seeBox 7.2).

Single-force spectroscopy has also been applied to study the mechanical prop-erties of nucleosome arrays of the chromatin fiber (see Section 6.5), to suggestchromatin organization and to examine the effect of linker histones on fibercompaction [682].

Similar experiments have also been applied to RNA [759, 771, 1452] andproteins [1453].

Biological Relevance and Other Applications

The extreme deformations of DNA are of great interest because flexible DNAmolecules undergo a variety of distortions in biological environments. Notableexamples occur during DNA recombination and cell division. It has been sug-gested, for example, that a helix to ladder transition in DNA near the chromosomalcentromere region may occur during cell division and thus play a regulatoryrole [671]. Intriguingly, the overstretched DNA in recombination filaments of5.1 A per bp is close to that associated with the phase transition observed insingle-molecular micro-manipulation experiments.

7.2. Variations on a Theme 215

It has also been suggested that longitudinal deformations of DNA might beassociated with many DNA/protein binding events, such as DNA binding to theTATA-box binding protein TBP (where DNA is locally compressed, strongly bent,and unwound) and DNA binding to nucleosomes [1050], where variable lengthsof DNA associated with nucleosome wrapping might accommodate sequence andionic variations [678].

More generally, single-molecule biochemistry experiments help investigate theenergetics and dynamics of folding and unfolding, the stability of various 2D and3D intermediates, conformational landscapes, and folding pathways.

Box 7.2: Modeling Overstretched DNA

Results of modeling studies of overstretched DNA — by energy minimization or molec-ular dynamics — are protocol dependent. For example, results and interpretations dependon which end of the DNA is pulled, what minimization method is used, how minimizationis implemented (e.g., how fine the helical rise increments are), what force field is em-ployed, and how solvent is treated.

The minimization work of Lavery and coworkers [711] suggested that the stretched con-formation may be a flat, unwound duplex or a narrow fiber with substantial bp inclination[711]. The molecular dynamics simulations of Konrad and Bolonick [671] reproduced thehelix to ladder transition and analyzed the geometric and energetic properties stabilizingthe S-ladder. The constrained minimization studies of Olson, Zhurkin and coworkers [678]produced a wealth of structural analyses for both compressed and stretched DNA duplexes(from 2 to 7 A per bp) of poly(dA)·poly(dT) and poly(dG)·poly(dC) homopolymers underhigh and low salt conditions. It was found that DNA can stretch to about double, andcompress to half, its length before the internal energy rises sharply. Energy profiles span-ning four families of right-handed structures revealed that DNA extension/compressiondeformations can be related to concerted changes in rise, twist, roll, and slide parameters.The lowest energy configurations correspond to canonical A and B-DNA. These modelsmay be relevant to nucleoprotein filaments between bacterial Rec-A-like proteins andoverstretched, undertwisted DNA [349].

Such fascinating single-molecule biochemistry manipulations have also shown that un-der much smaller forces (<3 pN), a new DNA phase is achieved, with about 2.6 bps/turnand thus about 75% more extended than B-DNA [21]. This new DNA conformation,termed P-DNA (for Pauling, see below), occurs at moderate positive supercoiling formolecules that cannot relieve the torsional stress via writhing (nonplanar bending). (Fornegative supercoiling, DNA denatures in this force range). Intriguingly, some modelingsuggests that such a structure is ‘inside-out’, with bases on the helical exterior and thesugar/phosphate backbone at the center, as Linus Pauling once suggested for the structureof the double helix based on a model with un-ionized bases [975]. Such a conformationaltransition arrives from major rotation of torsion angles (notably α and γ, toward the transconformation) and is compatible with both C2′-endo and C3′-endo sugar puckers. Theintrastrand P–P distances in P-DNA (around 7.5 A) are larger than the correspondingvalues in canonical A and B-DNA (5.8 and 6.6 A, respectively). Evidence shows that thisP-DNA conformation is found in the packed DNA inside some virus complexes, where

216 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

the DNA is constrained by the helical coat protein [289, 775]. Still, the interpretation ofpositively-supercoiled, overstretched DNA as an ‘inside-out’ model remains controversial.

Polymer statistical-mechanics calculations suggest that the experimental data involvedin these force-versus-extension measurements of DNA can be fit to the elastic theory ofthe entropic force required to extend a worm-like chain [190]. However, such studiesof entropic force are only relevant to small fluctuations about the B-DNA equilibriumconformation.

7.3 RNA Structure and Function

7.3.1 DNA’s Cousin Shines

While proteins are household words and DNA is an icon, in science as well asart (for the latter, see [1112] for an overview), their biomolecular cousin, RNA,was largely left behind until recently. Indeed, RNA’s starring role in the cell hasemerged with new discoveries concerning RNA’s vital regulatory roles, as in-troduced in Chapter 1. Namely, our appreciation for RNA has heightened withdiscoveries that RNA molecules are integral components of the cellular machin-ery for protein synthesis and transport, RNA editing, chromosome replication andregulation, catalysis, and many other functions (see Table 7.1 for some of RNA’sdiverse roles).

At a 2006 symposium organized by NCI at Cold Spring Harbor, SusanGottesman suggested on a presentation slide: “Anything that DNA can do, RNAcan do better”. At the same symposium, Gary Ruvkun announced 23 challengesfacing scientists concerning regulatory RNAs. James Watson simply summed theatmosphere: “This is a revolution”. Indeed, we are now discovering that bio-logical and synthetic RNAs prepared in the laboratory perform a broad rangeof functions and have numerous applications. This comes with the discovery notonly of regulatory RNAs but also of many synthetic RNAs developed from in vitroselection experiments that have significantly expanded our knowledge of RNA’srepertoire.

7.3.2 RNA Chains Fold Upon Themselves

Many of the base pairing variations described for DNA in the prior sectionsare common for RNA, the single-stranded polynucleotide chain, which can foldupon itself to form double-stranded segments interspersed with loops. Such pat-terns are governed by the primary sequence and can accommodate a variety ofhydrogen bonding patterns. The double stranded regions (stems) can be imper-fect with bulges, mismatched pairs, and unusual hydrogen bond schemes, asshown in Figure 7.4. The stem/loop structures themselves can fold further, seeking

7.3. RNA Structure and Function 217

Table 7.1. Some classes of non-coding RNA (ncRNA).

RNA Function

transfer RNA (tRNA) protein synthesisribosomal RNA (rRNA) protein synthesisSignal recognition particle (SRP) protein recognitionsmall nucleolar RNA (snoRNA) rRNA modificationmicro RNA (miRNA) translation regulationtransfer-messenger RNA (tmRNA) protein stability in ribosometelomerase RNA replicationguide RNA (gRNA) mRNA editingspliced leader RNA (SL RNA) mRNA trans-splicingsmall nuclear RNA (snRNA) RNA splicinghairpin, hammerhead, and HDV ribozymes self-cleavageGroup I intron self-splicingGroup II intron self-splicingRNase P pre-tRNA processing23S rRNA peptide bond formationG, A, glmS, TPP and other riboswitches gene regulation

favorable stacking, hydrogen bonding, and other interactions between distant part-ners. See [1113] for an introduction to RNA structure and associated biologicalproblems.

The multitude of RNA secondary structures and tertiary interactions are sta-bilized by various double-stranded regions, interior and terminal loops, bulges,K-turns, U-turns, S-turns, A-platforms, tetraloops, and other motifs (e.g., [102,545, 654, 736, 739, 740, 875]).

RNA pseudoknot motifs, which can be considered supersecondary structuralelements, are produced from a special intertwining of secondary structural ele-ments by Watson-Crick base pairing (see Figure 7.5) Namely, pseudoknots formwhen a consecutive single-stranded domain with segments a, a′, b, b′, c, c′ andd (a′, b′, and c′ are connectors) folds to form two hydrogen bonds: a with c, andb with d. This is a pseudoknot rather than a knot since the strands do not actu-ally pass through one another [1367]. See Figure 7.6 for examples of RNAs withpseudoknots.

7.3.3 RNA’s Diversity

The wonderful capacity of RNA to form complex, stable tertiary structures hasbeen exploited by evolution. RNA molecules are integral components of the cel-lular machinery for protein synthesis and transport, RNA editing, chromosome

218 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

Stem I

Stem II

Stem III

C17

Hammerheadribozyme

U AC GG A

A UC G

A GA U

C GG A

C G

G CG C

U A

A UGU

UG

GG

C

C

CC

CA

A

Stem I

Stem II

Stem III

5’3’

5’ 3’

17

Hairpin:SL3 element of Ψ site ofhuman immuno−deficiency virus type−1

G CG C

5’ 3’

G C

A U

U AA U

C G

C GG GG A

Cruciform:4−way branch junction

CU

U

CG

GGC

C

U UC

A G AA G C

U GC

GA

A

3’ 5’

3’5’

3’

5’3’

5’

A

A

BBulge loops & Hairpin:

hairpin ribozymeof tobacco ringspotsatellite virus

Bulgeloops

Hairpin

3’

U

GU U U UC

GA A AA C

C GU A

A AAG

GC A U

GU AC

C GU

C G

G CA U

UG

C GA UC G

AAAA

A

A

AA

G

GC

C

UU

UU

5’3’ 5’

UC

U

A UG CA U

A UC G

C GA U

G UU A

G UG CG A

A UG U

U AC G

U AC G

G U

A GA G

A A

C G

A UC G U

AUGGG

C C

AAA

UUGC

A

AAU

AU

AG

UC

GCU

AG

UA

U

AA

C GC G

U AG U

A

GC

C

AA

G UC G

A U

G CC G

C G

CGG

ACAUGGUCCUA

ACCA

U

U

AA

A

AAA

G

G

G GGG

GG

CC

CC

5’3’

Loops, Bulge Loops& Hairpins:

P4−P6 domain ofthe group I intron ofTetrahymenathermophila

Loops

TetraloopReceptor

Tetraloop P6b

P6a

P6

P4

P5

P5a

P5c

P5b

Figure 7.4. Various nucleotide-chain folding motifs from solved RNA systems: loops,bulge loops, hairpins, hairpin loops, cruciforms (four-way junctions, found in recombi-nation intermediates and synthetic designs), and the complex tertiary contacts found inRNA (illustrated on the hammerhead ribozyme), which compact the secondary-structureelements. Broken connections are used for non-WC bps, including mismatches. The PDBfiles for the structures (clockwise from left) are as follows: tetrahymena intron, 1GID; to-bacco ringspot ribozyme, 1B36; HIV-1 Ψ-stem loop 3, 1A1T; and hammerhead ribozyme,1MME. The tertiary interactions shown for the hammerhead drawing (ellipses or connect-ing line segments) are based on [1154]. The nucleotide C17 shown in space-filling form isthe strand cleavage site of the ribozyme.

7.3. RNA Structure and Function 219

Figure 7.5. RNA pseudoknots have an intertwined form of base pairing, which can beevident from a circular representation of base pairing.

replication and regulation, catalysis and many other functions2. Indeed, somescientists hypothesize that life was based on RNA (‘RNA World’) before theevolution of the modern nucleoprotein universe [454, 906]. RNA/DNA hybriddouble helices also occur in transcription of RNA onto DNA templates and inthe initiation of DNA replication by short RNA segments.

As mentioned in Chapter 1, a highlight of RNA’s functional diversity is itscatalytic capability, as discovered in the 1980s and recognized in the 1989 No-bel Prize in Chemistry to Thomas Cech and Sidney Altman. Such catalysis isperformed by RNA molecules termed ribozymes; hundreds of ribozyme typesare now known in a diverse range of organisms (e.g., [1142, 1208, 1251, 1275],and some have been designed in the laboratory by in vitro selection (e.g., [1228,1245]), as spare in composition as two base building-block units (rather than four)[1042].3 Many ribozymes make or break phosphodiester bonds in nucleic acidbackbones, but other biological and chemical functions are continuously beingdiscovered.

RNA’s conformational flexibility, modularity, and versatility [493] are impor-tant for recognition by, and interactions with, ligands (e.g., as in the RNA of thehuman immunodeficiency virus, HIV) and other molecules. These features areof practical importance for the design of therapeutic agents that exploit RNA’sfunctional sites as potential drug targets (e.g., [76,211,539,546,725,981,1268]).The construction of ligands that bind RNA and interfere with protein synthesis,transcription, or viral replication have potential as new therapeutic agents, suchas antibiotic/antiviral drugs [984]. These pharmaceuticals are urgent given theresurgence of previously known viral and bacterial diseases and the emergence ofresistant mutants of common antibiotic and antiviral drugs.

2For example, tRNA molecules carry amino acids and deposit them in correct order, mRNAstranslate hereditary information from DNA into protein, rRNA are involved in protein biosynthesis(within a complex of ribosomal RNA and numerous proteins), gRNAs edit RNA messages (by ‘guide’sequences), cRNAs play roles in catalysis and autocatalysis, and snRNAs are splicing agents (by‘small nuclear’ components of the mRNA splicing machinery known as the splicosome).

3The 83-nucleotide ribozyme composed only of two different building blocks — uracil and 2,6-diaminopurine — was shown to catalyze the ligation of two RNA molecules with a rate 36,000 timesfaster than the uncatalyzed reaction [1042]. The fact that RNA’s genetic code may be simpler thantoday’s four bases lends further support to the “RNA world” hypothesis.

220 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

PDo

P9

P7P3P8

P1P2

P3

P1.1

P

U1ARBD

J1/2

L3

J1.1/4

J4/2

P1

P2

P3

P1.1P

U1ARBD

J1/2

L3

J1.1/4

J4/2

C

A

AC

CC

G

UU U

A GC GA UC GC G

C GA AUGGGAC

ACCCUG

GUAGC

GGG

CC

G

GGCCC

U

AC

GGG

CC

GCCU

UU

CC

CGG

G

5’ 3’

P3P8 P7 P9.0

P9

A5’

3’G GA AAC C CU

P4 P6 Domain

GG GC CU UU AGGGGG AAA

CUCUUCUU

UGU

A

AA ACU GAC

AUAG CU G

AU

ACU

AAGU UA

GACCU

CUC

CG

A AA

GG

CG

GA

CUCAU

UUGG

GA

J8/7

J3/4 J6/7

P1

P1.1

P2

P2P1

P1.1

P1

P1

U1ARBD

U1ARBD

PDo

P8

P8

P3

P3

P7

P7

P9

P9

Figure 7.6. Ribozymes with pseudoknots. Top: Hepatitis Delta virus (HDV) ribozyme(PDB code 1DRZ) shows two pseudoknots, with pseudoknot basepairing pattern drawnin the circular diagram with basepair number along the perimeter: a main one formed byregions P1 (red) and P2 (orange-red), and a minor one formed by regions P1.1 (cyan) andP3 (green). A hairpin formed by the P4 helix (blue) and the U1A ribonucleoprotein bindingdomain (purple) is also present. Bottom: P3-P9 domain of the group I intron Tetrahymenathermophila ribozyme (PDB code 1GRZ) shows one pseudoknot formed by regions P3(red) and P7 (orange-red). The P4-P6 domain (yellow) folds independently. See Box 7.3for a discussion of some of those structures. The intertwined or pseudoknotted regions areclearly seen by the crossings in the circular diagrams.

These applications — together with design of novel RNAs and of RNA se-quences called aptamers that bind specific molecular targets (e.g., [166, 199,724]) — have important potential as regulators of gene expression, therapeuticagents, molecular switches, and molecular sensors4. See [546, 889, 1207, 1208,1275, 1381], for some examples.

4RNA sensors are RNA molecules that are turned on or off when they contact a target, like smallorganic molecules, and can trigger a chemical reaction.

7.3. RNA Structure and Function 221

Designing new RNAs based on this idea of aptamers has led to the excitingexperimental field of in vitro selection. This experimental technology involvesgenerating and screening large random-sequence libraries of nucleic acid mole-cules for a specific function, such as binding or catalysis. Numerous nucleic acidmolecules binding targets (aptamers) have been developed, as diverse as organicmolecules, antibiotics, proteins, and whole viruses. In addition, new classes ofRNA enzymes (ribozymes) have been discovered by this approach, with applica-tions in biomolecular engineering, such as in the design of allosteric ribozymesand aptamer-based biosensors, aptamers capable of inhibiting protein function,and therapeutic aptamers (e.g., inhibiting the TAR RNA element of HIV-1). Otheremerging applications of engineered RNAs include RNA nanotechnology, whereRNAs are assembled into functional arrays, and RNA synthetic biology, wheredesigned RNAs are used to control cellular functions (e.g., regulate gene ex-pression). See Subsection 7.5.2 on analysis of random nucleotide pools to betterunderstand the relationship between sequence and structure/function and devel-opment of a computational procedure for mimicking aspects of this selectionapproach in silico.

7.3.4 Non-Coding and Micro-RNAs

Such newly found roles for RNAs, especially concerning tiny RNAs that do notencode proteins (ncRNA for non-coding RNAs) but can influence gene action wonDNA’s cousin the venerable trophy of “Breakthrough of The Year” by Scienceeditors in 2002 (see the 20 December 2002 issue of Science, volume 298). Non-protein coding stretches of mRNAs range in size from only 20 nucleotides to over10000 nucleotides [1228]; they are required to control the translation from themRNA transcript into protein.

The 2002 award recognized a large group of papers that unraveled various fas-cinating features of small RNAs (affectionately termed nanoRNAs). Micro-RNAs(miRNAs), generally 21 to 25 nucleotides long, form a regulatory class ofncRNAs [200]. These RNAs control gene expression by repressing translationof target genes through, for example, binding to 3′ untranslated regions of themessenger-RNA targets.

Such small RNAs in animals, plants, and fungi collectively became associ-ated with the title RNAi (for RNA interference). The agents that initiate RNAiin a sequence-specific manner are double-stranded RNA segments that silencegene expression (siRNAs, for small interfering RNAs). For example, they mayseek out the messenger RNA and destroy it, or they may bind to chromatinand/or modify chromatin structure [383, 1004, 1432]. Though initially regardedas anomalies, work is revealing that such siRNAs regulate gene expression in avariety of organisms.

Such interference mechanisms by RNA silencing can provide an organism anatural defense against invading viruses and transposons (DNA segments that mi-grate within and across organisms and are associated with bacterial pathogenicity)[11]. Consequently, this natural protection is being exploited by scientists using

222 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

siRNAs to target viral genes that can inhibit the replication of HIV-1, polio, orother viruses (e.g., [757]). Moreover, RNA interference mechanisms are being in-vestigated by companies who apply them to discover the functions of genes byturning them off to determine the effect on the plant or the animal. For exam-ple, a landmark study on obesity employed RNA interference [626] to inactivateabout 85% of the roundworm’s predicted 19,757 genes that code for proteins ina single experiment [66]. More recent studies have used RNA interference to un-cover genes involved in regulating the immune response to pathogenic bacterialinfections [273] and genes regulating programmed cell death (or apoptosis) [229].

These fascinating discoveries regarding RNA’s interference with gene activityare associated with many epigenetic phenomena — changes in gene expression(inheritance of features) that do not involve alterations in the genome and persistacross at least one generation [31]. A different kind of epigenetic control was alsodiscovered by Breaker, Nudler, and coworkers [863, 864, 889, 926, 1275, 1381] insome bacterial messenger RNAs containing sequences that sense small moleculesdirectly to control translation of mRNA into protein. Namely, specific controlregions of mRNA can bind directly to metabolites associated with vitamin B syn-thesis and import, and induce a conformational change in RNA’s folding state; thismetabolite-triggered conformational change acts as part of the signal transductionpathway that senses vitamin level and controls enzyme production. Such RNAsthat control gene expression upon binding metabolites or other ligands have beentermed riboswitches.

A riboswitch can undergo local and global conformational changes upon bind-ing to its substrate. The secondary and tertiary structures of these natural aptamerscan be very diverse, and several structures have been solved [873]. Examples in-clude the type-I glmS ribozyme and purine and SAM-II riboswitches in which onebinding region induces local conformational changes upon binding. TPP, SAM-I,and the M-box magnesium are examples of type II riboswitches because of theirtwo regions which undergo conformational changes upon binding their metabo-lite, making possible global as well as local conformational rearrangements (seeFigure 7.7).

Such a switch of RNA conformation between two states in a ligand-dependentmanner (see Figure 7.7) also opens new avenues for thinking about RNA designin a variety of contexts (e.g., [166,199,242,599,1028,1245]). Like the Paracelsuschallenge for proteins (see Chapter 2), one can formulate a similar challenge forRNA design — describe minimal changes in the nucleotide sequence to triggera conformational rearrangement in the folding of a given RNA molecule — thatcan be approached by a combination of computational and experimental wizardry.Indeed, Science editor Jennifer Couzin [267] writes: “Having exposed RNAs’hidden talents, scientists now hope to put them to work”.

7.3.5 RNA at Atomic Resolution

Until fairly recently, less was known at atomic resolution on RNA structurein comparison to DNA, but this has changed as the ‘RNA era’ has begun.

7.3. RNA Structure and Function 223

Progress in RNA structure elucidation can be attributed to vast improvements incrystallization procedures (e.g., RNA structure determination through crystalliza-tion with a protein that would not interfere with the enzyme’s activity [391]), aswell as alternative approaches for studying RNAs such as high-resolution NMR,spectroscopy, crosslinking reactions, and phylogenetic data analysis. Our knowl-edge of RNA structures has also increased dramatically with recent solutions ofribosomes [85, 201, 206, 210, 910, 922, 1135, 1374, 1379, 1427], since ribosomescontain numerous tertiary motifs of RNAs and therefore provide a rich resourceof information on RNA structural elements and organization.

The clover-leaf structure of the tRNA molecule has been known for decades,and for a long time was the only well-characterized major structure of an RNAmolecule [1173]. Its structure whet our appetite for RNA appreciation by reveal-ing the long-distance tertiary interactions. By now, RNA folds characterized byX-ray crystallography include the tRNA, hammerhead ribozyme, Tetrahymenagroup I intron, hepatitis delta virus ribozyme, Group I intron from Azoarcus andTwort, Group II intron, various ribozymes (e.g., hairpin, GlmS, Diels-Alder), var-ious riboswitches (e.g., purine, M-box, and TPP), RNase P types A and B, signalrecognition particle, and various fragments such as kissing hairpin and sarcin/ricinmotifs. See some examples in Figure 7.7 and details of some solved catalyticRNAs in Box 7.3.

Box 7.3: RNAs at Atomic Resolution

The hammerhead [1005, 1282] and Hepatitis Delta Helper virus (HDV) [391] ri-bozymes, in the family of self-cleaving catalytic RNA, were solved in 1994 and 1998,respectively. The hammerhead’s Y, or wishbone-shaped, structure has three base-pairedstems resembling the head and handle of a carpenter’s hammer (see Figure 7.4). The RNAis unpaired in its U-turn core and stabilized by non-WC, non-wobble bps in the stems.Visualizing its structure led to further analysis of the mechanism of RNA self cleavagethrough trapping of intermediates in the ribozyme reaction pathway [331, 886]. While thehammerhead’s active site is open, that of HDV is hidden (see Figure 7.6), resembling thecatalytic sites of globular proteins, and contains two pseudoknots.

The crystal structure of one self-folding Tetrahymena thermophila Group I intron [488],was solved in 2004. Its secondary structure consists of nine regions (P1-P9), which foldinto two major domains: P3-P9 and P4-P6. Both domains are stabilized through extensiveinter-domain tertiary interactions. The P3-P9 domain is formed by a coaxial helix betweenP3 and P8, with a slight bend with respect to P7, while helix P9 is oriented perpendicularto P7 (see Figure 7.6). The P4-P6 domain has a helical tetraloop region connected toanother helical segment, the tetraloop receptor, by a large bend (≈150o) (see Figure 7.4).Thus, its active site is hidden and exemplifies the complexity of RNA structure, involvingcomplex intertwining of secondary structure elements, including pseudoknots. Indeed,both WC and non-canonical base-paired regions are interspersed with internal loops, andan adenosine-rich bulge region mediates the long-range tertiary contacts in the RNA,improving base stacking interactions. The adenosine platform motif emerges from thisstructure as an important architectural component of RNA that might have arisen early

224 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

Figure 7.7. Examples of two riboswitches: SAM-I and TPP riboswitch. In the presenceof its substrate, S-adenosynlmethionine (SAM), the SAM riboswitch is composed of afour-way junction forming a pseudoknot and a pair of coaxial helices P1–P4 and P2a–P3.The junction is stable due to hydrogen bonding between nucleotides (orange) in helices P1and P3 which bind to SAM. The bound state of the riboswitch was solved by crystallogra-phy [872]. In the absence of SAM, the 3′-side of helix P1 forms an anti-terminator element(AT), allowing the mRNA to be fully transcribed. The TPP riboswitch The thiamine py-rophosphate (TPP) riboswitch from Escherichia coli thiM mRNA also has two formsdepending on the presence of TPP. The bound-state form has been solved by X-ray crys-tallography [1166]. Without TPP, the 3′-side of helix P1 binds to an anti-ShineDalgarno(anti-SD) region, allowing the ribosome to access the SD element so that translation canstart as proposed by the Breaker group [1381].

in evolution. As found in other RNA structures, metal-phosphate coordination is alsoimportant in stabilizing tertiary contacts in this intron RNA domain. In addition to theGroup I intron of the Tetrahymena thermophila, the Group I intron of the Azoarcus [3] andTwort [466] species are also available.

7.4. Current Challenges in RNA Modeling 225

The complete Group II intron solved in 2008 [1266] is a self-splicing ribozyme thatcatalyze their own excision from precursor mRNAs and joins together flanking exons with-out the help of proteins. The secondary structure is characterized by six domains (I to VI)composed of step-loop structures connected by a common central core and stabilized bylong-range tertiary interactions. Domain I is the largest element and contains recognitionsequences responsible for the correct assemble of the intron in its active form. DomainsII and III enhance the catalytic efficiency of the splicing, while Domain IV contains theopen reading frame (ORF) for expression of a reverse-transcriptase enzyme. Domain V isthe most phylogenetically conserved element and contains a hairpin loop involved in catal-ysis. Domain VI contains an adenosine that interacts with the splice site during splicing.The Group II intron and the spliceosome share common structural and functional features,supporting the hypothesis of a common ancestor [1266].

7.4 Current Challenges in RNA Modeling

7.4.1 RNA Folding

Deducing the functional structure of RNA molecules from the primary sequencehas been called the RNA folding problem [172, 241, 771, 1261]. The challengeis to understand how the strong electrostatic repulsions between closely packedphosphates in RNA are alleviated. Indeed, the stability of compact RNA formsis strongly maintained through interactions with both monovalent and divalentcations and by pseudoknotting.

As mentioned in connection to the multitude of hydrogen bonding patterns forpolynucleotides in Section 7.2.1, the rich variety of possibilities in RNA, as re-viewed in [736, 739, 740], has led to proposal for new nomenclature for RNAsand the annotation of RNA structure through the international RNA OntologyConsortium, ROC (http://roc.bgsu.edu/) [735].

7.4.2 RNA Motifs

We now recognize at least seven major tertiary interaction motifs for RNA, asshown in Figure 7.8 — subdivided into three classes. These classes separate in-teractions between two double stranded helices, between a single strand and ahelix, and between two single stranded regions. A recent annotation study of 54representative high-resolution solved RNA structures showed the dominance ofA-minor motifs (37%), coaxial helices (32%), and ribose zippers (20%), whichtogether account for 89% of the total motifs (which number 613) [1403] (seeFigure 7.9). Correlations among motifs, such as a pseudoknot or coaxial helixwith A-minor, reveal patterns of higher organization and underscore RNA’s hier-archical structure (Figure 7.9). Analyses of RNA junctions of orders 4 through 10further suggested global patterns and subnetworks that organize RNA in complex

226 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

ways as well as a classification into distinct families (Figure 7.10) [690, 691].Such studies can help in the goal of RNA structure prediction by defining majormotifs in RNA and pinpointing specific sequence/structure relations.

7.4.3 RNA Structure Prediction

Predicting the secondary and tertiary folding of RNA is a difficult and ongoingenterprise [172, 225, 241, 277, 689, 1022, 1145, 1465]. Secondary structure ele-ments are easier to identify through free energy minimization combined withcomparative analysis (sequence alignment) using evolutionary and database rela-tionships [838,839,1460,1461]. The minimized objective function is empiricallyderived based on basic physiochemical laws. However, caution is warranted in the

S/S

tRNA D-loop:T-loop

Kissing hairpin

Pseudoknots

Self-complementary nucleotides

Two intertwiningregions forming

Complementary basepairs

D/T loop interaction

Ribose zipper

Antiparallelstem/loopinteraction

5'-CC-3' (Stem)3'-AA-5' (Loop)

A-minor motif

Clustering of adenines

G-C preferred

Tetraloop receptor

Tetraloop/internal loop5'- GAAA -3'5'-CC-UAAG-3'

S/HH/H

Coaxial helices

Junction,“pseudo-stem”

5'

3' 5'3'5'

3'

5'

3' 5'3'5'

3'

3'5'

5'

3' 5'3'5'

3'

3'5'

5'3'5'

3' 5'

3'

3'

5'

3'

5'3'

5'

Two hairpins

3' 5'

3'

5'

RNA Tertiary Motifs

Figure 7.8. Seven major motifs of tertiary interactions in RNA, involving two doublestranded helices (H/H), single strand and a helix (S/H), and two single stranded regions(S/S). Coaxial helices form when nucleotide bases from two separate helices stack andalign axes to form a pseudo-continuous coaxial helix. A-minor motifs originate from clus-tering of adenosine interactions, often present within other motifs such as coaxial helices.Tetraloop receptor is an interaction between a tetraloop (GNRA, where N = A,G,C,U andR = A,G) and an internal loop plus two GC bps. Ribose zippers form when two consecutiveresidues from one chain segment interact in an antiparallel fashion with two consecu-tive residues from another chain segment distant in sequence but close in space. Kissinghairpins form by base pairing between single-stranded residues of two hairpins with com-plementary sequences, and tRNA D-loop:T-loop are interactions between two conservedhairpins in tRNA. See glossary of [1403] for more details and references to original motifdefinitions.

7.4. Current Challenges in RNA Modeling 227

Coaxial helix32%

A-minor motif37%

Ribose zipper20%

Other11%

TPP riboswitch(2GDI)

5’3’

A-minor motif

Ribose Zipper

Loop-loop receptor

Coaxial helix

P1

P2

P3

P4

P5

Figure 7.9. The distribution of RNA tertiary motifs in a non-redundant set of 54 high--resolution crystal structures and annotated diagram of the TPP riboswitch (PDB 2GDI)showing correlated motifs [1403].

interpretations of such 2D structures since predictions are imperfect, especiallyfor more than 200 nucleotides. The new structures, however, provide opportuni-ties for learning what works, as well as what fails, in structure prediction. Thus,discriminating among the possible tertiary interactions to obtain the final foldedstate remains a challenge [277, 1115, 1261]. Still, findings concerning the foldingkinetics of Tetrahymena ribozyme [1149] have suggested that, as thermodynamicdata on tertiary structure interactions become available [1261], the RNA foldingproblem might be easier to solve than protein folding [101,1261]. Therefore, withadvances in RNA synthesis and structure determination [558] and the availabilityof thermodynamic data on tertiary interactions [1261], it is likely that our under-standing of RNA structure, RNA folding, and RNA’s role in enzyme evolutionwill dramatically increase in the coming decade. These developments are pro-pelling studies in RNA informatics (or ribonomics) [332] and RNA design (e.g.,[166, 199, 241, 242, 359, 599, 600, 689, 1115, 1142, 1245, 1245, 1251, 1281]).

Emerging themes in RNA structure include the importance of metal ions andloops for structural stability, hidden active sites in some ribozymes like cat-alytic sites of proteins, various groove binding motifs (e.g., adenines interactingwith minor-groove helices to stabilize tertiary contacts, as deduced from crys-tal structures [327, 911] and statistical analyses [1383]), architectural motifstailored for intermolecular interactions [546], hierarchical folding, fast estab-lishment of 2D elements, and extreme flexibility of the molecule as a whole[172, 277, 390, 493, 1261]. See recent review [1465].

Some key challenges concerning RNA include finding novel RNA genes,identifying the biological roles of these RNA genes, determining the structuralrepertoire of RNA, determining RNA tertiary folds from sequence, and designingnovel RNAs [1112]. Bioinformatics tools hold great promise in addressing thesechallenges.

228 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

P64P66

P65

P67

P61

P63

P62P64

Three-Way Junctions

P57

P59

P58

Family A23S rRNA (2J01)

P75

P79

P76

P28

P43

P29

P75

P79

P76

5'3'

5'

3'

5' 3'

P28

P43

P29

5' 3'

5'

3'5'

3'

Family B16S rRNA (2J00)

P57

P59

P585'

3' 5'

3'5'

3'

Family C23S rRNA (1S72)

Four-Way Junctions

Family H(RNase P type B 1NBS)

P3P3b

P3aP3c

P7

P9

P8

P10

P1

P3P2

P4

P51

P53

P52

P54

P11

P13

P12

P145' 3'

5'3'

5' 3'

P3P3b

P3a

P3c

5'3'

P1

P3P2

P4

Family cH(HCV IRES domain 1KH6)

Family cL(tRNA-Phe 1EHZ)

P29

P41

P30

P42

P27

P29

P28

P31

P75

P79

P76

P79

3'

5'3'

5'

5'

3'

Family cK(23S rRNA 1S72)

Family π(Ribonuclease P 1U9S)

5'

3'

5' 3'

P11

P13P12

P14

5' 3'

5' 3'

5'3'

5'3'

P61

P63

P62P64

5'3'

Family cW(23S rRNA 2J01)

Family ψ(23S rRNA 2J01)

5'

3'

5' 3'

P64 P66

P65

P67

3' 5'

3' 5'

Family cX(16S rRNA 2J00)

5'3'

5'

3'5'

3'

P29

P41

P30

P42

5'

3'

5' 3'

P

P

P27

P29

P28 P31

5'

3'5'

3'

Family X(16S rRNA 2J00)

P7

P9

P8

5'3'

3'5'

3' 5'

P10

5' 3'

Base pair interactions

Watson-Crick edge

Hoogsteen edge

Sugar edge

Base involved in tertiary interaction

Cis Trans

Hydrogen bond interactionAU cis Watson-Crick base pairGC cis Watson-Crick base pair

P Helix-packing interaction

RI

RII

Bifurcated base pair

Ribo-base interaction type I

Ribo-base interaction type II

B

Figure 7.10. Classification of RNA junctions into families according to coaxial stackingproperties, perpendicular helical configurations, and flexible helical arms. Lines inside thehelices represent the canonical WC bps GC, AU, and the GU wobble bp. The networksymbology follows the Leontis Westhof notation [738].

7.5. Application of Graph Theory to Studies of RNA Structureand Function 229

P1P2

P26

P47

P25

P72

P73

P94

P98

P99

P36

P37P39 P40

P38 P41

P45

P4

P4A

P11

P12P13

P14

P80P74

P75

P81

P82

P88

P2

P3 P15P15.1

P5

P15.2

P3

P2

P4

P5

P1

P2

P8

P3

P7.1

P10

P14

P16

P15

P21P22

Five-Way JunctionsGroup I Azoarcus intron (1U6B)

23S rRNA(1S72)

tRNA(2BTE)

5'3'

P3

P5

P1

P2

P4

P5

P7

P6P13

P14

P7

P9

P8

P10

P8

P27

P29

P28

P30

P315'3'

5' 3'

5'

3'

5'

3'

P2

P3

P8

P7.1

P10

5' 3'

5' 3'

5' 3'

5'3'

3'

5'

5'3'

P14

P15

P16

P21P22

5'3'

16S rRNA(2AVY)

5'

3'

3'

5'

5'

3'

B

P5

P6P7

P14

P13

Group II Intron (3EOH) 5' 3'

3' 5'

Pii

PAPB

PC

PD

23S rRNA (1S72)

3'5'

3'

5'

P

RI

P

RI

P30

P27

P28

P31P29

3' 5'

Ribonuclease P(2A64)

23S rRNA(2AW4)

23S rRNA(2J01)

5'

3'C

C A C C C G

CGGGUG C

A G

CU

CC G

G

A

AC G

G

ACUGGG C

CCAG

GA U

CU

CC

CU

GGGGA

G C C

GGC

A

C

CG G A

AG

AC

U

AA

GUA

AG A

A

5'

5'

5'

5'3'

3'

3'

3'

G

A

P4

P4A

P11

P12

P13

P14

GGUAAUC G

AUUACC

CUGCGGCCGC

GGCCGUAG

G

AC

A A A G

CU

CUCGC G

CGAGA

CCCG

GGGC

A

CCUUCCC

GG

GA

AG

GAGA

UA

GA

U

AC

AG

GC

GAGCCUGU

5' 3'

3'5'

5'3'

5'3'

A A

G

A

G C

C GC G

U AA U

UG

AA

A

AG

P2

P3

P15

P15.1P5

P15.2

5'3'

5' 3'

5' 3'

C

ACCU A

GGU

ACUAUG

UAGU

GAC G

UC

AGUGAUC

A U G

CAUAUGAUUC

AA

CG

GA

CGGUGGU G

CCAUCG

UU

5' 3'

5'

5'

5'

5'

3'

3'

3'

3'

U

U

C

G

UAGC

P

UC

C

U

C

CGGAG

GA

A

A

AU

P80P74

P75

P81

P82

P88

5'3'5'

3'

23S rRNA(2AW4)

5'CUC

C

CCGA

UA

GG

GG

A

G

GC

U

AUG

UA

G

CUCGUGA

U

C

CGGG

C

A

C

G

GCG

GGU

U

C

CG

UC

G

UG

G

G

GC

C

C

CC

GC

GC

GG

CA

GC

GA

CG

C

GC

GU

U

UU

G

GC

AC

G

A

A A

G

G

AA

A

A

AG

G

C

5'

5'

3'

A

G

P

RII

P36

P37

P39

P40

P38

P41

P45

5'

5'

3'

5'

5'

3'

23S rRNA(2AW4)

GGUUAAGC

G C U U A A C C

GCGUACACGGUGGAUG

C U G A A A C C G U G U A C G U

GU

GG

CU

GC

U

U

UGU

A

U

A

ACCUA

UGGG

UCAGCG

CUGAC

AAG

UA

A

A

AUU

U

AAG

UA

A

A

AACG

UG

A A

UG

C G C U G

CAGUG

C C C G C G G C

GCCGUGGG

G G A G A A C

GUUCUCC

GACCCU

AG

GG

UC

GA

CU

UG

GA

AG

GCCG

G C U

U

A

CCG

U

CU

G A

A

A

U

C

A

G

A

C

A

A

G

CA

5'

5'

5'

5'

5'

5'

5'

3'

3'

3'

3'

3'

3'

3'3'

5'

GU

P1

P2

P26

P47

P25

P72

P73 P94

P98

5'3'

5'3'

P99

Six-Way Junctions

Seven-Way Junctions

Ten-Way Junctions

Classification of RNA junctions (continued).

7.5 Application of Graph Theory to Studies of RNAStructure and Function

7.5.1 Graph Theory

The application of graph theory to describe secondary structure motifs of RNA,as pioneered by Waterman in 1978 and extended by others [116, 438, 708, 1350],

230 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

junction

bulge

loop

stem

Tree Dual

edge vertex

vertex edge

edge

edge

edge

edge

edge

edge

vertex

vertex

junction

bulge

loop

stem vertex

vertex

vertex

vertex

stem 2 bp

1bp

2bbulge

loop

We consider:

Translating RNA motifs tograph theory objects

B0G

C

G

A

C

C

G

A

G C

C

A

G

C

G A

A

AG

U

U

G

G

G

A

G

U

C

G

C5'

10

20

3'

S1

S2

B1

B2Leadzyme Tree Graphs Dual Graph

B2

B1

S1

S2

G

C

G

G

A

UU

UA

GCUCAG

U

U

GG G

A

G A G CG

C

C

A

G

AC

U

GAA

G

A

U

C

U

G

G A GG

U

C

CU

GU

G U UC

GAU

C

CA

CA

GA

A

U

U

C

G

CA

CC

A

5'

3'

10

2030

40

50

6070

B3

B1

B2

B0

B4

S1

S2

S3

S4

B2

B1B3 B4

S2

S1

S3

S4

B2

B1B3S1

S2S3

S4

B0

B4

B2

B1

B0

S1

S2

tRNA

>

>

>

Figure 7.11. Graphic Representations of RNA Secondary Structures shown for two RNAs,using Tree and Dual Graphs, using the definitions at right [388, 438].

holds promise in the area of RNA structure analysis. Graph theory is suitable forRNA because this field of mathematics is widely used for analyzing networksand enumerating structural possibilities, including chemical structures, geneticand biochemical networks, monetary transactions, terrorist organizations, socialnetworks, and the Internet [33,900]; see also the 24 July 24 2009 issue of Science,volume 325, devoted to networks.

Graph theory can be used to represent RNA secondary structures as two typesof graphical objects: trees and dual graphs according to specified rules for defin-ing RNA stems, bulges, junctions, and loops (Fig. 7.11). All possible RNA 2Dstructure motifs, as simplified by this approach, can then be cataloged by thegraph vertex number, as well as ranked by topological complexity according tothe second-smallest eigenvalue of the Laplacian matrix of the graph [388, 438].

7.5.2 RNA-As-Graphs (RAG) Resource

One such RNA topology resource, RAG (RNA-As-Graphs) (http://monod.biomath.nyu.edu/rna), is being used to classify/analyze topological character-istics of existing RNAs [438, 645] (see Fig 7.12 for graph enumeration segmentsfrom RAG) and to design and predict novel RNA motifs [643–646]. See also someRNA reviews commenting on graph theory applications to RNA [542, 736].

Specifically, RAG and the associated graph theory framework for RNA [388,435] have been used to classify and catalog RNA motifs [438, 645], esti-mate the size of RNA’s structural/functional repertoire [645], detect structural

7.5. Application of Graph Theory to Studies of RNA Structureand Function 231

and functional similarity among existing RNAs [965], identify RNA motifs ofantibiotic-binding aptamers (found synthetically) in genomes [702–704], analyzethe structural diversity of random pools used for in vitro selection of RNAs [450],simulate aspects of the process of in vitro selection in silico [643, 644, 646],and analyze RNA thermodynamics landscapes to better understand riboswitchmechanisms to ultimately enhance their design [1028].

Other applications of RAG in the community which include graph theory ex-tensions involve classification and prediction of ncRNAs [503,633,801,901,902,1181], various RNA structure analyses [50,78,148,170,533,534,538,982,1041],and various applications of graph theory [82, 416, 470, 471, 538, 761, 1324, 1431].Examples of graph extensions include labeled dual graphs [633] and directedtree graphs [503]. Our application of spectral theory to catalog RNA graphshas also been extended to other biological and physical systems [82, 416, 470,471, 761, 1324, 1431]. See Kim’s thesis for a more detailed descriptions on theseapplications [642].

RNA Structure Enumeration

Cataloging based on graph theory enumeration suggests that the RNA structureuniverse is dominated (more than 90%) by pseudoknots, in agreement with avail-able data [645], as also discussed in [78,170,982,1041]. Significantly, the existingRNA classes represent only a small subset of possible 2D RNA motifs as enumer-ated by graph index; some of these motifs may be natural while others may bepossible to generate in the laboratory. Still, others may not exist.

RNA-Like Motifs

The usage of clustering techniques to separate graphs that are ‘RNA-like’ fromthose that do not resemble natural RNAs also led to predictions of many newRNA-like motifs, including ten specific examples of sequences that might leadto novel-like RNA topologies [645], as shown in Figure 7.13. Some of thesemotifs predicted in 2004 have since been solved: C1 in mammalian CPEB3 ri-bozyme [1084], C2 in a purine riboswitch [819], C3 in the tymovirus/PomovirustRNA-like 3′ UTR element [840], C4 in the tombusvirus 3′ UTR region IV [1463],and C7 in the flavivirus DB element [235]. Significantly, the predicted and actualsequences have between 45 to 51% homology.

Graph theory tools are also natural for comparing RNA structures to find ex-isting RNA motifs within large RNAs based on graph isomorphisms [965]. Thisidea was applied to identify topological similarities among existing RNA classesand to define motifs of RNA within larger RNA topologies for major RNA classes(e.g., tRNA, tmRNA, hepatitis delta virus RNA, 5S, 16S, 23S rRNAs).

Furthermore, the representation of RNAs as graphs led to identification of RNAmotifs in genomes. Since natural aptamers exist in many bacterial genomes andother organisms, it appeared likely that natural counterparts of synthetic motifsexist in vivo. This led to development of an efficient search tool for identifyingsmall RNA motifs in genomes by exploiting many artificial motifs derived from

232 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

Figure 7.12. Segments from RAG showing some enumeration of graphs for tree and dualsegments of low V numbers. The graphs are coded according to motifs found in nature(red), motifs not yet found that are RNA-like (blue), as determined by clustering analysis,and remaining motifs (black) [645]. See RAG website for details and updates.

RNA in vitro selection experiments [702–704]. The search for antibiotic-bindingaptamers produced 37 candidate sequences from bacterial and archaeal genomes.

RNA Design

Finally, graph theory tools can also advance the design of novel RNAs bymimicking the process of in vitro selection in silico. As a first step, understand-ing the structural diversity of random pools was important for improving in vitrotechnology. It is simple to generate random pools computationally, but graphtheory allowed rapid analysis of the resulting 2D motifs using 2D folding al-gorithms [450]. By characterizing the distribution of secondary structure motifsin computer-generated random pools using sets of possible RNA tree structuresfrom graph theory [450], we showed that random pools do not have a uniformdistribution of possible topologies and instead favor simple topological motifs.This was expected from experimental observations that typically yield simple mo-tifs. Further studies also found that the proportion of multiply branched structuresincreases with sequence length, in agreement with other studies.

7.5. Application of Graph Theory to Studies of RNA Structureand Function 233

single strand RNA(NDB:PR0055)

bulged hairpin(Rfam:CopA)

DsrA RNA(Rfam:DsrA)

single strand RNA(NDB:PR0055)

bulged hairpin(Rfam:CopA)

single strand RNA(NDB:PR0055)

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

DsrA RNA(Rfam:DsrA)

bulged hairpin(Rfam:CopA)

single strand RNA(NDB:PR0037)

DsrA RNA(Rfam:DsrA)

Flavivirus DB element(Rfam:RF00525)

Purine Riboswitch(Rfam: RF00167)

Tymovirus tRNA-like3' UTR element(Rfam: RF00233)

Candidates Discovered After 2004RNA Secondary Structure WithNatural Submotif

Graph RepresentationWith Natural Submotif

Tombuvirus 3' UTR region IV(Rfam: RF00176)

Mammalian CPEB3 Ribozyme(Rfam: RF00622)

Figure 7.13. Ten candidates predicted to have RNA-like topologies, as determined by clus-tering analysis, represented as dual graphs; the red submotif occurs in natural RNAs [645].Of those ten, five RNAs were discovered since the published predictions, as shown in thethird column.

234 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

4x4 mixing matrix M

RNA sequences

ACGU

ACGU

ACGU

ACGU

ACGUAGGGACCGUUUUAAUAGGGACUACUUUCCCUAGGGUACUAGUACUGUAGUACUUACUCAGUUUUGGGGUAAAAUAAUGAACUGACGACUGACCCUACGUACCAUGAUUUACGAUAC

~ 1015

M M M M

M M M M

M M M M

M M M M

AA

CA

GA

UA

AC

CC

GC

UC

AG

CG

GG

UG

AU

CU

GU

UU

Aptamers

Task

1. Specifying sequence mutations

Chemical probing with functional assays

2. Generating nucleotide pools

3. Characterizing structure and function of sequences

4 vials containing A, C, G, and U

2D folding of sequences,analysis of graph motifs

InitialStructure

Candidate aptamersequences

Experiment Theory

Figure 7.14. Experimental RNA pool synthesis and an in silico approach to mimic the pro-cess [643,646]. Pool generation can be simulated using the mixing matrix, which specifiesnucleotide mixtures in synthesis port or mutation rates for all nucleotide bases and can bedefined to mimic specific biological situations [646]. The resulting sequences are “folded”into 2D structures using existing algorithms and analyzed further to screen and filter thecandidates.

Mimicking the experimental process in silico can be done on the basis ofa “mixing matrix” framework (see Fig. 7.14), which led to development ofRAGPOOLS, a publicly-available web server for pool design [643]. Such mixingmatrices specify the mixing ratios of nucleotides in synthesis ports (see Fig. 7.14).When applied to a starting sequence, each mixing matrix will generate a pool ofspecific sequences via mutations of the given sequence. Thus, different designstrategies (based on covariance mutations like AU to CG or conversion of AU toCG base pairs) produce different sequence pools, and the main idea is to designthese pools to yield the desired structure and/or function.

Using this new tool and other computational and experimental resources,very large pools of nucleotides (up to 1014) can be generated, screened, andfiltered according to various 2D-structure similarity and flanking sequence

7.5. Application of Graph Theory to Studies of RNA Structureand Function 235

analyses [644]. Such computational and theoretical yields agree for simple RNAmotifs. For real aptamer targets, the in silico procedure overestimates the yieldsfound experimentally, as expected, because experimental yields represent lowerbounds and the screening does not yet involve 3D structural aspects.

Advances on all these fronts are ongoing.

236 7. Topics in Nucleic Acids Structure: Noncanonical Helices and RNA Structure

2 3 4 5 6 7 8 9−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

50 100 150 200 250 300 3500

0.5

1

1.5

2

2.5

3

50 100 150 200 250 300 3500

1

2

3

4

0 1 2 3 4 50

50

100

150

200

Morse

Quartic (special)

Cubic...........

Quartic

Harmonic

r (bond length) [A]

Bon

d en

ergy

[kca

l/mol

]

2 fold3 foldsum

.........

τ (dihedral angle) [deg.]

Dih

edra

l an

gle

ene

rgy

[kca

l/mol

]

O3'

C3'

C2'

O2'

O

P

O5'

C5'

Cou

lom

bic

ener

gy [k

cal/m

ol]

LJ E

nerg

y [k

cal/m

ol]

C...C

O...H

110 115 120 125 1300

0.2

0.4

0.6

0.8

1

1.2

1.4

θ (bond angle) [deg.]

Bon

d A

ngle

ene

rgy

[kca

l/mol

]

Harmonic Trig.

τ (dihedral angle) [deg.]

0 0.1 0.2 0.3 0.4 0.5−5000

0

5000

r (distance) [A] r (distance) [A]

C...C

O...H

Dih

edra

l an

gle

ene

rgy

[kca

l/mol

]