Sequence and conformational preferences at termini of α-helices in membrane proteins: Role of the...

49
TITLE Sequence and Conformational Preferences at Termini of α-helices in Membrane Proteins: Role of the Helix Environment SHORT TITLE Helix environment dictates sequence and structural preferences at termini of α- helices. KEYWORDS: Membrane protein folding, Membrane protein modeling, Protein design, Structural Biology, Helix termini, Helix capping AUTHORS AND AFFILIATIONS Ashish Shelar and Manju Bansal Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560012 , India Work performed at Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560012 , India CORRESPONDING AUTHOR Manju Bansal Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560012 , India E-mail: [email protected]) Ph: +91-080-22932534 Research Article Proteins: Structure, Function and Bioinformatics DOI 10.1002/prot.24696 This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process which may lead to differences between this version and the Version of Record. Please cite this article as an ‘Accepted Article’, doi: 10.1002/prot.24696 © 2014 Wiley Periodicals, Inc. Received: Jul 15, 2014; Revised: Sep 05, 2014; Accepted: Sep 16, 2014

Transcript of Sequence and conformational preferences at termini of α-helices in membrane proteins: Role of the...

TITLE

Sequence and Conformational Preferences at Termini of α-helices in Membrane

Proteins: Role of the Helix Environment

SHORT TITLE

Helix environment dictates sequence and structural preferences at termini of α-

helices.

KEYWORDS: Membrane protein folding, Membrane protein modeling, Protein design,

Structural Biology, Helix termini, Helix capping

AUTHORS AND AFFILIATIONS

Ashish Shelar and Manju Bansal

Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560012 , India

Work performed at

Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560012 , India

CORRESPONDING AUTHOR

Manju Bansal

Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560012 , India

E-mail: [email protected])

Ph: +91-080-22932534

Research Article Proteins: Structure, Function and BioinformaticsDOI 10.1002/prot.24696

This article has been accepted for publication and undergone full peer review but has not beenthrough the copyediting, typesetting, pagination and proofreading process which may lead todifferences between this version and the Version of Record. Please cite this article as an‘Accepted Article’, doi: 10.1002/prot.24696© 2014 Wiley Periodicals, Inc.Received: Jul 15, 2014; Revised: Sep 05, 2014; Accepted: Sep 16, 2014

2

ABSTRACT

α-helices are amongst the most common secondary structural elements seen in

membrane proteins and are packed in the form of helix bundles. These α-helices

encounter varying external environments (hydrophobic, hydrophilic) that may

influence the sequence preferences at their N and C-termini. The role of the external

environment in stabilization of the helix termini in membrane proteins is still

unknown. Here we analyze α-helices in a high-resolution dataset of integral α-helical

membrane proteins and establish that their sequence and conformational preferences

differ from those in globular proteins. We specifically examine these preferences at

the N and C-termini in helices initiating/terminating inside the membrane core as well

as in linkers connecting these transmembrane helices. We find that the sequence

preferences and structural motifs at capping (Ncap and Ccap) and near-helical (N’ and

C’) positions are influenced by a combination of features including the membrane

environment and the innate helix initiation and termination property of residues

forming structural motifs. We also find that a large number of helix termini which do

not form any particular capping motif are stabilized by formation of hydrogen bonds

and hydrophobic interactions contributed from the neighboring helices in the

membrane protein. We further validate the sequence preferences obtained from our

analysis with data from an ultradeep sequencing study that identifies evolutionarily

conserved amino acids in the rat neurotensin receptor. The results from our analysis

provide insights for the secondary structure prediction, modeling and design of

membrane proteins.

Page 2 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

3

INTRODUCTION

Membrane proteins constitute roughly 30% of open reading frames in various

sequenced genomes and form about 70% of the current drug targets 1. They are key

components in many regulatory pathways, immune responses and help to maintain the

integrity of cells2. However, due to difficulties in their crystallization, they constitute

only 2-3% of the total structures3 available in the Protein Data Bank (PDB)4. The past

few years have seen a considerable upsurge in research on membrane protein

structure and function and large amount of protein and genomic sequence data5 has

become available. Various experimental6-8 studies elucidating topology of membrane

proteins, as well as studies on computational design of membrane proteins9,10 have

been published. The advancement of X-ray11,12 and cryo-electron microscopy

techniques13-15 over the past decade has led to a substantial increase in the number of

high resolution membrane protein structures being solved which can be used for

sequence and structural analysis (summary of these structures is provided by White

http://blanco.biomol.uci.edu/).

α-helices and β-strands are the principal secondary structures observed in membrane

proteins due to the energetic constraints imposed by the lipid bilayer16. Thus, helix

bundles and beta barrels are the two major super secondary structural elements seen in

integral membrane proteins. Historically, transmembrane regions of proteins have

been predicted using various hydrophobicity scales17,18. Topology prediction of α-

helical membrane proteins has been greatly assisted by the ‘positive inside rule’18,

‘glycine outside rule’19 and the presence of aromatic residues (Tryptophan, Tyrosine)

at the bilayer and water interface to anchor the helix in the membrane20-22. These

studies have provided guiding rules to understand the folding process of α-helical

integral membrane proteins. However the reasons for helical distortions, pi-bulges,

Page 3 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

4

helical transitions, kinks as well as broken and re-entrant helices inside the

membrane23-26 are not yet completely understood. Due to these variations in the

structure, computational modeling27,28 of membrane proteins has only been successful

in predicting structures of small membrane bound peptides29 and four to seven helix

bundle proteins30,31. Thus, a detailed and systematic sequence and structure analysis

of the α-helices in membrane proteins is essential for the understanding of principles

governing their folding and functions as well as developing better prediction tools.

An α-helix in a membrane protein experiences a range of environments along its

length: apolar (in the membrane core), slightly polar (at lipid headgroups) to

completely polar (in the cytoplasmic and extracellular regions)32 and it shows

characteristic residue preferences depending on its location in the membrane19,33. α-

helices in globular proteins have been previously analyzed in terms of the residue

preferences34,35 and capping motifs34-38 at their termini. These motifs act as ‘start and

‘stop’ signals and also provide stability to the α-helix. In the present analysis, we have

addressed the following questions: Are the helix ‘start and ‘stop signals in membrane

proteins similar to those of globular proteins? Does the variable environment of the

membrane play a role in the residue preferences and the helix capping motifs at the

termini? Do α-helices in the vicinity of a helix terminus have a role in the stabilization

of the helix terminus?

To address these questions, we have analyzed 865 α-helices that are longer than 8

amino acids in length and identified by STRIDE39 program in high resolution

membrane proteins. These are compared with a dataset of 2680 α-helices in globular

proteins, to find out sequence and conformational preferences at helical and near-

helical positions. A detailed analysis of membrane helices initiating or terminating

outside the membrane with those that have their termini embedded in the membrane

Page 4 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

5

indicates differences in amino acid requirements between datasets for the first (N1)

and second (N2) as well as the penultimate (C2) and last (C1) positions in the helix

body. Different amino acids are also preferred at helix capping (Ncap, Ccap) and

near-helical (N’, C’) positions. Upon examining helix initiation and termination

motifs we observe that helix termini have an inclination to form specific motifs

(glycine-Schellman and non-glycine Schellman motifs) that help to maintain the

‘helix bundle’ type of architecture of the membrane protein. Short linkers connecting

transmembrane helices also show a similar positional preference for amino acids as

well as a tendency to take up specific backbone conformations. Overall, our findings

suggest that the sequence and structural preferences at the α-helix termini in

membrane proteins are governed by the demands of helix initiation and termination as

well as the membrane environment.

Page 5 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

6

Materials and Methods

X-ray Crystal structure dataset

Membrane Proteins

A non-homologous dataset of X-ray crystal structures with resolution better than 2.5Å

with sequence identity <25% was created using the PISCES server40. The dataset

comprised of 75 proteins with 181 chains. Coordinates for these 75 proteins were then

downloaded from the Orientation of Proteins in Membrane (OPM) database41. The

OPM database aligns the protein structure along the Z-axis and also provides the

membrane (hydrophobic core) boundaries for the protein based on free energy

transfer values for proteins from water to cyclohexanol42.

Globular proteins

A dataset of representative globular protein folds was created using the data from the

ASTRAL-1.75 compendium43 in the SCOP database. From a total of 1195

downloaded representative folds, the dataset was further refined by removing: (i)

Domains with SPACI score less than 0.4 (resolution worse than 2.5Å resolution) (ii)

Folds having missing ATOM record for any of the residues were excluded (iii) All

beta folds (iv) Membrane and cell surface protein folds. After filtering the data based

on the above mentioned criteria, the final dataset consisted of 626 representative

folds.

Secondary structure assignment

The Structure Identification (STRIDE) program was used to assign secondary

structures in the 75 membrane proteins and 626 representative folds of globular

proteins. The total dataset included 1164 STRIDE assigned α-helices 865 of which

were longer than 8 amino acids with average helical length of 29 and median

Page 6 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

7

distribution of 28 amino acids. The globular proteins dataset consisted of 3615 α-

helices 2680 of which were longer than 8 amino acids with mean and median helical

length of 14 and 15 amino acids respectively. These 865 and 2680 α-helices from

membrane and globular proteins were selected for further analysis.

Helix position nomenclature

Fifteen positions (nine helical positions and six near-helical positions) were

considered in and around the helix namely. N’’, N’, Ncap, N1 , N2 , N3 , N4 , MID ,

C4 , C3 , C2 , C1 , Ccap, C’ and C’’ for position-specific analysis of amino acid

occurrence (Fig. S8). The MID region represents the middle region of the helix after

excluding the four terminal positions at each end of the α-helix. The number of

residues at the MID position is N-8, where N is the length of the helix. The

distribution of 20 amino acids at each of the above mentioned 15 positions for

globular and membrane proteins is shown in Tables S8 and S9.

Statistical Methods

Distribution of amino acids, Positionwise propensity and Percent frequency

Distribution of 20 amino acids was computed for the 865 α-helices for the 15

positions in and around the α-helix. Positionwise propensities (Pij) and percent

frequencies for the 15 positions were calculated using the following formulae:

Positionwise Propensity: Pij = (nij/ni) / (Nj/N)

Positionwise Percent frequency = nij / ni

Where

nij = Number of amino acid ‘i’ at position ‘j’

Page 7 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

8

ni = Total number of amino acids of type ‘i’ in 75 membrane proteins

Nj = Number of amino acid at position ‘j’

N =Total number of amino acids in the 75 membrane proteins

Change in proportion

Change in proportion test was carried out as per the methodology followed Kumar

and coworkers35 to find out amino acids occurring significantly at a position. Change

in proportion for the ith amino acid at the jth position was considered to be significant

at 95 % confidence level if it is greater than twice the estimated standard deviation

(propij - propri) >2σij

Where:

propij = proportion of ith amino acid at the jth position

propri = proportion of ith amino acid in the reference distribution of amino acids

(distribution of amino acids in the 865 helices)

σij = sqrt (propavi (1-propavi) / ((1/Nj )+ (1/R)))

where :

propavi = average proportion of the ith amino acid

R = Total number of amino acids in the reference distribution

Significant occurrence of amino acids at a position

Propensity values for all amino acids showing significant occurrence (α=0.05) were

summed up and averaged. Thus, a propensity based cut-off value of 1.2 was obtained

for amino acids occurring in statistically significant amounts at a particular position in

helices in membrane proteins. The propensity cut-off has been indicated by a

horizontal line in each of the propensity plots.

Page 8 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

9

A propensity based cut-off of 1.4 was obtained following an identical methodology

for helices in globular proteins.

Two sample Z-test

The two sample Z-test was used to find out whether proportion of amino acids

occurring at a particular position in globular and membrane proteins were statistically

significant (α =0.05).

Z-statistic = (p1ij − p2ij) / (S(p1ij−p2ij ))

Where:

p1ij = Proportion of ith amino acid at the jth position in globular proteins.

p2ij = Proportion of ith amino acid at the jth position in membrane proteins.

S (p1ij−p2ij) = Standard error.

The Z-critical value for 19 degrees of freedom at 95% level of significance was 1.96.

The proportion of amino acids at a specific position in globular or membrane proteins

was considered to be significantly different from each other if the Z-statistic value

was >1.96. The amino acids showing this difference also differed in their propensity

values and have been marked with a ‘*’ above the propensity values in Figs. 1, 2a, 2b.

Hydrogen bond calculation

Hydrogen bond identification was carried out using HBPLUS v3.0644 employing

default parameters, for both membrane and globular proteins.

Visualization of protein structures

Visualization, structural analysis and creation of cartoon images of α-helices was

performed in Pymol45.

Page 9 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

10

Contact criteria for establishing inter-helical interactions

Contact criteria for hydrophobic interactions between side chain atoms of proteins

were followed as per the methodology used by Walters and coworkers46. Two atoms

‘a’ and ‘b’ were considered to be interacting if

Dab< (vdWa+vdWb) + 0.6Å

Where:

Dab : Distance between two interacting atoms ‘a’ and ‘b’

vdWa : Van der Waals radius of atom ‘a’

vdWb : Van der Waals radius of atom ‘b’

Division of the membrane protein dataset

Membrane proteins contain many long membrane spanning α-helices. However, apart

from these long α-helices, they also contain short or medium length α-helices which

terminate / initiate inside the membrane. The termini (N and/or C) of some of these

short/medium length α-helices face the hydrophobic membrane environment and

those of long transmembrane α-helices face a polar environment. In order to

understand the effect of the different external environments (hydrophobic and polar)

on the sequence preferences at the helix termini, the dataset of 865 α-helices was

divided into four subsets. We calculated the position of Cα atoms of N1 and/or C1

relative to the edge of the membrane boundary as defined by OPM database, to

classify helices as having ‘Both N and C-termini embedded’, ‘N-terminus embedded’,

‘C-terminus embedded’ and ‘Both N and C-termini protruding’. The number of

helices in each of the four sub-sets has been shown in Table I and Fig. S1.

Page 10 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

11

The helices mentioned in Table I were grouped based on the environment faced at

their N and C terminus (Fig. S1). The total number of residues at each position has

been shown in Table S10.

Page 11 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

12

RESULTS

Preferred residues at the N and C-termini of membrane proteins

The position-wise propensities of 20 amino acids that occur at each of the 15

positions in α-helices (defined in Methods) in membrane and globular proteins were

calculated and compared. Residues showing a marked occurrence at these positions

have been shown in Fig. 1.

Ncap, N1 and N2 are important positions as they facilitate the helix initiation and

propagation. Membrane proteins show an unusual preference for Glycine and

Histidine at Ncap, whereas globular proteins prefer Proline at this position. Globular

and membrane proteins also show common residue preferences at Ncap by preferring

‘good’ Ncaps such as Aspartic acid, Asparagine, Serine, and Threonine. Proline, a

well known ‘helix-initiator’ is preferred at the N1 position in both globular and

membrane protein datasets while it shows a higher occurrence at flanking positions

N’’, N’ and N2 in globular proteins.

Interestingly, membrane proteins also prefer Glutamine (Pij=1.3), Cysteine (Pij=1.3)

and Lysine (Pij=1.3) at Ccap position which are less preferred in globular proteins.

Glycine is highly preferred at the Ccap position in both globular and membrane

proteins as it can adopt a wide range of conformations (‘extended’ or ‘left handed’) to

terminate the helix34. The C’ position strongly prefers Proline in both datasets and

Aspartic acid in globular proteins only. Thus several positions at the N and C-termini

of α-helices generally show common residue preferences in membrane and globular

proteins, confirming the importance of these positions for helix initiation and

termination and the strict residue preferences therein.

Page 12 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

13

Role of the environment in helix initiation and termination

α-helices in membrane proteins experience varying environments on their surface,

such as the hydrophobic membrane core and the polar extracellular/cytoplasmic

region32. To check whether different environments have a role to play in the choice of

α-helix initiation and termination residues, the 865 α-helices were separated into four

classes as indicated in Table I and Fig. S11. Position-wise propensities for the 15

positions were now calculated for helices with their N or C-termini embedded within

the membrane and those with termini protruding from the membrane. These

preferences were again compared with preferences for the globular protein dataset.

Ncap, N1 and N2 positions show unique residue preferences when embedded

The Ncap position inside the membrane modulates its residue preferences so as to

curtail the energetic cost of charged residues occurring inside the membrane (Fig. 2a).

Aspartic acid, a commonly found Ncap residue in previous analyses34,35 is preferred

only in membrane protein helices with N-termini outside and in globular proteins.

However, it is avoided at the Ncap position when the helix terminus is embedded in

the membrane and ranks low with a propensity of 0.73 due to its high energetic cost

of insertion in the bilayer47-49. Relatively uncommon Ncaps such as Glycine, Histidine

and Proline are preferred at the Ncap position when it is embedded in the membrane.

Polar but uncharged residues like Asparagine, Serine and Threonine are however

commonly seen at the Ncap positions in all three datasets.

The membrane environment is highly hydrophobic in nature and is known to avoid

charged residues in the core region33. Hence helices with the Ncap position inside the

membrane select mildly polar residues like (Asparagine, Serine and Threonine) and

Page 13 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

14

Glycine in lieu of the charged Aspartic acid thus lowering the energetic cost of

residue insertion in the membrane.

N1 and N2 positions in membrane helices are selective in their residue preferences

The N1 position shows a high preference for Proline in all three datasets and confirms

its well known role as a ‘helix initiator’28,34,47 (Fig. 2a).The membrane embedded N-

termini also show a preference for bulky hydrophobic and aromatic residues such as

Tryptophan, Isoleucine, Leucine, and Phenylalanine.

Globular proteins prefer charged amino acids such as Aspartic acid (Pij=1.96) and

Glutamic acid (Pij=1.4) at the N2 position, while these are less preferred inside the

membrane (Propensities 0.12 and 1 for Aspartic and Glutamic acid respectively) (Fig.

2a). Hence, the membrane environment plays a role in governing the sequence

preferences at the N1 and N2 positions which are important for the stability and

propagation of the α-helix.

C2, C1, Ccap and C’ positions show distinct sequence preferences inside the

membrane

The helical and near-helical positions at the C-termini also fine-tune their residue

preferences in accordance with the external helix environment (hydrophobic or polar)

as observed earlier for the Ncap, N1 and N2 positions. Thus, only hydrophobic amino

acids like Leucine, Isoleucine and aromatic amino acids like Phenylalanine are

preferred at C2 and C1 positions inside the membrane whereas charged amino acids

such as Lysine and Arginine are preferred outside the membrane.

The Ccap position inside the membrane prefers a relatively uncommon hydrophobic

amino acid like Isoleucine and aromatic amino acids like Phenylalanine and Histidine

Page 14 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

15

(Fig. 2b). Glycine, a ‘good’ Ccap residue shows strong preference for the Ccap

position in helices with termini ‘protruding’ outside the membrane as well as globular

proteins however; it shows a comparatively lesser affinity for Ccap in the membrane

proteins with C-termini embedded in the membrane. This decreased preference of

Glycine is compensated by the presence of polar but uncharged amino acids Serine

and Threonine, which are preferred at the Ccap inside the membrane but avoided

outside the membrane and in globular proteins.

The positively charged Lysine is avoided at Ccap positions inside the membrane and

in globular proteins but it is preferred at Ccap positions outside the membrane (Fig.

2b).

Asparagine, a favored Ccap residue is also avoided at Ccap position inside the

membrane but it is strongly preferred outside the membrane and in globular proteins.

Proline is also avoided at Ccap inside as well as outside the membrane but it is

preferred in globular proteins (Fig. 2b).

C’ is an important near-helical position as it helps to propagate the break in the helix,

induced by the Ccap position. The C’ position prefers the charged Lysine in helices

terminating outside the membrane but avoids it inside the hydrophobic environment

of the membrane. Proline is preferred in all three datasets at this position as it is

known to terminate the helix, one or two residues before its occurrence (Fig. 2b).

Thus, the membrane environment plays a role in regulating the sequence preferences

at the helical (C2, C1) as well as the near-helical Ccap and C’ positions which signal

the termination of the α-helix respectively.

Short linkers between transmembrane α-helices show sequence and

conformational preferences

Page 15 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

16

Successive transmembrane α-helices in membrane proteins are connected by short

linker regions which can take up a particular ‘turn’ type conformation or may be

present as ‘unstructured loops’. Residues present in short linkers (three and four

residues) perform the dual role of inducing a turn and also capping preceding and

succeeding α-helices. We have identified and analyzed sequence and conformational

preferences of 23 linkers of three and 29 linkers of four residue length found in the

dataset of 75 membrane proteins.

Residue preferences in three and four residue linkers

In three residue linkers, the Cc, C’ and C’’ positions of helix 1 overlap with the N’’,

N’ and Nc position of helix 2 respectively (Fig. 4a). Glycine and Lysine are preferred

at the first position (Cc / N’’) of the three residue linker. The second position (C’ / N’)

prefers ‘turn inducing’ Proline50 and Arginine, whereas the third position (C’’ / N’)

prefers Asparagine and Glycine. In the four residue linkers, the near-helical regions at

the C-terminus of helix 1 and those at the N-terminus of helix 2 overlap at the second

(C’ / N’’) and third linker positions (C’’ / N’). These linkers prefer Proline, Glycine

and Lysine at the second position (C’ / N’’), while Proline and Arginine are preferred

at the third position (C’’ / N’) (Fig 4b).

Amino acids such as Glycine, Proline, Histidine, Asparagine and Serine show high

propensities for each of the near-helical positions at the N and C-terminus in

individual helices, as seen in Fig 1, while ‘turn inducing’ Proline, Glycine and

charged Arginine, Lysine are preferred in both three and four residue linkers. Each of

the near-helical positions (C’, C’’, N’, N’’) show a high propensity for Histidine in

individual helices, but it is avoided when these positions overlap in the linker regions.

Page 16 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

17

A similar analysis of short linkers connecting α-helices in globular proteins (84 three

residue and 75 four residue linkers) shows that only Glycine has a strong preference

at the first overlapping position of the three residue linkers, while Leucine and

Isoleucine are preferred at the second position. Serine, Aspartic acid and Proline are

preferred at the third position (data not shown). In four residue linkers, the second

position prefers Proline, Glycine, Glutamic acid and Threonine, while hydrophobic

Alanine and Valine are preferred in the third position. Thus apart from Glycine and

Proline, three and four residue linkers favor charged amino acids at overlapping

positions in membrane proteins, while hydrophobic and polar residues are preferred in

globular proteins.

Conformational preferences in three and four residue linkers

Out of the 23 three residue linkers, 9 terminate their preceding α-helices with a

Schellman motif (Table S1) and show conformational clustering with backbone

torsion angles (φ-ψ) at Ccap corresponding to the left handed α-helical conformation

and ‘Extended’ conformation at their C’ and C’’ positions (Figs. S2 and S3) as noted

previously by Engel and coworkers51. These conformationally clustered linkers

superpose with a backbone RMSD of 0.37Å. The (φ-ψ) distribution of the 29 four-

residue linkers shows poor superposition and conformational clustering (Figs. S2 and

S3).

Hydrogen bond formation involving top ranked Ncap and Ccap residues

An α-helix is characterized by (NH) 5→1 (O=C) hydrogen bonds between the

backbone amide (NH) and carbonyl (CO) groups of the polypeptide chain. However,

due to this hydrogen bond pattern, four NH groups in the first and four CO groups in

Page 17 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

18

the last turn of the α-helix are not hydrogen bonded. These free NH and CO groups

are often ‘capped’52 by side chains of amino acids at the Ncap and Ccap positions as

well as the flanking near-helical region at the N and C terminus respectively giving

rise to characteristic motifs with specific hydrogen bond and/or backbone (φ-ψ)

patterns.

Top ranked amino acids at the Ncap and Ccap position (propensity >1.2) (see

Methods) were analyzed for their side chain to hydrogen bond to the backbone NH

and CO groups in the first and last α-helical turns respectively. The percentage

composition of top ranked amino acids at the Ncap and Ccap position in globular

(propensity >1.4) and membrane proteins (propensity >1.2) is shown in Table S2. The

Ncap position has equal percentage of amino acids (~55%) that are capable of

accepting a hydrogen bond with their side chain in both globular and membrane

proteins. However, at the Ccap position the occurrence of amino acids with side chain

hydrogen bond donor atoms is substantially less in membrane proteins (~19%) as

compared to globular proteins (~58%).

Hydrogen bond formation of top ranked Ncap residues

The hydrogen bond formation of top ranked Ncap residues (with side chain acceptors)

in globular and membrane proteins has been shown in Table S3. In both datasets, a

majority of the Ncap residues hydrogen bond with the amino group at the N3 position

except for Histidine which prefers to hydrogen bond with the N2 position. Despite

being a top ranked Ncap residue in both datasets, Aspartic acid forms considerably

less number of hydrogen bonds in membrane proteins as compared to globular

proteins.

Page 18 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

19

When the maximally preferred Ncap residues in membrane proteins (shown in Table

S3) are subdivided into Ncaps that are ‘embedded’ and ‘protruding’ out of the

membrane, it is found that Aspartic acid, Asparagine and Histidine residues show

considerably higher frequency of hydrogen bond formation when they are located

within the membrane (Table S4). However, Serine and Threonine show similar

frequencies (~75% to 85%) in both datasets. The dataset for ‘Embedded Ncap

residues’ is small, however it is interesting to note that all 7 Aspartic acid residues

inside the membrane form an intra-helical backbone hydrogen bond.

Hydrogen bond formation of top ranked Ccap residues

The side chains of top ranked Ccap residues show a high preference to form hydrogen

bond with carbonyl oxygen of amino acids at C3 and C4 positions in both datasets

(Table S5). Glutamine and Lysine are preferred Ccaps in membrane proteins and form

hydrogen bonds in 41% and 20% of the cases respectively. Glutamine, a less

frequently observed residue in previous analyses34-37 is highly preferred at Ccap

position in membrane proteins only. However, it shows notably lesser number of

hydrogen bonds in this case as compared to globular proteins. A visual inspection of

the helices with Lysine at Ccap reveals that all these helices terminate at the

interfacial region of the membrane which prefers charged and aromatic residues. 26

(59%) of these helices terminate at the inner side of the membrane emphasizing the

role of the ‘positive inside’18 rule in the distribution of Lysine at Ccap position. Thus,

sequence preference of Lysine at Ccap is dictated by the membrane environment.

Tyrosine, a preferred Ccap only in globular proteins (not shown in Table S5) shows

no hydrogen bond formation to the residues in the last turn. The absence of hydrogen

bond involving Lysine and Tyrosine is due to the conformational restrictions on their

Page 19 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

20

long (aliphatic) and aromatic side chain to fold back and form such intra-helical

hydrogen bonds.

The hydrogen bond formation of top ranked Ccap residues in membrane proteins after

subdivision into Ccaps that are ‘embedded’ and ‘protruding’ out of the membrane has

been shown in Table S6.The Ccap residues do not show a substantial difference in

hydrogen bond patterns inside and outside the membrane.

Helix capping at the N and C-termini of α-helices

Motifs at the N-terminus

The capping box and β-box motif are two commonly observed capping motifs at the

N-terminus of α-helices and have been illustrated in Fig. S4. Their frequency of

occurrence has been shown in Table II. The capping box involves a hydrogen bond

between the free amide of N3 residue and side chain acceptor of the Ncap and a

reciprocal hydrogen bond between the free amide of the Ncap and the side chain of

the N3 residue34,53 In this study, an N-terminal motif is defined as a capping box even

if only one of the above mentioned hydrogen bonds is seen53. A β-box motif is

characterized by the presence of N2(mc)→Ncap(sc) hydrogen bond at the N-terminus

of the α-helix34.

Helices in both globular as well as membrane proteins prefer the capping box motif

over the β-box (Table II).However, the percentage of all capping motifs in globular

(47%) and membrane proteins (51%) is comparable, though the small number of

helices with their N-terminus embedded inside the membrane show a lower

percentage of capping motifs (30.8%). α-helices prefer ‘good’ Ncap (Aspartic acid,

Asparagine, Serine and Threonine) and N3 (Glutamic acid, Aspartic acid and

Glutamine) residues in motif forming (capping box, β box) and ‘non-motif forming’

Page 20 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

21

helices in membrane proteins. Thus, the presence of favored Ncap and N3 residues is

not correlated to the presence of any particular motif at the N-terminus of α-helices in

membrane proteins. Other capping motifs such as the Big-box, α-box, β’-box, α’-box,

α- β box and the Cap’-box are completely absent in membrane proteins, and even

their total count is less than 10 in the globular proteins datasets. Thus, they have not

been listed in the table.

Motifs at the C-terminus

Helices at the C-terminus are capped by motifs that have characteristic conformations

as indicated by their (φ-ψ) patterns (Fig. S5) as against the N-terminus which is

capped by hydrogen bonds between free main chain amide groups as donors and side

chain of amino acids as acceptors34. The frequency of occurrence of various helix

termination motifs observed at the C-termini of α-helices in globular and membrane

proteins has been listed in Table III.

The glycine-Schellman motif is one of the commonly found motifs in both datasets

and terminates 19.3% helices in globular and 14.3% helices in membrane proteins.

However in membrane proteins, 110 out of 118 (93.2%) of these helices form

Schellman motifs outside the membrane thus confirming the strong inclination of this

motif to be present outside the membrane.

The non-glycine Schellman motif also shows comparable abundance in globular and

membrane protein datasets. Interestingly, these occur more frequently inside the

membrane (21 out of 63 occurrences) as compared to glycine-Schellman motifs (8 out

of 118 occurrences). Visual inspection of all the non-glycine Schellman motifs shows

that they occur near the interfacial region (data not shown) and prefer aromatic and

charged residues at Ccap.

Page 21 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

22

The right handed-Schellman motif and the α-L motif show comparable occurrences

individually in both datasets, but show a bias towards terminating helices outside the

membrane, as seen in the case of glycine Schellman motif (Table III).

Thus, most glycines with left-handed torsions at Ccap inside the membrane are

involved in a Schellman motif to terminate the α-helix.

The glycine Schellman, non-glycine Schellman as well as α-L motifs show a

preference for the hydrophobic Alanine and Leucine at C2, C1 and C’ positions. In

addition to these amino acids, the C’ position prefers Serine in α-L, Glycine in glycine

Schellman and Valine in non-glycine Schellman motifs. Asparagine and Glycine are

preferred at C’’ position in the α-L motif whereas both glycine and non-glycine

Schellman motifs prefer Proline and Phenylalanine at C’’ position.

The Pro-C’ motif occurs equally in globular (8.8%) and membrane proteins (8.8%). In

membrane proteins, out of the total 93 Proline residues present at the C’ position, 69

(74%) form a Pro-C’ motif. Proline at the C’ position also ‘caps’ the α-helix by

forming a hydrogen bond through its Cα-H atom with the free backbone carbonyl

atom of the last turn of the α-helix54 (Fig. 5). These Proline Cα -H…O hydrogen bonds

occur in 13 (19%) cases, only one of which occurs inside the membrane.

The ‘Extended Ccap’ motif causes helix termination in 27% and 29% of helices in

globular and membrane proteins respectively and is the most common helix

termination motif observed in both datasets.

Helix-helix interactions are involved in stabilization of helix termini

Apart from the well known capping motifs at the helix termini, the ends of α-helices

are also stabilized by inter-helical hydrogen bonds and hydrophobic interactions

contributed by from the tertiary structure of the protein (neighboring α-helices and

Page 22 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

23

loop regions). In certain instances, these helix termini are stabilized by interactions

(hydrogen bonds, hydrophobic interactions) contributed by more than one α-helix in

the vicinity of a helix terminus.

In membrane proteins, 406 (~47%) α-helices do not form any previously reported

motif at their N-terminus. These termini interact with a total of 1166 residues which

are spatially proximal, 64% of which belong to α-helices. 43% of these ‘termini-

stabilizing’ helix-helix interactions occur inside the membrane and 21% occur outside

the membrane (Fig. S7). Correspondingly, 212 (~25%) α-helices that do not form any

well known motif at the C-terminus are stabilized by a total of 1056 residues in their

vicinity. 67% of these residues belong to α-helices, 41% and 26% of which involve

interactions occurring inside and outside the membrane respectively.

In globular proteins, 29% and 33% of the interactions that help in stabilizing 1396 and

370 ‘non-motif forming’ α-helices at their N and C-termini respectively, arise from

helix-helix interactions.

Thus, a larger number of helix-helix interactions occur between closely packed α-

helices inside the membrane and play a major role in the stabilization of helix termini,

especially inside the membrane.

Validation of position-specific residue analysis: A GPCR case study

G-protein coupled receptors (GPCRs) represent one of the most widely studied

families of membrane proteins as they play critical roles in processes such as signal

transduction55, maintaining cellular homeostasis56 and neurological57 as well as

reproductive physiological processes58; thus making them one of the major drug

targets59,60.

Page 23 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

24

Schlinkmann and co-workers recently developed an in-vitro directed evolution system

for a GPCR variant rat neurotensin receptor 1-D03 (rNTR1-D03) to enhance its

biosynthesis, detergent stability and functionality61. In this study, ultradeep

sequencing was used to uncover amino acids that are ‘not-acceptable’, ‘acceptable’

and ‘preferred’ for each position in the rNTR1-D03 protein. The amino acid

preferences from the ultradeep sequencing have been used to compare and validate

the sequence signatures deduced from the current statistical analysis of crystal

structures. Preferences from ultradeep sequencing reveal the presence of similar

amino acids at Ncap, N1, N2, Ccap, C’ positions to those obtained in the present

study (Table IV) and thus emphasize the strong residue preferences at these positions.

Amino acids in the third transmembrane α-helix (TM3) and its Ccap, C’ positions;

and those at the Ncap, N1 positions of the sixth transmembrane helix (TM6) are

known to contact the G-protein to assist in the signal transduction and thus are

functionally important55,61. Among the G-protein interacting residues, those occurring

at the helix termini show conservation by ultradeep sequencing and also match with

the preferences seen in the current propensity based analysis (Table S7). Hence,

analogous to the helix termini, functionally important positions also show selectivity

in residue preferences.

Page 24 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

25

Discussion

Sequence and motif preferences at α-helix termini for globular proteins have been

extensively studied in the past few decades34-37,53,62-64. However, to the best of our

knowledge, such studies have not been carried out for a high resolution dataset of

membrane proteins. Depth dependant propensities for amino acids in a 19 mer

synthetic peptide and a comparison with results from a smaller membrane protein

dataset have been reported by Hessa et al65. Here, we have analyzed in detail the

residue preferences at the N and C-termini of α-helices in helical membrane proteins,

particularly their dependence on the environment of the helix. We find that some of

the amino acids are highly preferred at Ncap (Asparagine, Aspartic acid, Serine,

Threonine), N1 (Proline), Ccap (Glycine) and C’ (Proline) positions are similar in

both globular and membrane proteins. In addition to these, Glycine and Histidine are

often observed at Ncap position, while Cysteine, Glutamine and Lysine occur at Ccap

position in membrane protein α-helices, though they are not preferred in globular

proteins. The MID region of the α-helices in both globular and membrane proteins

prefers hydrophobic amino acids such as Leucine, Isoleucine and Valine. In

membrane protein helices, these amino acids play a functional role by interacting with

the alkyl tails of lipids in the bilayer66,67.

A large number of transmembrane helices are short in length and initiate / terminate

within the bilayer. These helices with their termini embedded in the membrane

display unique selectivity in their residue preferences. Serine, Threonine, Asparagine,

Histidine, Proline and Glycine are preferred at the Ncap inside the membrane while

Aspartic acid is avoided to reduce the energetic cost of its insertion into the

hydrophobic environment47-49. The presence of Glycine and Proline at the Ncap

position inside the membrane is explained by their ability to distort longer

Page 25 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

26

transmembrane helices into shorter fragments possibly for helix packing and

functional reasons23,68,69. Inside the membrane, the N1 position prefers hydrophobic

residues (Isoleucine, Leucine, Valine and Tryptophan) to start the helix apart from

Proline. The N2 position is important for the stability of the nascent α-helix and the

amino acids preferred at this position (Aspartic acid, Glutamic acid, Threonine)

stabilize the helix by forming hydrogen bonds through their side chains with the free

main chain amide groups of the first α-helical turn63,70. This position prefers aromatic

residues (Phenylalanine, Tryptophan) and Proline when embedded inside the

membrane as opposed to polar residues (Aspartic acid and Glutamic acid) in when it

is outside the membrane. Apart from the usually preferred Glycine19,33,34, the Ccap

position inside the membrane is more permissive in its sequence preferences as it

favors a range of amino acids, such as Isoleucine, Serine ,Threonine, Histidine and

Phenylalanine. The varying amino acid requirements at the Ncap, N1, N2 as well as

Ccap and C’ positions inside and outside the membrane show that the sequence

preferences are governed by the helix initiating and terminating property as well as

the external environment of the helix (Fig. 3).

Short linker regions (three and four residue length) play a role in the organization of

the connected transmembrane helices71 and thus have stringent sequence preferences

therein72,73. Sequence analysis of these linker regions between α-helices in membrane

and globular proteins reveals selectivity in residue preferences at overlapping linker

positions. The preferred residues (Glycine and Proline) serve to cap the helix at their

termini34-36,74 as well as induce a ‘sharp turn’32,44,45 to redirect the polypeptide chain

into the membrane. The linkers in membrane proteins also prefer the charged

Arginine and Lysine as well as polar Asparagine at certain overlapping positions. As

the linkers are present at the membrane interface, membrane proteins have the added

Page 26 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

27

choice of inserting charged and polar amino acids along with Glycine and Proline

which are also observed at overlapping positions within linkers in globular proteins.

Nine Schellman motif forming three residue linkers also show backbone

conformational clustering by taking up an ‘α-L-extended-extended’ conformation17,51.

In spite of similar residue preferences at Ncap position in globular and membrane

proteins, Aspartic acid shows fewer intra-helical hydrogen bonds involving its side

chain acceptor atoms in membrane protein helices. However, an examination of

hydrogen bonds involving Ncap residues inside the membrane reveals that Aspartic

acid, Asparagine and Histidine show higher frequency of hydrogen bonds when

embedded inside the membrane. At Ccap, Lysine and Glutamine are highly preferred

but do not participate in intra-helical hydrogen bonds due to conformational

constraints in their side chains. The interfacial region of membrane proteins prefers

charged and aromatic residues which interact with the polar lipid head groups20-22.

The preference of Lysine and Glutamine at the Ccap position in helices terminating at

the interfacial region highlights the role of the membrane environment in preferring

these residues at the Ccap position.

As noted previously in globular proteins34, membrane proteins too show higher

percentage of intra-helical hydrogen bond formation at the Ncap as compared to the

Ccap position. This reiterates the fact that intra-helical hydrogen bonds involving side

chains of Ncap residues stabilize the N-terminus, whereas additional hydrogen bonds

involving flanking residues and hydrophobic interactions are involved in the

formation of structural motifs at the C-termini of α-helices34.

At the N-terminus, the capping box is preferred over the β-box in both globular and

membrane proteins as it has reciprocal hydrogen bonds between main and side chains

of Ncap and N3 which impart more stability to the helix.

Page 27 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

28

We find higher occurrence of capping motifs in globular (84.6%) as compared to

membrane proteins (74.2%) at the C-terminus. As observed in previous analyses34-

36,74 Glycine is the most frequently occurring Ccap residue in both globular and

membrane proteins (Fig. 1). Most of these glycine residues take up an ‘α-L’

conformation at the Ccap and form Glycine Schellman and α-L-motifs which are

particularly important in membrane proteins as they help to fold the polypeptide chain

upon itself to form a helix hairpin and reinsert it into the membrane to maintain the

‘helix bundle’ type of architecture seen in α-helical integral membrane proteins.

Inside the membrane, only 33 (17%) helices terminate in different Schellman motifs

(glycine, non-glycine and right handed), all of them near the interfacial region and 25

(76%) of them prefer non-glycine (aromatic and charged) Ccaps showing that the

membrane environment dictates sequence preferences in Schellman motifs inside the

membrane. The presence of non-glycine residues at the Ccap position to take up ‘α-L’

conformations and form Schellman motifs indicates that the need for motif formation

overrides the sequence preference at Ccap.

Capping motifs at the N and C-termini of α-helices serve as helix ‘start’ and ‘stop’

signals and prevent the helix from ‘fraying’. Apart from the commonly observed and

well studied capping motifs, we find that helix-helix interactions also play a

substantial role in capping/stabilizing the helix-termini within the membrane. These

interactions thereby compensate for the shortage of capping motifs inside the

membrane .Thus, the external environment and the fold of the membrane protein play

a role in determining sequence and motif preferences at the helix termini in membrane

proteins.

Helices in membrane proteins occasionally ‘unwind’ inside the hydrophobic bilayer

in order to perform a specific function (Fig. S6) as seen in the case of the Calcium

Page 28 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

29

pump75 and the Leucine transporter76. A few studies have also found helices to be

distorted or kinked for packing46 as well as evolutionary reasons26. The functional

nature of membrane proteins involves large amounts of helix-helix dynamics23,77 and

interactions46,78. This probably accounts for the smaller number of stable capping

motifs at helix termini inside the membrane.

GPCRs constitute the largest superfamily of proteins in mammalian genomes.

However, proteins in this superfamily display high sequence diversity, with few

residues being strictly conserved, as indicated by multiple sequence alignment36,61 and

analysis of crystal structures79. Despite this distinctive feature of protein sequences in

this family, signature amino acids at helical (N1, N2, C1) and near-helical positions

(Ncap, Ccap and C’ ) are evolutionarily conserved within rNTR1-D03 protein60, as

well as in other protein sequences within the GPCR family80. These residue

preferences are akin to the favored amino acids at the corresponding positions from

our current analysis. The conservation of functionally important amino acids in the

helical and near-helical positions of the third and sixth transmembrane helices of the

rNTR1-D03 protein and their conformity with the residue preferences determined in

our analysis highlights the inclination of these positions to prefer only a few favored

amino acids. The relevance of selective residue preferences at the above mentioned

positions is further confirmed by several mutational studies in membrane proteins

which show that mutations in these positions cause loss of function81-83, decrease in

binding affinity to the ligand84-86 or the G-protein87,88 thereby leading to protein

misfolding89 or diseases90.

Our study shows that residues preferred at the capping positions in helix termini in

membrane proteins are comparable to those seen in globular proteins. However, a

subdivision of the membrane protein dataset reveals differences in sequence

Page 29 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

30

preferences at helix termini, depending on whether they are embedded within or

protruding outside the membrane. The inclination of peculiar amino acids to be

present at the C2, C1, Ccap, C’ and C’’ positions at the C-terminus in motif forming

helices reveals a sequence dependant structural preference that would aid in the

design of specific motifs in α-helical membrane proteins.

Membrane proteins optimize biophysical and chemical constraints of the external

environment to strategically place select amino acids at crucial helical and near-

helical positions that act as helix ‘start‘ and ‘stop‘ signals. These amino acid

preferences in turn govern the type of capping motifs formed at the helix termini. In

summary this study reveals that stabilization of helix termini in membrane proteins

depends upon the interplay of amino acid preferences at specific positions, their role

in formation of capping motifs and helix-helix interactions which determine the fold

of the protein. Insights from our study will be helpful in delineating α-helix

boundaries for newly solved membrane protein structures at low resolution. These

results would also assist in fine tuning computational tools used to model and

rationally design membrane proteins.

Page 30 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

31

ACKNOWLEDGEMENTS

This work was supported by Department of Science and Technology (DST) and

Department of Biotechnology (DBT), India.

CONFLICT OF INTEREST

The authors declare no conflict of interest in this work.

Page 31 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

32

REFERENCES 1. Arinaminpathy Y, Khurana E, Engelman DM, Gerstein MB. Computational

analysis of membrane proteins: the largest class of drug targets. Drug Discov Today 2009;14(23-24):1130-1135.

2. Boyd D, Schierle C, Beckwith J. How many membrane proteins are there? Protein Science 1998;7(1):201-205.

3. Fagerberg L, Jonasson K, von Heijne G, Uhlén M, Berglund L. Prediction of the human membrane proteome. Proteomics 2010;10(6):1141-1149.

4. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res 2000;28(1):235-242.

5. Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS. Three- dimensional structures of membrane proteins from genomic sequencing. Cell 2012;149(7):1607-1621.

6. Hennerdal A, Elofsson A. Rapid membrane protein topology prediction. Bioinformatics 2011;27(9):1322-1323.

7. Rapp M, Drew D, Daley DO, Nilsson J, Carvalho T, Melen K, De Gier JW, Von Heijne G. Experimentally based topology models for E. coli inner membrane proteins. Protein Sci 2004;13(4):937-945.

8. Mirzadegan T, Benko G, Filipek S, Palczewski K. Sequence analyses of G- protein-coupled receptors: similarities to rhodopsin. Biochemistry 2003;42(10):2759-2767.

9. Perez-Aguilar JM, Saven JG. Computational design of membrane proteins. Structure 2012;20(1):5-14.

10. Senes A. Computational design of membrane proteins. Curr Opin Struct Biol 2011;21(4):460-466.

11. Cherezov V, Abola E, Stevens RC. Recent progress in the structure determination of GPCRs, a membrane protein family with high potential as pharmaceutical targets. Methods Mol Biol 2010;654:141-168.

12. White SH. The progress of membrane protein structure determination. Protein Sci 2004;13(7):1948-1949.

13. Bill RM, Henderson PJ, Iwata S, Kunji ER, Michel H, Neutze R, Newstead S, Poolman B, Tate CG, Vogel H. Overcoming barriers to membrane protein structure determination. Nat Biotechnol 2011;29(4):335-340.

14. Goldie KN, Abeyrathne P, Kebbel F, Chami M, Ringler P, Stahlberg H. Cryo- electron microscopy of membrane proteins. Methods Mol Biol 2014;1117:325-341.

15. Hu M, Vink M, Kim C, Derr K, Koss J, D'Amico K, Cheng A, Pulokas J, Ubarretxena-Belandia I, Stokes D. Automated electron microscopy for evaluating two-dimensional crystallization of membrane proteins. J Struct Biol 2010;171(1):102-110.

16. White SH. Biophysical dissection of membrane proteins. Nature 2009;459(7245):344-346.

17. Argos P, Rao JK, Hargrave PA. Structural prediction of membrane-bound proteins. Eur J Biochem 1982;128(2-3):565-575.

18. von Heijne G. Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol 1992;225(2):487-494.

Page 32 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

33

19. Jin W, Takada S. Asymmetry in membrane protein sequence and structure: glycine outside rule. J Mol Biol 2008;377(1):74-82.

20. Yau WM, Wimley WC, Gawrisch K, White SH. The preference of tryptophan for membrane interfaces. Biochemistry 1998;37(42):14713-14718.

21. Killian JA, von Heijne G. How proteins adapt to a membrane-water interface. Trends Biochem Sci 2000;25(9):429-434.

22. Norman KE, Nymeyer H. Indole localization in lipid membranes revealed by molecular simulation. Biophysical journal 2006;91(6):2046-2054.

23. Bright JN, Sansom MS. The flexing/twirling helix: exploring the flexibility about molecular hinges formed by proline and glycine motifs in transmembrane helices. The Journal of Physical Chemistry B 2003;107(2):627-636.

24. Cordes FS, Bright JN, Sansom MS. Proline-induced distortions of transmembrane helices. J Mol Biol 2002;323(5):951-960.

25. Screpanti E, Hunte C. Discontinuous membrane helices in transport proteins and their correlation with function. J Struct Biol 2007;159(2):261-267.

26. Yohannan S, Faham S, Yang D, Whitelegge JP, Bowie JU. The evolution of transmembrane helix kinks and the structural diversity of G protein-coupled receptors. Proc Natl Acad Sci U S A 2004;101(4):959-963.

27. Barth P, Wallner B, Baker D. Prediction of membrane protein structures with complex topologies using limited constraints. Proceedings of the National Academy of Sciences 2009;106(5):1409-1414.

28. Yarov‐Yarovoy V, Schonbrun J, Baker D. Multipass membrane protein structure prediction using Rosetta. PROTEINS: Structure, Function, and Bioinformatics 2006;62(4):1010-1025.

29. Polyansky AA, Chugunov AO, Vassilevski AA, Grishin EV, Efremov RG. Recent advances in computational modeling of alpha-helical membrane-active peptides. Curr Protein Pept Sci 2012;13(7):644-657.

30. Kufareva I, Katritch V, Stevens RC, Abagyan R. Advances in GPCR Modeling Evaluated by the GPCR Dock 2013 Assessment: Meeting New Challenges. Structure 2014;22(8):1120-1139.

31. Sadiq SK, Guixa-Gonzalez R, Dainese E, Pastor M, De Fabritiis G, Selent J. Molecular modeling and simulation of membrane lipid-mediated effects on GPCRs. Curr Med Chem 2013;20(1):22-38.

32. Baeza-Delgado C, Marti-Renom MA, Mingarro I. Structure-based statistical analysis of transmembrane helices. Eur Biophys J 2013;42(2-3):199-207.

33. Ulmschneider MB, Sansom MS. Amino acid distributions in integral membrane protein structures. Biochim Biophys Acta 2001;1512(1):1-14.

34. Aurora R, Rose GD. Helix capping. Protein Sci 1998;7(1):21-38. 35. Kumar S, Bansal M. Dissecting alpha-helices: position-specific analysis of

alpha-helices in globular proteins. Proteins 1998;31(4):460-476. 36. Presta LG, Rose GD. Helix signals in proteins. Science 1988;240(4859):1632-

1641. 37. Gunasekaran K, Nagarajaram H, Ramakrishnan C, Balaram P. Stereochemical

punctuation marks in protein structures: glycine and proline containing helix stop signals. J Mol Biol 1998;275(5):917-932.

38. Lahr SJ, Engel DE, Stayrook SE, Maglio O, North B, Geremia S, Lombardi A, DeGrado WF. Analysis and design of turns in α-helical hairpins. J Mol Biol 2005;346(5):1441-1454.

Page 33 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

34

39. Heinig M, Frishman D. STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res 2004;32(Web Server issue):W500-502.

40. Wang G, Dunbrack RL, Jr. PISCES: a protein sequence culling server. Bioinformatics 2003;19(12):1589-1591.

41. Lomize MA, Lomize AL, Pogozheva ID, Mosberg HI. OPM: orientations of proteins in membranes database. Bioinformatics 2006;22(5):623-625.

42. Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI. Positioning of proteins in membranes: a computational approach. Protein Science 2006;15(6):1318-1333.

43. Brenner SE, Koehl P, Levitt M. The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res 2000;28(1):254-256.

44. McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J Mol Biol 1994;238(5):777-793.

45. Schrodinger, LLC. The PyMOL Molecular Graphics System, Version 1.3r1. 2010.

46. Walters RF, DeGrado WF. Helix-packing motifs in membrane proteins. Proc Natl Acad Sci U S A 2006;103(37):13658-13663.

47. Kauko A, Hedin LE, Thebaud E, Cristobal S, Elofsson A, von Heijne G. Repositioning of transmembrane α-helices during membrane protein folding. J Mol Biol 2010;397(1):190-201.

48. Hedin LE, Ojemalm K, Bernsel A, Hennerdal A, Illergard K, Enquist K, Kauko A, Cristobal S, von Heijne G, Lerch-Bader M, Nilsson I, Elofsson A. Membrane insertion of marginally hydrophobic transmembrane helices depends on sequence context. J Mol Biol 2010;396(1):221-229.

49. Hessa T, Kim H, Bihlmaier K, Lundin C, Boekel J, Andersson H, Nilsson I, White SH, von Heijne G. Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature 2005;433(7024):377-381.

50. Nilsson I, von Heijne G. Breaking the camel's back: proline-induced turns in a model transmembrane helix. J Mol Biol 1998;284(4):1185-1189.

51. Engel DE, DeGrado WF. Alpha-alpha linking motifs and interhelical orientations. Proteins 2005;61(2):325-337.

52. Richardson JS, Richardson DC. Amino acid preferences for specific locations at the ends of alpha helices. Science 1988;240(4859):1648-1652.

53. Shen Y, Bax A. Identification of helix capping and b-turn motifs from NMR chemical shifts. J Biomol NMR 2012;52(3):211-232.

54. Ashish S, Prasun K, Manju B. Defining α-helix geometry by Cα atom trace vs (φ-ψ) torsion angles: a comparative analysis. In: Srinivasan MBaN, editor. Biomolecular Forms and Functions: Biomolecular Forms and Functions,IISc Press; 2013. p 116-127.

55. White JF, Noinaj N, Shibata Y, Love J, Kloss B, Xu F, Gvozdenovic-Jeremic J, Shah P, Shiloach J, Tate CG, Grisshammer R. Structure of the agonist-bound neurotensin receptor. Nature 2012;490(7421):508-513.

56. Hazell GG, Hindmarch CC, Pope GR, Roper JA, Lightman SL, Murphy D, O'Carroll AM, Lolait SJ. G protein-coupled receptors in the hypothalamic paraventricular and supraoptic nuclei--serpentine gateways to neuroendocrine homeostasis. Front Neuroendocrinol 2012;33(1):45-66.

57. Chien EY, Liu W, Zhao Q, Katritch V, Han GW, Hanson MA, Shi L, Newman AH, Javitch JA, Cherezov V, Stevens RC. Structure of the human

Page 34 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

35

dopamine D3 receptor in complex with a D2/D3 selective antagonist. Science 2010;330(6007):1091-1095.

58. Kruse AC, Ring AM, Manglik A, Hu J, Hu K, Eitel K, Hubner H, Pardon E, Valant C, Sexton PM, Christopoulos A, Felder CC, Gmeiner P, Steyaert J, Weis WI, Garcia KC, Wess J, Kobilka BK. Activation and allosteric modulation of a muscarinic acetylcholine receptor. Nature 2013;504(7478):101-106.

59. Stevens RC, Cherezov V, Katritch V, Abagyan R, Kuhn P, Rosen H, Wuthrich K. The GPCR Network: a large-scale collaboration to determine human GPCR structure and function. Nat Rev Drug Discov 2013;12(1):25-34.

60. Katritch V, Cherezov V, Stevens RC. Structure-function of the G protein- coupled receptor superfamily. Annu Rev Pharmacol Toxicol 2013;53:531-556.

61. Schlinkmann KM, Honegger A, Tureci E, Robison KE, Lipovsek D, Pluckthun A. Critical features for biosynthesis, stability, and functionality of a G protein-coupled receptor uncovered by all-versus-all mutations. Proc Natl Acad Sci U S A 2012;109(25):9810-9815.

62. Cheng RP, Weng YJ, Wang WR, Koyack MJ, Suzuki Y, Wu CH, Yang PA, Hsu HC, Kuo HT, Girinath P, Fang CJ. Helix formation and capping energetics of arginine analogs with varying side chain length. Amino Acids 2012;43(1):195-206.

63. Vasudev PG, Banerjee M, Ramakrishnan C, Balaram P. Asparagine and glutamine differ in their propensities to form specific side chain-backbone hydrogen bonded motifs in proteins. Proteins 2012;80(4):991-1002.

64. Dasgupta B, Dey S, Chakrabarti P. Water and side-chain embedded pi-turns. Biopolymers 2013.

65. Hessa T, Meindl-Beinker NM, Bernsel A, Kim H, Sato Y, Lerch-Bader M, Nilsson I, White SH, von Heijne G. Molecular code for transmembrane-helix recognition by the Sec61 translocon. Nature 2007;450(7172):1026-1030.

66. Stansfeld PJ, Jefferys EE, Sansom MS. Multiscale simulations reveal conserved patterns of lipid interactions with aquaporins. Structure 2013;21(5):810-819.

67. Lee AG. Biological membranes: the importance of molecular detail. Trends Biochem Sci 2011;36(9):493-500.

68. Senes A, Gerstein M, Engelman DM. Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with β-branched residues at neighboring positions. J Mol Biol 2000;296(3):921-936.

69. Curran AR, Engelman DM. Sequence motifs, polar interactions and conformational changes in helical membrane proteins. Curr Opin Struct Biol 2003;13(4):412-417.

70. Penel S, Hughes E, Doig AJ. Side-chain structures in the first turn of the α- helix. J Mol Biol 1999;287(1):127-143.

71. Dmitriev OY, Fillingame RH. The rigid connecting loop stabilizes hairpin folding of the two helices of the ATP synthase subunit c. Protein Science 2007;16(10):2118-2122.

72. Tastan O, Klein-Seetharaman J, Meirovitch H. The Effect of Loops on the Structural Organization of< i> α</i>-Helical Membrane Proteins. Biophysical journal 2009;96(6):2299-2312.

73. Kadota K, Hirokawa T, Mitaku S. Position Dependent Amino Acid Propensity in the Transmembrane Region for Topology Prediction of Membrane Proteins.

Page 35 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

36

74. Sun H, Greathouse DV, Andersen OS, Koeppe RE, 2nd. The preference of tryptophan for membrane interfaces: insights from N-methylation of tryptophans in gramicidin channels. J Biol Chem 2008;283(32):22233-22243.

75. Toyoshima C, Nakasako M, Nomura H, Ogawa H. Crystal structure of the calcium pump of sarcoplasmic reticulum at 2.6 A resolution. Nature 2000;405(6787):647-655.

76. Yamashita A, Singh SK, Kawate T, Jin Y, Gouaux E. Crystal structure of a bacterial homologue of Na+/Cl--dependent neurotransmitter transporters. Nature 2005;437(7056):215-223.

77. Langosch D, Arkin IT. Interaction and conformational dynamics of membrane-spanning protein helices. Protein Sci 2009;18(7):1343-1358.

78. Bondar AN, White SH. Hydrogen bond dynamics in membrane protein function. Biochim Biophys Acta 2012;1818(4):942-950.

79. Venkatakrishnan AJ, Deupi X, Lebon G, Tate CG, Schertler GF, Babu MM. Molecular signatures of G-protein-coupled receptors. Nature 2013;494(7436):185-194.

80. Vroling B, Sanders M, Baakman C, Borrmann A, Verhoeven S, Klomp J, Oliveira L, de Vlieg J, Vriend G. GPCRDB: information system for G protein- coupled receptors. Nucleic Acids Res 2011;39(Database issue):D309-319.

81. Ballesteros JA, Shi L, Javitch JA. Structural mimicry in G protein-coupled receptors: implications of the high-resolution structure of rhodopsin for structure-function analysis of rhodopsin-like receptors. Mol Pharmacol 2001;60(1):1-19.

82. Blin N, Yun J, Wess J. Mapping of single amino acid residues required for selective activation of Gq/11 by the m3 muscarinic acetylcholine receptor. J Biol Chem 1995;270(30):17741-17748.

83. Cho W, Taylor LP, Akil H. Mutagenesis of residues adjacent to transmembrane prolines alters D1 dopamine receptor binding and signal transduction. Mol Pharmacol 1996;50(5):1338-1345.

84. Lundstrom K, Turpin MP, Large C, Robertson G, Thomas P, Lewell XQ. Mapping of dopamine D3 receptor binding site by pharmacological characterization of mutants expressed in CHO cells with the Semliki Forest virus system. J Recept Signal Transduct Res 1998;18(2-3):133-150.

85. Valiquette M, Vu HK, Yue SY, Wahlestedt C, Walker P. Involvement of Trp- 284, Val-296, and Val-297 of the human delta-opioid receptor in binding of delta-selective ligands. J Biol Chem 1996;271(31):18789-18796.

86. Pollock NJ, Manelli AM, Hutchins CW, Steffey ME, MacKenzie RG, Frail DE. Serine mutations in transmembrane V of the dopamine D1 receptor affect ligand interactions and receptor activation. J Biol Chem 1992;267(25):17780- 17786.

87. Kostenis E, Conklin BR, Wess J. Molecular basis of receptor/G protein coupling selectivity studied by coexpression of wild type and mutant m2 muscarinic receptors with mutant G alpha(q) subunits. Biochemistry 1997;36(6):1487-1495.

88. Rosenbaum DM, Zhang C, Lyons JA, Holl R, Aragao D, Arlow DH, Rasmussen SG, Choi HJ, Devree BT, Sunahara RK, Chae PS, Gellman SH, Dror RO, Shaw DE, Weis WI, Caffrey M, Gmeiner P, Kobilka BK. Structure and function of an irreversible agonist-beta(2) adrenoceptor complex. Nature 2011;469(7329):236-240.

Page 36 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

37

89. Rader AJ, Anderson G, Isin B, Khorana HG, Bahar I, Klein-Seetharaman J. Identification of core amino acids stabilizing rhodopsin. Proc Natl Acad Sci U S A 2004;101(19):7246-7251.

90. Tan K, Pogozheva ID, Yeo GS, Hadaschik D, Keogh JM, Haskell-Leuvano C, O'Rahilly S, Mosberg HI, Farooqi IS. Functional characterization and structural modeling of obesity associated mutations in the melanocortin 4 receptor. Endocrinology 2009;150(1):114-125.

Page 37 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

38

FIGURE LEGENDS

Fig. 1. Amino acid preferences at 15 positions (9 helical and 6 near-helical) for α-

helices in globular and membrane proteins. The horizontal line at 1.2 corresponds

to the propensity cut-off value, above which the residue occurrence in membrane

proteins is significant. The stars indicate that the difference between propensity values

in globular and membrane proteins is statistically significant at that position.

Fig. 2. Amino acid preferences for select amino acid residues at a) N-terminus

and b) C-terminus in globular proteins and membrane protein helices., grouped

into those which have their N-termini / C-termini embedded in the membrane

and those with their N-termini / C-termini located outside the membrane (in the

cytosol or extracellular region). The stars indicate that the difference between

residue propensity values in membrane helices with their N or C-termini protruding or

embedded is statistically significant.

Fig. 3. Amino acids showing strong preferences at helix termini facing

hydrophobic environment within the membrane and polar environment if

terminating in the cytosolic/extracellular region. Preferred amino acids at helical

(N1, N2, C2, C1) and near-helical (N’, Ncap, Ccap, C’) have been shown, with those

colored in blue being highly preferred in membrane proteins as compared to globular

proteins. Red and blue planes indicate the outer and inner membrane boundaries

respectively as specified by the OPM database.

Page 38 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

39

Fig. 4. Sequence preferences in a) three and b) four residue linkers connecting

neighboring transmembrane helices.

The arrows adjacent to the helices indicate the N to C direction. Amino acids

preferred at the N and C-termini have of individual helices have been shown in Figure

1. The preferred residues that are common to the Cc and N’’, C’ and N’ and C’’ and

Nc positions, forming a three residue linker are shown in figure a. Similarly preferred

residues at the overlapping positions C’ and N’’, C’’ and N’ in four residue linkers are

shown in figure b.

Fig. 5. Hydrogen bond formation involving Cα –H of Proline at C’ position

observed to stabilize the C-terminus in 13 α-helices. A representative example

showing a Cα –H..O hydrogen bond between Proline-119 at C’ position and carbonyl

oxygen atom of a Tyrosine-115 at C3 position in a helix in L-carnitine antiporter

(PDB id - 2wsw).

Page 39 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

1

Table I: Division of the dataset of α-helices in membrane proteins based on the location

of their N and C-termini. (See also Fig S1)

Environment at the

helix termini

No. of

helices

(865)

Mean

length (No.

of amino

acids)

Median

length

(No. of

amino

acids)

Both N and C termini

embedded

64 16 +/- 2 16

Only N-terminus

embedded

156 21 +/- 4 22

Only C-terminus

embedded

141 22 +/- 3 22

Both N and C termini

protruding

504 28 +/- 6 30

Page 40 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

1

Table II: Capping motifs at the N-termini in globular and membrane proteins. The

membrane protein dataset has been subdivided based on α-helices with their termini

‘embedded’ inside or ‘protruding’ outside the membrane. The helices with residues up to

N4’ have been selected for analysis as they play a role in the formation of the capping motifs.

Motif at N-

terminus

Helices in

globular

proteins (2628)

Helices in

membrane

proteins (total

dataset) (831)

Helices with

N-terminus

protruding

(620)

Helices with

N-terminus

embedded

(211)

Capping box 1065 (40.5 %) 338 (40.6 %) 286 (46 %) 52 (24.6 %)

β-box 167 (6.3 %) 87 (10.2 %) 74 (12 %) 13 (6 %)

Total 1232 (47%) 425 (51%) 360 (58%) 65 (30.8 %)

Page 41 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

1

Table III: Various helix terminating motifs observed in globular and membrane proteins. The membrane protein dataset has been subdivided based on the helices terminating inside and outside the membrane. The helices with residues up to C4’ have been selected for analysis as they play a role in the formation of different helix termination motifs by mediating hydrophobic interactions with side chains of amino acids in the last turn of the helix. Glycine Schellman motif - A glycine at Ccap with left handed helical torsions and characteristic Schellman motif (6→1 and 5→2) hydrogen bonds Non-glycine Schellman motif - Similar to the Glycine Schellman motif with a non-glycine amino acid at Ccap with left handed torsions Right handed Schellman motif – A helix termination motif with right handed torsion angles at Ccap and a 6→1 hydrogen bond between C’ and C4 residues α-L motif - Glycine at Ccap with left handed helical torsions lacking the hydrogen bond pattern to form a Schellman motif Non-glycine α-L motif – Similar to the α-L motif with a non-glycine amino acid at the Ccap with left handed torsions Proline C’ motif - Proline occurs at C’ positions to force the Ccap to take up an ’extended’ conformation and break the helix Extended Ccap motif – The Ccap residue takes up an ‘Extended’ conformation to terminate the α-helix Motif at

Cterminus

Helices in

globular proteins

(2394)

Helices in

membrane

proteins (total

dataset) (824)

Helices with their

C-terminus

protruding (630)

Helices with

their C-terminus

embedded (194)

Glycine-Schellman

motif

464 (19.3 %) 118 (14.3 %) 110 (17.4 %) 8 (4 %)

Non-glycine

Schellman motif

210 (8.7 %) 63 (7.6 %) 42 (6.6 %) 21 (11 %)

Right handed

Schellman motif

31 (1.2%) 16 (2%) 12 (2%) 4 (2%)

α-L motif 322 (13.9 %) 86 (10.4 %) 83 (13 %) 3 (1.5 %)

Non-glycine α-L

motif

140 (5.8%) 20 (2.4 %) 18 (2.8 %) 2 (1%)

Proline C’ motif 212 (8.8%) 69 (8.8 %) 44 (7 %) 25 (13 %)

Extended Ccap 648 (27 %) 240 (29 %) 198 (31.4 %) 42 (21.6 %)

Total 2024 (84.6 %) 612 (74.2%) 507 (80.4 %) 105 (54 %)

Page 42 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

1

Table IV: Comparison of sequence preferences at Ncap, N1, N2, Ccap and C’ positions listed in decreasing order of preference Column 3 lists residues of the present analysis which is a propensity based analysis carried out on a dataset of 75 high resolution (<2.5Ǻ) integral membrane proteins. Column 4 gives sequence preferences calculated using data taken from a 454 ultradeep sequencing study that identifies the evolutionarily conserved amino acids in the rat neurotensin receptor by Schlinkmann and co-workers61. Column 5 gives sequence preferences calculated using multiple sequence alignments of more than 20,00 GPCR class-A sequences. Column 6 lists sequences preferences obtained from propensity analysis of a data set of 17crystal structures of GPCR class-A family, solved at resolution <2.5Ǻ.

‡ The residue preferences indicated in column 3 do not belong a particular transmembrane α-helix but all α-helices in 75 membrane proteins. Only three structures of GPCRs are present in the 'Present analysis' dataset and these have not been included in the analysis for 'GPCR class-A crystal structures' listed in column 6. Color coding scheme in the third and sixth column is based on propensity values of amino acids at a particular position. Here amino acids are highlighted in blue (propensity>1.2), dark green (1.2>propensity>1), light green (1>propensity>0.8). *Color coding scheme in columns 4 and 5 follows the nomenclature used by Schlinkmann and coworkers61 and indicates conservation, enrichment, mild enrichment, no significant change, mild deselection and strong deselection of an amino acid.

Helix No. Positions Residue preferences

Present analysis†

Schlinkmann

and

Co-workers*

GPCR

class A

consensus*

GPCR class A

crystal

structures†

TM1

(61-88)

Ncap (60) N,D,T,S,P,H,G N,G,H,D - T,G,S,N,P,H N1 (61) P,W,E,K,I,L,V,A M,I,W - P,N,L,E N2 (62) E,W,P,A,D,F,S,T,R,Q Y,W,V E T,E,W,L,N

Ccap (89) G,N,H,Q,C,K,R,T,D,F T,P,I I G,H,K,N C’ (90) P,H,K,G,S,D,Q F L P,K,T,R

TM2

(98-130)

Ncap (97) N,D,T,S,P,H,G K,S T T,G,S,N,P,H N1 (98) P,W,E,K,I,L,V,A I,V P P,N,L,E N2 (99) E,W,P,A,D,F,S,T,R,Q C,M,P M T,E,W,L,N

Ccap (131) G,N,H,Q,C,K,R,T,D,F I,V G G,H,K,N C’ (132) P,H,K,G,S,D,Q H,P,Q Y P,K,T,R

TM3

(139-172)

Ncap (138) N,D,T,S,P,H,G G S T,G,S,N,P,H N1 (139) P,W,E,K,I,L,V,A I,D,E F P,N,L,E N2 (140) E,W,P,A,D,F,S,T,R,Q F,I,W,L A T,E,W,L,N Ccap (173) G,N,H,Q,C,K,R,T,D,F E,T H G,H,K,N C’ (174) P,H,K,G,S,D,Q P,G,D P P,K,T,R

TM4

(187-207)

Ncap (186) N,D,T,S,P,H,G T,S V T,G,S,N,P,H N1 (187) P,W,E,K,I,L,V,A I,L,V C P,N,L,E N2 (188) E,W,P,A,D,F,S,T,R,Q Y,L L T,E,W,L,N

Page 43 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

2

Ccap (208) G,N,H,Q,C,K,R,T,D,F T,V,A,S L G,H,K,N C’ (209) P,H,K,G,S,D,Q M T P,K,T,R

TM5

(231-266)

Ncap (230) N,D,T,S,P,H,G V,G - T,G,S,N,P,H N1 (231) P,W,E,K,I,L,V,A G,N - P,N,L,E N2 (232) E,W,P,A,D,F,S,T,R,Q N,A E T,E,W,L,N Ccap (267) G,N,H,Q,C,K,R,T,D,F W,Q,H - G,H,K,N C’ (268) P,H,K,G,S,D,Q S,T - P,K,T,R

TM6

(302-334)

Ncap (301) N,D,T,S,P,H,G E,I I T,G,S,N,P,H N1 (302) P,W,E,K,I,L,V,A D,Q R P,N,L,E N2 (303) E,W,P,A,D,F,S,T,R,Q C,G,L S T,E,W,L,N Ccap (335) G,N,H,Q,C,K,R,T,D,F K,S,R,P,T P G,H,K,N C’ (336) P,H,K,G,S,D,Q D,S,N S P,K,T,R

TM7

(341-374)

Ncap (340) N,D,T,S,P,H,G T,S P T,G,S,N,P,H N1 (341) P,W,E,K,I,L,V,A T,S D P,N,L,E N2 (342) E,W,P,A,D,F,S,T M,L L T,E,W,L,N Ccap (375) G,N,H,Q,C,K,R,T,D,F D,S,T E G,H,K,N C’ (376) P,H,K,G,S,D,Q D,L,T F P,K,T,R

Page 44 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Fig. 1. Amino acid preferences at 15 positions (9 helical and 6 near-helical) for α-helices in globular and membrane proteins. The horizontal line at 1.2 corresponds to the propensity cut-off value, above which the

residue occurrence in membrane proteins is significant. The stars indicate that the difference between

propensity values in globular and membrane proteins is statistically significant at that position. 209x297mm (300 x 300 DPI)

Page 45 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Fig. 2. Amino acid preferences for select amino acid residues at a) N-terminus and b) C-terminus in globular proteins and membrane protein helices., grouped into those which have their N-termini / C-termini

embedded in the membrane and those with their N-termini / C-termini located outside the membrane (in

the cytosol or extracellular region). The stars indicate that the difference between residue propensity values in membrane helices with their N or C termini protruding or embedded is statistically significant.

173x245mm (300 x 300 DPI)

Page 46 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Fig. 3. Amino acids showing strong preferences at helix termini facing hydrophobic environment within the membrane and polar environment if terminating in the cytosolic/extracellular region. Preferred amino acids at helical (N1, N2, C2, C1) and near-helical (N’, Ncap, Ccap, C’) have been shown, with those colored in blue

being highly preferred in membrane proteins as compared to globular proteins. Red and blue planes indicate the outer and inner membrane boundaries respectively as specified by the OPM database.

113x136mm (300 x 300 DPI)

Page 47 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Fig. 4. Sequence preferences in a) three and b) four residue linkers connecting neighboring transmembrane helices.

The arrows adjacent to the helices indicate the N to C direction. Amino acids preferred at the N and C-

termini have of individual helices have been shown in Figure 1. The preferred residues that are common to the Cc and N’’, C’ and N’ and C’’ and Nc positions, forming a three residue linker are shown in figure a. Similarly preferred residues at the overlapping positions C’ and N’’, C’’ and N’ in four residue linkers are

shown in figure b. 109x65mm (300 x 300 DPI)

Page 48 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Fig. 5. Hydrogen bond formation involving Cα –H of Proline at C’ position observed to stabilize the C-terminus in 13 α-helices. A representative example showing a Cα –H..O hydrogen bond between Proline-119 at C’ position and carbonyl oxygen atom of a Tyrosine-115 at C3 position in a helix in L-carnitine antiporter

(PDB id - 2wsw) 113x133mm (300 x 300 DPI)

Page 49 of 49

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics