Sequence and conformational preferences at termini of α-helices in membrane proteins: Role of the...
Transcript of Sequence and conformational preferences at termini of α-helices in membrane proteins: Role of the...
TITLE
Sequence and Conformational Preferences at Termini of α-helices in Membrane
Proteins: Role of the Helix Environment
SHORT TITLE
Helix environment dictates sequence and structural preferences at termini of α-
helices.
KEYWORDS: Membrane protein folding, Membrane protein modeling, Protein design,
Structural Biology, Helix termini, Helix capping
AUTHORS AND AFFILIATIONS
Ashish Shelar and Manju Bansal
Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560012 , India
Work performed at
Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560012 , India
CORRESPONDING AUTHOR
Manju Bansal
Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560012 , India
E-mail: [email protected])
Ph: +91-080-22932534
Research Article Proteins: Structure, Function and BioinformaticsDOI 10.1002/prot.24696
This article has been accepted for publication and undergone full peer review but has not beenthrough the copyediting, typesetting, pagination and proofreading process which may lead todifferences between this version and the Version of Record. Please cite this article as an‘Accepted Article’, doi: 10.1002/prot.24696© 2014 Wiley Periodicals, Inc.Received: Jul 15, 2014; Revised: Sep 05, 2014; Accepted: Sep 16, 2014
2
ABSTRACT
α-helices are amongst the most common secondary structural elements seen in
membrane proteins and are packed in the form of helix bundles. These α-helices
encounter varying external environments (hydrophobic, hydrophilic) that may
influence the sequence preferences at their N and C-termini. The role of the external
environment in stabilization of the helix termini in membrane proteins is still
unknown. Here we analyze α-helices in a high-resolution dataset of integral α-helical
membrane proteins and establish that their sequence and conformational preferences
differ from those in globular proteins. We specifically examine these preferences at
the N and C-termini in helices initiating/terminating inside the membrane core as well
as in linkers connecting these transmembrane helices. We find that the sequence
preferences and structural motifs at capping (Ncap and Ccap) and near-helical (N’ and
C’) positions are influenced by a combination of features including the membrane
environment and the innate helix initiation and termination property of residues
forming structural motifs. We also find that a large number of helix termini which do
not form any particular capping motif are stabilized by formation of hydrogen bonds
and hydrophobic interactions contributed from the neighboring helices in the
membrane protein. We further validate the sequence preferences obtained from our
analysis with data from an ultradeep sequencing study that identifies evolutionarily
conserved amino acids in the rat neurotensin receptor. The results from our analysis
provide insights for the secondary structure prediction, modeling and design of
membrane proteins.
Page 2 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
3
INTRODUCTION
Membrane proteins constitute roughly 30% of open reading frames in various
sequenced genomes and form about 70% of the current drug targets 1. They are key
components in many regulatory pathways, immune responses and help to maintain the
integrity of cells2. However, due to difficulties in their crystallization, they constitute
only 2-3% of the total structures3 available in the Protein Data Bank (PDB)4. The past
few years have seen a considerable upsurge in research on membrane protein
structure and function and large amount of protein and genomic sequence data5 has
become available. Various experimental6-8 studies elucidating topology of membrane
proteins, as well as studies on computational design of membrane proteins9,10 have
been published. The advancement of X-ray11,12 and cryo-electron microscopy
techniques13-15 over the past decade has led to a substantial increase in the number of
high resolution membrane protein structures being solved which can be used for
sequence and structural analysis (summary of these structures is provided by White
http://blanco.biomol.uci.edu/).
α-helices and β-strands are the principal secondary structures observed in membrane
proteins due to the energetic constraints imposed by the lipid bilayer16. Thus, helix
bundles and beta barrels are the two major super secondary structural elements seen in
integral membrane proteins. Historically, transmembrane regions of proteins have
been predicted using various hydrophobicity scales17,18. Topology prediction of α-
helical membrane proteins has been greatly assisted by the ‘positive inside rule’18,
‘glycine outside rule’19 and the presence of aromatic residues (Tryptophan, Tyrosine)
at the bilayer and water interface to anchor the helix in the membrane20-22. These
studies have provided guiding rules to understand the folding process of α-helical
integral membrane proteins. However the reasons for helical distortions, pi-bulges,
Page 3 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
4
helical transitions, kinks as well as broken and re-entrant helices inside the
membrane23-26 are not yet completely understood. Due to these variations in the
structure, computational modeling27,28 of membrane proteins has only been successful
in predicting structures of small membrane bound peptides29 and four to seven helix
bundle proteins30,31. Thus, a detailed and systematic sequence and structure analysis
of the α-helices in membrane proteins is essential for the understanding of principles
governing their folding and functions as well as developing better prediction tools.
An α-helix in a membrane protein experiences a range of environments along its
length: apolar (in the membrane core), slightly polar (at lipid headgroups) to
completely polar (in the cytoplasmic and extracellular regions)32 and it shows
characteristic residue preferences depending on its location in the membrane19,33. α-
helices in globular proteins have been previously analyzed in terms of the residue
preferences34,35 and capping motifs34-38 at their termini. These motifs act as ‘start and
‘stop’ signals and also provide stability to the α-helix. In the present analysis, we have
addressed the following questions: Are the helix ‘start and ‘stop signals in membrane
proteins similar to those of globular proteins? Does the variable environment of the
membrane play a role in the residue preferences and the helix capping motifs at the
termini? Do α-helices in the vicinity of a helix terminus have a role in the stabilization
of the helix terminus?
To address these questions, we have analyzed 865 α-helices that are longer than 8
amino acids in length and identified by STRIDE39 program in high resolution
membrane proteins. These are compared with a dataset of 2680 α-helices in globular
proteins, to find out sequence and conformational preferences at helical and near-
helical positions. A detailed analysis of membrane helices initiating or terminating
outside the membrane with those that have their termini embedded in the membrane
Page 4 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
5
indicates differences in amino acid requirements between datasets for the first (N1)
and second (N2) as well as the penultimate (C2) and last (C1) positions in the helix
body. Different amino acids are also preferred at helix capping (Ncap, Ccap) and
near-helical (N’, C’) positions. Upon examining helix initiation and termination
motifs we observe that helix termini have an inclination to form specific motifs
(glycine-Schellman and non-glycine Schellman motifs) that help to maintain the
‘helix bundle’ type of architecture of the membrane protein. Short linkers connecting
transmembrane helices also show a similar positional preference for amino acids as
well as a tendency to take up specific backbone conformations. Overall, our findings
suggest that the sequence and structural preferences at the α-helix termini in
membrane proteins are governed by the demands of helix initiation and termination as
well as the membrane environment.
Page 5 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
6
Materials and Methods
X-ray Crystal structure dataset
Membrane Proteins
A non-homologous dataset of X-ray crystal structures with resolution better than 2.5Å
with sequence identity <25% was created using the PISCES server40. The dataset
comprised of 75 proteins with 181 chains. Coordinates for these 75 proteins were then
downloaded from the Orientation of Proteins in Membrane (OPM) database41. The
OPM database aligns the protein structure along the Z-axis and also provides the
membrane (hydrophobic core) boundaries for the protein based on free energy
transfer values for proteins from water to cyclohexanol42.
Globular proteins
A dataset of representative globular protein folds was created using the data from the
ASTRAL-1.75 compendium43 in the SCOP database. From a total of 1195
downloaded representative folds, the dataset was further refined by removing: (i)
Domains with SPACI score less than 0.4 (resolution worse than 2.5Å resolution) (ii)
Folds having missing ATOM record for any of the residues were excluded (iii) All
beta folds (iv) Membrane and cell surface protein folds. After filtering the data based
on the above mentioned criteria, the final dataset consisted of 626 representative
folds.
Secondary structure assignment
The Structure Identification (STRIDE) program was used to assign secondary
structures in the 75 membrane proteins and 626 representative folds of globular
proteins. The total dataset included 1164 STRIDE assigned α-helices 865 of which
were longer than 8 amino acids with average helical length of 29 and median
Page 6 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
7
distribution of 28 amino acids. The globular proteins dataset consisted of 3615 α-
helices 2680 of which were longer than 8 amino acids with mean and median helical
length of 14 and 15 amino acids respectively. These 865 and 2680 α-helices from
membrane and globular proteins were selected for further analysis.
Helix position nomenclature
Fifteen positions (nine helical positions and six near-helical positions) were
considered in and around the helix namely. N’’, N’, Ncap, N1 , N2 , N3 , N4 , MID ,
C4 , C3 , C2 , C1 , Ccap, C’ and C’’ for position-specific analysis of amino acid
occurrence (Fig. S8). The MID region represents the middle region of the helix after
excluding the four terminal positions at each end of the α-helix. The number of
residues at the MID position is N-8, where N is the length of the helix. The
distribution of 20 amino acids at each of the above mentioned 15 positions for
globular and membrane proteins is shown in Tables S8 and S9.
Statistical Methods
Distribution of amino acids, Positionwise propensity and Percent frequency
Distribution of 20 amino acids was computed for the 865 α-helices for the 15
positions in and around the α-helix. Positionwise propensities (Pij) and percent
frequencies for the 15 positions were calculated using the following formulae:
Positionwise Propensity: Pij = (nij/ni) / (Nj/N)
Positionwise Percent frequency = nij / ni
Where
nij = Number of amino acid ‘i’ at position ‘j’
Page 7 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
8
ni = Total number of amino acids of type ‘i’ in 75 membrane proteins
Nj = Number of amino acid at position ‘j’
N =Total number of amino acids in the 75 membrane proteins
Change in proportion
Change in proportion test was carried out as per the methodology followed Kumar
and coworkers35 to find out amino acids occurring significantly at a position. Change
in proportion for the ith amino acid at the jth position was considered to be significant
at 95 % confidence level if it is greater than twice the estimated standard deviation
(propij - propri) >2σij
Where:
propij = proportion of ith amino acid at the jth position
propri = proportion of ith amino acid in the reference distribution of amino acids
(distribution of amino acids in the 865 helices)
σij = sqrt (propavi (1-propavi) / ((1/Nj )+ (1/R)))
where :
propavi = average proportion of the ith amino acid
R = Total number of amino acids in the reference distribution
Significant occurrence of amino acids at a position
Propensity values for all amino acids showing significant occurrence (α=0.05) were
summed up and averaged. Thus, a propensity based cut-off value of 1.2 was obtained
for amino acids occurring in statistically significant amounts at a particular position in
helices in membrane proteins. The propensity cut-off has been indicated by a
horizontal line in each of the propensity plots.
Page 8 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
9
A propensity based cut-off of 1.4 was obtained following an identical methodology
for helices in globular proteins.
Two sample Z-test
The two sample Z-test was used to find out whether proportion of amino acids
occurring at a particular position in globular and membrane proteins were statistically
significant (α =0.05).
Z-statistic = (p1ij − p2ij) / (S(p1ij−p2ij ))
Where:
p1ij = Proportion of ith amino acid at the jth position in globular proteins.
p2ij = Proportion of ith amino acid at the jth position in membrane proteins.
S (p1ij−p2ij) = Standard error.
The Z-critical value for 19 degrees of freedom at 95% level of significance was 1.96.
The proportion of amino acids at a specific position in globular or membrane proteins
was considered to be significantly different from each other if the Z-statistic value
was >1.96. The amino acids showing this difference also differed in their propensity
values and have been marked with a ‘*’ above the propensity values in Figs. 1, 2a, 2b.
Hydrogen bond calculation
Hydrogen bond identification was carried out using HBPLUS v3.0644 employing
default parameters, for both membrane and globular proteins.
Visualization of protein structures
Visualization, structural analysis and creation of cartoon images of α-helices was
performed in Pymol45.
Page 9 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
10
Contact criteria for establishing inter-helical interactions
Contact criteria for hydrophobic interactions between side chain atoms of proteins
were followed as per the methodology used by Walters and coworkers46. Two atoms
‘a’ and ‘b’ were considered to be interacting if
Dab< (vdWa+vdWb) + 0.6Å
Where:
Dab : Distance between two interacting atoms ‘a’ and ‘b’
vdWa : Van der Waals radius of atom ‘a’
vdWb : Van der Waals radius of atom ‘b’
Division of the membrane protein dataset
Membrane proteins contain many long membrane spanning α-helices. However, apart
from these long α-helices, they also contain short or medium length α-helices which
terminate / initiate inside the membrane. The termini (N and/or C) of some of these
short/medium length α-helices face the hydrophobic membrane environment and
those of long transmembrane α-helices face a polar environment. In order to
understand the effect of the different external environments (hydrophobic and polar)
on the sequence preferences at the helix termini, the dataset of 865 α-helices was
divided into four subsets. We calculated the position of Cα atoms of N1 and/or C1
relative to the edge of the membrane boundary as defined by OPM database, to
classify helices as having ‘Both N and C-termini embedded’, ‘N-terminus embedded’,
‘C-terminus embedded’ and ‘Both N and C-termini protruding’. The number of
helices in each of the four sub-sets has been shown in Table I and Fig. S1.
Page 10 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
11
The helices mentioned in Table I were grouped based on the environment faced at
their N and C terminus (Fig. S1). The total number of residues at each position has
been shown in Table S10.
Page 11 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
12
RESULTS
Preferred residues at the N and C-termini of membrane proteins
The position-wise propensities of 20 amino acids that occur at each of the 15
positions in α-helices (defined in Methods) in membrane and globular proteins were
calculated and compared. Residues showing a marked occurrence at these positions
have been shown in Fig. 1.
Ncap, N1 and N2 are important positions as they facilitate the helix initiation and
propagation. Membrane proteins show an unusual preference for Glycine and
Histidine at Ncap, whereas globular proteins prefer Proline at this position. Globular
and membrane proteins also show common residue preferences at Ncap by preferring
‘good’ Ncaps such as Aspartic acid, Asparagine, Serine, and Threonine. Proline, a
well known ‘helix-initiator’ is preferred at the N1 position in both globular and
membrane protein datasets while it shows a higher occurrence at flanking positions
N’’, N’ and N2 in globular proteins.
Interestingly, membrane proteins also prefer Glutamine (Pij=1.3), Cysteine (Pij=1.3)
and Lysine (Pij=1.3) at Ccap position which are less preferred in globular proteins.
Glycine is highly preferred at the Ccap position in both globular and membrane
proteins as it can adopt a wide range of conformations (‘extended’ or ‘left handed’) to
terminate the helix34. The C’ position strongly prefers Proline in both datasets and
Aspartic acid in globular proteins only. Thus several positions at the N and C-termini
of α-helices generally show common residue preferences in membrane and globular
proteins, confirming the importance of these positions for helix initiation and
termination and the strict residue preferences therein.
Page 12 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
13
Role of the environment in helix initiation and termination
α-helices in membrane proteins experience varying environments on their surface,
such as the hydrophobic membrane core and the polar extracellular/cytoplasmic
region32. To check whether different environments have a role to play in the choice of
α-helix initiation and termination residues, the 865 α-helices were separated into four
classes as indicated in Table I and Fig. S11. Position-wise propensities for the 15
positions were now calculated for helices with their N or C-termini embedded within
the membrane and those with termini protruding from the membrane. These
preferences were again compared with preferences for the globular protein dataset.
Ncap, N1 and N2 positions show unique residue preferences when embedded
The Ncap position inside the membrane modulates its residue preferences so as to
curtail the energetic cost of charged residues occurring inside the membrane (Fig. 2a).
Aspartic acid, a commonly found Ncap residue in previous analyses34,35 is preferred
only in membrane protein helices with N-termini outside and in globular proteins.
However, it is avoided at the Ncap position when the helix terminus is embedded in
the membrane and ranks low with a propensity of 0.73 due to its high energetic cost
of insertion in the bilayer47-49. Relatively uncommon Ncaps such as Glycine, Histidine
and Proline are preferred at the Ncap position when it is embedded in the membrane.
Polar but uncharged residues like Asparagine, Serine and Threonine are however
commonly seen at the Ncap positions in all three datasets.
The membrane environment is highly hydrophobic in nature and is known to avoid
charged residues in the core region33. Hence helices with the Ncap position inside the
membrane select mildly polar residues like (Asparagine, Serine and Threonine) and
Page 13 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
14
Glycine in lieu of the charged Aspartic acid thus lowering the energetic cost of
residue insertion in the membrane.
N1 and N2 positions in membrane helices are selective in their residue preferences
The N1 position shows a high preference for Proline in all three datasets and confirms
its well known role as a ‘helix initiator’28,34,47 (Fig. 2a).The membrane embedded N-
termini also show a preference for bulky hydrophobic and aromatic residues such as
Tryptophan, Isoleucine, Leucine, and Phenylalanine.
Globular proteins prefer charged amino acids such as Aspartic acid (Pij=1.96) and
Glutamic acid (Pij=1.4) at the N2 position, while these are less preferred inside the
membrane (Propensities 0.12 and 1 for Aspartic and Glutamic acid respectively) (Fig.
2a). Hence, the membrane environment plays a role in governing the sequence
preferences at the N1 and N2 positions which are important for the stability and
propagation of the α-helix.
C2, C1, Ccap and C’ positions show distinct sequence preferences inside the
membrane
The helical and near-helical positions at the C-termini also fine-tune their residue
preferences in accordance with the external helix environment (hydrophobic or polar)
as observed earlier for the Ncap, N1 and N2 positions. Thus, only hydrophobic amino
acids like Leucine, Isoleucine and aromatic amino acids like Phenylalanine are
preferred at C2 and C1 positions inside the membrane whereas charged amino acids
such as Lysine and Arginine are preferred outside the membrane.
The Ccap position inside the membrane prefers a relatively uncommon hydrophobic
amino acid like Isoleucine and aromatic amino acids like Phenylalanine and Histidine
Page 14 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
15
(Fig. 2b). Glycine, a ‘good’ Ccap residue shows strong preference for the Ccap
position in helices with termini ‘protruding’ outside the membrane as well as globular
proteins however; it shows a comparatively lesser affinity for Ccap in the membrane
proteins with C-termini embedded in the membrane. This decreased preference of
Glycine is compensated by the presence of polar but uncharged amino acids Serine
and Threonine, which are preferred at the Ccap inside the membrane but avoided
outside the membrane and in globular proteins.
The positively charged Lysine is avoided at Ccap positions inside the membrane and
in globular proteins but it is preferred at Ccap positions outside the membrane (Fig.
2b).
Asparagine, a favored Ccap residue is also avoided at Ccap position inside the
membrane but it is strongly preferred outside the membrane and in globular proteins.
Proline is also avoided at Ccap inside as well as outside the membrane but it is
preferred in globular proteins (Fig. 2b).
C’ is an important near-helical position as it helps to propagate the break in the helix,
induced by the Ccap position. The C’ position prefers the charged Lysine in helices
terminating outside the membrane but avoids it inside the hydrophobic environment
of the membrane. Proline is preferred in all three datasets at this position as it is
known to terminate the helix, one or two residues before its occurrence (Fig. 2b).
Thus, the membrane environment plays a role in regulating the sequence preferences
at the helical (C2, C1) as well as the near-helical Ccap and C’ positions which signal
the termination of the α-helix respectively.
Short linkers between transmembrane α-helices show sequence and
conformational preferences
Page 15 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
16
Successive transmembrane α-helices in membrane proteins are connected by short
linker regions which can take up a particular ‘turn’ type conformation or may be
present as ‘unstructured loops’. Residues present in short linkers (three and four
residues) perform the dual role of inducing a turn and also capping preceding and
succeeding α-helices. We have identified and analyzed sequence and conformational
preferences of 23 linkers of three and 29 linkers of four residue length found in the
dataset of 75 membrane proteins.
Residue preferences in three and four residue linkers
In three residue linkers, the Cc, C’ and C’’ positions of helix 1 overlap with the N’’,
N’ and Nc position of helix 2 respectively (Fig. 4a). Glycine and Lysine are preferred
at the first position (Cc / N’’) of the three residue linker. The second position (C’ / N’)
prefers ‘turn inducing’ Proline50 and Arginine, whereas the third position (C’’ / N’)
prefers Asparagine and Glycine. In the four residue linkers, the near-helical regions at
the C-terminus of helix 1 and those at the N-terminus of helix 2 overlap at the second
(C’ / N’’) and third linker positions (C’’ / N’). These linkers prefer Proline, Glycine
and Lysine at the second position (C’ / N’’), while Proline and Arginine are preferred
at the third position (C’’ / N’) (Fig 4b).
Amino acids such as Glycine, Proline, Histidine, Asparagine and Serine show high
propensities for each of the near-helical positions at the N and C-terminus in
individual helices, as seen in Fig 1, while ‘turn inducing’ Proline, Glycine and
charged Arginine, Lysine are preferred in both three and four residue linkers. Each of
the near-helical positions (C’, C’’, N’, N’’) show a high propensity for Histidine in
individual helices, but it is avoided when these positions overlap in the linker regions.
Page 16 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
17
A similar analysis of short linkers connecting α-helices in globular proteins (84 three
residue and 75 four residue linkers) shows that only Glycine has a strong preference
at the first overlapping position of the three residue linkers, while Leucine and
Isoleucine are preferred at the second position. Serine, Aspartic acid and Proline are
preferred at the third position (data not shown). In four residue linkers, the second
position prefers Proline, Glycine, Glutamic acid and Threonine, while hydrophobic
Alanine and Valine are preferred in the third position. Thus apart from Glycine and
Proline, three and four residue linkers favor charged amino acids at overlapping
positions in membrane proteins, while hydrophobic and polar residues are preferred in
globular proteins.
Conformational preferences in three and four residue linkers
Out of the 23 three residue linkers, 9 terminate their preceding α-helices with a
Schellman motif (Table S1) and show conformational clustering with backbone
torsion angles (φ-ψ) at Ccap corresponding to the left handed α-helical conformation
and ‘Extended’ conformation at their C’ and C’’ positions (Figs. S2 and S3) as noted
previously by Engel and coworkers51. These conformationally clustered linkers
superpose with a backbone RMSD of 0.37Å. The (φ-ψ) distribution of the 29 four-
residue linkers shows poor superposition and conformational clustering (Figs. S2 and
S3).
Hydrogen bond formation involving top ranked Ncap and Ccap residues
An α-helix is characterized by (NH) 5→1 (O=C) hydrogen bonds between the
backbone amide (NH) and carbonyl (CO) groups of the polypeptide chain. However,
due to this hydrogen bond pattern, four NH groups in the first and four CO groups in
Page 17 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
18
the last turn of the α-helix are not hydrogen bonded. These free NH and CO groups
are often ‘capped’52 by side chains of amino acids at the Ncap and Ccap positions as
well as the flanking near-helical region at the N and C terminus respectively giving
rise to characteristic motifs with specific hydrogen bond and/or backbone (φ-ψ)
patterns.
Top ranked amino acids at the Ncap and Ccap position (propensity >1.2) (see
Methods) were analyzed for their side chain to hydrogen bond to the backbone NH
and CO groups in the first and last α-helical turns respectively. The percentage
composition of top ranked amino acids at the Ncap and Ccap position in globular
(propensity >1.4) and membrane proteins (propensity >1.2) is shown in Table S2. The
Ncap position has equal percentage of amino acids (~55%) that are capable of
accepting a hydrogen bond with their side chain in both globular and membrane
proteins. However, at the Ccap position the occurrence of amino acids with side chain
hydrogen bond donor atoms is substantially less in membrane proteins (~19%) as
compared to globular proteins (~58%).
Hydrogen bond formation of top ranked Ncap residues
The hydrogen bond formation of top ranked Ncap residues (with side chain acceptors)
in globular and membrane proteins has been shown in Table S3. In both datasets, a
majority of the Ncap residues hydrogen bond with the amino group at the N3 position
except for Histidine which prefers to hydrogen bond with the N2 position. Despite
being a top ranked Ncap residue in both datasets, Aspartic acid forms considerably
less number of hydrogen bonds in membrane proteins as compared to globular
proteins.
Page 18 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
19
When the maximally preferred Ncap residues in membrane proteins (shown in Table
S3) are subdivided into Ncaps that are ‘embedded’ and ‘protruding’ out of the
membrane, it is found that Aspartic acid, Asparagine and Histidine residues show
considerably higher frequency of hydrogen bond formation when they are located
within the membrane (Table S4). However, Serine and Threonine show similar
frequencies (~75% to 85%) in both datasets. The dataset for ‘Embedded Ncap
residues’ is small, however it is interesting to note that all 7 Aspartic acid residues
inside the membrane form an intra-helical backbone hydrogen bond.
Hydrogen bond formation of top ranked Ccap residues
The side chains of top ranked Ccap residues show a high preference to form hydrogen
bond with carbonyl oxygen of amino acids at C3 and C4 positions in both datasets
(Table S5). Glutamine and Lysine are preferred Ccaps in membrane proteins and form
hydrogen bonds in 41% and 20% of the cases respectively. Glutamine, a less
frequently observed residue in previous analyses34-37 is highly preferred at Ccap
position in membrane proteins only. However, it shows notably lesser number of
hydrogen bonds in this case as compared to globular proteins. A visual inspection of
the helices with Lysine at Ccap reveals that all these helices terminate at the
interfacial region of the membrane which prefers charged and aromatic residues. 26
(59%) of these helices terminate at the inner side of the membrane emphasizing the
role of the ‘positive inside’18 rule in the distribution of Lysine at Ccap position. Thus,
sequence preference of Lysine at Ccap is dictated by the membrane environment.
Tyrosine, a preferred Ccap only in globular proteins (not shown in Table S5) shows
no hydrogen bond formation to the residues in the last turn. The absence of hydrogen
bond involving Lysine and Tyrosine is due to the conformational restrictions on their
Page 19 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
20
long (aliphatic) and aromatic side chain to fold back and form such intra-helical
hydrogen bonds.
The hydrogen bond formation of top ranked Ccap residues in membrane proteins after
subdivision into Ccaps that are ‘embedded’ and ‘protruding’ out of the membrane has
been shown in Table S6.The Ccap residues do not show a substantial difference in
hydrogen bond patterns inside and outside the membrane.
Helix capping at the N and C-termini of α-helices
Motifs at the N-terminus
The capping box and β-box motif are two commonly observed capping motifs at the
N-terminus of α-helices and have been illustrated in Fig. S4. Their frequency of
occurrence has been shown in Table II. The capping box involves a hydrogen bond
between the free amide of N3 residue and side chain acceptor of the Ncap and a
reciprocal hydrogen bond between the free amide of the Ncap and the side chain of
the N3 residue34,53 In this study, an N-terminal motif is defined as a capping box even
if only one of the above mentioned hydrogen bonds is seen53. A β-box motif is
characterized by the presence of N2(mc)→Ncap(sc) hydrogen bond at the N-terminus
of the α-helix34.
Helices in both globular as well as membrane proteins prefer the capping box motif
over the β-box (Table II).However, the percentage of all capping motifs in globular
(47%) and membrane proteins (51%) is comparable, though the small number of
helices with their N-terminus embedded inside the membrane show a lower
percentage of capping motifs (30.8%). α-helices prefer ‘good’ Ncap (Aspartic acid,
Asparagine, Serine and Threonine) and N3 (Glutamic acid, Aspartic acid and
Glutamine) residues in motif forming (capping box, β box) and ‘non-motif forming’
Page 20 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
21
helices in membrane proteins. Thus, the presence of favored Ncap and N3 residues is
not correlated to the presence of any particular motif at the N-terminus of α-helices in
membrane proteins. Other capping motifs such as the Big-box, α-box, β’-box, α’-box,
α- β box and the Cap’-box are completely absent in membrane proteins, and even
their total count is less than 10 in the globular proteins datasets. Thus, they have not
been listed in the table.
Motifs at the C-terminus
Helices at the C-terminus are capped by motifs that have characteristic conformations
as indicated by their (φ-ψ) patterns (Fig. S5) as against the N-terminus which is
capped by hydrogen bonds between free main chain amide groups as donors and side
chain of amino acids as acceptors34. The frequency of occurrence of various helix
termination motifs observed at the C-termini of α-helices in globular and membrane
proteins has been listed in Table III.
The glycine-Schellman motif is one of the commonly found motifs in both datasets
and terminates 19.3% helices in globular and 14.3% helices in membrane proteins.
However in membrane proteins, 110 out of 118 (93.2%) of these helices form
Schellman motifs outside the membrane thus confirming the strong inclination of this
motif to be present outside the membrane.
The non-glycine Schellman motif also shows comparable abundance in globular and
membrane protein datasets. Interestingly, these occur more frequently inside the
membrane (21 out of 63 occurrences) as compared to glycine-Schellman motifs (8 out
of 118 occurrences). Visual inspection of all the non-glycine Schellman motifs shows
that they occur near the interfacial region (data not shown) and prefer aromatic and
charged residues at Ccap.
Page 21 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
22
The right handed-Schellman motif and the α-L motif show comparable occurrences
individually in both datasets, but show a bias towards terminating helices outside the
membrane, as seen in the case of glycine Schellman motif (Table III).
Thus, most glycines with left-handed torsions at Ccap inside the membrane are
involved in a Schellman motif to terminate the α-helix.
The glycine Schellman, non-glycine Schellman as well as α-L motifs show a
preference for the hydrophobic Alanine and Leucine at C2, C1 and C’ positions. In
addition to these amino acids, the C’ position prefers Serine in α-L, Glycine in glycine
Schellman and Valine in non-glycine Schellman motifs. Asparagine and Glycine are
preferred at C’’ position in the α-L motif whereas both glycine and non-glycine
Schellman motifs prefer Proline and Phenylalanine at C’’ position.
The Pro-C’ motif occurs equally in globular (8.8%) and membrane proteins (8.8%). In
membrane proteins, out of the total 93 Proline residues present at the C’ position, 69
(74%) form a Pro-C’ motif. Proline at the C’ position also ‘caps’ the α-helix by
forming a hydrogen bond through its Cα-H atom with the free backbone carbonyl
atom of the last turn of the α-helix54 (Fig. 5). These Proline Cα -H…O hydrogen bonds
occur in 13 (19%) cases, only one of which occurs inside the membrane.
The ‘Extended Ccap’ motif causes helix termination in 27% and 29% of helices in
globular and membrane proteins respectively and is the most common helix
termination motif observed in both datasets.
Helix-helix interactions are involved in stabilization of helix termini
Apart from the well known capping motifs at the helix termini, the ends of α-helices
are also stabilized by inter-helical hydrogen bonds and hydrophobic interactions
contributed by from the tertiary structure of the protein (neighboring α-helices and
Page 22 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
23
loop regions). In certain instances, these helix termini are stabilized by interactions
(hydrogen bonds, hydrophobic interactions) contributed by more than one α-helix in
the vicinity of a helix terminus.
In membrane proteins, 406 (~47%) α-helices do not form any previously reported
motif at their N-terminus. These termini interact with a total of 1166 residues which
are spatially proximal, 64% of which belong to α-helices. 43% of these ‘termini-
stabilizing’ helix-helix interactions occur inside the membrane and 21% occur outside
the membrane (Fig. S7). Correspondingly, 212 (~25%) α-helices that do not form any
well known motif at the C-terminus are stabilized by a total of 1056 residues in their
vicinity. 67% of these residues belong to α-helices, 41% and 26% of which involve
interactions occurring inside and outside the membrane respectively.
In globular proteins, 29% and 33% of the interactions that help in stabilizing 1396 and
370 ‘non-motif forming’ α-helices at their N and C-termini respectively, arise from
helix-helix interactions.
Thus, a larger number of helix-helix interactions occur between closely packed α-
helices inside the membrane and play a major role in the stabilization of helix termini,
especially inside the membrane.
Validation of position-specific residue analysis: A GPCR case study
G-protein coupled receptors (GPCRs) represent one of the most widely studied
families of membrane proteins as they play critical roles in processes such as signal
transduction55, maintaining cellular homeostasis56 and neurological57 as well as
reproductive physiological processes58; thus making them one of the major drug
targets59,60.
Page 23 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
24
Schlinkmann and co-workers recently developed an in-vitro directed evolution system
for a GPCR variant rat neurotensin receptor 1-D03 (rNTR1-D03) to enhance its
biosynthesis, detergent stability and functionality61. In this study, ultradeep
sequencing was used to uncover amino acids that are ‘not-acceptable’, ‘acceptable’
and ‘preferred’ for each position in the rNTR1-D03 protein. The amino acid
preferences from the ultradeep sequencing have been used to compare and validate
the sequence signatures deduced from the current statistical analysis of crystal
structures. Preferences from ultradeep sequencing reveal the presence of similar
amino acids at Ncap, N1, N2, Ccap, C’ positions to those obtained in the present
study (Table IV) and thus emphasize the strong residue preferences at these positions.
Amino acids in the third transmembrane α-helix (TM3) and its Ccap, C’ positions;
and those at the Ncap, N1 positions of the sixth transmembrane helix (TM6) are
known to contact the G-protein to assist in the signal transduction and thus are
functionally important55,61. Among the G-protein interacting residues, those occurring
at the helix termini show conservation by ultradeep sequencing and also match with
the preferences seen in the current propensity based analysis (Table S7). Hence,
analogous to the helix termini, functionally important positions also show selectivity
in residue preferences.
Page 24 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
25
Discussion
Sequence and motif preferences at α-helix termini for globular proteins have been
extensively studied in the past few decades34-37,53,62-64. However, to the best of our
knowledge, such studies have not been carried out for a high resolution dataset of
membrane proteins. Depth dependant propensities for amino acids in a 19 mer
synthetic peptide and a comparison with results from a smaller membrane protein
dataset have been reported by Hessa et al65. Here, we have analyzed in detail the
residue preferences at the N and C-termini of α-helices in helical membrane proteins,
particularly their dependence on the environment of the helix. We find that some of
the amino acids are highly preferred at Ncap (Asparagine, Aspartic acid, Serine,
Threonine), N1 (Proline), Ccap (Glycine) and C’ (Proline) positions are similar in
both globular and membrane proteins. In addition to these, Glycine and Histidine are
often observed at Ncap position, while Cysteine, Glutamine and Lysine occur at Ccap
position in membrane protein α-helices, though they are not preferred in globular
proteins. The MID region of the α-helices in both globular and membrane proteins
prefers hydrophobic amino acids such as Leucine, Isoleucine and Valine. In
membrane protein helices, these amino acids play a functional role by interacting with
the alkyl tails of lipids in the bilayer66,67.
A large number of transmembrane helices are short in length and initiate / terminate
within the bilayer. These helices with their termini embedded in the membrane
display unique selectivity in their residue preferences. Serine, Threonine, Asparagine,
Histidine, Proline and Glycine are preferred at the Ncap inside the membrane while
Aspartic acid is avoided to reduce the energetic cost of its insertion into the
hydrophobic environment47-49. The presence of Glycine and Proline at the Ncap
position inside the membrane is explained by their ability to distort longer
Page 25 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
26
transmembrane helices into shorter fragments possibly for helix packing and
functional reasons23,68,69. Inside the membrane, the N1 position prefers hydrophobic
residues (Isoleucine, Leucine, Valine and Tryptophan) to start the helix apart from
Proline. The N2 position is important for the stability of the nascent α-helix and the
amino acids preferred at this position (Aspartic acid, Glutamic acid, Threonine)
stabilize the helix by forming hydrogen bonds through their side chains with the free
main chain amide groups of the first α-helical turn63,70. This position prefers aromatic
residues (Phenylalanine, Tryptophan) and Proline when embedded inside the
membrane as opposed to polar residues (Aspartic acid and Glutamic acid) in when it
is outside the membrane. Apart from the usually preferred Glycine19,33,34, the Ccap
position inside the membrane is more permissive in its sequence preferences as it
favors a range of amino acids, such as Isoleucine, Serine ,Threonine, Histidine and
Phenylalanine. The varying amino acid requirements at the Ncap, N1, N2 as well as
Ccap and C’ positions inside and outside the membrane show that the sequence
preferences are governed by the helix initiating and terminating property as well as
the external environment of the helix (Fig. 3).
Short linker regions (three and four residue length) play a role in the organization of
the connected transmembrane helices71 and thus have stringent sequence preferences
therein72,73. Sequence analysis of these linker regions between α-helices in membrane
and globular proteins reveals selectivity in residue preferences at overlapping linker
positions. The preferred residues (Glycine and Proline) serve to cap the helix at their
termini34-36,74 as well as induce a ‘sharp turn’32,44,45 to redirect the polypeptide chain
into the membrane. The linkers in membrane proteins also prefer the charged
Arginine and Lysine as well as polar Asparagine at certain overlapping positions. As
the linkers are present at the membrane interface, membrane proteins have the added
Page 26 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
27
choice of inserting charged and polar amino acids along with Glycine and Proline
which are also observed at overlapping positions within linkers in globular proteins.
Nine Schellman motif forming three residue linkers also show backbone
conformational clustering by taking up an ‘α-L-extended-extended’ conformation17,51.
In spite of similar residue preferences at Ncap position in globular and membrane
proteins, Aspartic acid shows fewer intra-helical hydrogen bonds involving its side
chain acceptor atoms in membrane protein helices. However, an examination of
hydrogen bonds involving Ncap residues inside the membrane reveals that Aspartic
acid, Asparagine and Histidine show higher frequency of hydrogen bonds when
embedded inside the membrane. At Ccap, Lysine and Glutamine are highly preferred
but do not participate in intra-helical hydrogen bonds due to conformational
constraints in their side chains. The interfacial region of membrane proteins prefers
charged and aromatic residues which interact with the polar lipid head groups20-22.
The preference of Lysine and Glutamine at the Ccap position in helices terminating at
the interfacial region highlights the role of the membrane environment in preferring
these residues at the Ccap position.
As noted previously in globular proteins34, membrane proteins too show higher
percentage of intra-helical hydrogen bond formation at the Ncap as compared to the
Ccap position. This reiterates the fact that intra-helical hydrogen bonds involving side
chains of Ncap residues stabilize the N-terminus, whereas additional hydrogen bonds
involving flanking residues and hydrophobic interactions are involved in the
formation of structural motifs at the C-termini of α-helices34.
At the N-terminus, the capping box is preferred over the β-box in both globular and
membrane proteins as it has reciprocal hydrogen bonds between main and side chains
of Ncap and N3 which impart more stability to the helix.
Page 27 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
28
We find higher occurrence of capping motifs in globular (84.6%) as compared to
membrane proteins (74.2%) at the C-terminus. As observed in previous analyses34-
36,74 Glycine is the most frequently occurring Ccap residue in both globular and
membrane proteins (Fig. 1). Most of these glycine residues take up an ‘α-L’
conformation at the Ccap and form Glycine Schellman and α-L-motifs which are
particularly important in membrane proteins as they help to fold the polypeptide chain
upon itself to form a helix hairpin and reinsert it into the membrane to maintain the
‘helix bundle’ type of architecture seen in α-helical integral membrane proteins.
Inside the membrane, only 33 (17%) helices terminate in different Schellman motifs
(glycine, non-glycine and right handed), all of them near the interfacial region and 25
(76%) of them prefer non-glycine (aromatic and charged) Ccaps showing that the
membrane environment dictates sequence preferences in Schellman motifs inside the
membrane. The presence of non-glycine residues at the Ccap position to take up ‘α-L’
conformations and form Schellman motifs indicates that the need for motif formation
overrides the sequence preference at Ccap.
Capping motifs at the N and C-termini of α-helices serve as helix ‘start’ and ‘stop’
signals and prevent the helix from ‘fraying’. Apart from the commonly observed and
well studied capping motifs, we find that helix-helix interactions also play a
substantial role in capping/stabilizing the helix-termini within the membrane. These
interactions thereby compensate for the shortage of capping motifs inside the
membrane .Thus, the external environment and the fold of the membrane protein play
a role in determining sequence and motif preferences at the helix termini in membrane
proteins.
Helices in membrane proteins occasionally ‘unwind’ inside the hydrophobic bilayer
in order to perform a specific function (Fig. S6) as seen in the case of the Calcium
Page 28 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
29
pump75 and the Leucine transporter76. A few studies have also found helices to be
distorted or kinked for packing46 as well as evolutionary reasons26. The functional
nature of membrane proteins involves large amounts of helix-helix dynamics23,77 and
interactions46,78. This probably accounts for the smaller number of stable capping
motifs at helix termini inside the membrane.
GPCRs constitute the largest superfamily of proteins in mammalian genomes.
However, proteins in this superfamily display high sequence diversity, with few
residues being strictly conserved, as indicated by multiple sequence alignment36,61 and
analysis of crystal structures79. Despite this distinctive feature of protein sequences in
this family, signature amino acids at helical (N1, N2, C1) and near-helical positions
(Ncap, Ccap and C’ ) are evolutionarily conserved within rNTR1-D03 protein60, as
well as in other protein sequences within the GPCR family80. These residue
preferences are akin to the favored amino acids at the corresponding positions from
our current analysis. The conservation of functionally important amino acids in the
helical and near-helical positions of the third and sixth transmembrane helices of the
rNTR1-D03 protein and their conformity with the residue preferences determined in
our analysis highlights the inclination of these positions to prefer only a few favored
amino acids. The relevance of selective residue preferences at the above mentioned
positions is further confirmed by several mutational studies in membrane proteins
which show that mutations in these positions cause loss of function81-83, decrease in
binding affinity to the ligand84-86 or the G-protein87,88 thereby leading to protein
misfolding89 or diseases90.
Our study shows that residues preferred at the capping positions in helix termini in
membrane proteins are comparable to those seen in globular proteins. However, a
subdivision of the membrane protein dataset reveals differences in sequence
Page 29 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
30
preferences at helix termini, depending on whether they are embedded within or
protruding outside the membrane. The inclination of peculiar amino acids to be
present at the C2, C1, Ccap, C’ and C’’ positions at the C-terminus in motif forming
helices reveals a sequence dependant structural preference that would aid in the
design of specific motifs in α-helical membrane proteins.
Membrane proteins optimize biophysical and chemical constraints of the external
environment to strategically place select amino acids at crucial helical and near-
helical positions that act as helix ‘start‘ and ‘stop‘ signals. These amino acid
preferences in turn govern the type of capping motifs formed at the helix termini. In
summary this study reveals that stabilization of helix termini in membrane proteins
depends upon the interplay of amino acid preferences at specific positions, their role
in formation of capping motifs and helix-helix interactions which determine the fold
of the protein. Insights from our study will be helpful in delineating α-helix
boundaries for newly solved membrane protein structures at low resolution. These
results would also assist in fine tuning computational tools used to model and
rationally design membrane proteins.
Page 30 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
31
ACKNOWLEDGEMENTS
This work was supported by Department of Science and Technology (DST) and
Department of Biotechnology (DBT), India.
CONFLICT OF INTEREST
The authors declare no conflict of interest in this work.
Page 31 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
32
REFERENCES 1. Arinaminpathy Y, Khurana E, Engelman DM, Gerstein MB. Computational
analysis of membrane proteins: the largest class of drug targets. Drug Discov Today 2009;14(23-24):1130-1135.
2. Boyd D, Schierle C, Beckwith J. How many membrane proteins are there? Protein Science 1998;7(1):201-205.
3. Fagerberg L, Jonasson K, von Heijne G, Uhlén M, Berglund L. Prediction of the human membrane proteome. Proteomics 2010;10(6):1141-1149.
4. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res 2000;28(1):235-242.
5. Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS. Three- dimensional structures of membrane proteins from genomic sequencing. Cell 2012;149(7):1607-1621.
6. Hennerdal A, Elofsson A. Rapid membrane protein topology prediction. Bioinformatics 2011;27(9):1322-1323.
7. Rapp M, Drew D, Daley DO, Nilsson J, Carvalho T, Melen K, De Gier JW, Von Heijne G. Experimentally based topology models for E. coli inner membrane proteins. Protein Sci 2004;13(4):937-945.
8. Mirzadegan T, Benko G, Filipek S, Palczewski K. Sequence analyses of G- protein-coupled receptors: similarities to rhodopsin. Biochemistry 2003;42(10):2759-2767.
9. Perez-Aguilar JM, Saven JG. Computational design of membrane proteins. Structure 2012;20(1):5-14.
10. Senes A. Computational design of membrane proteins. Curr Opin Struct Biol 2011;21(4):460-466.
11. Cherezov V, Abola E, Stevens RC. Recent progress in the structure determination of GPCRs, a membrane protein family with high potential as pharmaceutical targets. Methods Mol Biol 2010;654:141-168.
12. White SH. The progress of membrane protein structure determination. Protein Sci 2004;13(7):1948-1949.
13. Bill RM, Henderson PJ, Iwata S, Kunji ER, Michel H, Neutze R, Newstead S, Poolman B, Tate CG, Vogel H. Overcoming barriers to membrane protein structure determination. Nat Biotechnol 2011;29(4):335-340.
14. Goldie KN, Abeyrathne P, Kebbel F, Chami M, Ringler P, Stahlberg H. Cryo- electron microscopy of membrane proteins. Methods Mol Biol 2014;1117:325-341.
15. Hu M, Vink M, Kim C, Derr K, Koss J, D'Amico K, Cheng A, Pulokas J, Ubarretxena-Belandia I, Stokes D. Automated electron microscopy for evaluating two-dimensional crystallization of membrane proteins. J Struct Biol 2010;171(1):102-110.
16. White SH. Biophysical dissection of membrane proteins. Nature 2009;459(7245):344-346.
17. Argos P, Rao JK, Hargrave PA. Structural prediction of membrane-bound proteins. Eur J Biochem 1982;128(2-3):565-575.
18. von Heijne G. Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol 1992;225(2):487-494.
Page 32 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
33
19. Jin W, Takada S. Asymmetry in membrane protein sequence and structure: glycine outside rule. J Mol Biol 2008;377(1):74-82.
20. Yau WM, Wimley WC, Gawrisch K, White SH. The preference of tryptophan for membrane interfaces. Biochemistry 1998;37(42):14713-14718.
21. Killian JA, von Heijne G. How proteins adapt to a membrane-water interface. Trends Biochem Sci 2000;25(9):429-434.
22. Norman KE, Nymeyer H. Indole localization in lipid membranes revealed by molecular simulation. Biophysical journal 2006;91(6):2046-2054.
23. Bright JN, Sansom MS. The flexing/twirling helix: exploring the flexibility about molecular hinges formed by proline and glycine motifs in transmembrane helices. The Journal of Physical Chemistry B 2003;107(2):627-636.
24. Cordes FS, Bright JN, Sansom MS. Proline-induced distortions of transmembrane helices. J Mol Biol 2002;323(5):951-960.
25. Screpanti E, Hunte C. Discontinuous membrane helices in transport proteins and their correlation with function. J Struct Biol 2007;159(2):261-267.
26. Yohannan S, Faham S, Yang D, Whitelegge JP, Bowie JU. The evolution of transmembrane helix kinks and the structural diversity of G protein-coupled receptors. Proc Natl Acad Sci U S A 2004;101(4):959-963.
27. Barth P, Wallner B, Baker D. Prediction of membrane protein structures with complex topologies using limited constraints. Proceedings of the National Academy of Sciences 2009;106(5):1409-1414.
28. Yarov‐Yarovoy V, Schonbrun J, Baker D. Multipass membrane protein structure prediction using Rosetta. PROTEINS: Structure, Function, and Bioinformatics 2006;62(4):1010-1025.
29. Polyansky AA, Chugunov AO, Vassilevski AA, Grishin EV, Efremov RG. Recent advances in computational modeling of alpha-helical membrane-active peptides. Curr Protein Pept Sci 2012;13(7):644-657.
30. Kufareva I, Katritch V, Stevens RC, Abagyan R. Advances in GPCR Modeling Evaluated by the GPCR Dock 2013 Assessment: Meeting New Challenges. Structure 2014;22(8):1120-1139.
31. Sadiq SK, Guixa-Gonzalez R, Dainese E, Pastor M, De Fabritiis G, Selent J. Molecular modeling and simulation of membrane lipid-mediated effects on GPCRs. Curr Med Chem 2013;20(1):22-38.
32. Baeza-Delgado C, Marti-Renom MA, Mingarro I. Structure-based statistical analysis of transmembrane helices. Eur Biophys J 2013;42(2-3):199-207.
33. Ulmschneider MB, Sansom MS. Amino acid distributions in integral membrane protein structures. Biochim Biophys Acta 2001;1512(1):1-14.
34. Aurora R, Rose GD. Helix capping. Protein Sci 1998;7(1):21-38. 35. Kumar S, Bansal M. Dissecting alpha-helices: position-specific analysis of
alpha-helices in globular proteins. Proteins 1998;31(4):460-476. 36. Presta LG, Rose GD. Helix signals in proteins. Science 1988;240(4859):1632-
1641. 37. Gunasekaran K, Nagarajaram H, Ramakrishnan C, Balaram P. Stereochemical
punctuation marks in protein structures: glycine and proline containing helix stop signals. J Mol Biol 1998;275(5):917-932.
38. Lahr SJ, Engel DE, Stayrook SE, Maglio O, North B, Geremia S, Lombardi A, DeGrado WF. Analysis and design of turns in α-helical hairpins. J Mol Biol 2005;346(5):1441-1454.
Page 33 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
34
39. Heinig M, Frishman D. STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res 2004;32(Web Server issue):W500-502.
40. Wang G, Dunbrack RL, Jr. PISCES: a protein sequence culling server. Bioinformatics 2003;19(12):1589-1591.
41. Lomize MA, Lomize AL, Pogozheva ID, Mosberg HI. OPM: orientations of proteins in membranes database. Bioinformatics 2006;22(5):623-625.
42. Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI. Positioning of proteins in membranes: a computational approach. Protein Science 2006;15(6):1318-1333.
43. Brenner SE, Koehl P, Levitt M. The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res 2000;28(1):254-256.
44. McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J Mol Biol 1994;238(5):777-793.
45. Schrodinger, LLC. The PyMOL Molecular Graphics System, Version 1.3r1. 2010.
46. Walters RF, DeGrado WF. Helix-packing motifs in membrane proteins. Proc Natl Acad Sci U S A 2006;103(37):13658-13663.
47. Kauko A, Hedin LE, Thebaud E, Cristobal S, Elofsson A, von Heijne G. Repositioning of transmembrane α-helices during membrane protein folding. J Mol Biol 2010;397(1):190-201.
48. Hedin LE, Ojemalm K, Bernsel A, Hennerdal A, Illergard K, Enquist K, Kauko A, Cristobal S, von Heijne G, Lerch-Bader M, Nilsson I, Elofsson A. Membrane insertion of marginally hydrophobic transmembrane helices depends on sequence context. J Mol Biol 2010;396(1):221-229.
49. Hessa T, Kim H, Bihlmaier K, Lundin C, Boekel J, Andersson H, Nilsson I, White SH, von Heijne G. Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature 2005;433(7024):377-381.
50. Nilsson I, von Heijne G. Breaking the camel's back: proline-induced turns in a model transmembrane helix. J Mol Biol 1998;284(4):1185-1189.
51. Engel DE, DeGrado WF. Alpha-alpha linking motifs and interhelical orientations. Proteins 2005;61(2):325-337.
52. Richardson JS, Richardson DC. Amino acid preferences for specific locations at the ends of alpha helices. Science 1988;240(4859):1648-1652.
53. Shen Y, Bax A. Identification of helix capping and b-turn motifs from NMR chemical shifts. J Biomol NMR 2012;52(3):211-232.
54. Ashish S, Prasun K, Manju B. Defining α-helix geometry by Cα atom trace vs (φ-ψ) torsion angles: a comparative analysis. In: Srinivasan MBaN, editor. Biomolecular Forms and Functions: Biomolecular Forms and Functions,IISc Press; 2013. p 116-127.
55. White JF, Noinaj N, Shibata Y, Love J, Kloss B, Xu F, Gvozdenovic-Jeremic J, Shah P, Shiloach J, Tate CG, Grisshammer R. Structure of the agonist-bound neurotensin receptor. Nature 2012;490(7421):508-513.
56. Hazell GG, Hindmarch CC, Pope GR, Roper JA, Lightman SL, Murphy D, O'Carroll AM, Lolait SJ. G protein-coupled receptors in the hypothalamic paraventricular and supraoptic nuclei--serpentine gateways to neuroendocrine homeostasis. Front Neuroendocrinol 2012;33(1):45-66.
57. Chien EY, Liu W, Zhao Q, Katritch V, Han GW, Hanson MA, Shi L, Newman AH, Javitch JA, Cherezov V, Stevens RC. Structure of the human
Page 34 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
35
dopamine D3 receptor in complex with a D2/D3 selective antagonist. Science 2010;330(6007):1091-1095.
58. Kruse AC, Ring AM, Manglik A, Hu J, Hu K, Eitel K, Hubner H, Pardon E, Valant C, Sexton PM, Christopoulos A, Felder CC, Gmeiner P, Steyaert J, Weis WI, Garcia KC, Wess J, Kobilka BK. Activation and allosteric modulation of a muscarinic acetylcholine receptor. Nature 2013;504(7478):101-106.
59. Stevens RC, Cherezov V, Katritch V, Abagyan R, Kuhn P, Rosen H, Wuthrich K. The GPCR Network: a large-scale collaboration to determine human GPCR structure and function. Nat Rev Drug Discov 2013;12(1):25-34.
60. Katritch V, Cherezov V, Stevens RC. Structure-function of the G protein- coupled receptor superfamily. Annu Rev Pharmacol Toxicol 2013;53:531-556.
61. Schlinkmann KM, Honegger A, Tureci E, Robison KE, Lipovsek D, Pluckthun A. Critical features for biosynthesis, stability, and functionality of a G protein-coupled receptor uncovered by all-versus-all mutations. Proc Natl Acad Sci U S A 2012;109(25):9810-9815.
62. Cheng RP, Weng YJ, Wang WR, Koyack MJ, Suzuki Y, Wu CH, Yang PA, Hsu HC, Kuo HT, Girinath P, Fang CJ. Helix formation and capping energetics of arginine analogs with varying side chain length. Amino Acids 2012;43(1):195-206.
63. Vasudev PG, Banerjee M, Ramakrishnan C, Balaram P. Asparagine and glutamine differ in their propensities to form specific side chain-backbone hydrogen bonded motifs in proteins. Proteins 2012;80(4):991-1002.
64. Dasgupta B, Dey S, Chakrabarti P. Water and side-chain embedded pi-turns. Biopolymers 2013.
65. Hessa T, Meindl-Beinker NM, Bernsel A, Kim H, Sato Y, Lerch-Bader M, Nilsson I, White SH, von Heijne G. Molecular code for transmembrane-helix recognition by the Sec61 translocon. Nature 2007;450(7172):1026-1030.
66. Stansfeld PJ, Jefferys EE, Sansom MS. Multiscale simulations reveal conserved patterns of lipid interactions with aquaporins. Structure 2013;21(5):810-819.
67. Lee AG. Biological membranes: the importance of molecular detail. Trends Biochem Sci 2011;36(9):493-500.
68. Senes A, Gerstein M, Engelman DM. Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with β-branched residues at neighboring positions. J Mol Biol 2000;296(3):921-936.
69. Curran AR, Engelman DM. Sequence motifs, polar interactions and conformational changes in helical membrane proteins. Curr Opin Struct Biol 2003;13(4):412-417.
70. Penel S, Hughes E, Doig AJ. Side-chain structures in the first turn of the α- helix. J Mol Biol 1999;287(1):127-143.
71. Dmitriev OY, Fillingame RH. The rigid connecting loop stabilizes hairpin folding of the two helices of the ATP synthase subunit c. Protein Science 2007;16(10):2118-2122.
72. Tastan O, Klein-Seetharaman J, Meirovitch H. The Effect of Loops on the Structural Organization of< i> α</i>-Helical Membrane Proteins. Biophysical journal 2009;96(6):2299-2312.
73. Kadota K, Hirokawa T, Mitaku S. Position Dependent Amino Acid Propensity in the Transmembrane Region for Topology Prediction of Membrane Proteins.
Page 35 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
36
74. Sun H, Greathouse DV, Andersen OS, Koeppe RE, 2nd. The preference of tryptophan for membrane interfaces: insights from N-methylation of tryptophans in gramicidin channels. J Biol Chem 2008;283(32):22233-22243.
75. Toyoshima C, Nakasako M, Nomura H, Ogawa H. Crystal structure of the calcium pump of sarcoplasmic reticulum at 2.6 A resolution. Nature 2000;405(6787):647-655.
76. Yamashita A, Singh SK, Kawate T, Jin Y, Gouaux E. Crystal structure of a bacterial homologue of Na+/Cl--dependent neurotransmitter transporters. Nature 2005;437(7056):215-223.
77. Langosch D, Arkin IT. Interaction and conformational dynamics of membrane-spanning protein helices. Protein Sci 2009;18(7):1343-1358.
78. Bondar AN, White SH. Hydrogen bond dynamics in membrane protein function. Biochim Biophys Acta 2012;1818(4):942-950.
79. Venkatakrishnan AJ, Deupi X, Lebon G, Tate CG, Schertler GF, Babu MM. Molecular signatures of G-protein-coupled receptors. Nature 2013;494(7436):185-194.
80. Vroling B, Sanders M, Baakman C, Borrmann A, Verhoeven S, Klomp J, Oliveira L, de Vlieg J, Vriend G. GPCRDB: information system for G protein- coupled receptors. Nucleic Acids Res 2011;39(Database issue):D309-319.
81. Ballesteros JA, Shi L, Javitch JA. Structural mimicry in G protein-coupled receptors: implications of the high-resolution structure of rhodopsin for structure-function analysis of rhodopsin-like receptors. Mol Pharmacol 2001;60(1):1-19.
82. Blin N, Yun J, Wess J. Mapping of single amino acid residues required for selective activation of Gq/11 by the m3 muscarinic acetylcholine receptor. J Biol Chem 1995;270(30):17741-17748.
83. Cho W, Taylor LP, Akil H. Mutagenesis of residues adjacent to transmembrane prolines alters D1 dopamine receptor binding and signal transduction. Mol Pharmacol 1996;50(5):1338-1345.
84. Lundstrom K, Turpin MP, Large C, Robertson G, Thomas P, Lewell XQ. Mapping of dopamine D3 receptor binding site by pharmacological characterization of mutants expressed in CHO cells with the Semliki Forest virus system. J Recept Signal Transduct Res 1998;18(2-3):133-150.
85. Valiquette M, Vu HK, Yue SY, Wahlestedt C, Walker P. Involvement of Trp- 284, Val-296, and Val-297 of the human delta-opioid receptor in binding of delta-selective ligands. J Biol Chem 1996;271(31):18789-18796.
86. Pollock NJ, Manelli AM, Hutchins CW, Steffey ME, MacKenzie RG, Frail DE. Serine mutations in transmembrane V of the dopamine D1 receptor affect ligand interactions and receptor activation. J Biol Chem 1992;267(25):17780- 17786.
87. Kostenis E, Conklin BR, Wess J. Molecular basis of receptor/G protein coupling selectivity studied by coexpression of wild type and mutant m2 muscarinic receptors with mutant G alpha(q) subunits. Biochemistry 1997;36(6):1487-1495.
88. Rosenbaum DM, Zhang C, Lyons JA, Holl R, Aragao D, Arlow DH, Rasmussen SG, Choi HJ, Devree BT, Sunahara RK, Chae PS, Gellman SH, Dror RO, Shaw DE, Weis WI, Caffrey M, Gmeiner P, Kobilka BK. Structure and function of an irreversible agonist-beta(2) adrenoceptor complex. Nature 2011;469(7329):236-240.
Page 36 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
37
89. Rader AJ, Anderson G, Isin B, Khorana HG, Bahar I, Klein-Seetharaman J. Identification of core amino acids stabilizing rhodopsin. Proc Natl Acad Sci U S A 2004;101(19):7246-7251.
90. Tan K, Pogozheva ID, Yeo GS, Hadaschik D, Keogh JM, Haskell-Leuvano C, O'Rahilly S, Mosberg HI, Farooqi IS. Functional characterization and structural modeling of obesity associated mutations in the melanocortin 4 receptor. Endocrinology 2009;150(1):114-125.
Page 37 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
38
FIGURE LEGENDS
Fig. 1. Amino acid preferences at 15 positions (9 helical and 6 near-helical) for α-
helices in globular and membrane proteins. The horizontal line at 1.2 corresponds
to the propensity cut-off value, above which the residue occurrence in membrane
proteins is significant. The stars indicate that the difference between propensity values
in globular and membrane proteins is statistically significant at that position.
Fig. 2. Amino acid preferences for select amino acid residues at a) N-terminus
and b) C-terminus in globular proteins and membrane protein helices., grouped
into those which have their N-termini / C-termini embedded in the membrane
and those with their N-termini / C-termini located outside the membrane (in the
cytosol or extracellular region). The stars indicate that the difference between
residue propensity values in membrane helices with their N or C-termini protruding or
embedded is statistically significant.
Fig. 3. Amino acids showing strong preferences at helix termini facing
hydrophobic environment within the membrane and polar environment if
terminating in the cytosolic/extracellular region. Preferred amino acids at helical
(N1, N2, C2, C1) and near-helical (N’, Ncap, Ccap, C’) have been shown, with those
colored in blue being highly preferred in membrane proteins as compared to globular
proteins. Red and blue planes indicate the outer and inner membrane boundaries
respectively as specified by the OPM database.
Page 38 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
39
Fig. 4. Sequence preferences in a) three and b) four residue linkers connecting
neighboring transmembrane helices.
The arrows adjacent to the helices indicate the N to C direction. Amino acids
preferred at the N and C-termini have of individual helices have been shown in Figure
1. The preferred residues that are common to the Cc and N’’, C’ and N’ and C’’ and
Nc positions, forming a three residue linker are shown in figure a. Similarly preferred
residues at the overlapping positions C’ and N’’, C’’ and N’ in four residue linkers are
shown in figure b.
Fig. 5. Hydrogen bond formation involving Cα –H of Proline at C’ position
observed to stabilize the C-terminus in 13 α-helices. A representative example
showing a Cα –H..O hydrogen bond between Proline-119 at C’ position and carbonyl
oxygen atom of a Tyrosine-115 at C3 position in a helix in L-carnitine antiporter
(PDB id - 2wsw).
Page 39 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
1
Table I: Division of the dataset of α-helices in membrane proteins based on the location
of their N and C-termini. (See also Fig S1)
Environment at the
helix termini
No. of
helices
(865)
Mean
length (No.
of amino
acids)
Median
length
(No. of
amino
acids)
Both N and C termini
embedded
64 16 +/- 2 16
Only N-terminus
embedded
156 21 +/- 4 22
Only C-terminus
embedded
141 22 +/- 3 22
Both N and C termini
protruding
504 28 +/- 6 30
Page 40 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
1
Table II: Capping motifs at the N-termini in globular and membrane proteins. The
membrane protein dataset has been subdivided based on α-helices with their termini
‘embedded’ inside or ‘protruding’ outside the membrane. The helices with residues up to
N4’ have been selected for analysis as they play a role in the formation of the capping motifs.
Motif at N-
terminus
Helices in
globular
proteins (2628)
Helices in
membrane
proteins (total
dataset) (831)
Helices with
N-terminus
protruding
(620)
Helices with
N-terminus
embedded
(211)
Capping box 1065 (40.5 %) 338 (40.6 %) 286 (46 %) 52 (24.6 %)
β-box 167 (6.3 %) 87 (10.2 %) 74 (12 %) 13 (6 %)
Total 1232 (47%) 425 (51%) 360 (58%) 65 (30.8 %)
Page 41 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
1
Table III: Various helix terminating motifs observed in globular and membrane proteins. The membrane protein dataset has been subdivided based on the helices terminating inside and outside the membrane. The helices with residues up to C4’ have been selected for analysis as they play a role in the formation of different helix termination motifs by mediating hydrophobic interactions with side chains of amino acids in the last turn of the helix. Glycine Schellman motif - A glycine at Ccap with left handed helical torsions and characteristic Schellman motif (6→1 and 5→2) hydrogen bonds Non-glycine Schellman motif - Similar to the Glycine Schellman motif with a non-glycine amino acid at Ccap with left handed torsions Right handed Schellman motif – A helix termination motif with right handed torsion angles at Ccap and a 6→1 hydrogen bond between C’ and C4 residues α-L motif - Glycine at Ccap with left handed helical torsions lacking the hydrogen bond pattern to form a Schellman motif Non-glycine α-L motif – Similar to the α-L motif with a non-glycine amino acid at the Ccap with left handed torsions Proline C’ motif - Proline occurs at C’ positions to force the Ccap to take up an ’extended’ conformation and break the helix Extended Ccap motif – The Ccap residue takes up an ‘Extended’ conformation to terminate the α-helix Motif at
Cterminus
Helices in
globular proteins
(2394)
Helices in
membrane
proteins (total
dataset) (824)
Helices with their
C-terminus
protruding (630)
Helices with
their C-terminus
embedded (194)
Glycine-Schellman
motif
464 (19.3 %) 118 (14.3 %) 110 (17.4 %) 8 (4 %)
Non-glycine
Schellman motif
210 (8.7 %) 63 (7.6 %) 42 (6.6 %) 21 (11 %)
Right handed
Schellman motif
31 (1.2%) 16 (2%) 12 (2%) 4 (2%)
α-L motif 322 (13.9 %) 86 (10.4 %) 83 (13 %) 3 (1.5 %)
Non-glycine α-L
motif
140 (5.8%) 20 (2.4 %) 18 (2.8 %) 2 (1%)
Proline C’ motif 212 (8.8%) 69 (8.8 %) 44 (7 %) 25 (13 %)
Extended Ccap 648 (27 %) 240 (29 %) 198 (31.4 %) 42 (21.6 %)
Total 2024 (84.6 %) 612 (74.2%) 507 (80.4 %) 105 (54 %)
Page 42 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
1
Table IV: Comparison of sequence preferences at Ncap, N1, N2, Ccap and C’ positions listed in decreasing order of preference Column 3 lists residues of the present analysis which is a propensity based analysis carried out on a dataset of 75 high resolution (<2.5Ǻ) integral membrane proteins. Column 4 gives sequence preferences calculated using data taken from a 454 ultradeep sequencing study that identifies the evolutionarily conserved amino acids in the rat neurotensin receptor by Schlinkmann and co-workers61. Column 5 gives sequence preferences calculated using multiple sequence alignments of more than 20,00 GPCR class-A sequences. Column 6 lists sequences preferences obtained from propensity analysis of a data set of 17crystal structures of GPCR class-A family, solved at resolution <2.5Ǻ.
‡ The residue preferences indicated in column 3 do not belong a particular transmembrane α-helix but all α-helices in 75 membrane proteins. Only three structures of GPCRs are present in the 'Present analysis' dataset and these have not been included in the analysis for 'GPCR class-A crystal structures' listed in column 6. Color coding scheme in the third and sixth column is based on propensity values of amino acids at a particular position. Here amino acids are highlighted in blue (propensity>1.2), dark green (1.2>propensity>1), light green (1>propensity>0.8). *Color coding scheme in columns 4 and 5 follows the nomenclature used by Schlinkmann and coworkers61 and indicates conservation, enrichment, mild enrichment, no significant change, mild deselection and strong deselection of an amino acid.
Helix No. Positions Residue preferences
Present analysis†
Schlinkmann
and
Co-workers*
GPCR
class A
consensus*
GPCR class A
crystal
structures†
TM1
(61-88)
Ncap (60) N,D,T,S,P,H,G N,G,H,D - T,G,S,N,P,H N1 (61) P,W,E,K,I,L,V,A M,I,W - P,N,L,E N2 (62) E,W,P,A,D,F,S,T,R,Q Y,W,V E T,E,W,L,N
Ccap (89) G,N,H,Q,C,K,R,T,D,F T,P,I I G,H,K,N C’ (90) P,H,K,G,S,D,Q F L P,K,T,R
TM2
(98-130)
Ncap (97) N,D,T,S,P,H,G K,S T T,G,S,N,P,H N1 (98) P,W,E,K,I,L,V,A I,V P P,N,L,E N2 (99) E,W,P,A,D,F,S,T,R,Q C,M,P M T,E,W,L,N
Ccap (131) G,N,H,Q,C,K,R,T,D,F I,V G G,H,K,N C’ (132) P,H,K,G,S,D,Q H,P,Q Y P,K,T,R
TM3
(139-172)
Ncap (138) N,D,T,S,P,H,G G S T,G,S,N,P,H N1 (139) P,W,E,K,I,L,V,A I,D,E F P,N,L,E N2 (140) E,W,P,A,D,F,S,T,R,Q F,I,W,L A T,E,W,L,N Ccap (173) G,N,H,Q,C,K,R,T,D,F E,T H G,H,K,N C’ (174) P,H,K,G,S,D,Q P,G,D P P,K,T,R
TM4
(187-207)
Ncap (186) N,D,T,S,P,H,G T,S V T,G,S,N,P,H N1 (187) P,W,E,K,I,L,V,A I,L,V C P,N,L,E N2 (188) E,W,P,A,D,F,S,T,R,Q Y,L L T,E,W,L,N
Page 43 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
2
Ccap (208) G,N,H,Q,C,K,R,T,D,F T,V,A,S L G,H,K,N C’ (209) P,H,K,G,S,D,Q M T P,K,T,R
TM5
(231-266)
Ncap (230) N,D,T,S,P,H,G V,G - T,G,S,N,P,H N1 (231) P,W,E,K,I,L,V,A G,N - P,N,L,E N2 (232) E,W,P,A,D,F,S,T,R,Q N,A E T,E,W,L,N Ccap (267) G,N,H,Q,C,K,R,T,D,F W,Q,H - G,H,K,N C’ (268) P,H,K,G,S,D,Q S,T - P,K,T,R
TM6
(302-334)
Ncap (301) N,D,T,S,P,H,G E,I I T,G,S,N,P,H N1 (302) P,W,E,K,I,L,V,A D,Q R P,N,L,E N2 (303) E,W,P,A,D,F,S,T,R,Q C,G,L S T,E,W,L,N Ccap (335) G,N,H,Q,C,K,R,T,D,F K,S,R,P,T P G,H,K,N C’ (336) P,H,K,G,S,D,Q D,S,N S P,K,T,R
TM7
(341-374)
Ncap (340) N,D,T,S,P,H,G T,S P T,G,S,N,P,H N1 (341) P,W,E,K,I,L,V,A T,S D P,N,L,E N2 (342) E,W,P,A,D,F,S,T M,L L T,E,W,L,N Ccap (375) G,N,H,Q,C,K,R,T,D,F D,S,T E G,H,K,N C’ (376) P,H,K,G,S,D,Q D,L,T F P,K,T,R
Page 44 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Fig. 1. Amino acid preferences at 15 positions (9 helical and 6 near-helical) for α-helices in globular and membrane proteins. The horizontal line at 1.2 corresponds to the propensity cut-off value, above which the
residue occurrence in membrane proteins is significant. The stars indicate that the difference between
propensity values in globular and membrane proteins is statistically significant at that position. 209x297mm (300 x 300 DPI)
Page 45 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Fig. 2. Amino acid preferences for select amino acid residues at a) N-terminus and b) C-terminus in globular proteins and membrane protein helices., grouped into those which have their N-termini / C-termini
embedded in the membrane and those with their N-termini / C-termini located outside the membrane (in
the cytosol or extracellular region). The stars indicate that the difference between residue propensity values in membrane helices with their N or C termini protruding or embedded is statistically significant.
173x245mm (300 x 300 DPI)
Page 46 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Fig. 3. Amino acids showing strong preferences at helix termini facing hydrophobic environment within the membrane and polar environment if terminating in the cytosolic/extracellular region. Preferred amino acids at helical (N1, N2, C2, C1) and near-helical (N’, Ncap, Ccap, C’) have been shown, with those colored in blue
being highly preferred in membrane proteins as compared to globular proteins. Red and blue planes indicate the outer and inner membrane boundaries respectively as specified by the OPM database.
113x136mm (300 x 300 DPI)
Page 47 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Fig. 4. Sequence preferences in a) three and b) four residue linkers connecting neighboring transmembrane helices.
The arrows adjacent to the helices indicate the N to C direction. Amino acids preferred at the N and C-
termini have of individual helices have been shown in Figure 1. The preferred residues that are common to the Cc and N’’, C’ and N’ and C’’ and Nc positions, forming a three residue linker are shown in figure a. Similarly preferred residues at the overlapping positions C’ and N’’, C’’ and N’ in four residue linkers are
shown in figure b. 109x65mm (300 x 300 DPI)
Page 48 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Fig. 5. Hydrogen bond formation involving Cα –H of Proline at C’ position observed to stabilize the C-terminus in 13 α-helices. A representative example showing a Cα –H..O hydrogen bond between Proline-119 at C’ position and carbonyl oxygen atom of a Tyrosine-115 at C3 position in a helix in L-carnitine antiporter
(PDB id - 2wsw) 113x133mm (300 x 300 DPI)
Page 49 of 49
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics