Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… ·...

35
Chapter 5 PHYLOGENETIC ANALYSIS OF σ 32 AND FtsH

Transcript of Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… ·...

Page 1: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

PHYLOGENETIC ANALYSIS OF σ32

AND FtsH

Page 2: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

71

5.1 EVOLUTIONARY ANALYSIS OF PROKARYOTIC HEAT SHOCK

TRANSCRIPTION REGULATORY PROTEIN SIGMA32

Induction of heat-shock proteins (HSPs) at elevated temperature is a universal cellular

response. The HSPs function as chaperones and proteases (Gross,1996; Yura et al.,2000).

In Escherichia coli HSPs are usually regulated by a single transcription factor, e.g. the heat

shock regulator protein σ32 (the rpoH gene product). On the other hand in eukaryotic

cells the heat shock factors (HSFs) are also involved in the regulation process (Guisbert et

al.,2008). From lower prokaryotes to humans some HSPs are conserved and act as

response to the environmental stress. These HSPs include mainly the chaperones, which

help proteins to fold properly (Guisbert et al.,2008). In Escherichia coli, expression of

more than 30 heat shock ge es is u der the co trol of the alter ative sig a factor σ ,

the product of the rpoH gene. Cellular σ level is very lo duri g gro th at 0C and

increases transiently when temperature increases. Moreover, the half-life of Escherichia

coli σ is e tre ely s all, less tha . i at C; upon temperature up-shift from 30

to 42 C, σ ecomes transiently stabilized at least 8-fold (Gruber and Gross, 2003).

Three regulatory echa is s are respo si le for the σ ediated regulatio of heat

shock responses (HSRs) in Escherichia coli (Yura and Nakahigashi, 1999). They are:

1. Control of synthesis of σ : σ sy thesis is co trolled y te perature.

2. Co trol of activity of σ : The u folded protei titratio odel has ee proposed

to e plai the regulatio of σ . Over-expression of either the DnaK/J or GroEL/S

chaperone machine inactivates σ . Co versely, depletio of either of the chapero e

achi e leads to accu ulatio of active σ .

3. Co trol of degradatio of σ : The post tra slatio al level regulatio occurs via

co trolled proteolysis of σ i volvi g differe t proteases e.g. FtsH, Lon, HslVU) and

chaperones (e.g. DnaKJ, GroELS) [Herman et al.,1995; Tomoyasu et al.,1995; Guisbert et

al., 2004; Kanemori et al., 1997). Normally at 30 C, the DnaK/J heat-shock chaperone

system binds with σ32, limiting its binding to core RNA polymerase (Liberek et al., 1995).

Page 3: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

72

The FtsH, an ATP-dependent heat-shock metalloprotease, degrades σ32 (bound with

DnaK/J) (Ito and Akiyama, 2005; Blaszczak et al., 1999). Upon heat stress, the chaperone

system DnaK/J becomes engaged with the increased cellular level of unfolded proteins

and thus makes the σ32 free to bind with core RNA polymerase, ultimately inducing HSPs

[Arsène et al., 2000). The interaction of σ32 with RNA polymerase core enzyme prevents

degradation of σ32 by the protease FtsH (Blaszczak et al., 1999), finally increasing its

stability.

Usually acteria co tai a si gle housekeepi g σ factor σ70) and several alternative σ s

σ , σ , σ 4 etc. , hich ediate respo se to differe t altered e viro e tal

conditions (Guisbert et al.,2008; Ohnuma et al., 2000; Raina et al., . σ factors ai ly

contains two to four functional regions in their sequences set depending on the particular

group to hich they elo gs. σ is a e er of the group σ s; they co tai regio s ,

3 and 4 (Guisbert et al.,2008). The regions contain recognition determinants for RNA

polymerase binding and for promoter recognition. In Escherichia coli, region2 i) consists

of sub-regions 1.2 to 2.4 (amino acid residue number 16 to 126), ii) recognizes the -10

region of promoter and iii) carries the major RNA polymerease recognition determinants.

On the other hand, region 4 is made up of sub-regions 4.1 and 4.2 (amino acid residue

number 213 to 280) and recognizes -35 region of promoter (Gruber and Gross, 2003). Our

sequence analysis study revealed that the amino acid sequences in these two regions are

conserved throughout the proteobacterial kingdom barring some mutations here and

there. The rpoH o regio i σ a i o acid residue u er to 4 has also ee

proved to play a role in binding to RNA polymerase. In addition to carrying out the

functions common to all σ s, σ must contain determinants that allow it to bind several

heat-shock chaperones like DnaK/J and heat-shock protease like FtsH , so that its activity

and stability can be properly regulated.

“i ce σ is prese t i al ost all acterial species a d ecause they are uite co served

we have attempted to study the evolutionary pattern of this important protein across the

proteobacterial kingdom using their phylogenetic relationships across the proteobacterial

Page 4: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

73

kingdom. The proteobacteria being the largest and the most diversified class of

eubacteria, are classified in different categories on the basis of their 16s rRNA sequences

(Prescott et al., 1999). We have investigated the se ue ces of σ of 4 differe t

proteobacterial species. Our aim is (1) to study the sequence diversity, length variation in

the conserved regions 2 and 4, (2) to determine the secondary structure variations and

(3) to draw the phylogenetic relatio ships a d a alyze the evolutio ary patter of σ

throughout the proteobacterial kingdom. Several groups used forward genetic screens to

study the effects of the utatio s i σ O rist et al., . It as o served that poi t

mutations in region 2.1 led to the stabilization of the protein. In addition, region 2.1 was

rece tly descri ed to e i porta t for chapero e ediated i activatio of σ Yura.,

2007). But the possible roles of the mutations which confer different functionalities to the

σ 2 protein have not yet been reported. No such study, using just the protein sequence

data, has ee reported earlier. Our results o the differe t regio s of σ ay shed

so e light i a alyzi g the roles of differe t utatio s i σ .

5.1.1 METHODS:

5.1.1.1 DATABASES:

Database is a collection of searchable, structured and updated data. Retrieval of dataset

and sketch the new conclusion are major reasons behind the database formation.

Retrieval of data may be done manually or by using computer programme.

5.1.1.2 PROTEIN SEQUENCE DATABASE AND UNIPROTKB

Protein sequence databases are a collection of annotated sequence information from

several source organisms. This is actually the fundamental determinants of biological

structure and functions.

Page 5: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

74

The Universal Protein Resource (UniProt) is a comprehensive annotated resource of

protein sequence. It serves through the collaboration among the European Bioinformatics

Institute (EMBL-EBI), Swiss Institute of Bioinformatics (SIB) and the Protein Information

Resource (PIR). UniProtKB is the central hub for the collection of functional information of

proteins with accurate, consistent, and rich information.

5.1.1.3 PAIRWISE SEQUENCE ALIGNMENT

Pairwise Sequence Alignment is used to identify regions of similarity that confer

functional, structural and/or evolutionary relationship between two protein or nucleotide

sequences. In 1970 Needleman and Wunsch demonstrated dynamic algorithm for global

alignment of two protein sequences. In 1981 Smith and Waterman developed a local

implementation of the dynamic algorithm and remove the obligations of the previous

one. Dynamic programming algorithm provides perfect output alignment for a given

scoring matrix, though is slow to some extent. Almost a decade latter Altschul et al (in

1990) introduce a heuristic approach Basic Local Alignment Search Tool (BLAST) for the

use on large datasets.

5.1.1.4 SEQUENCE HOMOLOGY SEARCH AND PAIR WISE ALIGNMENT OF

SEQUENCES

I itially a i o acid se ue ces of σ32 proteins from 24 different organisms were

extracted from refseq of NCBI, fro hich 4 a i o acid se ue ces of σ32 proteins were

chose for our study. “i ce for the sa e orga is there ere ultiple e tries for the σ32

proteins, each having the same amino acid sequence compositions, we chose one

represe tative se ue ce of σ32 protein for each of the 24 different organisms to eliminate

the data redundancies. NCBI refseq was selected for collecting our required sequences

because it provides comprehensive, integrated, non-redundant, well-annotated set of

sequences. The accession numbers of the proteins were tabulated in Table 5.1.

Page 6: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

75

Species Protein Accession Number

Salmonella enterica subsp. enterica serovar Typhi str. CT18 [GenBank:NP_458353.1]

Yersinia enterocolitica subsp. enterocolitica 8081 [GenBank:YP_001004614.1]

Haemophilus influenzae Rd KW20 [GenBank:NP_438438.1]

Citrobacter koseri ATCC BAA-895 [GenBank:YP_001456366.1]

Dickeya dadantii 3937 [GenBank:YP_003880910.1]

Pectobacterium carotovorum subsp. brasiliensis PBR1692 [GenBank:ZP_03827501.1]

Edwardsiella tarda EIB202 [GenBank:YP_003297368.1]

Xenorhabdus bovienii SS-2004 [GenBank:YP_003466496.1

Enterobacter cloacae SCF1 [GenBank:YP_003939845.1]

Pantoea sp. At-9b [GenBank:YP_004114141.1]

Erwinia amylovora CFBP1430 [GenBank:YP_003532845.1]

Photorhabdus luminescens subsp. laumondii TTO1 [GenBank:NP_931293.1]

Shigella flexneri 2a str. 301 [GenBank:NP_709231.1]

Cronobacter sakazakii ATCC BAA-894 [GenBank:YP_001440292.1]

Klebsiella pneumoniae subsp. pneumoniae MGH 78578 [GenBank:YP_001337481.1]

Sodalis glossinidius str. 'morsitans' [GenBank:YP_453767.1]

Serratia symbiotica str. Tucson [GenBank:ZP_08039036.1]

Proteus mirabilis HI4320 [GenBank:YP_002153283.1]

Rahnella sp. Y9602 [GenBank:YP_004214910.1]

Photobacterium damselae subsp. damselae CIP 102761 [GenBank:ZP_06156576.1]

Mannheimia succiniciproducens MBEL55E [GenBank:YP_087217.1]

Pasteurella multocida subsp. multocida str. Pm70 [GenBank:NP_246523.1]

Tolumonas auensis DSM 9187 [GenBank:YP_002894185.1]

Escherichia coli K-12 [GenBank: AAC76486.1]

Table 5.I: List of σ32 proteins of the 24 different proteobactreria used in our analysis

Page 7: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

76

Only those amino acid sequences for which there were clear annotations and no

ambiguities were chosen. These sequences were used as inputs to run BLAST (Altschul et

al., 1990), using the default parameters, in order to find out the homologous sequences.

The BLAST results again produced the same set of sequences as obtained previously. This

could be considered as a double check of our initial results of downloading the

sequences.

5.1.1.5 MULTIPLE SEQUENCE ALIGNMENT (MSA)

Multiple sequence alignments are an essential tool for protein structure and function

prediction, domain determination/characterization, phylogenetic inference and regular

sequence analysis. Due to requirement of high computational infrastructure dynamic

programming cannot easily utilized here for large number of sequences. So progressive

approach used for global multiple alignments are employed. CLUSTALW, a weighted

variant of CLUSTAL program, till date most widely used software program for multiple

sequence alignment. CLUSTALW constructs a distance matrix reflecting the relatedness of

each pair of sequences through pairwise alignment.

Here a sequence profile was generated by MSA, using the default parameters in the

software tool ClustalW (Thompson et al., 1994). The result shown in fig. 5.1. The MSA, so

generated, showed the presence of mutations in the sequences.

5.1.1.6 PREDICTION OF SECONDARY STRUCTURE

Secondary Structure prediction aims to predict the local secondary structures of

sequences based on the knowledge of their primary structure. Primary generation

prediction methods of early sixties and nineties mainly based on statistically calculated

single and multiple residue propensities. Modern day protein secondary structure

prediction methods based on evolutionary information contained multiple sequence

alignments viz, sequence conservation, hydrophobicity pattern, etc. Dynamic

Page 8: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

77

conformational changes related to the function of proteins and envioronmental

complications.

The secondary structures of the 32 family of proteins were predicted from their amino

acid sequences using the GOR IV, DSC of University of Wisconsin Genetics Computer

Group (GCG) package (Wisconsin Package Version 9.0., 1997). The secondary structures

were classified as helix, sheet and others. The generalized secondary structural patterns

of the 24 proteins were presented in figures 5.2a and 5.2b. The distribution of secondary

structural elements in the proteins has been found to be conserved, albeit a few

differences here and there. Therefore, figures 3a and 3b show a consensus view of the

distribution of secondary structure within the 32 family.

5.1.1.7 PROTEIN FUNCTION PREDICTION

The functions of the different parts of the 32 family of proteins were predicted from the

outputs of Pfam (Bateman et al., 2004). The Pfam results were obtained from the basis of

a MSA of closely related homologous proteins. The amino acid sequences of the

aforementioned 24 proteins were incorporated one by one to Pfam and the results were

analyzed. Pfam also demarcated the amino acid sequences of the proteins on the basis of

their structural components.

5.1.1.8 DISTANCE MATRIX CALCULATION AND CONSTRUCTION OF

PHYLOGENETIC TREE

A distance matrix was generated using MEGA (Tamura et al., 2007). This tool used

Maximum Composite Likelihood (MCL) method to estimate the evolutionary distances

between sequences. The MCL approach was different from the existing approaches for

evolutionary distance estimation, wherein each distance was estimated independent of

others, either by analytical formulae or by likelihood methods (independent estimation

[IE] approach). This implementation of the MCL method allowed for the consideration of

Page 9: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

78

substitution rate variation from site to site, using an approximation of the gamma

distribution of evolutionary rates.

5.1.2 RESULT & DISCUSSION:

5.1.2.1 SEQUENCE CONSERVATION OF σ32

Being the major heat shock regulatory protei σ32 showed great conservatism in their

functional regions among proteobacterial organisms. Both the terminal regions i.e., the

carboxyl terminal (C-ter i al a d the a i o ter i al N ter i al regio s of the σ32

proteins, contain the co served σ70 super family specific conserved regions. Initially 95

a i o acid se ue ces of σ32 proteins from 24 different organisms were extracted from

refse of NCBI, fro hich 4 a i o acid se ue ces of σ32 proteins were chosen for our

study. Since for the sa e orga is there ere ultiple e tries for the σ32 proteins each

having the same amino acid sequence compositions, we chose one representative

se ue ce of σ32 protein for each of the 24 different organisms to eliminate data

redundancies. NCBI refseq was selected for collecting our required sequences because it

provides comprehensive, integrated, non-redundant, well-annotated set of sequences.

As already e tio ed σ32 , a e er of the group σ s, contains regions 2, 3 and 4 in its

amino acid sequence (Gruber and Gross, 2003). The amino acid sequences of region 2

(amino acid residue numbers 53-126) in all the 24 different organisms were highly

conserved barring 14 amino acid residue positions of which at nine amino acid positions

(amino acid residue numbers 54, 55, 62, 64, 91, 94, 96, 97 and 123) there were

synonymous substitutions. There were non-synonymous substitutions at the following 4

amino acid sequence positions 56, 69, 75 and 116. The remaining amino acid sequence

position (position 67) was highly varia le i all of the σ32 proteins of the 24 different

organisms. This was presented in the sequence alignment of the corresponding regions of

σ32 in figure 5.1.

Page 10: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

79

Figure 5.1 - Multiple Sequence Alignment of sigma32 from 24 different proteobacteria

using ClustalW default parameters, showing their conserved/variable regions, functionally

important residues and functional regions shown in the box by CINEMA-MX (Lord et al.,

2002) .

Page 11: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

80

Compared to region 2, region 4 (amino acid sequence positions 216-280) showed more

sequence diversity (figure 5.1). In all the 24 different organisms, out of the 65 amino acid

residues in region 4, 28 positions had exactly the same amino acid residue. On the other

ha d at positio s i the a i o acid se ue ces i regio 4 of σ32 proteins there were

synonymous substitutions in all the 24 different organisms. Again non-synonymous

substitutions were found to be present at 6 different positions in amino acid sequences in

regio 4 of σ32 proteins in all the 24 different organisms. The amino acid sequence

positions of non-synonymous substitutions were 220, 223, 227, 228, 238 and 261. These

mutations led to functional divergences in the proteins as presented in the following

section.

5.1.2.2 FUNCTIONAL AND MUTATIONAL ANALYSIS

σ32 protein plays the regulatory role in heat-shock transcription. To search whether there

as a y fu ctio al diversity i σ32 of different proteobacterial species, we ventured into

Pfam (Bateman et al., 2004)-based functional studies. The Pfam search results showed

that the a i o acid se ue ces of σ32 proteins bore resemblances to other different

functionally important domains of various other proteins.

Pfam search results revealed that the ost co served regio of the σ32 proteins in all the

24 different organisms had the signature sequence of the -10 promoter recognition

motif, being helical in nature, as well as the signature sequence of the primary core RNA

polymerase binding determinant (Malhotra et al., 1996). The aromatic residues at the C-

terminus of this region take part in the transcription initiation by mediating strand

separation (Malhotra et al., 1996; Campbell et al., 2002). The other most conserved

region had the signature sequence of -35 core promoter binding motif (Campbell et al.,

2002).

Pfam search results also revealed the presence of other sequence motifs. A short stretch

of amino acids at positions 60 – 96 of region 2 had similarity with ubiquinol-cytochrome C

reductase, a helper protein of electron transport system (Iwata et al., 1998). This motif

Page 12: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

81

was found to be present in the amino acid sequences of all the 24 different organisms. In

some of organisms the signature sequences of Opi1 family of proteins were present. This

Opi1 family of proteins takes part in phospholipid biosynthesis (White et al., 1991) and

also acts as a repressor of several genes through phosphorylation (Chang et al., 2006).

The amino acid sequences at positions 130 – 171 of the σ32 proteins in all the 24 different

organisms were found to have sequence homology with respiratory-chain NADH

dehydrogenase, which catalyzed the transfer of electron from NADH to coenzyme Q

(Nakamaru-Ogiso et al., 2010).

The amino acid sequences at positions 218 – 262, which lie in the region 4 of the σ32

proteins in all the 24 different organisms, had signature sequence of the double strand

recombinant repair protein of Mei5 family, which played a role in the recombination of

homologous chromosomes during meiosis (Hayase et al., 2004).

A conserved patterns of amino acids were found at the C-terminal parts of amino acid

sequences of the σ32 proteins in all the 24 different organisms resembling the signature

sequence of RHH1 (ribbon helix helix) family of proteins. This RHH1 family of proteins

played a role in nucleotide incorporation in DNA. However, this family of proteins did not

function as DNA binding proteins (Gomis et al., 1998).

In Photobacterium damselea (ZP_06156576.1), the amino acid sequence at positions 166-

217 showed similarity with phospho-protein family which modulated the polymerase

protein (Pattnaik et al., 1997).

5.1.2.3 CONSERVED SECONDARY STRUCTURE/PATTERN ANALYSIS

“eco dary structures of the σ32 proteins in all the 24 different organisms were predicted

using both Chou Fasman (CF), and Garnier Osguthorpe Robson (GOR) algorithms present

in GCG package (Wisconsin Package Version 9.0., 1997). The results showed a great deal

of se ue ce co servatio i the ai fu ctio al regio s regio s a d 4 of the σ32

proteins, although there were a few mutations in the amino acid sequences in some parts

of region 2 (positions 26,27,45,53,,68,70) and region 4 (positions 221,222,224,228,229) of

the σ32 proteins (shown in fig5.2a, 5.2 . The regio of the σ32 proteins in all the 24

different organisms was found to be comprised of 4 long helices interspersed with sheet

Page 13: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

82

regions. On the other hand, the region 4 of the σ32 proteins in all the 24 different

organisms was made up of 4 helices and one sheet joined together by loops and turn

regions. The figures 3a, 3b depicting the arrangements of the secondary structural

ele e ts i regio a d regio 4 of σ32 proteins were prepared using Geneious v5.3

(Drummond et al., 2010).

Figure: 5.2(a)

Figure: 5.2(b)

Figure 5.2 - (a) Secondary structure of region 2 (above) (b) Secondary structure of region

4(below) Both were developed in GeniousPro. Mauve, Ivory white, blue coloured regions

denote alpha helix, beta sheet, and turns respectively.

Page 14: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

83

5.1.2.4 PHYLOGENETIC RELATIONSHIPS OF σ32 BETWEEN DIFFERENT SPECIES

We used Multiple Sequence Alignment (MSA) to detect the sequence conservation /

variatio s i the σ32 proteins in all the 24 different organisms. In order to derive a

phylogenetic relationship between these proteins a phylogenetic tree of the 24 different

proteobacterial species was constructed [Fig. 5.3]. In the top branch of the tree

Salmonella enterica subspecies Enterica serovar Typhi strain CT18 and Shigella flexneri 2a

strain 301 clubbed together. With them Enterobacter cloacae SCF1 and Klebsiella

pneumoniae subspecies Pneumoniae MGH 78578 formed a sub group. Since these

proteobacterial species had similar biochemical functionalities, therefore their grouping

together was not surprising. The same trend followed throughout the tree. The bottom of

the tree had Photobacterium damselae subspecies damselae CIP 102761 and Tolumonas

auensis DSM 9187. The taxonomic details of Photobacterium damselae is Bacteria;

Proteobacteria; Gammaproteobacteria; Vibrionales; Vibrionaceae; Photobacterium

whereas the taxonomic details of Tolumonas auensis DSM 9187 is Bacteria;

Proteobacteria; Gammaproteobacteria; Aeromonadales; Aeromonadaceae; Tolumonas.

The Vibrionaceae family of Proteobacteria produces their own order. They are inhabitants

of fresh or salt water, several species are pathogenic (Fischer-Romero et al., 1996). It is

clearly evident that these two bacterial species are quite far from being related in terms

of taxonomy. This is reflected in the comparatively larger distance of separation between

them relative to the other proteobacterial species in the phylogenetic tree.

Page 15: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

84

Figure 5.3 - Phylogenetic tree drawn using method Neighbor Joining with

1000 bootstrap

Page 16: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

85

5.1.3 CONCLUSIONS:

σ -bound RNA polymerase involves in the bacterial heat-shock gene transcription. In our

study, the diversity in sequence, variations in the secondary structure and function

amongst the major functional regions of the proteo acterial σ fa ily of protei s, a d

their phylogenetic relationships have been established. Bacterial σ protei s ca e

subdivided into mainly three functional regions which are referred to as regions 2, 3, and

4. We have found a great deal of sequence conservation among the functional region 2 of

proteo acterial σ fa ily of proteins though some mutations are also present in these

regions while region 4 has comparatively more variable sequences. In this study, we want

to explore the effects of mutations in these regions. Our study suggests that the sequence

diversities due to natural mutations i the differe t regio s of proteo acterial σ fa ily

lead to different functions. I this ork e a alyzed the se ue ces of σ32 of 24 different

proteobacterial species. We used the NCBI refseq in our analysis. We observed that there

exists a great deal of sequence conservation in the regions 2 and 4 of the proteins though

there ere so e utatio s i regio s a d 4 as ell of σ32 proteins. The mutations in

the proteins confer different additional functionalities to the proteins. These

fu ctio alities ay help the σ32 protein to perform some additional functions other that

its usual function. There were no previous reports regarding the bioinformatic analysis of

the se ue ces of the σ32 proteins. This study also revealed the possible effects of

utatio s i the σ32 proteins. Our study would therefore be useful to elucidate the effects

of mutations in different amino acid sequence positions in this family of proteins. Since

the differe t utatio s ere respo si le for so e additio al fu ctio alities of the σ32

proteins, our study would therefore be the first of its kind to predict the possible

ioche ical roles of utatio s of σ32 proteins. This may shed light to identify the

molecular biochemistry of this family of proteins. So far, this study is the first

ioi for atic approach to ards the u dersta di g of the echa istic details of σ

family of proteins using the protein sequence information only. This study will help in

elucidating the unknown molecular mechanism and fu ctio alities of σ family of

proteins.

Page 17: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

86

5.2 PHYLOGENETIC ANALYSIS OF PROKARYOTIC PROTEASE FtsH

In a cell proteases are primarily required for the maintenance of cell homeostasis. They

act in protein quality control and are indispensable for the integrity and performance of a

cell. Most of the cytoplasmic membrane proteins or cytosolic proteins are assumed to be

stable under normal growth conditions whereas their abnormal, unfolded and non-

functional states are subjected to rapid degradation by proteases. Otherwise they

accumulate inside the cell and arrest cell growth. Escherichia coli cells contain several

ATP-dependent proteases, including ClpAP, ClpXP, HslUV, Lon and FtsH (Gottesman1996).

Among these FtsH is the only membrane bound protease (Akiyama and Ito 2003). It plays

crucial roles in the quality control of membrane proteins like YccA (Kihara and Akiyama

1999) by eliminating nonfunctional membrane proteins in prokaryotic organisms as well

as in the mitochondria and the chloroplasts of eukaryotic cells. It also degrades various

protei s like the heat shock sig a σ factor σ Ito a d Akiya a , To oyasu et al.

, λ phage tra scriptio factor cII hich plays a key role in lysogeny (Banuett et al.

1986), LpxC (Ito and Akiyama 2005, Tomoyasu et al. 1995), a fundamental enzyme that

co trols the a ou t of e ra e lipopolysaccharides, su u it α of F part of proto

ATPase(Ito and Akiyama 2005), uncomplexed SecY(Kihara et al.1999).

FtsH is a cytoplasmic membrane protein which can be subdivided into two major

segments, the N terminal and the C-terminal segment (Ito and Akiyama 2005). In case of

E. coli, the N-terminal segment (approximately 120 residues long) consists of a small

cytoplasmic region followed by two transmembrane regions and a periplasmic stretch.

The i teractio of FtsH ith HflKC co ple is attri uted to the periplas ic regio of FtsH

Kihara et al. . HflK a d HflC HflKC pair of e ra e proteins is known to be

required for proteolytic activities of FtsH against different substrates (Saikawa et al.

2004). The membrane-anchoring domain of FtsH is required for its oligomerization, hence

it is essential for its protease activity.

The C-terminal segment (approximately 525 amino acid long) consists of two

major domains, AAA+ ATPase domain and the metalloprotease domain (Accession No:

Page 18: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

87

sp_P0AAI3). The AAA+ protein modules are ubiquitously found in biological kingdoms and

known to be involved in a wide variety of cellular processes.

The ATPase domain is composed of the conserved Walker A, having the GXXGXGK (T/S)

box (Hanson and Whiteheart 2005) and WalkerB motifs which are essential for nucleotide

binding and hydrolysis. Apart from these two motifs this domain has another region of

homology (SRH), which carries conserved arginine residues argi i e fi gers hich

assists in oligomerization and nucleotide hydrolysis. The conserved FGV pore motif helps

in the substrate recognition and its translocation. The protease domain shows the zinc

binding HEXXH finger print and a coiled-coil leucine zipper sequence (Bieniossek et al.

2009).

Crystal structure of the C-terminal domain shows it forms a hexameric ring like

assembly (Bieniossek et al. 2006, Suno et al. 2006). The crystal structure of the ADP state

of the constructs of FtsH, spanning most of the cytosolic region, thus containing AAA and

protease domains from Thermotogamaritima, showed a hexameric assembly with the

protease region having a 6-foldsymmetry and a AAA ring having 2 –fold symmetry

(Bieniossek et al.2009). The 2.6Å resolution structure of the cytosolic region of apo-FtsH

however reveals a new arrangement where the entire ATPase region has perfect 6-fold

symmetry and the crucial pore residues outline an open circular entrance (Bieniossek et

al.2009).This conformational change in the ADP bound form compared to the apo

structure generates a model of substrate translocation by AAA_ proteases (Bieniossek et

al.2009). In addition to ATP hydrolysis, proton motive force (PMF) has been shown to

stimulate proteolytic activity of membrane-bound FtsH (Akiyama2002).

In this report we aim to draw phylogenetic relationships of the FtsH from 109 different

organisms and study their amino acid sequence diversity, length variation of conserved

region together with the structural diversity between different clusters of the

phylogenetic tree. Our results show that the FtsH proteins have mainly three domains

arising out of different mutations. The presence of the domains correlates well with the

habitat-distributions of the organisms taken into considerations. So far there were no

Page 19: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

88

previous reports that analyze the effects of the mutations on the FtsH proteins. This work

is therefore the first of its kind that shed light in the yet to be identified evolutionary

history of this family of proteins. Thus, the sequence analysis and hence the evolutionary

history of FtsH and its homologues may provide clues as to which activities are shared by

members of this family and may reveal the basis of their diverse cellular functions.

5.2.1 METHODS

5.2.1.1 SEQUENCE HOMOLOGY SEARCH AND PAIR-WISE ALIGNMENT OF

SEQUENCES

Uniprot database (UniProt Consortium 2013) which specifically provides a

comprehensive, integrated, non-redundant, well-annotated and properly reviewed set of

sequences was used to analyze the sequences of FtsH proteins. Initially amino acid

sequences of about 250 FtsH proteins were extracted. Out of these sequences, 109 FtsH

sequences were chosen for our study. During the sequence file preparation we removed

all fragmented sequences, having gene name other than FtsH to eliminate the data

redundancies and finally took one representative sequence among the multiple entries

for the same organism where they have the same amino acid sequence compositions. The

accession numbers of the proteins were tabulated in Table 5.2. Since sequence file

preparation is the most fundamental step for sequence based analysis these sequences

were used as inputs to run psi-BLAST (Altschul et al. 1990), using the default parameters,

in order to find out the homologous sequences including all possible of sequence

information. This step can be considered as a double check of our initial result

downloading the sequences.

Page 20: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

89

SL.

No.

NAME ACCESSION

NUMBER

SL. No. NAME ACCESSION

NUMBER

1. Acaryochloris marina Sp A8ZNZ4 56. Pelodictyon luteolum Sp Q3B6R3

2. Acholeplasma laidlawii Sp A9NE17 57. Protochlamydia amoebophila Sp Q6MDI5

3. Acidimicrobium

ferrooxidans

Sp C7M0M0 58. Pelobacter carbinolicus Sp Q3A579

4. Acidobacterium capsulatum Sp C1F8X6 59. Photobacterium profundum Sp Q6LUJ8

5. Acidothermus cellulolyticus Sp A0LR74 60. Parabacteroides distasonis Sp A6LD25

6. Akkermansia muciniphila Sp B2UMY1 61. Pirellula staleyi Sp D2QZ34

7. Alkaliphilus metalliredigens Sp A6TSZ1 62. Magnetococcus sp. Sp.A0L4S0

8. Anaeromyxobacter

dehalogenans

Sp B8J992 63. Propionibacterium acnes Sp D4HA34

9. Aquifex aeolicus Sp O67077 64. Porphyra purpurea Sp P51327

10. Arabidopsis thaliana Sp Q39102 65. Marinobacter aquaeolei Sp A1TZE0

11. Atopobium parvulum Sp C8W731 66. Ralstoniametallidurans Sp Q1LLA9

12. Bacillus subtilis Sp P37476 67. Pyropia yezoensis Sp Q1XDF9

13. Bartonella bacilliformis Sp A1URA3 68. Pseudomonas putida Sp A5W382

14. Bdellovibrio bacteriovorus Sp Q6MLS7 69. Medicago sativa Sp Q9BAE0

15. Borrelia burgdorferi Sp B7J0N5 70. Rickettsia bellii Sp Q1RGP0

16. Brachybacterium faecium Sp C7MC16 71. Rhodococcus erythropolis Sp C0ZPK5

17. Buchnera aphidicola Sp Q8K9G8 72. Rhodothermus marinus Sp D0MGU8

18. Burkholderia pseudomallei SpQ3JMH0 73. Salmonella typhi Sp P63344

19. Caldicellulosiruptor bescii Sp B9MPK5 74. Staphylococcus haemolyticus Sp Q4L3G8

20. Caulobacter crescentus Sp B8H444 75. Rubrobacter xylanophilus Sp Q1AV13

21. Chlamydia trachomatis Sp B0B970 76. Slackia heliotrinireducens Sp C7N1I1

22. Chloroflexus aggregans Sp B8G4Q6 77. Rothia mucilaginosa Sp D2NQQ7

23. Clostridium sp. Sp A0PXM8 78. Shigella flexneri Sp P0AAI4

24. Conexibacter woesei Sp D3F124 79. Mesoplasma florum Sp Q6F0E5

25. Corynebacterium

glutamicum

Sp Q6M2F0 80. Streptobacillus moniliformis Sp D1AXT4

26. Cyanidioschyzon merolae Sp Q9TJ83 81. Streptococcus pneumoniae Sp O69076

27. Cyanidium caldarium Sp O19922 82. Sulcia muelleri Sp D5D8E3

28. Ectocarpus siliculosus Sp D1J722 83. Synechococcus sp Sp Q2JNP0

29. Escherichia coli-K12 Sp P0AAI3 84. Syntrophus aciditrop

hicus

Sp Q2LUQ1

30. Guillardia theta Sp O78516 85. Syntrophobacter fumaroxidans Sp A0LN68

31. Haemophilus influenzae Sp P71377 86. Sulfurovum sp. Sp A6QBN8

32. Hahella chejuensis Sp Q2SF13 87. Thermotoga lettingae Sp A8F7F7

33. Haliangium ochraceum Sp D0LWB8 88. Thermobaculum terrenum Sp D1CDT8

34. Halothermothrix orenii Sp B8D065 89. Thermus thermophilus Sp Q5SI82

35. Helicobacter pylori Sp P71408 90. Nicotiana tabacum Sp O82150

36. Heterosigma akashiwo Sp B2XTF7 91. Treponema pallidum Sp O83746

37. Hydrogenobaculum sp. Sp B4U7U4 92. Trichodesmium erythraeum Sp Q10ZF7

38. Kosmotoga olearia Sp C5CES8 93. Zymomonas mobilis Sp C8WEG0

39. Lactobacillus plantarum Sp C6VKW6 94. Ureaplasma parvum Sp B1AI94

40. Lactococcus lactis Sp P46469 95. Tropheryma whipplei Sp Q83FV7

41. Leptospira borgpetersenii Sp Q04Q03 96. Vaucheria litorea Sp B7T1V0

42. Leptotrichia buccalis Sp C7N914 97. Veillonella parvula Sp D1BLD0

43. Leuconostoc mesenteroides Sp Q03Z46 98. Methylibium petroleiphilum Sp A2SHH9

44. Methylacidiphilum

infernorum

Sp B3DV46 99. Mycoplasma arthritidis Sp B3PNH3

45. Oryza sativa Sp Q5Z974 100. Myxococcus xanthus Sp Q1D491

46. Petrotoga mobilis Sp A9BFL9 101. Methylococcus capsulatus Sp Q60AK1

47. Phytoplasma mali Sp B3R057 102. Mycobacterium tuberculosis Sp A5QU8T5

48. Rhodopirellula baltica Sp Q7UUZ7 103. Natranaerobius thermophilus Sp B2A3Q4

49. Salinibacter ruber Sp D5H7Z5 104. Nostoc sp. Sp Q8YMZ8

50. Sorangium cellulosum Sp A9GRC9 105. Nitrosococcus oceani Sp Q3JEE4

51. Sphaerobacter thermophilus Sp D1C1U7 106. Neorickettsia risticii Sp C6V4R9

52. Symbiobacterium

thermophilum

Sp Q67LC0 107. Odontella sinensis Sp P49825

53. Synechocystis sp. Sp P73179 108. Oenococcus oeni Sp Q83XX3

54. Thermoanaerobacter sp. Sp B0K5A3 109. Opitutus terrae Sp B1ZMG6

55. Thermomicrobium roseum Sp B9KXV3

Table 5.2 List of the species and their accession numbers

Page 21: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

90

5.2.1.2 MULTIPLE SEQUENCE ALIGNMENT (MSA)

Multiple protein sequences were aligned using Clustal W associated with MEGA (Tamura

et al. 2007). Pairwise alignment parameters were set as: slow/accurate alignment; gap

opening penalty 10; gap extension penalty 0.10; protein weight matrix PAM. Multiple

alignment parameters were set as: gap opening penalty 10; gap extension penalty 0.2;

gap separation distance 4, delay divergent sequences 30%. The MSA, so generated,

showed the presence of mutations in the sequences [Figure 5.4].

Figure: 5.4 Multiple sequence alignmentsof the FtsH proteins. The alignment

shows the presence of mutations; Consensus pattern of the alignment. It reflects

the conserved pattern observed in the AAA domain.

Page 22: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

91

5.2.1.3 PREDICTION OF SECONDARY STRUCTURE

The secondary structures of the FtsH family of proteins were predicted from their amino

acid sequences using psi PRED (Buchan et al. 2013; Jones 1999) and GOR4 (Garnier et al.

1996). The secondary structures were classified as helix, sheet and random coil.

5.2.1.4 PROTEIN FUNCTION PREDICTION

The domain architecture and functions of different parts of the FtsH family of proteins

were predicted from the outputs of Pfam (Bateman et al. 2004). Pfam gives the results on

the basis of an MSA of closely related homologous proteins. The amino acid sequences of

the proteins were incorporated one by one into Pfam sequence search and the results

were analyzed.

5.2.1.5 HOMOLOGY MODELING AND STRUCTURAL ALIGNMENT

Homology model of eight species (Slackia heliotrinireducens, Staphylococcus

haemolyticus, Acidobacterium capsulatum, Thermus thermophilus, Cyanidioschyzon

merolae, Syntrophus aciditrophicus, Atopobium parvulum, Phytoplasma mali from group

1, 2, 3, 4, 5, 6, 7, 8and denoted as SH1, SH2, AC, TT, CM, SA, AP, PM respectively) taken

from every group was build using Swiss model work space usingA chain of 2CE7 as

template structure (Bieniossek et al. 2006) which was searched by BLAST (using PDB

database). After getting the models we checked them in Procheck (Laskowski et al 1993),

verify 3D (Lüthy et al. 1992), ERRAT (Colovos and Yeates 1993). Three models passed

these checks successfully. Remaining five were refined either by using loop refinement

soft are ModBase Pieper et al. 4 or y i i izatio step usi g “teepest Desce d

algorith a d CHA‘M Brooks et al. polar H force field i Discovery “tudio .

Page 23: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

92

suite. Finally all the refined structures were aligned with each other usi g alig

structure protocol of Discovery “tudio . suite.

5.2.1.6 CODON USAGE DETERMINATION

Codon usage patterns of the mutated amino acids in the amino acid sequences of FtsH

proteins from eight species as mentioned in the section 5.2.1.5 were determined using

default parameters of OPTIMIZER (Puigbo et al. 2007) web server.

5.2.1.7 DISTANCE MATRIX CALCULATION AND CONSTRUCTION OF

PHYLOGENETIC TREE

A distance matrix was generated using MEGA (Tamura et al. 2007) and a phylogenetic

tree was built using the Neighbour Joining (NJ) method with 1000 bootstrap values,

Poison model and keeping uniform rate.

5.2.2 RESULT & DISCUSSION

5.2.2.1 SEQUENCE AND FUNCTIONAL DIVERSITY DUE TO MUTATIONS

Pfam results suggest some resemblance with few additional functional domains other

than the FtsH N-terminal domain, AAA domain, and Peptidase M 41 major domains. 201-

277 amino acids of Vaucheria litorea shows similarity with iron-containing alcohol

dehydrogenase, 566-612 amino acids of Thermotoga lettingae with fumarase C, 111-158

of Opitutus terrae with cytochrome B561, 310-412 of Alkaliphilus metallirudigens with

purine nucleoside permease, 511-635 of Autophobium pervulum with choline kinase and

in addition to these there are other few sequences which show some correspondence

with few uncharacterized domains.

Page 24: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

93

5.2.2.2 STRUCTURAL ANALYSIS

5.2.2.2.1 SECONDARY STRUCTURE ANALYSIS

Typical pattern of secondary structure of FtsH shown in the Figure 5.5 using Geneious Pro

[Drummond et al. 2010].We need to mention here that total secondary structural analysis

reveals that secondary structure in the AAA domain is mainly conserved but in other

regions there are some variations. If we consider the analysis in details using psi PRED and

Gor4 [Table 5.3] it has been found that proportion of alpha helix ranges from

approximately 35% to 54%, beta sheet 7% to 21%, and random coil 11% to 52% of the

total stretch of protein [Figure 5.5].

Figure: 5.5 Distribution of the predicted secondary structural patterns of the FtsH

proteins. Helical regions are presented as cylinders (violet). Sheet regions are

presented with a arrowhead. The rest of the parts are loops.

Page 25: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

94

STRAIN ALPHA STRAND RANDOM

COIL

STRAIN ALPHA STRAND RANDOM

COIL

Acaryochloris

marina

43.97%

13.59%

42.44%

Nicotiana tabacum 42.58% 11.20% 46.22%

Acholeplasma

laidlawii

44.15%

14.51%

41.34%

Nitrosococcus

oceani

49.14% 13.62% 37.25%

Acidimicrobium

ferrooxidans

36.21%

17.12%

46.67%

Nostoc sp. 49.70% 11.28% 39.02%

Acidobacterium

capsulatum

45.38%

12.52%

42.10%

Odontella sinensis 38.35% 15.84% 45.81%

Acidothermus

cellulolyticus

43.24%

10.21%

46.55%

Oenococcus oeni 47.83% 12.87% 39.30%

Akkermansia

muciniphila

37.81%

17.61%

44.58%

Opitutus terrae 52.93% 11.73% 35.34%

Alkaliphilus

metalliredigens

45.25%

15.59%

39.15% Oryza sativa

42.42%

12.54%

45.04%

Anaeromyxobacter

dehalogenans

42.92%

13.74%

43.34%

Parabacteroides

distasonis

45.32% 14.77% 39.91%

Aquifex aeolicus

44.64%

15.77%

39.59%

Pelobacter

carbinolicus

46.28% 14.09% 39.63%

Table 5.3 Distribution patterns of secondary structural elements.

Page 26: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

95

Arabidopsis

thaliana

36.45%

14.53%

49.02%

Pelodictyon

luteolum

38.10% 16.29% 45.61%

Atopobium

parvulum

45.65%

10.56%

43.79%

Petrotoga mobilis

49.92%

15.47%

34.61%

Bacillus subtilis

40.19%

16.95%

42.86%

Photobacterium

profundum

45.40% 15.80% 38.79%

Bartonella

bacilliformis

48.56%

11.52%

39.92%

Phytoplasma mali

48.00%

12.50%

39.50%

Bdellovibrio

bacteriovorus

48.99% 9.61%

41.40%

Pirellula staleyi 45.14% 11.14% 43.71%

Borrelia

burgdorferi

42.88%

17.68%

39.44%

Porphyra purpurea 45.70% 11.15% 43.15%

Brachybacterium

faecium

46.31%

10.65%

43.04%

Propionibacterium

acnes

38.63% 11.58% 49.79%

Buchnera

aphidicola

40.46%

21.21%

38.34%

Protochlamydia

amoebophila

52.95% 10.59% 36.46%

Burkholderia

pseudomallei

47.15%

10.96%

41.89%

Pseudomonas putida 53.66% 8.46% 37.89%

Caldicellulosirupto

r bescii

46.92%

14.29%

38.80%

Pyropia yezoensis 42.04% 12.58% 45.38%

Caulobacter

crescentus

49.84%

12.62%

37.54%

Ralstonia

metallidurans

40.22% 18.64% 41.14%

Chlamydia

trachomatis

48.41%

11.17% 40.42% Rhodococcus

erythropolis

36.53% 10.66% 52.81%

Page 27: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

96

Chloroflexus

aggregans

50.46%

8.99%

40.55%

Rhodopirellula

baltica

43.15%

15.62%

41.22%

Clostridium sp.

43.93%

16.72%

39.35%

Rhodothermus

marinus

46.34% 13.34% 40.32%

Conexibacter

woesei

39.97%

15.77%

44.26%

Rickettsia bellii 39.03% 19.12% 41.85%

Corynebacterium

glutamicum

45.02%

7.50%

47.48%

Rothia mucilaginosa 44.58% 10.32% 45.11%

Cyanidioschyzon

merolae

46.93%

16.58%

36.48%

Rubrobacter

xylanophilus

45.78% 11.98% 42.24%

Cyanidium

caldarium

46.58%

15.64%

37.79%

Salinibacter ruber

41.40%

14.43%

44.17%

Ectocarpus

siliculosus

40.85%

15.28%

43.87%

Salmonella typhi 45.03% 16.30% 38.66%

Escherichia coli-

K12

43.94%

15.68%

40.37%

Shigella flexneri 43.94% 15.68% 40.37%

Guillardia theta

43.26%

14.74%

42.00%

Slackia

heliotrinireducens

43.30% 10.73% 45.98%

Haemophilus

influenzae

35.59%

22.20%

42.20%

Sorangium

cellulosum

60.10%

7.11%

32.79%

Hahella chejuensis

45.40%

10.66%

43.94%

Sphaerobacter

thermophilus

43.49%

13.94%

42.57%

Page 28: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

97

Haliangium

ochraceum

47.51%

12.32%

40.18%

Staphylococcus

haemolyticus

38.93% 13.62% 47.46%

Halothermothrix

orenii

46.83%

10.63%

42.54%

Streptobacillus

moniliformis

52.71% 10.25% 37.04%

Helicobacter pylori

47.15%

11.39%

41.46%

Streptococcus

pneumoniae

44.33% 15.64% 40.03%

Heterosigma

akashiwo

42.53%

12.52%

44.95%

Sulcia muelleri 43.78% 14.38% 41.84%

Hydrogenobaculu

m sp.

41.82%

16.67%

41.51%

Sulfurovum sp 50.67% 12.82% 36.51%

Kosmotoga olearia

39.84%

15.35%

44.81%

Symbiobacterium

thermophilum

61.62%

7.41%

30.98%

Lactobacillus

plantarum

44.30%

14.50%

41.21%

Synechococcus sp. 50.63% 10.97% 38.40%

Lactococcus lactis

35.11%

21.15%

43.74%

Synechocystis sp.

46.17%

11.58%

42.26%

Leptospira

borgpetersenii

41.26%

16.41%

42.33%

Syntrophobacter

fumaroxidans

44.51% 14.06% 41.42%

Leptotrichia

buccalis

46.09%

9.24%

44.66%

Syntrophus

aciditrophicus

47.83% 9.78% 42.39%

Leuconostoc

mesenteroides

48.57%

12.00%

39.43%

Thermoanaerobacter

sp.

42.06%

14.08%

43.86%

Magnetococcus sp. 45.47% 13.67% 40.86% Thermobaculum

terrenum

44.43% 13.31% 42.26%

Page 29: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

98

Marinobacter

aquaeolei

51.03% 8.06% 40.92%

Thermomicrobium

roseum

43.40%

13.96%

42.64%

Medicago sativa 40.79% 11.33% 47.88%

Thermotoga

lettingae

46.33% 15.97% 37.70%

Mesoplasma

florum

50.15% 15.23% 34.62%

Thermus

thermophilus

46.47% 12.66% 40.87%

Methylacidiphilum

infernorum

45.75%

13.36%

40.88%

Treponema pallidum 53.69% 11.82% 34.48%

Methylibium

petroleiphilum

51.70% 12.32% 35.98% Trichodesmium

erythraeum

46.63% 13.04% 40.33%

Methylococcus

capsulatus

51.96% 12.24% 35.79%

Tropheryma

whipplei

46.70% 12.76% 40.54%

Mycobacterium

tuberculosis

37.63% 13.95% 48.42% Ureaplasma parvum 51.73% 11.10% 37.17%

Mycoplasma

arthritidis

41.67% 15.19% 43.15%

Vaucheria litorea 40.22% 14.44% 45.34%

Myxococcus

xanthus

38.71% 17.87% 43.42%

Veillonella parvula 44.08% 13.71% 42.21%

Natranaerobius

thermophilus

44.44% 12.55% 43.00% Zymomonas mobilis 41.24% 13.75% 45.02%

Neorickettsia

risticii

45.28% 15.88% 38.84%

Page 30: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

99

5.2.2.2.2 TERTIARY STRUCTURE ANALYSIS

Results of structural alignments (in RMSD) [Table 5.4] of these proteins revealed that

though there are differences in some aspects but their overall conformation remains

almost same which in-turn implies their functional conservation.

SH1 SH2 AC TT CM SA AP PM

SH1 - 0.428 0.408 0.426 0.486 0.428 0.408 0.327

SH2 0.428 - 0.479 0.403 0.514 0.459 0.322 0.365

AC 0.408 0.479 - 0.421 0.298 0.434 0.428 0.271

TT 0.426 0.403 0.421 - 0.466 0.403 0.480 0.396

CM 0.486 0.514 0.298 0.466 - 0.509 0.488 0.443

SA 0.428 0.459 0.434 0.403 0.509 - 0.445 0.286

AP 0.408 0.322 0.428 0.480 0.488 0.445 - 0.405

PM 0.327 0.365 0.271 0.396 0.443 0.286 0.405 -

Table 5.4 Deviations in three dimensional structures measured in RMSD

Page 31: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

100

5.2.2.3 CODON USAGE ANALYSIS

A high codon usage value indicated a higher rate of appearance of that particular amino

acid obtained from the codon. This would give an insight into why that particular amino

acid is favored at that position. Codon usage analysis revealed that it varies widely among

different species (each case numerical values indicated the number of repetitions). The

range of variations of the codons representing the amino acids is presented in the

following section. The codon usage value was presented for only the organism with a high

codon usage value for a particular amino acid [Table 5.5].

Figure 5.6 Distribution of %GC content in the different organisms. for SH1,

SH2, AC, TT, CM, SA, AP, PM denoted as Slackia heliotrinireducens,

Staphylococcus haemolyticus, Acidobacterium capsulatum, Thermus

thermophilus, Cyanidioschyzon merolae, Syntrophus aciditrophicus,

Atopobium parvulum, Phytoplasma mali from group 1, 2, 3, 4, 5, 6, 7, 8

respectively.

Page 32: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

101

for TTA (L) Thermus thermophilus (65), Atopobium parvulum (54); for GCT (A)

Phytoplasma mali (42) , Syntrophus aciditrophicus (83); for GGT (G) Phytoplasma mali

(36), Syntrophus aciditrophicus (75); for GAA (E) Phytoplasma mali (42), Slackia

heliotrinireducens (76); for ATT (I) Thermus thermophilus (22), Phytoplasma mali (54); for

GAT (D) Cyanidioschyzon merolae (29), Staphylococcus haemolyticus (58); for TTC (F) CAC

(H) and ATG (M) there are no major change in the overall distribution observed; for AAA

(K) Thermus thermophilus (27), Phytoplasma mali (61); for AAC (N) Thermus

thermophilus(11), Atopobium parvulum (33); for CCA (P) Phytoplasma mali (15) Slackia

heliotrinireducens (47); for CAA (Q) Thermus thermophilus (13), Slackia heliotrinireducens

(37); for CGT (R) Thermus thermophilus (58) Phytoplasma mali (24); for TCT (S) Thermus

thermophilus (19) have lower Atopobium parvulum and Slackia heliotrinireducens (41)

have similar higher values; for ACT (T) Slackia heliotrinireducens (50) Staphylococcus

haemolyticus (25); for GTT (V) Phytoplasma mali (34) Acidobacterium capsulatum (55).

TAT (Y) TGT (C) and TGG (W) appeared at lowest frequency in the usage pattern but TGG

(W) is entirely absent for Atopobium parvulum , Staphylococcus haemolyticus and TGT (C)

for Syntrophus aciditrophicus.

Variation in this respect reflected in their positions in the phylogenetic tree and total %GC

content. From Fig: 5.6 it was evident that though major strains had very much the equal

proportion of GC population but for the case of Thermus thermophilus as expected for

the thermo-tolerant strains the %GC content was much higher and on the other hand

Phytoplasma mali a plant pathogen had lowest %GC values.

Page 33: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

102

GCT TGT GAT GAA TTC GGT CAC ATT AAA TTA

SH1 77 3 43 76 26 57 12 35 37 59

SH2 43 3 58 63 29 67 14 48 54 55

AC 58 4 37 46 27 56 16 33 35 57

TT 78 1 33 58 26 55 11 22 27 65

CM 63 3 29 41 20 46 7 42 29 61

SA 83 - 45 56 28 75 11 43 34 64

AP 57 4 36 53 27 56 10 44 38 54

PM 42 13 32 42 33 36 13 54 61 62

ATG AAC CCA CAA CGT TCT ACT GTT TGG TAT

SH1 29 36 47 37 42 41 50 58 2 16

SH2 19 38 26 36 45 40 25 47 - 17

AC 21 21 27 29 42 33 29 55 5 8

TT 14 11 34 13 58 19 30 54 5 10

Table 5.5 Codon usage patterns of the mutated amino acids in the

different organisms

Page 34: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

103

CM 25 25 21 26 36 31 29 52 12 5

SA 17 14 41 27 55 35 35 53 5 14

AP 19 33 21 26 36 31 38 41 - 11

PM 16 25 15 23 24 37 34 34 1 13

5.2.2.4 PHYLOGENETIC RELATIONSHIP

In the sequence set there were sequences of FtsH proteins of various species. For

establishing the mutual relationship between these species we constructed a phyogenetic

tree (shown in figure: 5.7) using Neighbour Joining method. To analyze the tree we first

divided the tree into eight major groups and group 6 is then subdivided into two

subgroups 6A, 6B. Our search to find out the reason behind the diversified pattern of

these species yielded some surprising facts. Extremophiles are those which can live in

physically and geochemically extreme conditions, were mainly present at group 4. Some

thermophilic strains that can grow in hot spring are also located at group 6 such as

Thermus thermophilus, Symbiobacterium thermophilum, Thermobaculum terrenum,

Marinobactor aquaeolei etc. Pathogenic i.e., disease causing harmful strains were present

in group 3, last portion of 4 and 7 i.e, Shigella flexneri, Salmonella typhi, Bertonella

bacilliformis, Chlamydia trachomatis etc. Marine microbes.

Page 35: Chapter 5shodhganga.inflibnet.ac.in › bitstream › 10603 › 97285 › 12 › 12_chapt… · Chapter 5 73 kingdom. The proteobacteria being the largest and the most diversified

Chapter 5

104

5.2.3 CONCLUSIONS:

Cellular proteases maintain the cell homeostasis. Among the proteases, FtsH is a

membrane bound protease and is basically involved in the quality controlling of

membrane bound proteins. FtsH proteins are widely distributed among different

organisms with several mutations. In the present work we have tried to analyze the

effects of those mutations and to correlate the mutations with the habitat of the

organisms. We also analyzed the changes in the patterns of secondary structures in the

FtsH proteins due to mutations. Finally we constructed a phylogenetic tree to detect the

effects of the mutations in the evolutionary pathway of the FtsH proteins. This report is

the first of its kind. So our study would therefore be beneficial to determine the probable

modes of activities of the FtsH proteins.

Figure 5.7 Phylogenetic trees of the various species having the FtsH

domains. The tree can be divided into various categories.