Post on 20-Sep-2020
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Symplectic biology:Universals in microbial genomes
24 april 2006
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Symplectic biology:The Delphic Boat
Biology is a science ofrelationships betweenobjects rather than fromobjects: from συνtogether, πλεκτειν, toweaveProteins are part ofcomplexes, as are partsin an engineAs for constructing a boat,failing to understand theirrelationships will result inultimate failure ofsynthetic objects
The Delphic Boat: Harvard University
Press, february 2003
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Three processes are needed for Life:
Information transfer (Living Computers?) => the goalof genomics is to decipher the blueprint of the “read-only” memory of the machine
Driving force for a coupling between the genomestructure and the structure of the cell:
MetabolismCompartmentalisation
What is Life?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Two processes are needed for computing:
A read/write machine
A program on a physical support (typically, a tapeillustrates the sequential string of symbols that makesup the programme), split (in practice) into two entities:
Programme (providing the goal)Data (providing the context)
The machine is distinct from the programme
What is computing?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Cells as computers
Genomics rests on an alphabetic metaphor, that of a textwritten with a four-letter alphabet, acting as a programme
Conjecture: do cells behave as computers?
Genetic engineeringVirusesHorizontal gene transferCloning animal cells
all point to separation betweenMachineData + Programme
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
If the machine has not only to behave as acomputer but has also to construct themachine itself, one must find an image ofthe machine somewhere in the machine(John von Neumann)
Is there a map of the cellin the chromosome?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Genome organisation
Is the gene order random in the chromosomes?
At first sight, consistent with different DNA managementprocesses not much is conserved, and genes transferred fromother organisms are distributed throughout genomes
However, groups of genes such as operons or pathogenicityislands tend to cluster in specific places, and they code forproteins with common functions. « Persistent » genes areclustered together
Also, some motifs are ubiquitously present, suggestinggeneral rules constraining genome organisation
E Larsabal, A DanchinGenomes are covered with ubiquitous 11bp periodic patterns, the "class A flexible patterns"BMC Bioinformatics (2005) 6: 206
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
A universal feature of the program: the period of 10-11.5
motifs
0 10 20 30 40 50 60 70 80 90 100bp
0 10 20 30 40 50 60 70 80 90 100bp
ffss(G(G--))-0.01
0
0.01
0 10 20 30 40 50 60 70 80 90 100bp
real
model
difference
Helicobacter pylori
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Flexible motifs of type A
methods
1- 1-xxAAxxxxxxxxTTxxxxxxxxAAxxxxxxxxTTTTxxxxxxxxxxAAxxxxxxxxTTxxxxxxxxAAxxxxxx: : AllAll kindoms kindoms 2-2-xxxxxxxxxxxxxxxxxxxxxxGGxxxxxxxxTTTTxxxxxxCCxxxxxxxxxxTTxxxxxxxxxxxxxxxxxx:: ProteobacteriaProteobacteria 4- 4-xxxxxxxxxxxxTTxxxxxxxxAGAGxxxxxxTTTTxxxxxxxxxxxxxxxxTTxxxxxxxxxxxxxxxxxxxx:: ArchaeaArchaea 55''--xxxxxx-10xxxxxxxxx0xxxxxxxx10xxxxxx-10xxxxxxxxx0xxxxxxxx10xxxxxxbp-3bp-3''
TTTTxxxxxxGGxxxxxxTTxxxxxxxxxxxxxxxxxxxxTTTT
The nucleotides composing this classA flexible pattern are accessiblethrough this side too but thedinucleotides are set in minor grooves.
The nucleotides composing this classA flexible pattern are fully accessiblethrough this side and the dinucleotidesare set in major grooves.TTTT
GG TT TTTT
AAAACC
AA AAAA
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
To lead or to lag...
Is it possible to see whether the position ofgenes in the chromosome is randomlydistributed on the leading and lagging strand?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
180
90
0
27055% leading
Escherichia coli
Ori
Ter
90270 65% leadingTreponema pallidum
Ori
Ter
180
90270 75% leadingBacillus subtilis
Ori
Ter
9027087% leading
Thermoanaerobactertengcongensis
Ori
Ter
CDS densityLeading CDS density
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Chosing arbitrarily anorigin of replicationand a property of thestrand (basecomposition, codoncomposition, codonusage, amino acidcomposition of thecoded protein…) onecan use discriminantanalysis to seewhether thehypothesis holds.
To lag or to lead...
E. Rocha, A. Danchin & A. Viari Universal replication biases in bacteria. Mol. Microbiol. (1999) 32: 11-16
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
To lag or to lead, that is thequestion
.
0,450,5
0,550,6
0,650,7
0,750,8
0,85
0 20 40 60 80 100
Bacillussubtilis
accu
racy
Borreliaburgdorferi
0,4
0,5
0,6
0,7
0,8
0,9
1
0 20 40 60 80 100 0,4
0,5
0,6
0,7
0,8
0,9
1Chlamydiatrachomatis
0 20 40 60 80 100
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Escherichiacoli
accu
racy
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Heamophilusinfluenzae
0 20 40 60 80 100
HelicobacterPylori
0,4
0,45
0,5
0,55
0,6
0,65
0,7
0,40,45
0,50,55
0,60,65
0,70,75
0,8
0 20 40 60 80 100
Methanobacteriumthermoautotrophicum
position (%) position (%) position (%)
accu
racy
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Mycobacteriumtuberculosis
0,4
0,5
0,6
0,7
0,8
0,9
1
0 20 40 60 80 100
Treponemapallidum
Bases
Amino acids
Codons
Dinucleotides
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Visible even inproteins…
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Essentiality in B. subtilis
hi ghlyexpressed
0%
25%
50%
75%
100%
non-highlyexpressed
Essential genes Non-essential genes
Lagging
Leading
non-highlyexpressed
highlyexpressed
EPC Rocha, A DanchinEssentiality, not expressiveness, drives gene-strand bias in bacteriaNature Genetics (2003) 34: 377-378
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
When polymerasescollide
DNAPdeceleration
End oftranscription
Arrest of RNAP & DNAP
Transcriptionabortion
Co-oriented Head-onConsequences:1. Replication slow-down
2. Loss of transcripts
Consequences:1. Aborted transcripts
2. Truncated essentialproteins
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Three examples of therole of the context
Microbial genes are of infinite diversity but thereexists universals; only about 10% of their genes are ofpersistent and recognized function; we do not have yeta fair idea of the number of microbial species; thenumber of genes in a given species is highly variable(horizontal gene transfer)
Example 1: persistent genes Example 2: orphan genes and universal amino acids [Example 3: a new metabolic pathway] ....
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
An extension of essentiality:Gene persistence
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Some of the genes missing from the list of persistentgenes have diverged considerably. To assess thecontribution of this effect we measured for each pair ofgenomes the correlation between the similarity oforthologous pairs and that of the 16S rRNA. Thecorrelations were high. For example (A), 38% (resp. 48%)of B. subtilis (resp. E. coli) persistent genes showed acorrelation coefficient >0.9 between the sequencesimilarity of the pair of orthologs and the 16S RNA.
In contrast, some genes (B) evolve in an erratic way.This may be due to horizontal gene transfer, localadaptations leading to faster or slower evolutionary pace,or simply wrong assignments of orthology. The latter canbe a significant problem, especially in large proteinfamilies. The genes presenting such an erratic patternare rare in the persistent set.
Gene persistence
G Fang, EPC Rocha, A DanchinHow essential are non-essential genes?Mol Biol Evol (2005) 22: 2147-2156
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Genomic islands
A clustering method based onthe analysis of codon usagebiases, using an informationtheory leads to group the genesinto homogeneous clusters,which are not distributedrandomly in the chromosome.One cluster corresponds tohighly expressed genes. Otherclusters are linked to specificfunctions or processes:horizontally transferred genes,motility or intermediarymetabolism.
M. Bailly-Béchet
M Bailly-Bechet, A Danchin, M Iqbal, M Marsili, M VergassolaCodon usage domains over bacterial chromosomesPLoS Computational Biology (2006) 2: april 20th
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
One cluster is related to expression levels.Other groups feature an over-representation of genes belonging todifferent functional groups: horizontallytransferred genes, motility andintermediary metabolism. Genes with asimilar bias are close on the chromosomeand organized in coherent domains, moreextended than operons, demonstrating arole of translation in structuring bacterialchromosomes. A sizeable contribution tothis effect comes from the dynamiccompartimentalization induced by therecycling of tRNAs, leading to geneexpression rates dependent on theirgenomic and expression context
Genome islands
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Genomeorganization
P. haloplanktis
origins
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
P. haloplanktis : Correspondence Analysis
Pseudoalteromonas haloplanktis
IIMPsIIMPs IntermediaryIntermediarymetabolismmetabolism
OuterOutermembranemembraneor secretedor secretedInformationInformation
transfertransfer
UnknownUnknownfunction /function /
PhagePhageproteinsproteins
cold
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Universal biases inamino acid composition
First axis: separates Integral Inner Membrane Proteins(IIMP) from the rest; driven by opposition between chargedand large hydrophobic residues
Second axis: separates proteins according to anopposition driven by the G+C content of the first codonbase
Third axis: separates proteins by their content inaromatic amino acids; enriched in orphan proteins
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Temperature-dependentbiases in protein amino acid
composition
The general trend of amino acid compositionbias is to avoid some aminoacids at highertemperatures (associated to aging processes)
Mesophilic bacteria belong to at least twodifferent classes (in a 5-clusters analysis)
Biases are always dominated by the IIMPclustering
C Médigue, E Krin, G Pascal, V Barbe, A Bernsel, PN Bertin, F Cheung, S Cruveiller, S D'Amico, A Duilio, G Fang, G Feller, C Ho, S Mangenot, GMarino, J Nilsson, E Parrilli, EPC Rocha, Z Rouy, A Sekowska, ML Tutino, D Vallenet, G von Heijne, A DanchinCoping with cold: the genome of the versatile marine Antarctica bacterium Pseudoalteromonas haloplanktis TAC125Genome Research (2005) 15: 1325-1335
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
A specific asparagine bias in psychrophiles
Comparative proteomics
IIMPsIIMPs
53%53%mesophilesmesophiles
62%62%thermophilesthermophiles
55%psychrophiles
55%psychrophiles
Motility
Cell wall,outermembrane
Transport(TonB),secretion
Adaptationto stress
Metabolismof DNA andRNA
isoaspartate
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Asparagine deamidates: a major contribution to protein aging
Chemistry
Main post-translational modification
Reaction still poorly understood
Spontaneous reaction (untargeted?)
Affects the protein structure (and function?)
Role in regulating protein folding
Signal for degradation of intracellular proteins
Asparagine (N)
Intermediary:succinimide
Degradation:succinimide
IsoaspartateAspartate
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
In 1991, at the EU meeting on genome programs inElounda, Greece, the presentation of the yeastchromosome III and the first 100 kb of the Bacillussubtilis genome revealed that, contrary to expectation(the only cases where this had been observed werephages, for obvious reasons), at least half of the genesuncovered were totally unknown, whether in structureor in function
The first discovery ofgenomics
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Orphans: the gluons
A remarkable role of aromatic amino acids createsa universal bias. Expressed orphan proteins areenriched in these residues, suggesting that theymight participate in a process of gain of functionduring evolution. We postulate that the majority ismade of proteins — gluons — involved instabilising complexes, thus defining the "self" ofthe species.
G Pascal, C Médigue, A DanchinUniversal biases in protein composition of model prokaryotesProteins (2005) 60: 27-35
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Why aromatic amino acids in orphan proteins?
From Orphans to « Gluons »
♣ Orphan proteins loose their status during evolution Rocha. 2002.Pedulla. 2003
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/ adanchin@pasteur.fr
Thank you