How to Raise the Dead: The Nuts & Bolts of Ancestral Sequence Reconstruction Jeffrey Boucher...

31
How to Raise the Dead: The Nuts & Bolts of Ancestral Sequence Reconstruction Jeffrey Boucher Theobald Laboratory

Transcript of How to Raise the Dead: The Nuts & Bolts of Ancestral Sequence Reconstruction Jeffrey Boucher...

How to Raise the Dead:The Nuts & Bolts of Ancestral

Sequence Reconstruction

Jeffrey BoucherTheobald Laboratory

Talk Outline

• All the Pretty Corals: Convergent or Divergent Evolution

• How’d They Do That?

45°

• 11-stranded β-barrel with chromophore positioned in center

Green Fluorescent Protein (GFP)

GFP Chromophore - Structure & Synthesis

Ser65

Tyr66 Gly67

Wachter 2006 Excitation - UV Blue Emission - Green

• Auto-catalyzation begins upon folding

1. Cyclization

3. Oxidation

2. Dehydation

3. Dehydation

2. Oxidation

dendRFPclawGFP

mc5G5 2

scubGFP1

mc2R2

mc3mc4

mc1R1 2

G1 2

Kaede

Ugalde, Chang & Matz 2004

GFP Superfamily

GFP-likeProteins

from Coral

Colors of the Rainbow

Wachter 2006

2 Chemical Reactions(Oxidation, Dehydration)

3 Chemical Reactions(Oxidation, Dehydration &2nd Oxidation extends π-system)

Was complexity gained or lost?

Convergent vs. Divergent Evolution

Terrestrial Vertebrates

Tree of Life (tolweb.org) - Slide courtesy of Kristine Mackin

0.1 substitutions/site

dendRFPclawGFP

mc5G5 2

scubGFP1

mc2R2

mc3mc4

mc1R1 2

G1 2

Kaede

Ugalde, Chang & Matz 2004

ALL

RED/GREEN

Pre-REDRED

dendRFPclawGFP

0.1 substitutions/site

dendRFPclawGFP

mc5G5 2

scubGFP1

mc2R2

mc3mc4

mc1R1 2

G1 2

Kaede

495 nm

505 nm

518 nm

578 nm

ALL

RED/GREEN

Pre-REDRED

Ugalde, Chang & Matz 2004

mc5G5 2

scubGFP1

mc2R2

mc3mc4

mc1R1 2

G1 2

Kaede

Ugalde, Chang & Matz 2004

How to Resurrect a Protein

1) Acquire/Align Sequences

2) Construct Phylogeny(from Chang et al. 2002)

3) Infer Ancestral Nodes

Acquiring Sequences• BLAST (Basic Local Alignment Search Tool)

• How does alignment compare to alignment of random sequences?– E-value of 1E-3 is a 1:1000 chance of alignment of

random sequences

Homology vs. Identity• Significant BLAST hits inform us about

evolutionary relationships

• Homologous - share a common ancestor– Homology is a hypothesis, identity is calculated

– This is binary, not a percentile

– Homology does not ensure common function

Aligning Sequences

• Gap Penalty of -8 (heuristically determined):

4 -8 5 4 0 6 2 4 6 5 4 0 3 4 -8 7 1 1 = 40

OrangutanChimpanzee

• Multiple sequences aligned by similar method• ClustalW

• MUSCLE (MUltiple Sequence Comparison by Log-Expectation)

• Faster than ClustalW, but methods similar

How to Resurrect a Protein

1) Acquire/Align Sequences

2) Construct Phylogeny(from Chang et al. 2002)

3) Infer Ancestral Nodes

Visual Depiction of Alignment Scores

• Suppose alignment of 3 sequences…

OrangutanChimpanzeeMouse

OCM

M C O

19 40 -18 - 40- 18 19

M O C

Neighbor-Joining

Phylogenetic Programs• PHYLIP (PHYLogeny Inference Package)

• PAUP (Phylogeny Analysis Using Parsimony)– Now incorporates Maximum Likelihood

• PhyML (Phylogenetic Maximum Likelihood)

• MrBayes

• BAli-Phy (Bayesian Alignment and Phylogeny estimation)

Maximum Likelihood (ML)

• Likelihood:

– How surprised we should be by the data– Maximizing the likelihood, minimize your surprise

• Example:– Roll 20-sided die 9 times:

Likelihood = Probability(Data|Model)

Maximum Likelihood (ML)

• Fair Die Model:– 5% chance of rolling a 20

• Trick Die Model:– 100% chance of rolling a 20

Likelihood = Probablity(Data|Model)

Likelihood = (0.05)9 = 2E-11

Likelihood = (1)9 = 1

Assuming trick model maximizes the likelihood

From Dice to Trees

– Data - Sequences/Alignment– Model - Tree topology, Branch lengths & Model of

evolution

Starting Tree

• Choose model that maximizes the likelihood

Likelihood = Probablity(Data|Model)

New Tree

Likelihood Likelihood

How to Resurrect a Protein

1) Acquire/Align Sequences

2) Construct Phylogeny(from Chang et al. 2002)

3) Infer Ancestral Nodes

Reconstruction Methods

Consensus

Parsimony

Maximum Likelihood

Consensus

• Advantage: Easy & fast• Disadvantages: Ignores phylogenetic relationships

X X

Parsimony• Parsimony Principle– Best-supported evolutionary inference requires fewest

changes– Assumes conservation as model

• Advantage over consensus:– Takes phylogenetic relationships into account

ParsimonyA B C D E F G H

ABC D EFGH

Parsimony

V VVILL

Example adapted from David Hillis

IL

{V}{L}

{V, I}

{V, I, L}

{V, I, L}

{V, I, L}

{V, I, L}

Changes = 4

V

L

I

I

I

VL

Parsimony - Alternate Reconstructions

• Resolve ambiguous reconstructions

• Is conservation the best model?

ML Improvements Over Parsimony

• PAML (Phylogeny Analysis by Maximum Likelihood)

• Includes evolutionary process & branch lengths– Reduction in ambiguous sites

• Fit of model included in calculation– Removes a priori choices– Use more complex models (when applicable)

• Confidence in reconstruction– Posterior probabilities

Ugalde, Chang & Matz 2004

Ancestral GFP Reconstructions

Ugalde, Chang & Matz 2004

• Sites with PP<0.8 considered ambiguous

• Alt residues considered if PP>0.2

• Alternative ancestors did not affect the conclusions

Position

Residue PP Residue PP Residue PP

168 P 0.999

169 K 0.407

R 0.236

S 0.210

170 V 0.730

I 0.270

171 I 0.742

V 0.158

R 0.034

Thanks for Your Attention

Questions?