CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's...

81
CODING A LIFE FULL OF ERRORS PITP IAS 2012 PART I c 9 c 10 c 7 c 8 c 11 c 12 c 3 c 4 c 5 c 6 ϕ(c 5 ) ϕ(c 4 ) ϕ(c 3 ) ϕ(c 2 ) ϕ(c 1 ) ϕ(c i ) i c

Transcript of CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's...

Page 1: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

CODING A LIFE FULL OF ERRORS

PITP

IAS 2012

PART I c9 c10 c7 c8 c11 c12 c3 c4 c1 c2 c5 c6

ϕ(c5)

ϕ(c4)

ϕ(c3)

ϕ(c2)

ϕ(c1)

ϕ(ci)

ic

Page 2: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

What is Life? (biological and artificial)

• Self-replication.

• Emergence.

• Evolution.

• Non-equilibrium.

• Information.

• Geometry.

• Stochasticity.

• Viruses (bio and computer).

• Growth and form.

• Natural algorithms .

• Learning & robots.

• Codes and Errors.

Page 3: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Living information is carried (mostly) by molecules

“Living systems”

I. Self-replicating information processors.

II. Evolve collectively.

III. Made of molecules.

• Generic properties of molecular codes subject to evolution?

• Information theory approach?

Environment

Page 4: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Challenges of molecular codes: rate and distortion

Distortion

• Noise, crowded milieu.

• Competing lookalikes.

• Weak recognition interactions ~ kBT.

• Need diverse meanings.

“Synthesis of reliable organisms from unreliable components”

(von Neumann, Automata Studies 1956)

Rate

• How to construct the low-rate molecular codes

at minimal cost of resources?

Rate-distortion theory (Shannon 1956) Inside E. Coli, D. Goodsell

Page 5: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Codes are mappings, channels, representations, models…

• Code ϕ is a mapping between spaces, ϕ: S → M.

• Molecular codes map\translate between molecular spaces\languages.

• Molecular spaces have inherent geometry\topology.

• Coding machinery affects organism's fitness.

S M ϕ

Page 6: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Outline: molecular codes and errors

• Living and artificial self-replication.

• The main molecular codes of life (central dogma).

• The translation machinery:

– The genetic code, ϕ: codons → amino-acids.

– The ribosome and the problem of molecular recognition.

• Basic coding theory: geometrical aspects.

– How codes cope with errors.

• Emergence and evolution of codes: rate-distortion.

• Accuracy vs. rate: proofreading schemes.

Page 7: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Coding and the problem of self-replication

Proposed demonstration of simple robot self-replication,

from advanced automation for space missions, NASA conference 1980.

Page 8: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Self-replication and accuracy in computers

Page 9: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Von Neumann’s universal constructor

Self-reproducing machine: constructor + tape (1948/9).

• Program on tape:

(i) retrieve parts from “sea” of spares.

(ii) assemble them into a duplicate.

(iii) copy tape.

.

(1966)

Kemeny, Man viewed as a machine , Sci Am (1955)

Page 10: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Von Neumann’s design allows open-ended evolution

Motivated by biological self-replication:

• Construction universality.

• Evolvability.

Key insight (before DNA) separation of information and function.

• Tape is read twice: for construction and when copied.

• How to design fast/accurate/compact constructor?

• Requires efficient and accurate coding… mutations

Implementation by Nobili & Pesavento (1995)

Page 11: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Outline: molecular codes and errors

• Living and artificial self-replication.

• The main molecular codes of life (central dogma).

• The translation machinery:

– The genetic code, ϕ: codons → amino-acids.

– The ribosome and the problem of molecular recognition.

• Basic coding theory: geometrical aspects.

– How codes cope with errors.

• Emergence and evolution of codes.

• Accuracy vs. rate: proofreading schemes.

Page 12: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Dual spaces of DNA and proteins

• Building blocks:

20 amino acids.

• Polymer = protein.

• Functional molecules (“constructor”)

• Building blocks :

4 nucleic bases = {A, T, G, C}.

• Polymer: DNA double-helix.

• Inert information storage (“tape”)

DNA

protein

Page 13: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

RNA intermediates can be both tapes and machines

DNA

protein

• Primordial “RNA world” :

RNA molecules are both information carriers (DNA) and executers (proteins).

RNA

Page 14: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

The Central Dogma of molecular biology

Francis Crick 1956

Francis Crick:

Page 15: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

The central dogma graphs the main information channels between nucleotides and proteins

• Information from DNA sequence cannot be channeled

back from protein to either protein or nucleic acid.

• 3 information carriers: DNA, RNA protein

and 3×3 potential channels:

- 3 general channels (occur in most cells).

- 3 special channels (under “specific” conditions).

- 3 unknown transfers (no example (yet?)).

replication

translation

Page 16: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

“special” information transfers

RNA replication

• Reverse transcription (RNA DNA ):

Reverse transcriptase, in retroviruses (e.g. HIV)

and eukaryotes (retrotransposons and telomeres).

• RNA replication (RNA RNA):

Many viruses replicate by RNA-dependent RNA

polymerases (also used in eukaryotes for RNA

silencing).

• Direct translation (DNA protein):

demonstrated in extracts from E. coli which

expressed proteins from foreign ssDNA templates.

Page 17: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Channels outside the dogma: Epigenetic information transfer

• Changes in methylation of

DNA alter gene expression

levels.

• Heritable change is called

epigenetic.

• Effective information change

but not DNA sequence.

• Others:

post-translational modification…

Page 18: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Post-translational modifications of proteins :

• Extends functionality by attaching other groups (e.g. acetate).

• Changes chemical nature of aa.

• Structural changes (disulfide bridges).

• Compensate for missing tRAS (Helicobacter pylori).

• Enzymes may remove amino acids or cut the peptide chain in the middle.

Page 19: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Self-replication requires fast, accurate and robust coding

replication

translation

replication

translation

Page 20: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Universal constructors in the arts

The Santa Claus machine (A. Sward)

The “replicator” (Star Trek)

Page 21: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Universal constructors in the arts and in reality (?)

Self-printing?

Page 22: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Outline: molecular codes and errors

• Living and artificial self-replication.

• The main molecular codes of life (central dogma).

• The translation machinery:

– The genetic code, ϕ: codons → amino-acids.

– The ribosome and the problem of molecular recognition.

• Basic coding theory: geometrical aspects.

– How codes cope with errors.

• Emergence and evolution of codes.

• Accuracy vs. rate: proofreading schemes.

Page 23: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

The translation machinery is the main system of the living von Neumann’s universal constructor

• Machinery parts = tRNA + synthetase + ribosome…

• The translation machinery conveys information from nucleotides to proteins.

tRNA

Amino-acid

Anti-codon

Aminoacyl-tRNA synthetase

(~one per each aa)

• Synthetases charge tRNAs

according to the genetic code. φ(c)

Page 24: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Ribosomes translate nucleic bases to amino acids

Goodsell, The Machinery of Life

• Ribosomes are large molecular machines that

synthesize proteins with mRNA blueprint and

tRNAs that carry the genetic code.

genetic code

mRNA

tRNA

small subunit

large subunit

protein

amino-acid= (codon)

tRNA

Amino-acid

Anti-codon

1. Is the code φ(c) adapted to the noise problem?

Page 25: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Ribosome needs to recognize the correct tRNA

2. How to construct fast\accurate\small molecular decoder ?

• Accept tRNA

• Reject tRNA

tRNAs

(i) binding wrong tRNAs:

(ii) unbinding correct tRNAs:

amino-acid (codon)

amino-acid (codon)

Ribosome

No

ise

ϕ(ci)

ic

jc

c9 c10 c7 c8 c11 c12 c3 c4 c1 c2 c5 c6

ϕ(c5)

ϕ(c4)

ϕ(c3)

ϕ(c2)

ϕ(c1)

Ribosome

mRNA

protein

ϕ(ci)

ic

tRNAs

ic

ϕ(ci)

synthetase

Page 26: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

• Central problem in biology and chemistry:

How to evolve molecules that recognize in a noisy environment?

(crowded, thermally fluctuating, weak interactions).

• How to estimate recognition performance (“fitness”)?

• What are the relevant degrees-of-freedom? Dimension? Scaling?

• What is the role of conformational changes?

2.Decoding at the ribosome is a molecular recognition problem

• Accept tRNA

• Reject tRNA

tRNAs

Ribosome

No

ise

ϕ(ci)

ic

jc

Page 27: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Ribosome sets physical limit on self-reproduction rate

Large fraction of cell mass is ribosomes.

• In self-reproduction each ribosome should self-reproduce.

• Sets lower bound on self-reproduction rate .

• “Fastest “ growing bacteria (Clostridium perfringens): T ~ 500 sec.

Problem: how ribosome accuracy affects fitness depends on

(i) Basic protein properties (mutations).

(ii) Biological context (environment etc.).

4ribo

C

mass 10 amino-acids 500 sec

20 amino-acids/secT

R

ribomass

rate CR

error WR

T

Page 28: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Outline: molecular codes and errors

• Living and artificial self-replication.

• The main molecular codes of life (central dogma).

• The translation machinery:

– The genetic code, ϕ: codons → amino-acids.

– The ribosome and the problem of molecular recognition.

• Basic coding theory: geometrical aspects.

– How codes cope with errors.

• Emergence and evolution of codes.

• Accuracy vs. rate: proofreading schemes.

Page 29: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

1. The genetic code maps DNA to protein

• Genetic code: maps 3-letter words in 4-letter DNA language (43 = 64 codons)

to protein language of 20 amino acids.

• Genetic code embeds the codon-graph (Hamming graph) into space of amino-acids (“digital to analog”).

• Translation machinery, whose main component is the ribosome, facilitates the map.

1 2 3codon = , {A, T, G, C}.

(codon) amino-acid.

ib b b b

polarity

size

charge

G T

C

A ϕ

Genetic code

3

Page 30: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Outline: molecular codes and errors

• Living and artificial self-replication.

• The main molecular codes of life (central dogma).

• The translation machinery:

– The genetic code, ϕ: codons → amino-acids.

– The ribosome and the problem of molecular recognition.

• Basic coding theory: geometrical aspects.

– How codes cope with errors.

• Emergence and evolution of codes.

• Accuracy vs. rate: proofreading schemes.

Page 31: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Coping with unreliability of coding machinery

• Error detecting code - parity checking.

• One check: Odd parity mistake (e.g. 0111).

• Retransmission.

• Single error can be detected but not corrected.

• The redundancy of the code:

signal binary parity

0 000 0

1 001 1

2 010 1

3 011 0

4 100 1

5 101 0

6 110 0

7 111 1 total # of bits

# of message bits 1

nR

n

Page 32: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Error correction requires minimal redundancy

• Error correcting code – can detect and correct errors.

• Multiple checks – Locating errors by confluence.

• Triplication code – send each message thrice (R = 3).

• What is the minimal number of checks m?

– To locate n positions requires

• Hamming’s code reaches this limit.

1001010111

0 1 0 0 1

10011011010010011110 0011111001 11010100011101001100

1

2 2log ( 1) log1 1

n n nR

n m n n

2 1.m n

Rectangular code 2

1Rn

Page 33: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Geometric view of error correction and detection

• Messages are mapped between hypercubes

• Metric is the Hamming distance:

• Sphere is

: (1001 1011001 )n n mY Y

# of different lettersi jx x

0{ | | }.S x x x r

d

r

Page 34: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Error correction is packing hard spheres

• To correct r errors the spheres should be at least at distance

• Correction: move to nearest sphere center.

• How many words can be encoded?

Or how many spheres can be packed?

2 1.d r

total volume sphere volume # spheres

2 ( 1) 2 2 1n n m mn n

S M

ϕ d

r

2# spheres # words = 2

1

nn m

n

Page 35: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Shannon’s channel coding theorem sets upper limit on the capacity of a noisy channel

• Noisy channel is defined by stochastic input\output ϕ(s|m).

• Channel capacity measures the input\output correlation

• Channel rate

• Shannon’s coding theorem (1948/9):

• Proof: show that # hard spheres is

• Upper limit achieved only “recently” (turbo codes, LDPC).

( , )2 2 2 .nR nI S M nC

( ) ( )

2

( , )max ( ; ) max log .

( ) ( )s s

s mC I S M

s m

S

M ϕ

2log (# words)lim .n

Rn

.R C

Page 36: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

• Degenerate (20 out of 64) “spheres”.

• Compactness of amino-acid regions.

• Smooth (similar “color” of neighbors).

• But not immune to one-letter errors (“soft” spheres).

Generic properties of molecular codes?

The genetic code is a smooth mapping

Amino-acid polarity

S

M

S M

ϕ

Page 37: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Gray code ”smooths” the impact of errors

• Invented by Émile Baudot for telegraphy.

• Often used in AD and DA applications.

• Minimizes the number of changes between

close by values smooth code.

• Used in many modulation schemes.

(e.g. phase shift).

Page 38: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

The smooth genetic code as a combinatorial game

“Marble packing”

(A) Max colors.

(B) Same\similar color of neighbors.

S M

Page 39: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

The genetic code maps codons to amino-acids

• Molecular code = map relating two sets of molecules.

• Spaces defined by similarity of molecules (size, polarity etc.)

64 codons 20 amino-acids

Genetic Code

GGG GGC GAG GAC GCG GCC GUG GUC

GGA GGU GAA GAU GCA GCU GUA GUU

AGG AGC AAG AAC ACG ACC AUG AUC

AGA AGU AAA AAU ACA ACU AUA AUU

CGG CGC CAG CAC CCG CCC CUG CUC

CGA CGU CAA CAU CCA CCU

UCA UCU

UGG UGC UAG UAC UCG UCC UUG UUC

UGA UGU UAA UAU

CUA CUU

UUA UUU

tRNA

amino

acid

codon

Page 40: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

• Distortion of noisy channel, D = average distortion of AA.

• r defines topology of codon space.

• c defines topology of amino-acid space.

Fitter codes have minimal distortion

D Trpath

paths

c p c e r d c

,

tRNA

amino-acids codons

Page 41: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

• Optimal code must balance contradicting needs for smoothness and diversity.

Smooth codes minimize distortion am

ino

-aci

d

• Noise confuses close codons.

• Smooth code:

close codons = close amino-acids.

→ minimal distortion.

20 # amino-acids

1 64

Max smoothness

Min diversity

Min smoothness

Max diversity

Marble game

Page 42: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

• Diverse codes require high specificity = high binding energies ε.

• Cost ~ average binding energy < ε >.

• Binding prob. ~ Boltzmann: E ~ e ε/T .

• Cost I = Channel Rate (bits/message)

Channel rate is code’s cost

,

lni i i ei

I e e

i α Encoder e

Page 43: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Rate-distortion theory of noisy information channels

• How well a mapping represents a signal?

• Example: quantization of continuous signal.

• The average distortion of a signal

• Main theorem: there exists a rate-distortion

function R(D) which is the minimal required

rate R to achieve distortion D.

D path

paths

c p c

{ }R(D) min ( , )

D DI S M

Shannon's limit:

( 0)R D C

max

Random

( ) 0R D D

(Shannon, Kolmogorov 1956)

Page 44: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Code’s fitness combines rate and distortion of map

• Gain β increases with organism complexity and environment richness.

• Fitness H is “free energy” with inverse “temperature” κ.

• Evolution varies the gain κ.

• Population of self-replicators evolving according

to code fitness H: mutation, selection, random drift.

H D I Fitness = Gain x Distortion + Rate

Page 45: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

• Low gain β : Cost too high

→ no specificity → no code.

• Code emerges when β increases:

channel starts to convey information (I ≠ 0).

• Continuous phase transition.

• Emergent code is smooth, low mode of R.

Code emerges at a critical coding transition

Distortion Q

Rate I

Coding transition

codes

no-code code

Rate-distortion theory (Shannon 1956)

Page 46: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

The emergent code is smooth

• Example: mapping between two cycles.

• Code emerges at critical transition.

PRL 2008

• Order parameter: deviation from random map

rand.i i aie e e

Page 47: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Emergent code is a smooth mode of error-Laplacian

• Lowest excited modes of graph-Laplacian R .

• Single maximum for lowest excited modes (Courant).

• Every mode corresponds to amino-acid :

# low modes = # amino-acids.

→ single contiguous domain for each amino-acid.

→ Smoothness.

Page 48: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

AAA

AGA

AAG

CAA

ACA

AAT

AAC GAA

ATA

TAA

CCA

ACT

GAT

A GAC

ATC

TTA

TGA

AGG CAG

Probable errors define the graph and the topology of the genetic code

• Codon graph = codon vertices + 1-letter difference edges (mutations).

T

A

G

C

T

A

G

C X X

T

A

G

C K4 X K4 X K4

• Non-planar graph (many crossings).

• Genus γ = # holes of embedding manifold.

• Graph is holey : embedded in γ = 41

(lower limit is γ = 25)

Page 49: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Coloring number limits number of amino-acids

• Q: Minimal # colors suffices to color a map where neighboring

countries have different colors?

• A: Coloring number, a topological invariant (function of genus):

1( ) 7 1 48 .

2chr

max(# amino-acids) ( )chr

• From Courant ‘s theorem + “convexity” (tightness).

• Genetic code: γ = 25-41 → coloring number = 20-25 amino-acids

(41) 25chr

(25) 20chr

(Ringel & Youngs 1968)

Page 50: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

The genetic code coevolves with accuracy

• A path for evolution of codes: from early codes with higher codon

degeneracy and fewer amino acids to lower degeneracy codes with more

amino acids.

1st 2nd 3rd chr #

1 4 1 0 4

2 4 1 1 7

4 4 1 5 11

4 4 2 13 16

4 4 3 25 20

4 4 4 41 25

Page 51: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Part I: Summary

• The translation machinery:

– The genetic code, ϕ: codons → amino-acids.

• Genetic code is a smooth map that minimizes distortion.

• Model for emergence: phase transition in a noisy mapping.

• Free energy is rate-distortion function.

• Continuous transition.

• Topology governs emergent code. Sources:

• Shannon, Mathematical Theory of Communication.

• Hamming, Coding and computation.

• von Neumann, In Automata Studies.

• Feynman, Lectures on Computation.

• Cover & Thomas, Elements of information theory.

• Berger T, Rate distortion theory.

Papers on coding: follow PITP link

Page 52: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

CODING A LIFE FULL OF ERRORS

PITP

IAS 2012

PART II c9 c10 c7 c8 c11 c12 c3 c4 c1 c2 c5 c6

ϕ(c5)

ϕ(c4)

ϕ(c3)

ϕ(c2)

ϕ(c1)

ϕ(ci)

ic

Page 53: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Ribosomes translate nucleic bases to amino acids

Goodsell, The Machinery of Life

• Ribosomes are large molecular machines that

synthesize proteins with mRNA blueprint and

tRNAs that carry the genetic code.

genetic code

mRNA

tRNA

small subunit

large subunit

protein

amino-acid= (codon)

tRNA

Amino-acid

Anti-codon

Is the code φ(c) adapted to the noise problem?

Page 54: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

George Palade (50s)

Page 55: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Ribosome needs to recognize the correct tRNA

How to construct fast\accurate\small molecular decoder ?

• Accept tRNA

• Reject tRNA

tRNAs

(i) binding wrong tRNAs:

(ii) unbinding correct tRNAs:

amino-acid (codon)

amino-acid (codon)

Ribosome

No

ise

ϕ(ci)

ic

jc

c9 c10 c7 c8 c11 c12 c3 c4 c1 c2 c5 c6

ϕ(c5)

ϕ(c4)

ϕ(c3)

ϕ(c2)

ϕ(c1)

Ribosome

mRNA

protein

ϕ(ci)

ic

tRNAs

ic

ϕ(ci)

synthetase

Page 56: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

• Central problem in biology and chemistry:

How to evolve molecules that recognize in a noisy environment?

(crowded, thermally fluctuating, weak interactions).

• How to estimate recognition performance (“fitness”)?

• What are the relevant degrees-of-freedom? Dimension? Scaling?

• What is the role of conformational changes?

2.Decoding at the ribosome is a molecular recognition problem

• Accept tRNA

• Reject tRNA

tRNAs

Ribosome

No

ise

ϕ(ci)

ic

jc

Page 57: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Ribosome sets physical limit on self-reproduction rate

Large fraction of cell mass is ribosomes.

• In self-reproduction each ribosome should self-reproduce.

• Sets lower bound on self-reproduction rate .

Problem: how ribosome accuracy affects fitness depends on

(i) Basic protein properties (mutations).

(ii) Biological context (environment etc.).

4ribo

C

mass 10 amino-acids 500 sec

20 amino-acids/secT

R

ribomass

rate CR

error WR

T

Page 58: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Ribosomes are complicated machines with many d.o.f.

Ribosomes are made of proteins and RNAs:

• ~ 104 nucleic bases in RNA.

• ~ 104 amino-acids in proteins.

• Total mass : ~ 3·106 a.u.

• High-res structure is known (Yonath et al.).

Within this known complexity:

• What are the relevant degrees-of-freedom?

• How does this machine operate?

(magenta – RNA, grey – protein,

from Goodsell, Nanotechnology ) 2

0 –

30

nm

Page 59: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Decoding is determined by energy landscapes of correct and wrong tRNAs

𝑅𝐶 ~ 1

𝑒𝑏1 + 𝑒𝑏2 + 𝑒𝑏3

Steady-state decoding rates

(Arrhenius law, 𝑘 ∝ 𝑒−∆𝐺)

𝑅𝑊 ~ 1

𝑒𝑏1 + 𝑒𝑏2 + 𝑒𝜹+𝑏3

• Decoding is multi-stage process.

• Kinetics involves large conformational changes.

Page 60: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

In Ehrenberg’s notation

• Merge the first two barriers to get Michaelis-Menten kinetics:

3 ; 1;

/ ; ;

c cb Bd a

ac nc

c a

nc c

d dd d anc c

c c

k ka e d

k k

k kd e d d d e

k k

3

1

1 1

/1

1 1

c

c ccata a

m

C C C W

C a ab B

k d Ak k

K a d

e R RR k k

e e

𝑗𝑐

𝑠𝑐𝑒=

𝑘𝑐𝑎𝑡𝐾𝑚

𝑐

= 𝑘𝑎𝑐

𝑘𝑐𝑐

𝑘𝑐𝑐 + 𝑘𝑑

𝑐 = 𝑘𝑎𝑐1

1 + 𝑎⇔ 𝑅𝐶 = 𝑘𝑎

𝐶1

1 + 𝑒𝑏3−𝐵∝

1

𝑒𝐵 + 𝑒𝑏3

𝑗𝑛𝑐

𝑠𝑛𝑐𝑒=𝑘𝑐𝑎𝑡𝐾𝑚

𝑛𝑐

= 𝑘𝑎𝑛𝑐

𝑘𝑐𝑛𝑐

𝑘𝑐𝑛𝑐 + 𝑘𝑑

𝑛𝑐 = 𝑘𝑎𝑛𝑐

1

1 + 𝑑𝑑𝑎⇔ 𝑅𝑊 = 𝑘𝑎

𝑊1

1 + 𝑒𝑏3−𝐵+𝛿∝

1

𝑒𝐵 + 𝑒𝑏3+𝛿

𝐴 =𝑘𝑐𝑎𝑡𝐾𝑚

𝑐𝑘𝑐𝑎𝑡𝐾𝑚

𝑛𝑐

= 𝑑𝑎1 + 𝑑𝑑𝑎

1 + 𝑎=𝑑𝑎 + 𝑑𝑎

1 + 𝑎⇔

𝑅𝐶𝑅𝑊

=𝑒𝐵 + 𝑒𝑏3+𝛿

𝑒𝐵 + 𝑒𝑏3=1 + 𝑒𝑏3−𝐵+𝛿

1 + 𝑒𝑏3−𝐵

1 2 .b bBe e e

Page 61: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Ribosome kinetics exhibits large dimensionality reduction

• Effective dimension decreases by at least 3 orders of magnitude:

~ 104 structural parameters → ~ 10 kinetic parameters (energy landscape).

• Generic phenomenon in biomolecules: many catalytic molecules (enzymes) can be

described by a few kinetic parameters (transition state landscape).

What is the origin of dimensionality reduction?

• Hints:

- Protein function mainly involve the lowest modes of their vibrational spectra (hinges).

- Sectors: “Normal modes” of sequence evolution (Leibler & Ranganthan).

Page 62: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Transition states reduce the dimensionality of

effective parameter space

Page 63: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Theory can be tested with measured rates

• The codon-specific stages are Codon recognition and GTP activation.

(Rodnina’s lab, Gottingen)

(UUU) (CUC)

𝑅𝐶 ~ 1

𝑒𝑏1 + 𝑒𝑏2 + 𝑒𝑏3

𝑅𝑊 ~ 1

𝑒𝑏1 + 𝑒𝑏2 + 𝑒𝜹+𝑏3

Page 64: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

How to estimate recognition performance (“fitness”) ?

What is the actual dimension of the problem ?

Page 65: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Recognition fitness has generic features

• “Fitness” F is often obscure and context-dependent:

look for generic properties of 𝐹 𝑅𝐶 , 𝑅𝑊 = 𝐹(𝐵, δ, 𝑏3).

• Only requirement: “biologically reasonable”, 𝜕𝐹

𝜕𝑅𝐶≥ 0,

𝜕𝐹

𝜕𝑅𝑊≤ 0.

• Searching for optimum in (𝐵, δ, 𝑏3) space:

(i) 𝜕𝐹

𝜕𝛿≥ 0 : 𝜹 approaches biophysical limit.

(ii) 𝜕𝐹

𝜕𝐵= 0 &

𝜕𝐹

𝜕𝑏3≥ 0 or

𝜕𝐹

𝜕𝐵≤ 0 &

𝜕𝐹

𝜕𝑏3= 0 :

Optimization is essentially 1D (2 other parameters approach limit).

𝑅𝐶 ~ 1

𝑒𝐵 + 𝑒𝑏3

𝑅𝑊 ~ 1

𝑒𝐵 + 𝑒𝜹+𝑏3

(𝑒𝐵 = 𝑒𝑏1 + 𝑒𝑏2)

max δ

min𝐵 δ

𝑏3

𝐵

𝜕𝐹

𝜕𝐵= 0

𝜕𝐹

𝜕𝑏3= 0

Yonatan Savir

Page 66: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

What is the optimal energy landscape of the ribosome?

• For example, distortion fitness from engineering (weight 𝑑 is context-dependent) :

𝐹 = 𝑅𝐶 − 𝑑 ∙ 𝑅𝑊 ∝1

𝒆𝐵+𝑒𝑏3−

𝑑

𝒆𝐵+𝑒𝑏3+𝛿

• 1D problem: optimum is along 𝑏3.

(measured: ∆ = 𝑏3 − 𝑏2)

• What is the optimal b3 (or ∆ )?

• Is the ribosome optimal ?

• Role of conformational changes ?

Page 67: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Optimal design is a Max-Min strategy

• Weight 𝑑 can vary.

(i) For each 𝑑 normalize 𝐹.

(ii) “Worst case scenario”:

max min 𝐹 .

• Max-Min solution is “symmetric”:

−𝛿/2 + 𝐵

−𝛿/2 + 𝐵

105

10−5

𝑏3 = −1

2𝛿 + 𝐵

Page 68: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Ribosome shows an energy barrier which is nearly optimal

• Measurements: 𝛥𝐶 ≈ −7 𝑘𝐵𝑇, 𝛿 ≈ 12𝑘𝐵𝑇 , 𝐵 = 𝐵 − 𝑏2 ≈ 1𝑘𝐵𝑇.

• Prediction: the optimal regime is symmetric,

𝛥𝐶 = −1

2𝛿 + 𝐵

• The ribosome is nearly optimal

(according to Max-Min prediction).

𝛥𝐶 < 0, 𝛥𝑊 > 0.

Page 69: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Decoding is optimal for all six measured tRNAs

• Except for UUC which encodes the same amino-acid

(UUC) = (UUU) phenylalanine.

Page 70: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Optimality is valid for wide range of fitness functions

• Ribosome optimal in wide region:

• General feature: any fitness function 𝐹(𝑅𝐶 , 𝑅𝑊)

exhibits optimum as long as both rates are “relevant”.

6 55 10 2 10 .e d e

𝐹 = 𝑅𝐶 − 𝑑 ⋅ 𝑅𝑊

−𝛿/2 + 𝐵

−𝛿/2 + 𝐵

𝑅𝐶 − 𝑑 ⋅ (𝑅𝑊/𝑅𝑪) 𝑅𝐶 − 𝑑 ⋅ 𝑅𝑊𝟐

/C W

F Fe e

R R

Page 71: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Theory predicts optimal regime of ribosomes for all organisms

• Optimal region in the space of all possible landscapes, −𝛿 ≤ 𝛥𝐶 ≤ 0.

• Mutations and antibiotics tend to push away of optimality.

𝛥𝐶 = −1

2𝛿 + 𝐵

Page 72: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

What is the role of conformational changes?

• Energy barrier results from binding energy and deformation energy penalty:

• Therefore

• For any

non-zero deformation is optimal for tRNA recognition.

Energy barrier that discerns the right target from competitors.

deform bind

1.

2G G B

deform bind

1.

2C G G B

deformG

bindG

C

bind B

15 k T,

2G B

deform 0G

Page 73: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Recombination machinery recognizes homologous DNA

• Exchange between two homologous DNAs.

• Essential for:

– Genome integrity (repair machinery).

– Genetic diversity (crossover, sex).

• Task: Detect correct, homologous DNA target

among many incorrect lookalikes.

• DNA stretches during recombination:

large deformation energy barrier.

Yonatan Savir

Page 74: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Energy barriers for optimal recognition may be a general design principle of recognition systems with competition

Recombination optimizes extension energy of dsDNA.

Relevant energy

Fitness F

Relevant energy Ribosome optimizes energy barriers of decoding

• Applies to any enzymatic kinetics in the presence of competition...

• Conformational proofreading: Design principle follows from optimization of information transfer function.

• May explain induced fit (Koshland 1958). Why molecules deform upon binding to target.

c9 c10 c7 c8 c11 c12 c3 c4 c1 c2 c5 c6

ϕ(c5)

ϕ(c4)

ϕ(c3)

ϕ(c2)

ϕ(c1)

ϕ(ci)

ic

Page 75: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Open questions, future directions…

Understanding evolvable matter:

• What are the degrees-of-freedom underlying dimensional reduction?

(Rubisco and other enzymes)

• Basic logic of molecular information channels

(e.g. utilizing conformational changes, worst case scenario).

Translation machinery coevolved with proteins

• Physics of the state of matter called “proteins”

(evolvable, mapped from DNA space, glassy dynamics) .

Page 76: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Kinetic Proofreading

• The basic idea: iterations of irreversible discrimination step lead to

exponential amplification.

N

N

N

NC C

NN

W W

N

Q K

Q KF

Q K

Hopfield (1974), Ninio (1975)

Page 77: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

RecA dynamics exhibits multistage KPR

NF 2 /2NF

Bar-Ziv Libchaber (2002,2004)

Page 78: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

RecA filament strongly fluctuates

• Gradual depolymerization vs. polymerization jumps (similar to microtubules):

• Continuum approximation

• Fluctuations “scan” the sequence and

are therefore sensitive to mutations.

( ( vacancies), )N

n n m

m n

p P n P p

Page 79: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

RecA dynamics is ultra-sensitive to DNA sequence

• At steady-state Gaussian amplification

• General result (Murugan, Huse & Leibler):

• Can sense even single mutations:

~ exp( # loops).P

Page 80: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

THANKS

more: www.weizmann.ac.il\complex\tlusty

Yonatan Savir

Page 81: CODING A LIFE FULL OF ERRORStlusty/talks/PITP2012.pdf · • Coding machinery affects organism's fitness. S ... and in reality (?) Self-printing? Outline: molecular codes and errors

Early simulations of artificial “evolution”

Niels Aall Barricelli