Harvard MIT DOE GtL Center

37
Harvar d MIT DOE GtL Center Collaborating PIs: Chisholm, Polz, Church, Kolter, Ausubel, Lory arep.med.harvard. C.Ting 7-Feb-2005 4:10-4:40 PM 2-20 μm 0.6 μm

description

Harvard MIT DOE GtL Center. C.Ting. 2-20 μm. 7-Feb-2005 4:10-4:40 PM. Collaborating PIs: Chisholm, Polz, Church, Kolter, Ausubel, Lory arep.med.harvard.edu. 0.6 μm. Molecular Systems Biology Access is free of charge. Transcriptomics Proteomics Metabolomics Functional genomics - PowerPoint PPT Presentation

Transcript of Harvard MIT DOE GtL Center

Page 1: Harvard MIT  DOE GtL  Center

HarvardMIT DOEGtL

Center

Collaborating PIs: Chisholm, Polz, Church, Kolter, Ausubel, Lory

arep.med.harvard.edu

C.Ting

7-Feb-2005 4:10-4:40 PM

2-20 μm

0.6 μm

Page 2: Harvard MIT  DOE GtL  Center

Molecular Systems BiologyAccess is free of charge

Transcriptomics Proteomics

Metabolomics Functional genomics Structural genomics

Computational biology Theoretical biology

Mathematical biologySynthetic biology

www.nature.com/msb/

Page 3: Harvard MIT  DOE GtL  Center

Harvard MIT DOE Center Projects Prochlorococcus Photosynthesis, circadian & cell cycles Escherichia Synthetic genomes/proteomesVibrio 4X faster replication than E.coliCaulobacter Asymmetric cell & chromsome structurePseudomonas Biofilms

Poster# Topic Goal# 2. Leptos, et al. Proteomics 1121. Nguyen, et al. Mass spectrometry XML 1122. Nguyen, et al. Gene Regulation 2 67. Thompson, et al. Vibrio diversity 3 68. Martiny, et al. Prochlorococcus diversity 3 77. Sullivan, et al. Cyanophage diversity 1,3 3. Zhang, et al. Single cell sequencing 1-4 1. Church, et al. Metabolic fluxes 4

arep.med.harvard.edu

Page 4: Harvard MIT  DOE GtL  Center

Prochlorococcus 40ºN - 40ºS

Ocean chl a (Aug 1997 –Sept 2000)Provided by the SeaWiFS Project, NASA

Page 5: Harvard MIT  DOE GtL  Center

Humans consume 2kW per person = 1010 kW.Sunlight hits the earth at 40,000 times that rate (70% ocean).

CO2 370 ppm = 730 x1015 g globally, increase ~3 x1015 /yr.Ocean productivity = ~100 x1015 g CO2/yr … due to

Autotrophs: 1026 Prochlorococcus cells globally (108 per liter)

Sequestration v. respiration v. use: heterotrophs (Pelagibacter), phages, predators (Maxillopoda, Malacostraca, herring)

Energy & CO2 Sequestration

http://www.gsfc.nasa.gov/gsfc/service/gallery/fact_sheets/earthsci/terra/earths_energy_balance.htmhttp://clear.eawag.ch/models/optionenE.html http://en.wikipedia.org/wiki/CopepodMorris et al. Nature 2002 Dec 19-26;420(6917):806-10. http://hosting.uaa.alaska.edu/mhines/biol468/pages/carbon.htmlhttp://www.aeiveos.com/~bradbury/Papers/PhotosyntheticEfficiency.html

0.1 0.1 mm6 cm

Page 6: Harvard MIT  DOE GtL  Center

Diel (circadian) cycleLight output for sun-box: 14hr light – 10hr dark, 230 E at peak

Zinser, Lindell,Zinser, Lindell, Chisholm, Chisholm, Leptos, Jaffe, Lin,Leptos, Jaffe, Lin, et al.et al.

Page 7: Harvard MIT  DOE GtL  Center

Light Dark DarkLight

No

rma

lize

d e

xpre

ssio

n

Time (Hours)

Diel Expression: All genes

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

4.0

5.0

Trust

Zinser et al. unpublZinser et al. unpubl..

Page 8: Harvard MIT  DOE GtL  Center

-Glc-1P ADP-Glc -1,4-glucosyl-glucan glycogenCentralCarbonMetabol.

glgC

glgX

glgA glgB

glgP

Glycogen metabolism

Time (hours)

0 4 8 12 16 20 24 28 32 36 40 44 48

Nor

mal

ized

Exp

ress

ion

0.1

1

10

glgAglgBglgCglgXglgP

Zinser et al. unpublZinser et al. unpubl..

Light regulated Prochlorococcus metabolism

Page 9: Harvard MIT  DOE GtL  Center

Oxygenic Photosynthesis

psbApsbA=D1=D1

D2D2HLIP=HLIP=High High Light Light Induced ProteinInduced Protein

Pc= PlastocyaninPc= Plastocyanin

Fd=Fd=FerridoxinFerridoxin

H2O O2

NADPH

e-e-

PSIPSII

H2O O2H2O O2

NADPH

e-e-

PSIPSIPSIIPSII

Core reaction Center ProteinsCore reaction Center Proteins

Page 10: Harvard MIT  DOE GtL  Center

Photosynthetic Genes in Phage

Podovirus P-SSP7 46 kb

PC HLIPs Fd D1

12kb 24kb

PC HLIPs Fd D1

12kb 24kb

~500 bp

HLIPs D1 D2

6.4kb 2.8kb

~500 bp

Myovirus P-SSM4 181 kbHLIPs D1 D2

6.4kb 2.8kb

Lindell, Sullivan, Chisholm et al. 2004Lindell, Sullivan, Chisholm et al. 2004

HLIP D1

Myovirus P-SSM2 255 kb

Page 11: Harvard MIT  DOE GtL  Center

RNA Responses to Phage

MED4-0682 (60 aa Conserved URF)

Phage SSP7 psbA

MED4 host psbA

Lindell,Lindell, Sullivan, Zinser, ChisholmSullivan, Zinser, Chisholm

Page 12: Harvard MIT  DOE GtL  Center

Synthetic - homologous recombination

testing of DNA motifs

1.3 2.4 (1.3 in argR)

1.1 1.3

0.7 2.5

0.2 1.4

1.4 3.5

RNA Ratio (motif- to wild type) for each flanking gene

Bulyk, McGuire,Masuda,Church Genome Res. 14:201–208

Page 13: Harvard MIT  DOE GtL  Center

Synthetic Genomes & Proteomes. Why?

• Test or engineer cis-DNA/RNA-elements •Access to any protein (complex) including post-transcriptional modifications• Affinity agents for the above.• Protein design, vaccines, solubility screens • Utility of molecular biology DNA -- RNA -- Protein

in vitro "kits" (e.g. PCR -- T7 -- Roche)

Toward these goals design a chassis:• 115 kbp genome. 150 genes.• Nearly all 3D structures known.• Comprehensive functional data.

Page 14: Harvard MIT  DOE GtL  Center

(PURE) translation utility

Removing tRNA-synthetases, translational release-factors,RNases & proteases

Allows:

Selection of scFvs[antibodies] specific for HBV DNA polymerase using ribosome display. Lee et al. 2004 J Immunol Methods. 284:147

Programming peptidomimetic syntheses by translating genetic codes designed de novo. Forster et al. 2003 PNAS 100:6353

High level cell-free expression & specific labeling of integral membrane proteins. Klammt et al. 2004 Eur J Biochem 271:568

Cell-free translation reconstituted with purified components. Shimizu et al. 2001 Nat Biotechnol. 19:751-5.

Also: membrane incompatible expression & diverse amino-acids (>21)

Page 15: Harvard MIT  DOE GtL  Center

in vitro genetic codes

5'

mS yU eU

UGGUUG CAG

AAC... GUU A 3'GAAACCAUG

fM TN V E

| | | | | || | |

5' Second base 3'

U

A

C

C U

mSyU

eU

A C U

G

A

0

500

1000

1500

2000

2500

3000

3500

30 40 50 60 70 80

3H-E dpm

time (min.)

fM yU mS eU E |

Forster, et al. (2003) PNAS 100:6353Zhang et al. (2004) Science. 303:371

80% average yieldper unnatural coupling.

eU = 2-amino-4-pentenoic acid yU = 2-amino-4-pentynoic acid mS = O-methylserine gS = O-GlcNAc–serine bK = biotinyl-lysine

Page 16: Harvard MIT  DOE GtL  Center

Escherichia coli Mycoplasma 3D structureColiphage 29 DNA polymerase + +Coliphage P1 Cre recombinase - + >Coliphage Lox/Cre recombinase site - +Coliphage T7 RNA polymerase + + >Coliphage T7 RNA polymerase initiation site + + >Coliphage T7 RNA polymerase termination site + +RNase P RNA + -RNase P protein + + >RNase P site/RNA primer for DNA polymerase + +Small subunit 16S ribosomal RNA + +All 21 small subunit ribosomal proteins (1-21) + except 1,21 +Large subunit 5S ribosomal RNA + +Large subunit 23S ribosomal RNA + +Large subunit 23S rRNA G2445>m2G methylase: unknown ? -Large subunit 23S rRNA U2449>dihydroU synthetase: unknown ? -Large subunit 23S rRNA U2457>pseudoU synthetase ? -Large subunit 23S rRNA C2498>Cm methylase: unknown ? -Large subunit 23S rRNA A2503>m2A methylase: unknown ? -Large subunit 23S rRNA U2504>pseudoU synthetase ? -All 33 large subunit ribosomal proteins (1-7,9-11,13-25,27-36) + except 25, 30 +Translational initiation factor 1 + +Translational initiation factor 2 + +Translational initiation factor 3 + +Translational elongation factor Tu + +Translational elongation factor Ts + +Translational elongation factor G + +Translational release factor 1 + +Translational release factor 2 - +Translational release factor Gln methylase + +Translational release factor 3 - +Ribosome recycling factor + +33/45 Transfer RNAs (see Fig. 2) 29/33 +tRNA(I) C34>lysidine synthetase ? +tRNA(R) A34>I deaminase ? +tRNA(ASV) U34>cmo5U (=V) synthetase: unknown - -tRNA(R) U34>2sU Cys desulfurase - +tRNA(R) nm5U34 methylase ? +tRNA(R) U34>cmnm5U GTPase ? +tRNA(R) U34>cmnm5U synthetase ? +tRNA(R) cmnm5U34>nm5U,mnm5U synthetase ? -tRNA(R) G37 N1-methylase + +tRNA(RNIKM) A37>t6A N6-threonylcarbamoyl-A synthetase: unknown + -tRNA(CLFSWY) A37>i6A synthetase - +tRNA(CLFSWY) i6A37>s2i6A(ms2i6A) synthetase - +All 22 aminoacyl-tRNA synthetase subunits (20 enzymes) + except G subunit, Q + except G subunitMet-tRNA formyltransferase + +Chaperonin DnaK + +Chaperonin GroEL + +Chaperonin GroES + +

Total genes = 150Forster & Church

Oligos for 150 & 776

synthetic genes(for E.coli minigenome & M.mobile whole genome

respectively)

Page 17: Harvard MIT  DOE GtL  Center

Up to 760K Oligos/Chip18 Mbp for $700 raw (6-18K genes)

<1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng , Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert

Tian, Gong, Church

Page 18: Harvard MIT  DOE GtL  Center

Improve DNA Synthesis CostSynthesis on chips in pools is 5000X less expensive per

oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!)

Solution: Amplify the oligos then release them.

10 50 10 => ss-70-mer (chip)

20-mer PCR primers with restriction sites at the 50mer junctions

Tian, Gong, Sheng , Zhou, Gulari, Gao, Church Nature 2004

=> ds-90-mer

=> ds-50-mer

Page 19: Harvard MIT  DOE GtL  Center

Improve DNA Synthesis Accuracyvia mismatch selection

Tian & Church Other mismatch methods: MutS (&H,L)

Page 20: Harvard MIT  DOE GtL  Center

Computer Aided Design Polymerase Assembly Multiplexing (CAD-PAM)

Moving forward: 1. Tandem, inverted and dispersed repeats (hierarchical assembly, size-selection and/or scaffolding)2. Reduce mutations (goal <1e-6 errors) to reduce # of intermediates 3. 15kb to 5Mb by homologous recombination (Nick Reppas)4. Phage integrase site-specific recombination, also for counters.

Stemmer et al. 1995. Gene 164:49-53;Mullis 1986 CSHSQB.

50

75

125 225 425 825 … 100*2^(n-1)

Page 21: Harvard MIT  DOE GtL  Center

All 30S-Ribosomal-protein DNAs(codon re-optimized)

Tian, Gong, Sheng , Zhou, Gulari, Gao, Church

1.7 kb

0.3 kb

s190.3kb

Nimblegen 95K chip

Atactic <4K chip

Page 22: Harvard MIT  DOE GtL  Center

Improving synthesis accuracy

Method Bp/error

Chip assembly (PAM) 160 1Hybridization-selection 1,400 1MutS-gel-shift 10,000 2MutHLS cleavage 30,000 3 (10X better than PCR)

1. Tian, Church, et al. 2004 Nature 432:1050 2. Carr, Jacobson, et al. 2004 NAR 32:e162 3. Smith & Modrich 1997 PNAS 94:6847

Page 23: Harvard MIT  DOE GtL  Center

Extreme mRNA makeover for protein expression in vitro

RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially.

RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable.

Solution: Iteratively resynthesize all mRNAs with less mRNA structure.

Tian & Church

20w 20m 17w 17m 16w 16m

10kd

W: wild-typeM: modified

Western blot based on His-tags

Page 24: Harvard MIT  DOE GtL  Center

Safe Synthetic Biology

Church, G.M. (2004) A synthetic biohazard non-proliferation proposal.

http://arep.med.harvard.edu/SBP/Church_Biohazard04c.doc

1. Monitor oligo synthesis via expansion of Controlled substances, Select Agents, &/or Recombinant DNA

2. Computational tools are available; very small number of reagent, instrument & synthetic gene suppliers at present.

3. System modeling checks for synthetic biology projects

4. Multi-auxotroph, novel genetic code for the host genome, prevents functional transfer of DNA to other cells.

Page 25: Harvard MIT  DOE GtL  Center

Marine Synechococcus

high light adaptedProchlorococcus

low light adaptedProchlorococcus

MIT9201GP2

MIT9107

SAR6TATL1a

ENATL1ENATL3

MIT9302MIT9312

MIT9202MIT9215

TATL1b

Pac 1ENATL7

ENATL4

MIT9211MIT9303

SAR139WH8112

SAR100WH8101

WH8012WH7805

SAR7

Synechococcus PCC6307

0.01

89

97

92

72

71

10078

66

84

Photosynthetic bacterial genomes

(for population genetics & proteomics)

MED4

NATL2A

SS120

MIT9313

WH8102

Page 26: Harvard MIT  DOE GtL  Center

Monthlysamples

IsolateVibrios

Identity population ascluster of barcode genes

Quantification:population iscontinuously

present

Genomes:almost each

genome differentin typical sample

Additionalmarker gene:

highlydiverse

Hsp60 allelic diversity

020406080

100120140160

95 96 97 98 99 100

% nucleotide similarity

nu

mb

er

of

seq

uen

ce

Environmental population genomics(of a ribotype cluster)

Thompson, Polz, et al. (2005) Science

Page 27: Harvard MIT  DOE GtL  Center

Sequencing single cells

Biome studies focus on single-cells because hard to grow in the lab, multiple DNAs & RNAs per cell, exchange genome subsets.

(Complementary to Biome shotgun and/or 100 kb BACs)

Many input molecules required to sequence one molecule. vs. one molecule sufficient to sequence via many copies of it.

Page 28: Harvard MIT  DOE GtL  Center

Amplifying DNA from single cells

29 real-time amplification

No template control

Affymetrix quantitation of independent amplifications

Prochlorococcus & Escherchia

Zhang, Martiny, Chisholm, Church, unpub.

Page 29: Harvard MIT  DOE GtL  Center

Polony Bead Sequencing Pipeline

In vitro libraries via paired tag

manipulation

Bead polonies via emulsion PCR

[Dre03]

Monolayered immobilization in acrylamide

Enrichment of amplified beads

SOFTWARE

Images → Tag Sequences

Tag Sequences → Genome

FISSEQ or “wobble”sequencing

Epifluorescence Scope with Integrated Flow

Cell

Mitra, Shendure, Porreca, Rosenbaum, Church unpub.

Page 30: Harvard MIT  DOE GtL  Center

Read length needs for population surveys

Paired tags are separated by 1000 +/- 100 bases

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

8 10 12 14 16 18 20

Length of K-mer Reads (bp)

% o

f P

aire

d K

-mer

s w

ith

Un

iqu

ely

Ass

ign

able

Lo

cati

on

Bacterial

Metazoan

Page 31: Harvard MIT  DOE GtL  Center

Polony Fluorescent In Situ Sequencing Libraries

Greg PorrecaAbraham Rosenbaum

1 to 100kb Genomic1 to 100kb Genomic

M

L R

M

PCRbead

Sequencingprimers

Selectorbead

2x20bp after MmeI (BceAI, AcuI)

Dressman et al PNAS 2003 emulsion

Page 32: Harvard MIT  DOE GtL  Center

Cleavable dNTP-Fluorophore (& terminators)

Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65

Reduce

or

photo-cleave

Page 33: Harvard MIT  DOE GtL  Center

Polony-FISSeq: up to 2 billion beads/slideCy5 primer (570nm) ; Cy3 dNTP (666nm)

Jay ShendureSelf Organizing Monolayer

Page 34: Harvard MIT  DOE GtL  Center

High accuracy special case: homopolymers (e.g. AAA, CC, etc.)

• Use "compressed" tags , ACG = ACCG=ACCCG• Quantitate incorporation • Reversible terminators• FRET between adjacent 3' bases • Wobble primers, CTAGCGAGCTAGNNNNNNNNA

All five of these work.

• Maintenance of amplification fidelity using linear amplification from initial genomic fragment

Page 35: Harvard MIT  DOE GtL  Center

• # of bases sequenced (total) 23,703,953

• # bases sequenced (unique) 73

• Avg fold coverage 324,711 X

• Pixels used per bead (analysis) ~3.6

• Read Length per primer 14-15 bp

• Insertions 0.5%

• Deletions 0.7%

• Substitutions (raw) 4e-5 • Throughput: 360,000 bp/min

Polony FISSeq Stats

Current capillary sequencing 1400 bp/min (600X speed/cost ratio, ~$5K/1X)

(This may omit: PCR , homopolymer, context errors)Shendure

Page 36: Harvard MIT  DOE GtL  Center

Wobble vs Simple primer sequencing

1 vs 2.5 bp read/cycle of 4 bases

10 vs 14-200 bp reads

3e-3 vs 4e-5 non-homopolymer errors

3e-3 vs 1e-1 homopolymer errors

40 minutes per base tested =

60 hr per 20 cycles (20 hr, if 4 colors)

Page 37: Harvard MIT  DOE GtL  Center

Harvard MIT DOE Center Projects Prochlorococcus Photosynthesis, circadian & cell cycles Escherichia Synthetic genomes/proteomesVibrio 4X faster replication than E.coliCaulobacter Asymmetric cell & chromsome structurePseudomonas Biofilms

Poster# Topic Goal# 1. Church, et al. Metabolic fluxes 4 2. Leptos, et al. Proteomics 1 68. Martiny, et al. Prochlorococcus diversity 3121. Nguyen, et al. Mass spectrometry XML 1122. Nguyen, et al. Gene Regulation 2 77. Sullivan, et al. Cyanophages 1,3 67. Thompson, et al. Vibrio diversity 3 3. Zhang, et al. Single cell sequencing 1-4

arep.med.harvard.edu