Download - Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Transcript
Page 1: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Wet-lab Considerations for Illumina data analysis

Based on a presentation by Henriette O’Geen

Lutz Froenicke

DNA Technologies and Expression AnalysisCores

UCD Genome Center

Page 2: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Complementary Approaches

Illumina PacBio

Still-imaging of clusters(~1000 clonal molecules)

Movie recordings fluorescence of single molecules

Short reads - 2x300 bp Up to 30 kb, N50 18 kb

Repeats are mostly not analyzable spans retro elements

High output - up to 75 Gb per lane up to 1 Gb per SMRT-cell

High accuracy ( < 0.5 %) Error rate 14 to 15 %

Considerable base composition bias No base composition bias

Very affordable Costs 5 to 10 time s higher

De novo assemblies of thousands of scaffolds

“Near perfect” genome assemblies

Page 3: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

RNA-seqGene Expression

Small RNA

ChIP-SEQ

Genotyping

De novo genomeSequencing

DNA Methylation

Metagenomics

Exome Sequencing

Splice IsoformAbundance

Genome Resequencing

3D Organization

SNPs, IndelsCNVs

Rearrangements

High

Throughput

Short Read

Sequencing:

Illumina

Page 4: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Rearrangements

De novo genomeSequencing

RNA-seqGene Expression

Small RNA

ChIP-SEQ

Genotyping

DNA Methylation

Metagenomics

Exome Sequencing

Splice IsoformAbundance

Genome Resequencing

3D Organization

Long Read

Sequencing

PacBio

SNPs, IndelsCNVs

Page 5: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

2014

Page 6: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Sequencing workflow

Library Construction

Cluster Formation

Illumina Sequencing

Data Analysis

Page 7: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

ChIP-seq

• Input control :verify fragment length (100-300 bp)

• ???? very experiment specific

Page 8: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Standard RNA-Seq library protocol

QC of total RNA to assess integrity

Removal of rRNA (most common)

mRNA isolation

rRNA depletion

Fragmentation of RNA

Reverse transcription and second-

strand cDNA synthesis

Ligation of adapters

PCR Amplify

Purify, QC and Quantify

Page 9: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Recommended RNA input

Library prep kit Starting material

mRNA (TruSeq) 100 ng – 4 μg total RNA

Directional mRNA (TruSeq) 1 – 5 μg total RNA or 50 ng mRNA

Apollo324 library robot

(strand specific)

100 ng mRNA

Small RNA (TruSeq) 1 μg total RNA

Ribo depletion (Epicentre) 1 – 5 μg total RNA

SMARTer™ Ultra Low RNA

(Clontech)

100 pg – 10 ng

Ovation RNA seq V2,

Single Cell RNA seq

(NuGen)

10 ng – 100 ng

Page 10: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

• 18S (2500b) , 28S (4000b)

Page 11: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis
Page 12: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

RNA integrity <> reproducibility

Chen et al. 2014

Page 13: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Considerations in choosing an RNA-Seq method

• Transcript type: - mRNA, extent of degradation- small/micro RNA

• Strandedness: - un-directional ds cDNA library - directional library

• Input RNA amount: - 0.1-4ug original total RNA - linear amplification from 0.5-10ng RNA

• Complexity: - original abundance - cDNA normalization for uniformity

• Boundary of transcripts: - identify 5’ and/or 3’ ends - poly-adenylation sites - Degradation, cleavage sites

Page 14: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis
Page 15: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

DNA, RNA

RNA

Fragmentation

Mechanical shearing:

• BioRuptor• Covaris

Enzymatic:

• Fragmentase, RNAse3

Chemical: Mg2+, Zn2+

DNA, RNA

Page 16: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Here:

Page 17: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

DNA

(0.1-1.0 ug)

Single molecule array

Library

preparation Cluster generation5’

5’3’

G

T

C

A

G

T

C

A

G

T

C

A

C

A

G

TC

A

T

C

A

C

C

T

AG

CG

T

A

GT

Sequencing

Illumina Sequencing Technology

Sequencing By Synthesis (SBS) Technology

Page 18: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

TruSeq Chemistry: Flow Cell

Surface of flow

cell coated

with a lawn of

oligo pairs

8

channels

Page 19: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

1.6 Billion Clusters

Per Flow Cell

20

Microns

100

Microns

Sequencing

19

Page 20: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

100

Microns

Sequencing

20

Page 21: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

What will go wrong ?

cluster identification

bubbles

synthesis errors:

Page 22: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

What will go wrong ?

synthesis errors:

Phasing & Pre-Phasing problems

Page 23: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

DNA library construction

5

5

5

5

5

’ 5

’HO

PP

OH

5

5

’ AP

P

A

TT

Fragmented DNA

End Repair

Blunt End Fragments

“A” Tailing

Single Overhang Fragments

Adapter Ligation

DNA Fragments

with Adapter Ends

Page 24: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Enrichment of library fragments

5’

5’

PCR Amplification

Page 25: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

“If you can put adapters on it,

we can sequence it!”

Page 26: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Know your sample

single-stranded

Adapter

Ligation

Page 27: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Optional: PCR-free libraries

PCR-free library:

– Library can be sequenced if concentration allows

– Reduction of PCR bias against e.g. GC rich orAT rich regions, especially for metagenomic samples

OR

Library enrichment by PCR:

– Ideal combination: high input and low cycle number; low-bias polymerase

Page 28: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Quantitation & QC methods

Intercalating dye methods (PicoGreen, Qubit, etc.):Specific to dsDNA, accurate at low levels of DNA

Great for pooling of indexed libraries to be sequenced in one lane

Requires standard curve generation, many accurate pipetting steps

Bioanalyzer:Quantitation is good for rough estimate

Invaluable for library QC

High-sensitivity DNA chip allows quantitation of low DNA levels

qPCRMost accurate quantitation method

More labor-intensive

Must be compared to a control

Page 29: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Library QC by Bioanalyzer

Predominant species of appropriate MW

Minimal primer dimer or adapter dimers

Minimal higher MW material

Page 30: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Library QC by Bioanalyzer

Beautiful 100% Adapters

Beautiful

~ 125 bp

Page 31: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Library QC

Examples for successful libraries Adapter

contamination

at ~125 bp

~125

bp

Page 32: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Is strand-specific information important?

Standard library

(non-directional)

antisense

sense

Neu1

Page 33: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Strand-specific RNA-seq

Standard library (non-directional)

Antisense non-coding RNA

Sense transcripts

Informative for non-coding RNAs and antisense transcripts

Essential when NOT using polyA selection (mRNA)

No disadvantage to preserving strand specificity

Page 34: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

RNA-seq for DGE• Differential Gene Expression (DGE)

– 50 bp single end reads

– 30 million reads per sample (eukaryotes)

• 10 mill. reads > 80% of annotated genes

• 30 mill. . reads > 90% of annotated genes

– 10 million reads per sample (bacteria)

Page 35: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Other RNA-seq• Transcriptome assembly:

– 300 bp paired end plus

– 100 bp paired end

• Long non coding RNA studies:

– 100 bp paired end

– 60-100 million reads

• Splice variant studies:

– 100 bp paired end

– 60-100 million reads

Page 36: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

RNA-seq targeted sequencing:

- Capture-seq (Mercer et al. 2014)- Nimblegen and Illumina

- Low quality DNA (FFPE)- Lower read numbers 10 million reads- Targeting lowly expressed genes.

Page 37: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

RNA-seq reproducibility

• Two big studies multi-center studies (2014)

• High reproducibility of data given:

- same library prep kits, same protocols- same RNA–samples- RNA isolation protocols have to be identical- robotic library preps?

Page 38: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

C1 Single cell capture

Page 39: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

SMARTer cDNA conversion / SMART-seq

Picelli 2014

Page 40: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Molecular indexing – for precision counts

Page 41: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Molecular indexing – for precision counts

Page 42: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Synthetic Spike Ins

• ERCC spike in mix

• QC of library prep

• normalization (internal standards)

• advanced normalization (transcript

lengths)

• Sample identity verification

Page 43: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

ERCC spike-ins

• RNA-seq spike-in standards:

• 92 polyadenylated transcripts - mimic natural eukaryotic mRNAs.

• wide range of lengths (250–2,000 nucleotides)

• and GC-contents (5–51%)

Page 44: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

ERCC spike-in

• RNA-seq spike-in standards:

• To test Dynamic range and lower limit of detection

• Fold-change response

Page 45: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

ERCC spike-ins

• Relative but NOT absolute expression:dosage a big problem

• To recover the differences of 2:390 million reads needed

• Protocols reproducible but not accurate(10x differences of ERCC transcripts between protocols)

Page 46: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Single Molecule Real Time (SMRT™) sequencingSequencing of single DNA molecule by single polymerase Very long reads: average reads over 8 kb, up to 30 kbHigh error rate (~13%).Complementary to short accurate reads of Illumina

http://pacificbiosciences.com

THIRD GENERATION

DNA SEQUENCING

Page 47: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

70 nm aperture

“Zero Mode

Waveguide”

Page 48: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Damien Peltier

Page 49: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Paul Hagerman, Biochemistry and Molecular

Medicine, SOM.• Single-molecule sequencing of pure CGG array,

- first for disease-relevant allele. Loomis et al. (2012)

Genome Research.

- applicable to many other tandem repeat disorders.

• Direct genomic DNA sequencing of methyl groups,

- direct epigenetic sequencing (paper under review).

• Discovered 100% bias toward methylation of 20 CGG-

repeat allele in female,

– first direct methylated DNA sequencing in human

disease.

• DoD STTR award with PacBio. Basis of R01

applications.

First Sequencing of CGG-repeat Alleles in Human Fragile X

Syndrome using PacBio RS Sequencer

Nucleotide position

CGG36 CGG95

A

C

G

T

Page 50: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Iso-Seq Pacbio

• Sequence full length transcripts no assembly

• High accuracy (except very long transcripts)

• More than 95% of genes show alternate splicing

• On average more than 5 isoforms/gene

• Precise delineation of transcript isoforms( PCR artifacts? chimeras?)

Page 51: Wet-lab Considerations for Illumina data analysis - UC …bioinformatics.ucdavis.edu/.../Thursday_LF_lecture.pdf ·  · 2015-04-02Wet-lab Considerations for Illumina data analysis

Thank you!