Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological...

22
Technical and biological variance structure in mRNA-Seq data:life in the real world Paper by Ann Oberg, et al. October 2, 2013 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world

description

Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Transcript of Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological...

Page 1: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Technical and biological variance structure inmRNA-Seq data:life in the real world

Paper byAnn Oberg, et al.

October 2, 2013

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 2: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Concept

Suppose x is helpful in predicting y.

y = β0 + β1x + ε (1)

ε ∼ N(0, σ2)

No variation, no model

◦C = (◦F − 32)× 5

9(2)

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 3: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Concept

RNASeq studies, sources of variation

Technical variation: flowcell, replication in lanes, librarypreparation etc

Biological variation: person to person

Observed count data: combination of both types of variation.

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 4: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Concept

Technical variation Poisson distribution: Var(Y ) = µ

Total variation over-dispersion: Var(Y ) > µ

I within sample variation ∼ Poisson distribution

I between sample variation ∼ Gamma distribution

This gives rise to Negative Binomial distribution

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 5: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Purpose of the paper

Describe the mean variance relationship in mRNA Seq data

1. Var(Y ) = µ: Poisson

2. Var(Y ) = kµ: Overdispersed Poisson (OD)

3. Var(Y ) = µ+ φµ2: Negative-Binomial distribution

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 6: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Purpose of the paper

Estimation of φ is very crucial step

1. per gene, glm.nb function MASS

2. local, empirical Bayes estimate shrinking per gene estimatetowards global, edgeR

3. global, quantile adjusted conditional maximum likelihood,edgeR

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 7: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Data and Statistical Experimental Design, Figure 1

25 study subjects (all female caucasians): 12 high and 13 lowantibody responders

13 flow cells, each with 8 lanes: 4 for High response, 4 for Lowresponse

For each response group, two specimens: unstimulated andstimulated

2 replicates for unstimulated and stimulated specimens each

2 subjects failed from High response group; leaving 10 subjectshigh and 13 subjects low

Only the unstimulated specimens were used, to avoid correlation

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 8: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Figure: 2

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 9: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Statistical Analysis

Models were fit to unstimulated specimens only to focus onbiological variation

Counts for the two technical replicates were summed for themodels.

No normalization with total count per lane-pair OR 75th percentilecount per lane pair as normalization constant.

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 10: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Technical variationRepresentative scatter plot of technical replicate 1 versus technicalreplicate 2 for one subject. Spearman correlation was 0.9941 forthis pair.

Figure: Supplementary plot

Technical reproducibility good

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 11: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Technical variationThe vertical axis is difference between the counts in the tworeplicates on the log2 scale and the horizontal axis is the averageof the two counts on the log2 scale.

Figure: Bland Altman plot: Supplementary plot

No linear bias: Gray smoother overlays y = 0 line

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 12: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Technical variation

QQ plots assuming poisson distribution in addition files.

Technical variation in general follows Poisson distribution.

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 13: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Biological variation, Figure 3A. Plot of Mean (x) and Variance (S2)B. Local estimates of φ and per group mean count

Figure: 3Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 14: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Goodness of fit

QQ plots

1. Standard Poisson

2. NB with global estimate of φ

3. NB with per-gene estimate of φ

4. NB with local estimate of φ

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 15: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Figure: 4

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 16: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Experimental variation

Potential sources of experimental variation examined (Whenexperimental factors were included in the model):

flow-cell, lane-pair and library preparation batch

Figure 5

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 17: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Figure: 5

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 18: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Flow-cell, the entire observed counts were smaller than theexpected count.

Reason was the software upgrade mid-way through the experiment.Number of read increased with the software upgrade, Figure 6A.

After 75th percentile offset was used, no clear flow-cell effect.

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 19: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Figure: 6

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 20: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Characterizing genes with poor model fit

Effect of genes with small counts.

1. smallest GOF statistics: indicative of overfitting

2. largest GOF statistics: indicative of underfitting (notexplaining enough variance)

Filtering out up to 10,000 total count had minor impact

GOF statistics for gene with average gene count < 5 per subjectwere distributed through out the range.Figure 7A

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 21: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Figure: 7

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world

Page 22: Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

Data records of genes with very small GOF statistics.

1. All 0 counts in one response group and non zero counts inother

2. counts very consistent and small variance

Data records of genes with very large GOF statistics.

1. The variance is very high. Example of one such gene in Figure7b

Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world