Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological...
-
Upload
kstatebioinformatics -
Category
Business
-
view
848 -
download
0
description
Transcript of Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological...
Technical and biological variance structure inmRNA-Seq data:life in the real world
Paper byAnn Oberg, et al.
October 2, 2013
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Concept
Suppose x is helpful in predicting y.
y = β0 + β1x + ε (1)
ε ∼ N(0, σ2)
No variation, no model
◦C = (◦F − 32)× 5
9(2)
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Concept
RNASeq studies, sources of variation
Technical variation: flowcell, replication in lanes, librarypreparation etc
Biological variation: person to person
Observed count data: combination of both types of variation.
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Concept
Technical variation Poisson distribution: Var(Y ) = µ
Total variation over-dispersion: Var(Y ) > µ
I within sample variation ∼ Poisson distribution
I between sample variation ∼ Gamma distribution
This gives rise to Negative Binomial distribution
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Purpose of the paper
Describe the mean variance relationship in mRNA Seq data
1. Var(Y ) = µ: Poisson
2. Var(Y ) = kµ: Overdispersed Poisson (OD)
3. Var(Y ) = µ+ φµ2: Negative-Binomial distribution
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Purpose of the paper
Estimation of φ is very crucial step
1. per gene, glm.nb function MASS
2. local, empirical Bayes estimate shrinking per gene estimatetowards global, edgeR
3. global, quantile adjusted conditional maximum likelihood,edgeR
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Data and Statistical Experimental Design, Figure 1
25 study subjects (all female caucasians): 12 high and 13 lowantibody responders
13 flow cells, each with 8 lanes: 4 for High response, 4 for Lowresponse
For each response group, two specimens: unstimulated andstimulated
2 replicates for unstimulated and stimulated specimens each
2 subjects failed from High response group; leaving 10 subjectshigh and 13 subjects low
Only the unstimulated specimens were used, to avoid correlation
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Figure: 2
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Statistical Analysis
Models were fit to unstimulated specimens only to focus onbiological variation
Counts for the two technical replicates were summed for themodels.
No normalization with total count per lane-pair OR 75th percentilecount per lane pair as normalization constant.
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Technical variationRepresentative scatter plot of technical replicate 1 versus technicalreplicate 2 for one subject. Spearman correlation was 0.9941 forthis pair.
Figure: Supplementary plot
Technical reproducibility good
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Technical variationThe vertical axis is difference between the counts in the tworeplicates on the log2 scale and the horizontal axis is the averageof the two counts on the log2 scale.
Figure: Bland Altman plot: Supplementary plot
No linear bias: Gray smoother overlays y = 0 line
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Technical variation
QQ plots assuming poisson distribution in addition files.
Technical variation in general follows Poisson distribution.
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Biological variation, Figure 3A. Plot of Mean (x) and Variance (S2)B. Local estimates of φ and per group mean count
Figure: 3Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Goodness of fit
QQ plots
1. Standard Poisson
2. NB with global estimate of φ
3. NB with per-gene estimate of φ
4. NB with local estimate of φ
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Figure: 4
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Experimental variation
Potential sources of experimental variation examined (Whenexperimental factors were included in the model):
flow-cell, lane-pair and library preparation batch
Figure 5
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Figure: 5
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Flow-cell, the entire observed counts were smaller than theexpected count.
Reason was the software upgrade mid-way through the experiment.Number of read increased with the software upgrade, Figure 6A.
After 75th percentile offset was used, no clear flow-cell effect.
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Figure: 6
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Characterizing genes with poor model fit
Effect of genes with small counts.
1. smallest GOF statistics: indicative of overfitting
2. largest GOF statistics: indicative of underfitting (notexplaining enough variance)
Filtering out up to 10,000 total count had minor impact
GOF statistics for gene with average gene count < 5 per subjectwere distributed through out the range.Figure 7A
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Figure: 7
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world
Data records of genes with very small GOF statistics.
1. All 0 counts in one response group and non zero counts inother
2. counts very consistent and small variance
Data records of genes with very large GOF statistics.
1. The variance is very high. Example of one such gene in Figure7b
Paper byAnn Oberg, et al.Technical and biological variance structure in mRNA-Seq data:life in the real world