Post on 21-Mar-2017
Higher criticism, SKAT, SKAT-o for WGS
Daisuke Yoneoka
1/13
Overview• Multiple testing problem• Which of my 100,000 genotypes have impact?• Difficult because of many false positives (FP)
• Controlling false positives• Familywise error rate (FWER)
• Controlling the chance of any FP• Ex. When M=100 and α=0.05, FWER1 (One of 100 tests always has FP!)• Correction method: Bonferroni, Stepwise (cf. Holm, Hochberg), Tarone
• False discovery rate (FDR)• Controlling the fraction of FP• Correction method: LAMP, SKAT
• Higher Criticism (HC)• Recent competitor• Explicit control for threshold
2/13
True FalseTrue True positive False positive,
Type 1, α
False False negativeType 2, β
True negativeAct
ual
Test
HC statistic Donoho and Jin (2004)
3/13
• Extension of Tukey’s method (1976)
• Consider, not only significances at the .05 level, but at all levels between 0 and α0 • H0: M tests are all null vs H1: a small fraction (at most Mα0) is
non-null
: Significance of the overall body of test
HC statistics, cont. Donoho and Jin (2004)
• Joint null hypothesis against the alternative hypothesis that signals in a set are sparse • Common situation in genetic association studies• jointly test the effects of genetic variants within a gene, network, or
pathway on a disease/trait• Let and is ith p-value sorted in
increasing order
• HC statistics asymptotically follows Gumbel distribution• Li and Siegmud (Annals of Statistics, 2015) provides precise distribution under
finite sample size4/13
You should decide this α0: In many cases, α0 =1/2
Overview of HC statistics
• Global (joint) test (M-times test)• Jointly test the M-times test• H0: M tests are all null vs H1: a small fraction (at most Mα0) is
non-null • asymptotically powerful test of the joint null hypothesis when
signals are sparse• Idea is completely different from that of correction for
multiple testing (ex. Bonferroni)• How to alternatively use• Detect important i signals with
5/13
Region Based Analysis of Rare Variants
• Single variant test is not powerful →Region based analysis • test the joint effect of rare/common variants in a gene/region
• Major classes of statistical tests• Burden/Collapsing tests • Supervised/Adaptive Burden/Collapsing tests • Variance component based tests• Kernel based method
• SKAT (SKAT-o)← New!
6/13
What is kernel?: Gaussian kernel• Kernel is a similarity function• Gaussian kernel (or Radial basis function kernel) on two
genes Gj and Gj’ is defined ((j,j’)-th element of K)
• SKAT uses the linear kernel (very poor representativeness)
7/13
G1 G2 G3
||G1- G2||
σ2
Tuning parameter to adjust the similarity
Sequence Kernel Association Test Wu et al. (2000,2001)
• Assume regression model (continuous outcome)
• Variance component test• Assume , where F() is an arbitral distribution• Test hypothesis• Score test for
where is the estimated , and • Each weight is pre-specified
• Ex. , where are shape parameters
8/13
Other covariates Genotype (1×p vec)
SKAT, cont.
• General form of Variance component test = SKAT
• Assume , where is a kernel • Test hypothesis • Score test for
• The (j,j’)-th element of K
9/13
Semiparametric term
SKAT-Optimal (SKAT-o) Lee et al. (2012)
• SKAT with the correlated kernel (What’s new?)• Burden test vs SKAT (linear kernel)• Burden tests are more powerful when effects are in the same
direction and same magnitude • SKAT is more powerful when the effects have mixed directions• Both scenarios can happen
• New class of kernel• Combine SKAT variance component and burden test statistics (Lee et
al. 2012)
• where and • In practice, is estimated by grid search on a set of pre-specified point
10/13
SKAT vs SKAT-o vs Burden test (Li et al. (2012))
11/13
Overview of SKAT, SKAT-o
• Advantage• Kernel method has expressive power to capture domain knowledge
in a general manner• Easy to propose new kernel (i.e., propose new similarity)• Theoretically, SKAT-o have robustness of model settings compared
with SKAT under wide range of models
• Disadvantage • Generally difficult to construct a good kernel for a specific problem
12/13
Summary
• HC statistics can detect important sparse subset of signals
• Kernel method• Incorporate prior biological information to construct kernel• SKAT is powerful when the effects have mixed directions• SKAT-o is more powerful when each effects of variants highly correlated
• According to simulation studies, SKAT-o is better?• We need to be careful to select kernel function.
13/13