Higher criticism, SKAT and SKAT-o for whole genome studies

13
Higher criticism, SKAT, SKAT-o for WGS Daisuke Yoneoka 1/13

Transcript of Higher criticism, SKAT and SKAT-o for whole genome studies

Page 1: Higher criticism, SKAT and SKAT-o for whole genome studies

Higher criticism, SKAT, SKAT-o for WGS

Daisuke Yoneoka

1/13

Page 2: Higher criticism, SKAT and SKAT-o for whole genome studies

Overview• Multiple testing problem• Which of my 100,000 genotypes have impact?• Difficult because of many false positives (FP)

• Controlling false positives• Familywise error rate (FWER)

• Controlling the chance of any FP• Ex. When M=100 and α=0.05, FWER1 (One of 100 tests always has FP!)• Correction method: Bonferroni, Stepwise (cf. Holm, Hochberg), Tarone

• False discovery rate (FDR)• Controlling the fraction of FP• Correction method: LAMP, SKAT

• Higher Criticism (HC)• Recent competitor• Explicit control for threshold

2/13

True FalseTrue True positive False positive,

Type 1, α

False False negativeType 2, β

True negativeAct

ual

Test

Page 3: Higher criticism, SKAT and SKAT-o for whole genome studies

HC statistic Donoho and Jin (2004)

3/13

• Extension of Tukey’s method (1976)

• Consider, not only significances at the .05 level, but at all levels between 0 and α0 • H0: M tests are all null vs H1: a small fraction (at most Mα0) is

non-null

: Significance of the overall body of test

Page 4: Higher criticism, SKAT and SKAT-o for whole genome studies

HC statistics, cont. Donoho and Jin (2004)

• Joint null hypothesis against the alternative hypothesis that signals in a set are sparse • Common situation in genetic association studies• jointly test the effects of genetic variants within a gene, network, or

pathway on a disease/trait• Let and is ith p-value sorted in

increasing order

• HC statistics asymptotically follows Gumbel distribution• Li and Siegmud (Annals of Statistics, 2015) provides precise distribution under

finite sample size4/13

You should decide this α0: In many cases, α0 =1/2

Page 5: Higher criticism, SKAT and SKAT-o for whole genome studies

Overview of HC statistics

• Global (joint) test (M-times test)• Jointly test the M-times test• H0: M tests are all null vs H1: a small fraction (at most Mα0) is

non-null • asymptotically powerful test of the joint null hypothesis when

signals are sparse• Idea is completely different from that of correction for

multiple testing (ex. Bonferroni)• How to alternatively use• Detect important i signals with

5/13

Page 6: Higher criticism, SKAT and SKAT-o for whole genome studies

Region Based Analysis of Rare Variants

• Single variant test is not powerful →Region based analysis • test the joint effect of rare/common variants in a gene/region

• Major classes of statistical tests• Burden/Collapsing tests • Supervised/Adaptive Burden/Collapsing tests • Variance component based tests• Kernel based method

• SKAT (SKAT-o)← New!

6/13

Page 7: Higher criticism, SKAT and SKAT-o for whole genome studies

What is kernel?: Gaussian kernel• Kernel is a similarity function• Gaussian kernel (or Radial basis function kernel) on two

genes Gj and Gj’ is defined ((j,j’)-th element of K)

• SKAT uses the linear kernel (very poor representativeness)

7/13

G1 G2 G3

||G1- G2||

σ2

Tuning parameter to adjust the similarity

Page 8: Higher criticism, SKAT and SKAT-o for whole genome studies

Sequence Kernel Association Test Wu et al. (2000,2001)

• Assume regression model (continuous outcome)

• Variance component test• Assume , where F() is an arbitral distribution• Test hypothesis• Score test for

where is the estimated , and • Each weight is pre-specified

• Ex. , where are shape parameters

8/13

Other covariates Genotype (1×p vec)

Page 9: Higher criticism, SKAT and SKAT-o for whole genome studies

SKAT, cont.

• General form of Variance component test = SKAT

• Assume , where is a kernel • Test hypothesis • Score test for

• The (j,j’)-th element of K

9/13

Semiparametric term

Page 10: Higher criticism, SKAT and SKAT-o for whole genome studies

SKAT-Optimal (SKAT-o) Lee et al. (2012)

• SKAT with the correlated kernel (What’s new?)• Burden test vs SKAT (linear kernel)• Burden tests are more powerful when effects are in the same

direction and same magnitude • SKAT is more powerful when the effects have mixed directions• Both scenarios can happen

• New class of kernel• Combine SKAT variance component and burden test statistics (Lee et

al. 2012)

• where and • In practice, is estimated by grid search on a set of pre-specified point

10/13

Page 11: Higher criticism, SKAT and SKAT-o for whole genome studies

SKAT vs SKAT-o vs Burden test (Li et al. (2012))

11/13

Page 12: Higher criticism, SKAT and SKAT-o for whole genome studies

Overview of SKAT, SKAT-o

• Advantage• Kernel method has expressive power to capture domain knowledge

in a general manner• Easy to propose new kernel (i.e., propose new similarity)• Theoretically, SKAT-o have robustness of model settings compared

with SKAT under wide range of models

• Disadvantage • Generally difficult to construct a good kernel for a specific problem

12/13

Page 13: Higher criticism, SKAT and SKAT-o for whole genome studies

Summary

• HC statistics can detect important sparse subset of signals

• Kernel method• Incorporate prior biological information to construct kernel• SKAT is powerful when the effects have mixed directions• SKAT-o is more powerful when each effects of variants highly correlated

• According to simulation studies, SKAT-o is better?• We need to be careful to select kernel function.

13/13