Higher criticism, SKAT and SKAT-o for whole genome studies

Higher criticism, SKAT, SKAT-o for WGS

Daisuke Yoneoka

Overview• Multiple testing problem• Which of my 100,000 genotypes have impact?• Difficult because of many false positives (FP)

• Controlling false positives• Familywise error rate (FWER)

• Controlling the chance of any FP• Ex. When M=100 and α=0.05, FWER1 (One of 100 tests always has FP!)• Correction method: Bonferroni, Stepwise (cf. Holm, Hochberg), Tarone

• False discovery rate (FDR)• Controlling the fraction of FP• Correction method: LAMP, SKAT

• Higher Criticism (HC)• Recent competitor• Explicit control for threshold

True FalseTrue True positive False positive,

Type 1, α

False False negativeType 2, β

True negativeAct

HC statistic Donoho and Jin (2004)

• Extension of Tukey’s method (1976)

• Consider, not only significances at the .05 level, but at all levels between 0 and α0 • H0: M tests are all null vs H1: a small fraction (at most Mα0) is

non-null

: Significance of the overall body of test

HC statistics, cont. Donoho and Jin (2004)

• Joint null hypothesis against the alternative hypothesis that signals in a set are sparse • Common situation in genetic association studies• jointly test the effects of genetic variants within a gene, network, or

pathway on a disease/trait• Let and is ith p-value sorted in

increasing order

• HC statistics asymptotically follows Gumbel distribution• Li and Siegmud (Annals of Statistics, 2015) provides precise distribution under

finite sample size4/13

You should decide this α0: In many cases, α0 =1/2

Overview of HC statistics

• Global (joint) test (M-times test)• Jointly test the M-times test• H0: M tests are all null vs H1: a small fraction (at most Mα0) is

non-null • asymptotically powerful test of the joint null hypothesis when

signals are sparse• Idea is completely different from that of correction for

multiple testing (ex. Bonferroni)• How to alternatively use• Detect important i signals with

Region Based Analysis of Rare Variants

• Single variant test is not powerful →Region based analysis • test the joint effect of rare/common variants in a gene/region

• Major classes of statistical tests• Burden/Collapsing tests • Supervised/Adaptive Burden/Collapsing tests • Variance component based tests• Kernel based method

• SKAT (SKAT-o)← New!

What is kernel?: Gaussian kernel• Kernel is a similarity function• Gaussian kernel (or Radial basis function kernel) on two

genes Gj and Gj’ is defined ((j,j’)-th element of K)

• SKAT uses the linear kernel (very poor representativeness)

G1 G2 G3

||G1- G2||

Tuning parameter to adjust the similarity

Sequence Kernel Association Test Wu et al. (2000,2001)

• Assume regression model (continuous outcome)

• Variance component test• Assume , where F() is an arbitral distribution• Test hypothesis• Score test for

where is the estimated , and • Each weight is pre-specified

• Ex. , where are shape parameters

Other covariates Genotype (1×p vec)

SKAT, cont.

• General form of Variance component test = SKAT

• Assume , where is a kernel • Test hypothesis • Score test for

• The (j,j’)-th element of K

Semiparametric term

SKAT-Optimal (SKAT-o) Lee et al. (2012)

• SKAT with the correlated kernel (What’s new?)• Burden test vs SKAT (linear kernel)• Burden tests are more powerful when effects are in the same

direction and same magnitude • SKAT is more powerful when the effects have mixed directions• Both scenarios can happen

• New class of kernel• Combine SKAT variance component and burden test statistics (Lee et

al. 2012)

• where and • In practice, is estimated by grid search on a set of pre-specified point

SKAT vs SKAT-o vs Burden test (Li et al. (2012))

Overview of SKAT, SKAT-o

• Advantage• Kernel method has expressive power to capture domain knowledge

in a general manner• Easy to propose new kernel (i.e., propose new similarity)• Theoretically, SKAT-o have robustness of model settings compared

with SKAT under wide range of models

• Disadvantage • Generally difficult to construct a good kernel for a specific problem

Summary

• HC statistics can detect important sparse subset of signals

• Kernel method• Incorporate prior biological information to construct kernel• SKAT is powerful when the effects have mixed directions• SKAT-o is more powerful when each effects of variants highly correlated

• According to simulation studies, SKAT-o is better?• We need to be careful to select kernel function.

Higher criticism, SKAT and SKAT-o for whole genome studies

Data & Analytics

Transcript of Higher criticism, SKAT and SKAT-o for whole genome studies

Joe Felsenstein Department of Genome Sciences and Department …evolution.gs.washington.edu/gs541/2004/lecture27.pdf · 2007. 2. 26. · Bootstrap sample #1 Bootstrap sample #2 Estimate

WHOLE KITCHEN Magazine Nº 9

Sequence Motifs and Antimotifs in -Barrel Membrane Proteins …gila.bioe.uic.edu/lab/papers/2006/JackupsChengLiang06... · 2006. 5. 9. · Proteins from a Genome-Wide Analysis: The

Codon populations in single-stranded whole human genome ...creationwiki.org/pool/images/7/71/DraftJean-claudePerez...Codon populations in single-stranded whole human genome DNA are

Bayesian large-scale multiple regression with summary ...xiangzhu/JSM_20160731.pdf · Bayesian large-scale multiple regression with summary statistics from genome-wide association

WHOLE: Ευζωία και υγιεινές επιλογές για ......WHOLE: Ευζωία και υγιεινές επιλογές για ηλικιωμένους και τους

Genome-wide identification of 31 cytochrome P450 (CYP1 1 -revised 2 3 Genome-wide identification of 31 cytochrome P450 (CYP) 4 genes in the freshwater

Sammelrezension 'Criticism in Society'. · 2013. 7. 19. · Dombrowa, R. : Strukturen in Shakespeares "King Henry the Sixth", von H. Keiper 229 Gallagher, C: The Industrial Reformation

Genome sequences characterizing five mutations in RNA ... · peptidolytic activities from a pathogenic V. pelagius strain is responsible for the lethal effect and vibriosis in turbot.

Yeast whole glucan particle (WGP) β-glucan in conjunction with ...

Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.

MyocardialSlices:anIntermediateComplexityPlatformforTranslational Cardiovascular Research · 2019. 9. 23. · whole heart; B1, the lungs and atria were removed by making an incision

The genome-scale DNA-binding profile of BarR, a β-alanine ...

Genome integration and excision by a new Streptomyces ...

Seriation and de novo genome assemblyrecanati/slides/InstitutCurie16.pdf · Genome Assembly Seriation has direct applications in (de novo) genome assembly. Genomes are cloned multiple

Branches of the NF-κB signaling pathway regulate …among NF-kB signaling pathway- and cell proliferation-related genes. In brief, Rat Genome 230 In brief, Rat Genome 230 2.0 could

DIOGENES...Diogenes 1 (2014) 3-16 ISSN 2054-6696 5 phases of the crusaders’ march, implications about emotional development are drawn from events described, but the criticism and

Next generation sequencing technologies (N GS) for whole ...static.livemedia.gr/livemedia/documents/al16726_us147_20150522105257... · Next generation sequencing technologies (N GS)

1. Whole time Director of the Board is - TNPSC · 103/DM/18 [Turn over 3 1. Whole time Director of the Board is (A) Finance secretary (B) Director of Rural development (C) Engineering

Whole kitchen 8