Statistics for high-dimensional data: P-values and ...buhlmann/teaching/presentation3.pdf ·...

Statistics for high-dimensional data:P-values and Stability Selection

Peter Buhlmann and Sara van de Geer

Seminar fur Statistik, ETH Zurich

May 2012

P-values for high-dimensional linear models

Y = Xβ0 + ε

goal: statistical hypothesis testing

H0,j : β0j = 0 or H0,G : β0

j = 0 for all j ∈ G

background: if we could handle the asymptotic distribution ofthe Lasso β(λ) under the null-hypothesis; could construct p-values

this is very difficult!asymptotic distribution of β has some point mass at zero,...Knight and Fu (2000) for p <∞ and n→∞;Montanari (2010, 2012) ... for random design with i.i.d. columnsnot practical at the moment

Variable selection

Example: Motif regressionfor finding HIF1α transcription factor binding sites in DNA seq.Muller, Meier, PB & Ricci

Yi ∈ R: univariate response measuring binding intensity ofHIF1α on coarse DNA segment i (from CHIP-chip experiments)Xi = (X (1)

i , . . . ,X (p)i ) ∈ Rp:

X (j)i = abundance score of candidate motif j in DNA segment i

(using sequence data and computational biology algorithms,e.g. MDSCAN)

question: relation between the binding intensity Y and theabundance of short candidate motifs?

; linear model is often reasonable“motif regression” (Conlon, X.S. Liu, Lieb & J.S. Liu, 2003)

Y = Xβ + ε, n = 287, p = 195

goal: variable selection; find the relevant motifs among the p = 195 candidates

Motif regression

for finding HIF1α transcription factor binding sites in DNA seq.

Yi ∈ R: univariate response measuring binding intensity oncoarse DNA segment i (from CHIP-chip experiments)X (j)

i = abundance score of candidate motif j in DNA segment i

variable selection in linear model Yi = β0 +

p∑j=1

βjX(j)i + εi ,

i = 1, . . . ,n = 287, p = 195

; Lasso selects 26 covariates and R2 ≈ 50%i.e. 26 interesting candidate motifs

motif regression: estimated coefficients β(λCV)

0 50 100 150 200

original data

variables

which variables in S are false positives?(p-values would be very useful!)

(Multi) Sample splittingwhich was an early but sub-ideal proposal

I select variables on first half of the sample ; SI compute OLS for variables in S on second half of the

sample; P-values Pj based on Gaussian linear model

if j ∈ S : Pj from t-statistics

if j /∈ S : Pj = 1 (i.e. if β(j) = 0)

Bonferroni-“style” corrected P-values:

Pcorr,j = min(Pj · |S|,1)

; (conserv.) familywise error control withPcorr,j (j = 1, . . . ,p)

(Wasserman & Roeder, 2008)

this is a “P-value lottery”motif regression example: p = 195, n = 287

adjusted P-values for same important variableover different random sample-splits

ADJUSTED P−VALUE

0.0 0.2 0.4 0.6 0.8 1.0

; improve by aggregating over many sample-splits(which improves “efficiency” as well)

this is a “P-value lottery”motif regression example: p = 195, n = 287

adjusted P-values for same important variableover different random sample-splits

ADJUSTED P−VALUE

0.0 0.2 0.4 0.6 0.8 1.0

; improve by aggregating over many sample-splits(which improves “efficiency” as well)

Sample splitting multiple times

run the sample-splitting procedure B times and do a non-trivialaggregation of p-values

P-values: P(1)j , . . . ,P(B)

goal:aggregation of P(1)

j , . . . ,P(B)j to a single P-value Pfinal,j

problem: dependence among P(1)j , . . . ,P(B)

define

Q(j)(γ) = qγ︸︷︷︸emp. γ-quantile fct.

(P(j)corr,b/γ; b = 1, . . .B)

e.g: γ = 1/2, aggregation with the median; (conserv.) familywise error control for any fixed value of γ

what is the best γ? it really matters; can “search” for it an correct with an additional factor

“adaptively” aggregated P-value:

P(j)final = (1− log(γmin)) · inf

γ∈(γmin,1)Q(j)(γ)

Q(j)(γ) = qγ(P(j)corr,b/γ; b = 1, . . .B)

; reject H(j)0 : βj = 0 ⇐⇒ P(j)

final ≤ α

P(j)final equals roughly a raw P-value based on sample size bn/2c,

multiplied by

a factor ≈ (5− 10) · |S|(which is to be compared with p)

for familywise error rate (FWER) =P[at least one false positive selection]

TheoremConsider Gaussian linear model (with fixed design) andassume:

I limn→∞ P[S ⊇ S0] = 1 screening propertyI |S| < bn/2c sparsity property

Then:strong control for either familywise error rate or for falsediscovery rate

motif regression examplep = 195, n = 287

●●●●●●●●●●●●●

●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●

●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●

●●●●●●●●

●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 50 100 150 200

motif regression

variables

◦: variable/motif with FWER-adjusted p-value 0.006◦: p-value clearly larger than 0.05

(this variable corresponds to known true motif)

discussion: multi sample splitting

I assumes P[S ⊇ S0]→ 1; requires the beta-min condition

such an assumption should be avoided for hypothesistesting(because this is the essence of the question in testingwhether β0

j is smallish or sufficiently large)I necessarily requires design conditions; but this is

unavoidable

Stability Selection (Meinshausen & PB, 2010)which allows to go way beyond linear models

selection of “features” from the set {1, . . . ,p}, e.g.:I variable selection in regression or classificationI edge selection in a graphI membership to a clusterI ...

selection procedure

Sλ ⊆ {1, . . . ,p},λ a tuning parameter

prime example: Lasso for selecting variables in a linear model

subsampling:I draw sub-sample of size bn/2c without replacement,

denoted by I∗ ⊆ {1, . . . ,n}, |I∗| = bn/2cI run the selection algorithm Sλ(I∗) on I∗

I do these steps many times and compute therelative selection frequencies

Πλj = P∗(j ∈ Sλ(I∗)), j = 1, . . . ,p

(P∗ is w.r.t. sub-sampling)

could also use bootstrap sampling with replacement...

subsampling:I draw sub-sample of size bn/2c without replacement,

denoted by I∗ ⊆ {1, . . . ,n}, |I∗| = bn/2cI run the selection algorithm Sλ(I∗) on I∗

I do these steps many times and compute therelative selection frequencies

Πλj = P∗(j ∈ Sλ(I∗)), j = 1, . . . ,p

(P∗ is w.r.t. sub-sampling)

could also use bootstrap sampling with replacement...

Stability selection

Sstable = {j ; Πλj ≥ πthr}

choice of πthr ; see later

if we consider many regularization parameters:

{Sλ; λ ∈ Λ}

(Λ can be discrete or continuous)

Sstable = {j ; maxλ∈ΛΠλj ≥ πthr}

see also Bach (2009) for a related proposal

Choice of threshold πthr ∈ (0,1)?

How to choose the threshold πthr?

denote by V = |Sc0 ∩ Sstable| = number of false positives

consider a selection procedure which selects q variables(e.g. top 50 variables when running Lasso over many λ’s)

Theorem (Meinshausen & PB, 2010main assumption: exchangeability conditionin addition: S is better than “random guessing”Then:

E[V ] ≤ 12πthr − 1

i.e. finite sample control, even if p � n; choose threshold πthr to control e.g. E[V ] ≤ 1 orP[V > 0] ≤ E[V ] ≤ α

note the generality of the Theorem...

I it works for any method which is better than “randomguessing”

I it works not only for regression but also for “any” discretestructure estimation problem (whenever there is ainclude/exclude decision); variable selection, graphical modeling, clustering, ...

and hence there must be a fairly strong condition...Exchangeability condition:the distribution of {I{j∈Sλ}; j ∈ SC

0 } is exchangeablenote: only some requirement for noise variables

note the generality of the Theorem...

I it works for any method which is better than “randomguessing”

I it works not only for regression but also for “any” discretestructure estimation problem (whenever there is ainclude/exclude decision); variable selection, graphical modeling, clustering, ...

and hence there must be a fairly strong condition...Exchangeability condition:the distribution of {I{j∈Sλ}; j ∈ SC

0 } is exchangeablenote: only some requirement for noise variables

Many simulations!

rep(1, 2)

data p n

violations max cor

rep(1, 2)

irst 0

1000(A)

201000

rep(1, 2)

1000(B)

201000

rep(1, 2)

1000(C)

201000

rep(1, 2)

10200100(D)

201000

rep(1, 2)

10200200(E)

201000

rep(1, 2)

42587660(F)

rep(1, 2)

4088(G)

irst 0

X X X X X X

X X X X

even if exchangeability condition fails:(conservative) error control for E[V ] holds

Motif regression (n=287, p=195)

I two stable motifs with stability selection (w.r.t. E[V ] ≤ 1)I Multi sample splitting finds only the one motif which is not

biologically validated

Ribolflavin data (n=71, p=4088): control for E[V ] ≤ 1

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●

●●

●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●

●●

●●●●●●●●●●●

●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

0 1000 2000 3000 4000

stability selection E[V] <= 1

I stability selection: 4 stable variables (w.r.t. E[V ] ≤ 1)I Multi sample splitting: 2 significant variables (w.r.t. FWER)

Graphical modeling using GLasso(Rothman, Bickel, Levina & Zhu, 2008; Friedman, Hastie & Tibshirani, 2008)

infer conditional independence graph using `1-penalizationi.e. infer zeroes of Σ−1 from X1, . . . ,Xn i.i.d. ∼ Np(0,Σ)

Σ−1jk 6= 0 ⇔ X (j) 6⊥ X (k)|X ({1,...,p}\{j,k}) ⇔ edge j − k

gene expr. data

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!! == 0.46

al Lasso

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!! == 0.448

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!! == 0.436

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!! == 0.424

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!! == 0.412

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

!! == 0.4

!!!!!!

!!!!!!!!

!!!!!!

!!!!!!!!

zero-pattern of Σ−1

sub-problem of Riboflavin datap = 160, n = 115stability selection with E[V ] ≤ 5

varying the regularization parameter λ in `1-penalization

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

λλ == 0.46

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

λλ == 0.448

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

λλ == 0.436

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

λλ == 0.424

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

λλ == 0.412

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

λλ == 0.4

●●

●●●●●●

●●●●

●●●●●●●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

with stability selection: choice of initial λ-tuning parameter doesnot matter much (as proved by our theory)just need to fix the finite-sample control

permutation of variablesvarying the regularization parameter for the null-case

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.065G

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.063

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.061

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.059

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.057

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.055

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

with stability selection: the number of false positives is indeedcontrolled (as proved by our theory)and here: exchangeability condition holds

permutation of variablesvarying the regularization parameter for the null-case

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.065G

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.063

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.061

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.059

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.057

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

λλ == 0.055

●●

●●●●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●

●● ●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

with stability selection: the number of false positives is indeedcontrolled (as proved by our theory)and here: exchangeability condition holds

Predicting causal effects from observational data

↘ error rate

Causal gene ranking

summary median errorGene rank effect expression (PCER) name

1 AT2G45660 1 0.60 5.07 0.0017 AGL20 (SOC1)2 AT4G24010 2 0.61 5.69 0.0021 ATCSLG13 AT1G15520 2 0.58 5.42 0.0017 PDR124 AT3G02920 5 0.58 7.44 0.0024 replication protein-related5 AT5G43610 5 0.41 4.98 0.0101 ATSUC66 AT4G00650 7 0.48 5.56 0.0020 FRI7 AT1G24070 8 0.57 6.13 0.0026 ATCSLA108 AT1G19940 9 0.53 5.13 0.0019 AtGH9B59 AT3G61170 9 0.51 5.12 0.0034 protein coding

10 AT1G32375 10 0.54 5.21 0.0031 protein coding11 AT2G15320 10 0.50 5.57 0.0027 protein coding12 AT2G28120 10 0.49 6.45 0.0026 protein coding13 AT2G16510 13 0.50 10.7 0.0023 AVAP514 AT3G14630 13 0.48 4.87 0.0039 CYP72A915 AT1G11800 15 0.51 6.97 0.0028 protein coding16 AT5G44800 16 0.32 6.55 0.0704 CHR417 AT3G50660 17 0.40 7.60 0.0059 DWF418 AT5G10140 19 0.30 10.3 0.0064 FLC19 AT1G24110 20 0.49 4.66 0.0059 peroxidase, putative20 AT1G27030 20 0.45 10.1 0.0059 unknown protein

• biological validation by gene knockout experiments in progress.

; see Friday...

The Lasso and its stability path and why Stability Selection works so well

riboflavin example: n = 71, p = 4099sparsity s0 “=” 6 (6 “relevant” genes;

all other variables permuted)Lasso Stability selection

with stability selection: the 4-6 “true” variables are sticking outmuch more clearly from noise covariates

stability selection cannot be reproduced by simply selecting theright penalty with Lasso

stability selection provides a fundamentally new solution

Leo Breiman

and providing error controlin terms of E[V ] (; conservative FWER control)

providing error control:

E[V ] ≤ 12πthr − 1

I super-simple!I over-simplistic?

Comparative conclusions

without conditions on the “design matrix” or “covariancestructure” (etc.):cannot assign strengths or uncertainty of a variable or astructural component (e.g. edge in a graph) in ahigh-dimensional setting

and these conditions are typically uncheckable...

three “easy to use” methods in comparison:

method assumptions applicabilitymulti sample splitting compatibility condition GLMs

beta-min conditionstability selection exchangeability condition “all”

the less assumptions, the more trustable the statisticalinference!“yet”, given the necessity of often uncheckable “designconditions”:confirmatory high-dimensional inference remains challenging

Statistics for high-dimensional data: P-values and ...buhlmann/teaching/presentation3.pdf ·...

Documents

Transcript of Statistics for high-dimensional data: P-values and ...buhlmann/teaching/presentation3.pdf ·...

Values & means: summary (Falconer & Mackay: chapter 7)

NTC Characteristic Values - Vincotech

Lecture 4 BLUP Breeding Values

Lower Bounds for 2-Dimensional Range Counting

Three-Dimensional Lithium-Ion Battery Model (Presentation)

Dimensional Analysis and Dynamic Similitude Dimensional ...

Flujo Unidimensional y Bi Dimensional

Module 4: Dimensional analysis - UBC Blogsblogs.ubc.ca/frigaard/files/2017/12/18-MECH280-Module4.pdf · Module 4: Dimensional analysis Understand dimensions, units and dimensional

Two-dimensional, phenanthroline-based, extended π ...

Chapter 3: Two – Dimensional Motion and Vectors

3-Dimensional Position-Sensitive CdZnTe ?-Ray Spectrometers

Statistic values by ArtVertise, examples & boards

Dimensional Analysis and Scalingjlogan1/PDFfiles/amch1.pdf · CHAPTER 1 Dimensional Analysis and Scaling 1. Dimensional analysis Exercise 1.1 The variables are t,r,ρ,e,P. We already

Multi-dimensional Parallel Discontinuous Galerkin Method

Statistics for high-dimensional data: Group Lasso and ...buhlmann/teaching/presentation4.pdf · Statistics for high-dimensional data: Group Lasso and additive models Peter Buhlmann

Three-Dimensional Brownian Motion and the Golden Ratio …goran/golden.pdf · Three-Dimensional Brownian Motion and the Golden ... X is the radial part of three-dimensional Brownian

One Dimensional Electronic Systems: Quantum Hall Edge ...

Stable L evy motion with values in the Skorokhod space ... · L evy process is a natural in nite-dimensional analogue of the -stable L evy motion with values in R d , with which it

AПDЯЄÏ ѪAПÏБβΔS Values Education 9 LM

On robust regression with high-dimensional predictors