Stein's Method Applied to Some Statistical Problems · 2017. 7. 5. · Np( ;I) for p 3 James-Stein...

Stein’s Method Applied to Some Statistical Problems

Jay Bartroff

Borchard Colloquium 2017

Jay Bartroff (USC) Stein’s for Stats 4.Jul.17 1 / 36

Outline of this talk

1. Stein’s Method2. Bounds to the normal for group sequential statistics with

covariatesI Produce explicit distributional bounds to the limiting normal

distribution for repeatedly-computed MLE

(θ1, θ2, . . . , θK )

of a parameter vector θ ∈ Rp in group sequential trials3. Concentration inequalities for occupancy models with log-concave

marginalsI How to get bounded, size biased couplings for certain multivariate

occupancy models, then use these to get concentration inequalitiesI Joint work with Larry Goldstein (USC) and Ümit Islak (Bogaziçi

University, Istanbul)

(θ1, θ2, . . . , θK )

Charles Stein1920-2016

Stein’s paradox: X in admissable for θ inNp(θ, I) for p ≥ 3

James-Stein shrinkage estimatorStein’s unbiased risk estimator for MSEStein’s Lemma (1): Covariance estimationStein’s Lemma (2): Sequential sample

size tailsStein’s method for distributional

approximation

Application 1: Group sequential analysis

But there are other books on this subject. . .

Response Yi ∈ R of i th patient depends onknown covariate vector xi

unknown parameter vector θ ∈ Rp

Primary goal: To test a null hypothesis about θ, e.g.,

H0 : θ = 0H ′0 : θj ≤ 0

H ′′0 : atθ = b, some vector a, scalar b

Secondary goals: Compute p-values or confidence regions for θ at theend of study

H0 : θ = 0H ′0 : θj ≤ 0

Setup: Group sequential analysisFor efficiency, ethical, practical, financial reasons, the standard in trialshas become group sequential analysis

A group sequential trial with at most K groups

Group 1: Y1, . . . ,Yn1

Group 2: Yn1+1, . . . ,Yn2

...Group K: YnK−1+1, . . . ,YnK

Group sequential dominant format for clinical trials since. . .

Beta-Blocker Heart Attack Trial (“BHAT”, JAMA 82)Randomized trial of propranolol for heart attack survivors3837 patients randomizedStarted June 1978, planned as ≤ 4-year study, terminated8 months early due to observed benefit of propranololJay Bartroff (USC) Stein’s for Stats 4.Jul.17 8 / 36

Group 1: Y1, . . . ,Yn1

Group 2: Yn1+1, . . . ,Yn2

...Group K: YnK−1+1, . . . ,YnK

Group 1: Y1, . . . ,Yn1

Group 2: Yn1+1, . . . ,Yn2

...Group K: YnK−1+1, . . . ,YnK

Setup: Group sequential analysisStopping rule related to H0

likelihood ratio, t-, F -, χ2- tests commonOf the form:

Stop and reject H0 at stage min{k ≤ K : T (Y1, . . . ,Ynk ) ≥ Ck}

for some statistic T (Y1, . . . ,Ynk ), often a function of the MLE

θk = θk (Y1, . . . ,Ynk )

The joint distribution ofθ1, θ2, . . . , θK

needed tochoose critical values Ck

compute p-value at end of studygive confidence region for θ at end of study

θk = θk (Y1, . . . ,Ynk )

Background: Group sequential analysisJennison & Turnbull (JASA 97)Asymptotic multivariate normal distribution of

(θ1, θ2, . . . , θK )

in a regression setup Yiind∼ fi(Yi ; xi , θ) nice

Asymptotics: nk − nk−1 →∞ for all k , K fixedE∞(θk ) = θ

“Independent increments”

Cov∞(θk1 , θk2) = Var∞(θk2) any k1 ≤ k2

“Folk Theorem”Normal limit widely (over-)used (software packages, etc.) beforeJennison & Turnbull paperCommonly heard: “Once n is 5 or so the normal limit kicks in!”Jay Bartroff (USC) Stein’s for Stats 4.Jul.17 10 / 36

(θ1, θ2, . . . , θK )

Background: Group sequential analysisJennison & Turnbull (JASA 97)Independent increments

Suppose

H0 : atθ = 0, Tk = (at θk )Ik = (at θk )[Var∞(at θk )]−1.

Cov∞(Tk1 ,Tk2) = Ik1 Ik2atCov∞(θk1 , θk2)a

= Ik1 Ik2atVar∞(θk2)a

= Ik1 Ik2Var∞(at θk2)

= Ik1 Ik2 I−1k2

= Ik1 = Var∞(Tk1)

⇒ Cov∞(Tk1 ,Tk2 − Tk1) = 0⇒ T1,T2 − T1, . . . ,TK − TK−1 asymptotically independent normals

Suppose

= Ik1 Ik2 I−1k2

= Ik1 = Var∞(Tk1)

Suppose

= Ik1 Ik2 I−1k2

= Ik1 = Var∞(Tk1)

Suppose

= Ik1 Ik2 I−1k2

= Ik1 = Var∞(Tk1)

New ContributionsExtension to group sequential setting of Berry-Esseen bound formultivariate normal limit for smooth functions

Anastasiou & Reinert 17: Bounds w/ explicit constants forbounded Wasserstein distance for scalar MLE (K = 1 analysis),Bernoulli .Anastasiou & Ley 17: Bounds for the asymptotic normality of themaximum likelihood estimator using the Delta method, ALEA.Anastasiou 17: Bounds for the normal approximation of themaximum likelihood estimator from m-dependent randomvariables. Statistics & Probability Letters.Anastasiou & Gaunt 16+: Multivariate normal approximation of themaximum likelihood estimator via the delta method,ArXiv:1609.03970Anastasiou 15+: Assessing the multivariate normal approximationof the maximum likelihood estimator from high-dimensional,heterogeneous data., ArXiv:1510.03679Jay Bartroff (USC) Stein’s for Stats 4.Jul.17 12 / 36

New Contributions, Cont’d

Relaxing independence assumption: Assume log-likelihood ofYk := (Ynk−1+1, . . . ,Ynk ) is of the form∑

i∈Gk

log fi(Yi , θ) + gk (Yk , θ)

for well-behaved functions fi , gk

gk = 0 gives Jennison & Turnbull’s independent settingSome generalized linear mixed models (GLMMs) with randomstage effect Uk take this form

I Uk = effect due to lab, monitoring board, cohort, etc.

Penalized quasi-likelihood (Breslow & Clayton, JASA 93)

i∈Gk

GLMM Example: Poisson regression

Letting fµ = Po(µ) density,

For Yi in k th stage, Yi |Ukind∼ fµi where µi = exp(βtxi + Uk )

{Uk}iid∼ hλ

θ = (β, λ).

Then log-likelihood is

K∏k=1

∫ ∏i∈Gk

fµi (Yi)hλ(Uk )dUk

∑i∈Gk

log fµi (Yi) + gk (Yk , θ)

where µi = exp(βtxi).

{Uk}iid∼ hλ

θ = (β, λ).

K∏k=1

∫ ∏i∈Gk

∑i∈Gk

{Uk}iid∼ hλ

θ = (β, λ).

K∏k=1

∫ ∏i∈Gk

∑i∈Gk

{Uk}iid∼ hλ

θ = (β, λ).

K∏k=1

∫ ∏i∈Gk

∑i∈Gk

Stein’s Method for MVN Approximation

Generator approach: Barbour 90, Goetze 91Size biasing: Goldstein & Rinott 96, Rinott & Rotar 96Zero biasing: Goldstein & Reinert 05Exchangeable pair: Chatterjee & Meckes 08, Reinert & Röllin 09Stein couplings: Fang & Röllin 15

Theorem (Reinert & Röllin 09)If W ,W ′ ∈ Rq exchangeable pair with EW = 0, EWW t = Σ PD, andE(W ′ −W |W ) = ΛW + R with Λ invertible, then for any 3-timesdifferentiable h : Rq → R,

|Eh(W )− Eh(Σ1/2Z )| ≤ a|h|24

+b|h|312

+ c(|h|1 +

q2||Σ||1/2|h|2

)for certain explicit constants a,b, c.

Stein’s Method for MVN Approximation

Generator approach: Barbour 90, Goetze 91Size biasing: Goldstein & Rinott 96, Rinott & Rotar 96Zero biasing: Goldstein & Reinert 05Exchangeable pair: Chatterjee & Meckes 08, Reinert & Röllin 09Stein couplings: Fang & Röllin 15

Theorem (Reinert & Röllin 09)If W ,W ′ ∈ Rq exchangeable pair with EW = 0, EWW t = Σ PD, andE(W ′ −W |W ) = ΛW + R with Λ invertible, then for any 3-timesdifferentiable h : Rq → R,

|Eh(W )− Eh(Σ1/2Z )| ≤ a|h|24

+b|h|312

+ c(|h|1 +

q2||Σ||1/2|h|2

)for certain explicit constants a,b, c.

Bounds to normal for θK := (θ1, θ2, . . . , θK )Approach: Apply Reinert & Röllin 09 result with W = score functionincrements to get smooth function bounds to normal.

ResultIn the group sequential setup above, if the Yi are independent or followGLMMs with the log-likelihood of the k th group dataYk = (Ynk−1+1, . . . ,Ynk ) of the form∑

i∈Gk

log fi(Yi , θ) + gk (Yk , θ),

then under regularity conditions on the fi and gk there are a,b, c,d s.t.

|Eh(J−1/2(θK −θK ))−Eh(Z )| ≤ aK 2||J−1/2||2|h|24

+bK 3||J−1/2||3|h|3

+ cK ||J−1/2||(|h|1 +

2||Σ||1/2||J−1/2|||h|2

)+ d .

Bounds to normal for θK := (θ1, θ2, . . . , θK )Approach: Apply Reinert & Röllin 09 result with W = score functionincrements to get smooth function bounds to normal.

ResultIn the group sequential setup above, if the Yi are independent or followGLMMs with the log-likelihood of the k th group dataYk = (Ynk−1+1, . . . ,Ynk ) of the form∑

i∈Gk

log fi(Yi , θ) + gk (Yk , θ),

then under regularity conditions on the fi and gk there are a,b, c,d s.t.

|Eh(J−1/2(θK −θK ))−Eh(Z )| ≤ aK 2||J−1/2||2|h|24

+bK 3||J−1/2||3|h|3

+ cK ||J−1/2||(|h|1 +

2||Σ||1/2||J−1/2|||h|2

)+ d .

Comments on resultBound

|Eh(J−1/2(θK −θK ))−Eh(Z )| ≤ aK 2||J−1/2||2|h|24

+bK 3||J−1/2||3|h|3

+ cK ||J−1/2||(|h|1 +

2||Σ||1/2||J−1/2|||h|2

)+ d .

a,b, c terms directly from Reinert & Röllin 09 boundc term ∝ Var(R) in

E(W ′ −W |W ) = ΛW + R,

vanishes in independent cased term is from Taylor Series remaindersRate O(1/

√nK ) under usual asymptotic

nk − nk−1

nK→ γk ∈ (0,1)

Comments on resultBound

|Eh(J−1/2(θK −θK ))−Eh(Z )| ≤ aK 2||J−1/2||2|h|24

+bK 3||J−1/2||3|h|3

+ cK ||J−1/2||(|h|1 +

2||Σ||1/2||J−1/2|||h|2

)+ d .

a,b, c terms directly from Reinert & Röllin 09 boundc term ∝ Var(R) in

E(W ′ −W |W ) = ΛW + R,

vanishes in independent cased term is from Taylor Series remaindersRate O(1/

√nK ) under usual asymptotic

nk − nk−1

nK→ γk ∈ (0,1)

Sketch of ProofIndependent CaseScore statistic

Si(θ) =∂

∂θlog fi(Yi , θ) ∈ Rp, W =

∑i∈G1

Si(θ), . . . ,∑i∈GK

Si(θ)

∈ Rq,

where q = pK .

Fisher Information

Ji(θ) = −E(∂

∂θSi(θ)t

)∈ Rp×p

J(θ1, . . . , θK ) = diag

( n1∑i=1

Ji(θ1), . . . ,

nK∑i=1

Ji(θK )

)∈ Rq×q

Σ := Var(W ) = diag

∑i∈G1

Ji(θ), . . . ,∑i∈GK

Ji(θ)

∈ Rq×q

Si(θ) =∂

∑i∈G1

Si(θ), . . . ,∑i∈GK

Si(θ)

∈ Rq,

where q = pK .

Fisher Information

Ji(θ) = −E(∂

∂θSi(θ)t

)∈ Rp×p

J(θ1, . . . , θK ) = diag

( n1∑i=1

Ji(θ1), . . . ,

nK∑i=1

Ji(θK )

)∈ Rq×q

∑i∈G1

Ji(θ), . . . ,∑i∈GK

Ji(θ)

∈ Rq×q

Si(θ) =∂

∑i∈G1

Si(θ), . . . ,∑i∈GK

Si(θ)

∈ Rq,

where q = pK .

Fisher Information

Ji(θ) = −E(∂

∂θSi(θ)t

)∈ Rp×p

J(θ1, . . . , θK ) = diag

( n1∑i=1

Ji(θ1), . . . ,

nK∑i=1

Ji(θK )

)∈ Rq×q

∑i∈G1

Ji(θ), . . . ,∑i∈GK

Ji(θ)

∈ Rq×q

Sketch of Proof: Exchangeable pairIndependent Case

1. Choose i∗ ∈ {1, . . . ,nK} uniformly, independent of all else2. Replace Yi∗ by independent copy Y ′i∗ (keeping xi∗), call result W ′

⇒W ,W ′ exchangeable

⇒W ,W ′ satisfy linearity condition

E(W ′ −W |W ) = −n−1K W

which is easy to check entry-wise

E(W ′ −W |W ) = −n−1K W

Sketch of Proof: Relating θK to WIndependent Case

By standard Taylor series,

θK−θK = J(θ∗K )−1S, where S =

( n1∑i=1

Si(θ1), . . . ,

nK∑i=1

Si(θK )

)∈ Rq

and θ∗K ∈ Rq on line segment connecting θK , θK .

By standard Taylor series,

θK−θK = J(θ∗K )−1S, where S =

( n1∑i=1

Si(θ1), . . . ,

nK∑i=1

Si(θK )

)∈ Rq

and θ∗K ∈ Rq on line segment connecting θK , θK .

Using S = AW where

1p 0p · · · 0p1p 1p · · · 0p...

.... . .

...1p 1p · · · 1p

1p,0p ∈ Rp×p identity and 0 matrices,

1st term is

|Eh(J−1/2S)− Eh(Z )| = |Eh(W )− Eh(Σ1/2Z )|

where h(w) = h(J−1/2Aw), then apply Reinert-Röllin and simplify.

2nd term is bounded by Taylor series arguments.

Using S = AW where

1p 0p · · · 0p1p 1p · · · 0p...

.... . .

...1p 1p · · · 1p

1p,0p ∈ Rp×p identity and 0 matrices,

1st term is

|Eh(J−1/2S)− Eh(Z )| = |Eh(W )− Eh(Σ1/2Z )|

where h(w) = h(J−1/2Aw), then apply Reinert-Röllin and simplify.

2nd term is bounded by Taylor series arguments.

Sketch of Proof: Exchangeable pairGLMM Case

1. Choose i∗ ∈ {1, . . . ,nK} uniformly, independent of Y1, . . . ,YnK

2. If i∗ in k th group, replace Yi∗ by independent copy Y ′i∗ with mean

ϕ(βtxi∗ + Uk ), ϕ−1 = link function

(same covariates xi∗ , group effect Uk ), call result W ′

E(W ′ −W |W ) = −n−1K W + R

where R = R(g1, . . . ,gK )

E(W ′ −W |W ) = −n−1K W + R

where R = R(g1, . . . ,gK )

E(W ′ −W |W ) = −n−1K W + R

where R = R(g1, . . . ,gK )

E(W ′ −W |W ) = −n−1K W + R

where R = R(g1, . . . ,gK )

E(W ′ −W |W ) = −n−1K W + R

where R = R(g1, . . . ,gK )

Concentration and Coupling

Application 2: Concentration inequalities foroccupancy models with log-concave marginals

Main idea: How to get bounded, size biased couplings for certainmultivariate occupancy models, then use methods pioneered byGhosh and Goldstein 11 to get concentration inequalities

Concentration Inequalities

e.g., P (Y − µ ≥ t) ≤ exp(− t2

2cµ+ ct

)Widely used in

high dimensional statisticsmachine learningrandom matrix theoryapplications: wireless communications, physics, . . .

(See Raginsky & Sason 15)Jay Bartroff (USC) Stein’s for Stats 4.Jul.17 24 / 36

Application 2: Concentration inequalities foroccupancy models with log-concave marginals

Main idea: How to get bounded, size biased couplings for certainmultivariate occupancy models, then use methods pioneered byGhosh and Goldstein 11 to get concentration inequalities

Concentration Inequalities

e.g., P (Y − µ ≥ t) ≤ exp(− t2

2cµ+ ct

)Widely used in

high dimensional statisticsmachine learningrandom matrix theoryapplications: wireless communications, physics, . . .

(See Raginsky & Sason 15)Jay Bartroff (USC) Stein’s for Stats 4.Jul.17 24 / 36

Occupancy model M = (Mα)

Mα may beI degree count of vertex α in an Erdös-Rényi random graphI # of balls in box α in multinomial modelI # balls of color α in sample from urn of colored balls

We consider statistics like

Yge =m∑α=1

1{Mα ≥ d}, Yeq =m∑α=1

1{Mα = d}

Yge =m∑α=1

wα1{Mα ≥ dα}, Yeq =m∑α=1

wα1{Mα = dα}

Yge =m∑α=1

1{Mα ≥ d}, Yeq =m∑α=1

1{Mα = d}

Yge =m∑α=1

wα1{Mα = dα}

Yge =m∑α=1

1{Mα ≥ d}, Yeq =m∑α=1

1{Mα = d}

Yge =m∑α=1

wα1{Mα = dα}

Yge =m∑α=1

1{Mα ≥ d}, Yeq =m∑α=1

1{Mα = d}

Yge =m∑α=1

wα1{Mα = dα}

Some Methods for Concentration Inequalities

McDiarmid’s Bounded Difference InequalityI Y a function with bounded differences of independent inputs

Negative AssociationI e.g., Dubashi & Ranjan 98

Certifiable FunctionsI Y a certifiable function of independent inputsI Controlling a large enough subset of inputs “certifies” value of

functionI e.g., McDiarmid & Reed 06

Bounded Size Bias Couplings

Bounded Size Bias CouplingsIf there is a coupling Y s of Y with the Y -size bias distribution, i.e.,

E [Yf (Y )] = µE [f (Y s)] for all f ,

and Y s ≤ Y + c for some c > 0 with probability one, then

max {P(Y − µ ≥ t),P(Y − µ ≤ −t)} ≤ bµ,c(t).

Ghosh & Goldstein 11: For all t > 0,

P (Y − µ ≤ −t) ≤ exp(− t2

)P (Y − µ ≥ t) ≤ exp

(− t2

2cµ+ ct

b exponential as t →∞.Arratia & Baxendale 13:

bµ,c(t) = exp(−µ

)), h(x) = (1 + x) log(1 + x)− x .

b Poisson as t →∞.Jay Bartroff (USC) Stein’s for Stats 4.Jul.17 27 / 36

P (Y − µ ≤ −t) ≤ exp(− t2

)P (Y − µ ≥ t) ≤ exp

(− t2

2cµ+ ct

)), h(x) = (1 + x) log(1 + x)− x .

P (Y − µ ≤ −t) ≤ exp(− t2

)P (Y − µ ≥ t) ≤ exp

(− t2

2cµ+ ct

)), h(x) = (1 + x) log(1 + x)− x .

Main Result

M = (Mα)α∈[m], Mα lattice log-concave

Yge =∑α∈[m]

wα1{Mα ≥ dα}, Yne =∑α∈[m]

wα1{Mα 6= dα}.

Main Result (in words)1. If M is bounded from below and can be closely coupled to a

version M ′ having the same distribution conditional onM ′α = Mα + 1, then there is a bounded size biased couplingY s

ge ≤ Yge + C and the above concentration inequalities hold.2. If M is non-degenerate at (dα) and can be closely coupled to a

version M ′ having the same distribution conditional on M ′α 6= dα,then there is a bounded size biased coupling Y s

ne ≤ Yne + C′ andthe above concentration inequalities hold.

Main Result

Yge =∑α∈[m]

wα1{Mα 6= dα}.

Main Result

Yge =∑α∈[m]

wα1{Mα 6= dα}.

Main ResultA few more details on Part 1M = f (U) whereU is some collection of random variablesf is measurable

Closely coupled means given Uk ∼ L(Vk ) := L(U|Mα ≥ k), there iscoupling U+

k and constant B such that

L(U+k |Uk ) = L(Vk |M+

k ,α = Mk ,α + 1) and Y+k ,ge, 6=α ≤ Yk ,ge,6=α + B,

where Yk ,ge, 6=α =∑

β 6=α 1(Mk ,β ≥ dβ).

The constant isC = |w |(B|d |+ 1)

where |w | = max wα, |d | = max dα.

Part 2 is similar.Jay Bartroff (USC) Stein’s for Stats 4.Jul.17 29 / 36

β 6=α 1(Mk ,β ≥ dβ).

Main ResultMain Ingredient in Proof

Incrementing LemmaIf M is lattice log-concave then there is π(x ,d) ∈ [0,1] such that if

M ′ ∼ L(M|M ≥ d) and B|M ′ ∼ Bern(π(M ′,d)),

thenM ′ + B ∼ L(M|M ≥ d + 1).

Extension of Goldstein & Penrose 10 for M Binomial, d = 0Analogous versions for

L(M|M ≤ d) ↪→ L(M|M ≤ d − 1)

L(M) ↪→ L(M|M 6= d)

where ↪→ means “coupled to”

Main ResultMain Ingredient in Proof

Incrementing LemmaIf M is lattice log-concave then there is π(x ,d) ∈ [0,1] such that if

M ′ ∼ L(M|M ≥ d) and B|M ′ ∼ Bern(π(M ′,d)),

thenM ′ + B ∼ L(M|M ≥ d + 1).

Extension of Goldstein & Penrose 10 for M Binomial, d = 0Analogous versions for

L(M|M ≤ d) ↪→ L(M|M ≤ d − 1)

L(M) ↪→ L(M|M 6= d)

where ↪→ means “coupled to”

Example 1: Erdös-Rényi random graph

m verticesIndependent edges with probability pα,β = pβ,α ∈ [0,1).Constructing U+

k from Uk :1. Selection non-neighbor β of α with probability

∝ pα,β1− pα,β

2. Add edge connecting β to α

This affects at most 1 other vertex so B = 1 and

Y sge ≤ Yge + |w |(|d |+ 1).

Applying this with dα = d , wα = 1,

P(Yge − µ ≤ −t) ≤ exp(

2(d + 1)µ

)≤ exp

(−t2

2(d + 1)m

Compare with McDiarmid’s bounded difference inequality:

Yge = f (X1, . . . ,X(m2)), Xi = 1{edge between vertex pair i},

supXi ,Xi′

∣∣∣f (X1, . . . ,Xi , . . . ,X(m2))− f (X1, . . . ,Xi ′ , . . . ,X(m

2))∣∣∣ ≤ 2,

4m(m − 1)

)New bound an improvement for m > 2d + 3

2(d + 1)µ

)≤ exp

(−t2

2(d + 1)m

supXi ,Xi′

∣∣∣f (X1, . . . ,Xi , . . . ,X(m2))− f (X1, . . . ,Xi ′ , . . . ,X(m

2))∣∣∣ ≤ 2,

4m(m − 1)

2(d + 1)µ

)≤ exp

(−t2

2(d + 1)m

supXi ,Xi′

∣∣∣f (X1, . . . ,Xi , . . . ,X(m2))− f (X1, . . . ,Xi ′ , . . . ,X(m

2))∣∣∣ ≤ 2,

4m(m − 1)

2(d + 1)µ

)≤ exp

(−t2

2(d + 1)m

supXi ,Xi′

∣∣∣f (X1, . . . ,Xi , . . . ,X(m2))− f (X1, . . . ,Xi ′ , . . . ,X(m

2))∣∣∣ ≤ 2,

4m(m − 1)

2(d + 1)µ

)≤ exp

(−t2

2(d + 1)m

supXi ,Xi′

∣∣∣f (X1, . . . ,Xi , . . . ,X(m2))− f (X1, . . . ,Xi ′ , . . . ,X(m

2))∣∣∣ ≤ 2,

4m(m − 1)

Example 2: Multinomial Counts

n balls independently into m boxesApplications in species trapping, linguistics, . . .# empty boxes proved asymptotically normal by Weiss 58, Rényi62 in uniform caseEnglund 81: L∞ bound for # of empty cells, uniform caseDubashi & Ranjan 98: Concentration inequality via NAPenrose 09: L∞ bound for # of isolated balls, uniform andnonuniform casesBartroff & Goldstein 13: L∞ bound for all d ≥ 2, uniform case

pα,j = prob. that ball j ∈ [n] falls in box α ∈ [m]

Mα = # balls in box α

=∑j∈[n]

1{ball j falls in box α}

Constructing U+k from Uk : Choose ball j 6∈ box α with probability

∝pα,j

1− pα,j

and add it to box α.

Y sge, 6=α ≤ Yge, 6=α so B = 0, thus Y s

ge ≤ Yge + |w |

=∑j∈[n]

∝pα,j

1− pα,j

ge ≤ Yge + |w |

=∑j∈[n]

∝pα,j

1− pα,j

ge ≤ Yge + |w |

Example 3: Multivariate Hypergeometric Sampling

Urn with n =∑

α∈[m] nα colored balls, nα balls of color αSample of size s drawn without replacementMα = # balls in sample of color αApplications in sampling (and subsampling) theory, gambling,coupon-collector problems

Constructing U+k from Uk : Select non-α colored ball in sample with

probability

∝nα(j)/n

1− nα(j)/n, α(j) = color of ball j

and replace it with α-colored ball

ge ≤ Yge + |w |

Urn with n =∑

probability

∝nα(j)/n

ge ≤ Yge + |w |

Urn with n =∑

probability

∝nα(j)/n

ge ≤ Yge + |w |

Urn with n =∑

probability

∝nα(j)/n

ge ≤ Yge + |w |

Urn with n =∑

probability

∝nα(j)/n

ge ≤ Yge + |w |

Urn with n =∑

probability

∝nα(j)/n

ge ≤ Yge + |w |

SummaryStein’s method applied to produce

I explicit bounds to the limiting normal distribution forrepeatedly-computed MLE (θ1, θ2, . . . , θK ) of a parameter vector ingroup sequential trials

I concentration inequalities for a class of occupancy models withlog-concave marginals

Many unanswered questions in statistics possibly susceptible toStein’s

I concentration inequalities for heavy-tailed distributionsI convergence of empirical measures and dimension reduction

methodsF projections of empirical measures onto subspacesF high-dimensional PCA

I other problems in sequential analysisF sequentially stopped test statisticsF stopping rules for high-dimensional MCMC

Thank You!Jay Bartroff (USC) Stein’s for Stats 4.Jul.17 36 / 36

Back Up Slides

Background: McDiarmid’s Inequality

IfX1, . . . ,Xn independentY = f (X1, . . . ,Xn), f measurablethere are ci such that

supxi ,x ′i

∣∣f (x1, . . . , xi , . . . , xn)− f (x1, . . . , x ′i , . . . , xn)∣∣ ≤ ci ,

P(Y − µ ≥ t) ≤ exp

(− t2

i=1 c2i

)for all t > 0,

and a similar left tail bound.

Comparison 2: Negative Association

X1,X2, ...,Xm are NA if

E(f (Xi ; i ∈ A1)g(Xj ; j ∈ A2)) ≤ E(f (Xi ; i ∈ A1))E(g(Xj ; j ∈ A2))

for anyA1,A2 ⊂ [m] disjoint,f ,g coordinate-wise nondecreasing.

Dubashi & Ranjan 98If X1,X2, ...,Xm are NA indicators, then Y =

∑mi=1 Xi satisfies

P(Y − µ ≥ t) ≤(

et for all t > 0

= O (exp(−t log t)) as t →∞.

Both NA and our method yield same order bound for Yge inMultinomial countsMultivariate hypergeometric sampling

but NA cannot be applied to:Yne in multinomial countsYne in multivariate hypergeometric samplingYge or Yne in Erdös-Rényi random graphYge or Yne in germ-grain models

Both NA and our method yield same order bound for Yge inMultinomial countsMultivariate hypergeometric sampling

but NA cannot be applied to:Yne in multinomial countsYne in multivariate hypergeometric samplingYge or Yne in Erdös-Rényi random graphYge or Yne in germ-grain models

Comparison 3: Certifiable Functions

McDiarmid & Reed 06If X1,X2, ...,Xn independent and Y = f (X1,X2, ...,Xn) where f iscertifiable:

There is c such that changing any coordinate xj changes the valueof f (x) by at most c,If f (x) = s then there is C ⊂ [n] with |C| ≤ as + b such that thatyi = ci ∀i ∈ C implies f (y) ≥ s,

P(Y − µ ≤ −t) ≤ exp(− t2

2c2(aµ+ b + t/3c)

)for all t > 0,

= O(exp(−t)) as t →∞.

A similar right tail bound.

Asymptotically O(e−t ).Best possible rate via log Sobolev inequalities(?)

Multinomial Occupancy: We showed C = |w | so if wα = 1,

P(Yge − µge ≤ −t) ≤ exp(−t2

Similar for right tail, Yne

Another Application: Germ-Grain Models

Used in forestry, wireless sensor networks, material science, . . .Germs Uα ∼ fα strictly positive on [0, r)p

Grains Bα = closed ball of radius ρα centered at Uα

d : [0, r)p → {0,1, . . . ,m} = # of intersections we’re interested inat x ∈ [0, r)p

Choice of r relative to p, ρα guarantees nontrivial distribution of

M(x) = # of grains containing at point x ∈ [0, r)p

=∑α∈[m]

1{x ∈ Bα}

∫[0,r)p

w(x)1{M(x) ≥ d(x)}dx

= (weighted) volume of d-way intersections of grains

=∑α∈[m]

1{x ∈ Bα}

∫[0,r)p

=∑α∈[m]

1{x ∈ Bα}

∫[0,r)p

=∑α∈[m]

1{x ∈ Bα}

∫[0,r)p

=∑α∈[m]

1{x ∈ Bα}

∫[0,r)p

Another Application: Germ-Grain ModelsMain ideas in proof

Different approach:1. Generate U0 independent of U1, . . . ,Um

2. Compute U0, . . . ,Ud(U0) and set Y sge = Yge(Md(U0))

3. Y sge has size bias distribution by Conditional Lemma with

A = {M(U0) ≥ d(U0)}:

Conditional Lemma (Goldstein & Penrose 10)If P(A) ∈ (0,1) < 1 and Y = P(A|F), then Y s has the Y -size biasdistribution if L(Y s) = L(Y |A).

A = {M(U0) ≥ d(U0)}:

Another Application: Germ-Grain ModelsMain ideas in proofArgument: Generate U0 ∼ w(x)/

∫w . Given Uk ∼ L(U0|M(U0) ≥ k),

with probability π(Mk (U0), k) choose germ β with probability

∝pβ(U0)

1− pβ(U0), where pβ(x) = P(x ∈ Uβ),

from germs whose grains do not contain U0, replace it with U ′β ∼ PU0

to get Uk+1, where

PU0(V ) = P(Uβ ∈ V |D(Uβ,U0) ≤ ρβ).

Otherwise Uk+1 = Uk .

Volume increase replacing Uβ by U ′β at most νp|ρ|p(νp = vol. of unit ball)Volume increase between U0 and Ud(U0) at most νp|ρ|p|d |Y s

ge increases Yge by at most νp|ρ|p|d ||w |

Another Application: Germ-Grain ModelsMain ideas in proofArgument: Generate U0 ∼ w(x)/

∫w . Given Uk ∼ L(U0|M(U0) ≥ k),

with probability π(Mk (U0), k) choose germ β with probability

∝pβ(U0)

1− pβ(U0), where pβ(x) = P(x ∈ Uβ),

from germs whose grains do not contain U0, replace it with U ′β ∼ PU0

to get Uk+1, where

PU0(V ) = P(Uβ ∈ V |D(Uβ,U0) ≤ ρβ).

Otherwise Uk+1 = Uk .

Volume increase replacing Uβ by U ′β at most νp|ρ|p(νp = vol. of unit ball)Volume increase between U0 and Ud(U0) at most νp|ρ|p|d |Y s

ge increases Yge by at most νp|ρ|p|d ||w |

Stein's Method Applied to Some Statistical Problems · 2017. 7. 5. · Np( ;I) for p 3 James-Stein...

Documents

Transcript of Stein's Method Applied to Some Statistical Problems · 2017. 7. 5. · Np( ;I) for p 3 James-Stein...

2018 - data.math.au.dkNo. 05, April 2018. A general central limit theorem and subsampling variance estimator for -mixing point processes ... By considering triangular arrays, Karácsony

R & D Report 2000:2. On inclusion probabilities and estimator bias … · 2017-04-05 · On inclusion probabilities and estimator bias for Pareto πps sampling 1 Introduction and

Counterfactual Model for Learning Systems · Definition [IPS Utility Estimator]: Given 𝑆= 1, 1,𝛿1,…, 𝑛, 𝑛,𝛿𝑛 collected under 𝜋0, Unbiased estimate of utility

Model Predictive MRAS Estimator for Sensorless …eprint.ncl.ac.uk/file_store/production/219022/D64229F4...is the stator self-inductance, L r is the rotor self-inductance and σ is

2013 modele wp tse · The resulting moment frontier estimator is quite appealing for huge samples of the order of sev-eral thousands, but unfortunately disappoints by its rather large

Two statistical inference methods: Confidence interval Estimator +/- Margin of Error Hypothesis testing Hypothesis: H 0 v.s. H a Test statistic.

A simple modi cation of the Hill estimator with ...

Jets and αs measurements at HERAbib-pubdb1.desy.de/record/152852/files/DESY-2013-00345.pdf · k-factor (k = σ NLO /σ LO ) – an estimator of higher order contributions reaches

Econometrics - Lecture 6 GMM-Estimator and Econometric Models.

Stein’s Method and the Zero Bias Transformation with ...bcf.usc.edu/~larry/papers/pdf/zertr.pdf · Stein’s Method and the Zero Bias Transformation with Application to Simple Random

Stein’s method, logarithmic Sobolev and transport inequalities

Bootstrap and Resampling Methodsluke/classes/STAT7400/...Bootstrap and Resampling Methods Often we have an estimator T of a parameter q and want to know its sampling properties –

Spectral deconvolution of unitary invariant modelsjgarzav/Slides_Pierre_Tarrago.pdf · 2020. 10. 8. · Bun, Allez, Bouchaud, Potters (2016) : Rotationally Invariant Estimator of

High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

NASA’s State-Space Exploration: Verifying Safety-Critical Systems · 2009-11-05 · tool: LTSA consists of control software, state estimator, and 4 types of sensors input provided

Control Using Logic & Switching: Part III Supervisory Controlhespanha/cdc01/cdc01Hespanha-2Dec.pdfEstimator-based linear supervisory control process multi-controller multi-estimator

Adaptive lasso, MCP, and SCAD - myweb.uiowa.edu · Adaptive lasso Concave penalties Introduction Although the lasso has many excellent properties, it is a biased estimator and this

The Impact of Pseudo-Measurements on State …abur/ieee/1_SE.pdfThe Impact of Pseudo-Measurements on State Estimator Accuracy Kevin A. Clements PSA Consulting July 27, 2011 KAC (PSA

Lecture 7 Estimation - Stanford University · Lecture 7 Estimation • Gaussian random vectors • minimum mean-square estimation ... • estimator modiﬁes prior guess by B times

H3: Linear models, Best Linear Unbiased Estimator