of 37/37
63 4 Asymptotics and Robustness 4.1 Consistency b Relative frequency R n P A= π = E〈R n . More precisely, lim n→∞ P {|R n π | } =0 Law of Large Numbers. Jakob Bernoulli (published posth. 1713) X ∼ B〈 n,π −→ E〈 X/n = E〈 X /n = nπ/n = π var X/n = var X /n 2 = (1 π )/n 2 = π (1 π )/n 0! = X/n π .
• date post

06-Nov-2019
• Category

## Documents

• view

5

0

Embed Size (px)

### Transcript of Consistency - stat.ethz.chstahel/courses/resampling/sl-asymp-e.pdf66 4.1 f Functionals. T h F i...

• 63

4Asymptotic

sandRobustn

ess

4.1

Consiste

ncy

bRelative

frequencyRn→P〈A

〉=π=

E〈Rn 〉.

More

precisely,

limn→

∞P{|R

n −π|>

ε}=

0

Law

ofLarge

Num

bers.

JakobBernoulli(published

posth.1713)

X∼

B〈n,π〉

−→E〈X

/n〉

=E〈X

〉/n=nπ/n=π

var〈X

/n〉

=var〈X

〉/n2=nπ(1

−π)/n2=π(1

−π)/n

→0

!=⇒

X/n→π.

• 64

01

01

01

01

01

01

• 654.1c

General

case:X

1 ,X

2 ,...,X

n...

indep.−→

P{|X

n −µ|>ε}

n→∞

−→0

foreach

ε>

0,

01

20

12

01

20

12

01

2

0.0 0.5 1.0 1.5 2.0

dEmpirical

cumulative

distributionfunction

−→theoretical,

F̂n 〈x〉

=1n#〈i

:Xi ≤

x〉n→

∞−→

F〈x〉

=P〈X

≤x〉

.

eConsistency

ofthe

characteristicvalues.

γ̂→γ.

• 664.1f

Functionals.T〈F

〉theoretical

characteristicvalue,

T〈F̂n 〉

empirical.

T〈F̂n 〉

n→∞

−→T〈F

*Mathem

atica

lconditio

ns,“

regularity

assu

mptio

ns”.

gIntegrals

E〈X〉=

∫xdF〈x〉

For

continuousrandom

variables:

E〈X〉=

∫xdF〈x〉

=

∫xf〈x〉

dx

For

discreteones:

E〈X〉=

∫xdF〈x〉

=∑

xx〈X

=x〉

For

empirical

distributionfunctions:

∫xdF̂n 〈x〉

=∑

i xi1n=X

• 674.1h

Let

beaparam

etricfam

ilywith

par.θ=

[θ1 ,...,θ

p ]

Sam

pleX

1 ,...,Xn ,Xi ∼

Estim

atorof

theparam

eters:

Afunction

ofthe

sample,

Tk 〈F̂

n 〉designed

toestim

atepar.

θk

−→If

F=Fθ

thenwewant

Tk 〈F

θ 〉=θk

Fisher

consistency.

jThe

Location

Model

X1 ,X

2 ,...,Xn,

Xi ∼

N〈µ,σ

20 〉,

independent

estimator:

X

–or

median?

med〈X

1 ,X2 ,...,X

n 〉→med 〈N〈µ,σ

20 〉〉=µ

• 68

4.2

Maxim

um

likelih

oodandM-estim

ators

aGiven

aparam

etricfam

ilywith

densityfθ 〈x〉

write

asf〈x

,θ〉Given

theobservation

x,f〈x

,θ〉=

likelihoodofθ.

Sam

efor

probabilitiesPθ 〈X

=x〉,

alsowritten

asf〈x

,θ〉.

Maxim

um-Likelihood

estimator:

maxim

izethe

likelihood!

θ̂=

arg

min〈f〈x

,θ〉〉

• 694.2b

Sam

pleX

1 ,...,Xn

−→joint

density∏i f〈x

i ,θ〉.Maxim

izethis

or

L〈x1 ,...,x

n;θ〉

=∑

i log〈f〈x

i ,θ〉〉or

minim

izeD〈θ〉=

−2∑

i log〈f〈x

i ,θ〉〉=

∑i ρ〈x

i ,θ〉D:Deviance.

ρ-function:“deviation”of

theobservation

xifrom

model〈θ〉.

For

normaldistribution

with

givenvariance:

ρ〈x,µ〉

=((x

−µ)/σ )

2+c

−→Least

Squares

• 704.2c

Exam

plelogistic

distribution.

fµ,σ 〈x〉

=1

(ez/2+e −

z/2 )

2,z=x−µ

σ

location-scalefam

ily.Likelihood

−2log 〈ez/2+e −

z/2 〉.

ρ〈x;µ,σ〉

=4log 〈e(x−

µ)/

2σ+e −

(x−µ)/

2σ 〉

dForm

derivativefor

θkand

setto

0∂L

∂θk 〈x

1 ,...,xn;θ〉

=∑

ni=1sk 〈x

i ;θ〉,

” Likelihood

scores”

sk 〈x

;θ〉

=∂∂θk

log〈f〈x

i ,θ〉〉=

−12

∂∂θk

ρ〈xi ,θ〉

.

∑ni=

1s 〈xi ;θ̂ 〉

=0.

Maxim

um-Likelihood

estimator:

Solve

forθ̂ !

• 714.2e

Logistic

distribution.Location-scale:

z=

(x−µ)/σ

−→∂z/∂µ=

−1/σ

and∂z/∂σ=

−(x

−µ)/σ2=

−z/σ

sµ 〈x

;θ 〉

=1σ

ez/2−

e −z/2

ez/2+e −

z/2

sσ 〈x

;θ 〉

=zsµ 〈x

;θ〉

fMore

general:

θ̂=

argmin

θ

∑ni=

1ρ〈X

i ,θ〉or

θ̂=

solutionof

∑ni=

1ψ 〈X

i ,θ〉=

0

Mestim

ator.

Use

scoresfunction

ofthe

logisticdistribution

evenif

youthink

(orhop

e)that

theobservations

followthe

normaldistribution.

• 724.2g

Mestim

atorsas

functionals.

Tρ 〈F

〉=

argmin

θ

∫ρ〈x

,θ〉dF〈x〉

or

Tψ 〈F

〉=

solutionof

∫ψ〈x

,θ〉dF〈x〉

=0.

Estim

ator:Solution

of∫ψ〈x

,θ〉dF̂n 〈x〉

=1n ∑

i ψ〈xi ,θ〉

=0

T〈Fn 〉→

T〈F〉.

Testim

atesθifT〈F

θ 〉=θ

=⇒

∫ψ〈x

,θ〉dFθ 〈x〉

=0

• 734.2h

Exam

ple:Location

model

andHuber

estimator

f〈x,µ〉

=f〈x

−µ,0〉

,f〈z

,0〉=

1√2πe −

z2/2.

log-lik.: −12log〈2

π〉−(x

−µ)2/

2,

scoress〈x

,µ〉=x−µ

−→estim

ator∑

i (xi −

µ̂)=

0−→

µ̂=

1n ∑i x

i

−→M

estimator

with

ψ〈x,µ〉

=x−µ

Median

isan

Mestim

ator,ψ〈x

,µ〉=

{−1

x−µ<

01

x−µ>

0.

Huber

estimator:

Mestim

atorwith

ψ〈x,µ〉

=

{x−µ

für|x−µ|≤

k−k

fürx−µ<

−k

kfür

x−µ>k

ktuning

constant.

• 74

−5

−4

−3

−2

−1

01

23

45

−3 −2 −1 0 1 2 3

z

ψ(z, µ)M

ittelM

edianH

uberB

iweight

• 754.2i

Maxim

um-Likelihood

estimator

∫s〈x

,θ〉f〈x

,θ〉dx=

0

Fisher-consistent

estimators

forthe

parameter(s)

ofthe

family!

Proof:

∂∂θlog〈f〈x

,θ 〉〉=

1∂∂θ f〈x

,θ〉∂∂θf〈x

,θ 〉=

s〈x,θ〉

=⇒

∂∂θf〈x

,θ 〉=

s〈x,θ〉

f〈x,θ〉

∫f〈x

,θ〉dx

=1

=⇒

∫∂∂θf〈x

,θ〉dx=

0

• 76

4.3

InfluenceFunctio

n

bEmprirical

Influence

function.Exam

pleof

weights

ofpigs

107

108

111

101

97

113

109

105

116

122.

observationx0

−→

(107+

108+...

+122+x0 )/

11

=nx+x0

n+

1=

(n+

1)x

−x

n+

1+

x0

n+

1=x+

1n+1(x

0 −x)

=108.9

+(x

0 −108.9)/

11.

Median

(108+

109)/

2=

108.5

−→108,ifx0 ≤

108

−→109,ifx0 ≥

109

• 77

9095

100105

110115

120125

130

107 108 109 110 111

x, x0

Schätzwert

arithm. M

ittel10%

−gestutztes M

.M

edian

• 784.3c

Empirical

Influence

Function

orSensitivity

Curve

SC〈x

0 ;T,x

1 ,...,xn 〉

=n(T〈x

1 ,...,xn ,x

0 〉−T〈x

1 ,...,xn 〉 )

dGross

Error

Model.

(1−ε)F〈.〉

+ε∆

x0 〈.〉

Model

for“wrong”

observationor“gross

error”

x−

4−

20

24

0.0 0.2 0.4 0.6 0.8 1.0

x0

ε

G(x)

(i)

x−

4−

20

24

0.0 0.2 0.4 0.6 0.8 1.0

x0 ε

G(x)

(ii)

G(x)F(x)∆(x)

More

general:G〈x〉

=(1

−ε)F〈x〉

+εH

〈x〉

• 794.3e

Influence

Function.

IF〈x;T,F

〉=

limε→

0

T〈(1−ε)F

+ε∆

x 〉−T〈F

〉ε

.

fIF

forX:

E〈(1−ε)F

+ε∆

x 〉=

(1−ε)E〈F

〉+εE〈∆

x 〉=

(1−ε)E〈F

〉+εx

IF 〈x;X,F

〉=

x−

E〈F〉

• 804.3g*

Median

med〈F

〉=F

−1〈0

.5〉.Ifx>t+ε=F

−1〈0

.5/(1

−ε)〉

thenmed〈(1

−ε)F

+ε∆

x 〉=t+ε

Ifx>t −ε

=F

−1〈1

−0.5/(1

−ε)〉

then...

=t −ε

ddεF

−1〈0

.5/(1

−ε)〉

=1

f〈F−1〈0

.5/(1

−ε)〉〉 ·

0.5

·(−

1)

(1−ε)2

·(−1)

→1/(2f〈µ〉),

µ=F

−1〈0

.5〉

IF 〈x;X,F

〉=

{−1/(2f〈µ〉)

fürx<

med〈F

〉1/(2f〈µ〉)

fürx>

med〈F

〉isnot

continuousat

themedian,

butjum

ps

from−1/(2f〈µ〉)

to1/(2f〈µ〉).

• 814.3h

Influence

Function

forM

estimators.

IF〈x;F〉=

1cψ〈x

,θ〉mit

c=

−∫

∂∂θψ〈x

,θ〉f〈x

,θ〉dx.

Influence

functionisprop

ortionalto

ψ.

IfTestim

atesθ,

thatis, ∫

s 〈x,θ〉

f〈x,θ〉

dx=

0

c=

∫ψ〈x

,θ〉s〈x

,θ〉f〈x

,θ〉dx.

Special

case:Maxim

um-Likelihood

estimator

c=

−∫

∂∂θs〈X

,θ〉f〈x

,θ〉dx=

∫s〈X

,θ〉2f〈x

,θ〉dx.

• 82

*Proof:T〈G

〉forthedistrib

utio

nG

=(1

−ε)F

+ε∆

x:

∫ψ〈x

,T〈G〉〉dG〈x〉

=(1

−ε) ∫ψ〈x

,T〈G〉〉dF〈x〉

+εψ〈x

,T〈G〉〉

ψ〈x,T〈G

〉〉≈

ψ〈x,T〈F

〉〉+

∂∂θ ψ〈x,T〈F

〉〉(T〈G

〉−T〈F

〉)∫...

≈∫ψ〈x

,T〈F〉〉dF〈x〉

+(T〈G

〉−T〈F

〉) ∫∂∂θ ψ〈x

,T〈F〉〉dF〈x〉

≈(1

−ε)(T〈G

〉−T〈F

〉) ∫∂∂θ ψ〈x

,T〈F〉〉dF〈x〉

+ε (ψ〈x

,T〈F〉〉

+(T〈G

〉−T〈F

〉)∂∂θ ψ〈x

,T〈F〉〉 )

=0

T〈G〉−T〈F

〉≈−ε (ψ〈x

,T〈F〉〉

+(T〈G

〉−T〈F

〉)∂∂θ ψ〈x

,T〈F〉〉 )

(1−ε) ∫

∂∂θ ψ〈x,T〈F

〉〉dF〈x〉

• 834.3i

Linearization.

T〈F̂n 〉

≈T〈F

〉+

1n ∑ni=

1IF〈X

i ;T,F

T〈G〉

≈T〈F

〉+

∫IF〈x

;T,F

〉d(G

−F)

• 84

4.4

Asymptotic

Distrib

utio

n

aCentral

limittheorem

.The

distributionof

thestandardized

average

Zn=Xn −

µ

σ/ √

n

approachesthe

standardnorm

aldistribution

asngrow

s.

P〈Z

n ≤z〉

n→∞

−→Φ〈z〉

(Φ:cum

.distribution

functionof

thestandard

normal)

...ifXiare

independent

andequally

distributedand

thevariance

isfinite.

bExpressed

differently:

Xn≈∼

N〈µ,σ

2/n〉

• 85

01

20

12

01

20

12

01

2

0.0 0.5 1.0 1.5 2.0

−2

02

−2

02

−2

02

−2

02

−2

02

0.0 0.5 1.0

• 864.4c

Central

LimitTheorem

forfunctionals.

T〈X1 ,Xi ,...,X

n 〉≈

T〈F〉+

1n ∑ni=

1IF〈X

i ;T,F

〉≈

∼N

〈T〈F〉,v

/n〉

v=

var〈IF〈X

;F〉〉

Wealw

ayshave

E〈IF〈X

;T,F

〉 〉=

0.

−→v=

E〈IF〈X

;T,F

〉2 〉.

dAsym

ptoticvariance

forM

estimators.

v=

1c 2 ∫ψ〈x

,θ〉2dF〈x〉

.

Maxim

um-Likelihood

estimator:

Integral=c

−→v=

1/c,

c=

∫s〈x

,θ〉2dFθ 〈x〉

.

Fisher-Inform

ation.

• 874.4e

Exam

pleHuber

estimator

forstandard

normaldistribution

v=

∫ψ〈x〉

2dΦ〈x〉

(∫ψ

′〈x〉dΦ〈x〉 )

2

k

as. Varianz

0.00.2

0.40.6

0.81.0

1.21.4

1.61.8

2.0

0.0 1.0 1.2 1.4 1.6

• 884.4f

Exam

pleLogistic

distribution

Maxim

um-Likelihood

estimator

forL〈µ

=0,σ

=1〉

∫(ez/2−

e −z/2)

2

(ez/2+e −

z/2)

4dz=

0.333

=1for

σ=

1.732.

gMaxim

umlikelihood

estimator

isthe

best

est.asym

ptotically.

vT≥

1/cfor

allFisher

consistentestim

ators.

• 894.4h

Tests

andconfi

denceintervals.

•Standarized

teststatistic

T=

(θ̂−θ0 ) / √

v/n≈∼

Φ

•Confi

denceinterval

forθ:θ̂±

1.96∗ √

v/n.

iSum

sof

squares−→

chisquareddistribution!.

T=

∑k (N

k −µk )

2/µk,

µk=

E〈Nk 〉

T≈

∼χ2ν

tothe

Central

LimitTheorem

?

• 90

4.5

Likelih

oodRatio

Tests

aBasic

Idea.

The

plausibilityof

amodel

inthe

lightof

data

ismeasured

bythe

likelihood.

Anullhyp

othesisusually

restrictsaparam

eterto

aspecifi

cvalue

(orone

sideof

agiven

value,for

one-sidedcase).

The

restrictiondeteriorates

the“fit”of

thedata

tothe

model.

The

likelihooddecreases.

Ifitdecreases

toomuch,

thenull

hypothesis

must

berejected.

• 91

−→Test

statstic:

•likelihood

ratio,or

•log

likelihooddiff

erence,or

•deviance

–diff

erenceof

deviancevalues

betw

een“full

model”

(freeparam

eter)and

“reducedmodel”

(parameter

fixed

at“nullvalue”)

• 924.5b

Exam

ple:sim

pleregression,

scaleknow

n.Log

likelihood:

c−1

2 ∑i (y

i −β1 x

i −β0 )

2

Maxim

umlikelihood

=Least

Squares.

Null

hypothesis

β1=

0,β0unsp

ecified.

−→Log

likelihooddiff

erence

c−1

2 ∑i (y

i −β1 x

i −β0 )

2−(c−

12σ

2 ∑i (y

i −β1 x

i −β0 )

2 )

=1

2 (∑i (y

i −β1 x

i −β0 )

2−∑

i (yi −

β0 )

2 )

=diff

erenceof

Sum

sof

Squares

(totalminus

residual)

=Sum

sof

Squares

ofModel

...divided

by2σ2.

• 93

σ2unknow

n−→

estimate

fromresiduals!

multiply

by2

−→Difference

ofdeviances

−→F-Test.

(Tobeprecise,

σisestim

atedunder

thealternative,

notunder

thenull

hypothesis...)

Also

applicablefor

multiple

regression,

more

thanone

coefficient

tobetested.

• 944.5c

Sam

eprop

ertiesfor

deviancediff

erences–asym

ptotically–

ingeneral

(underconditions):

Under

thenull

hypothesis,

thedeviance

difference

(=twice

thelog

likelihoodratio)

isdistributed

asymptotically

∼χ2df .

Degrees

offreedom

df=

number

ofparam

etersthat

are

fixed

bynull

hypothesis.

Only

appliesto“nested”

models:

The

reducedmodel

isobtained

byrestricting

thefull

model.

• 95

4.6

Robust

Estim

ators

aInfl

uenceFunction

shouldbebounded!

bGross

errorsensitivity.

γ∗〈T

,F〉=

supx 〈|IF〈x

;T,F

〉| 〉.

cExam

ples:

•Median:|IF〈x

;T,F

〉|=

1 /(2f〈m

ed〈F

〉〉)=γ∗

•γ∗ 〈X,F

〉=

∞.

• 964.6d

Maxim

alBias.

Gross

Error

distribution(1

−ε̃)F〈.〉

+ε̃H

〈.〉−→

Gross

Error“neighb

orhood””U〈F,ε〉

b〈ε;T,F

〉=

sup

G∈U〈F

,ε〉 〈G〈T

〉〉.

eBreakdow

npoint

ε ∗〈T,F

〉:minim

aneighb

orhoodaround

F,

forwhich

Tbreaks

down,

ε ∗〈T,F

〉=

infε 〈b〈ε

;T,F

〉=

∞〉,

• 974.6f

Empirical

breakdownpoint.

Sam

plex1 ,x

2 ,..,xn .

Plus

qarbitrary

x∗1 ,x

∗2 ,...,x∗q .

T〈x1 ,x

2 ,...,xn ,x

∗1 ,x∗2 ,...,x

∗q 〉−T〈x

1 ,x2 ,...,x

n 〉infi

nite?

−→Prop

ortionq/(n

+q)such

that...

remains

bounded.

Usually

independent

ofx1 ,x

2 ,..,xn .

10%trim

med

mean

−→10%

gIm

portance.

hCom

pareto

abridge:

canvibrate

underlow

wind

–more

orless

strongly–

andbreak

downunder

astorm

–more

orless

violent

• 984.6i

Ifwewant

minim

alGross

Error

Sensitivity

−→Median

−→asym

ptoticvariance

1.571,γ∗=

1.253.

Com

pareto

insurance:Prem

iumfor

coveringrisk.

jOptim

alcom

promise:

Ham

pel

For

normaldistribution

with

fixed

variance:Huber

estimators.

“Prem

ium”of

5%more

variancefor

k=

1.345.

γ∗=

1.637.The

comprom

isepays

off!

• 99

4.7

Outlo

ok

aBlock

onrobust

regression

bMultidim

ensionalestim

ators.−→

Multivariate

statistics

cSmall

Sam

pleAsym

ptotics.