63
4Asymptotic
sandRobustn
ess
4.1
Consiste
ncy
bRelative
frequencyRn→P〈A
〉=π=
E〈Rn 〉.
More
precisely,
limn→
∞P{|R
n −π|>
ε}=
0
Law
ofLarge
Num
bers.
JakobBernoulli(published
posth.1713)
X∼
B〈n,π〉
−→E〈X
/n〉
=E〈X
〉/n=nπ/n=π
var〈X
/n〉
=var〈X
〉/n2=nπ(1
−π)/n2=π(1
−π)/n
→0
!=⇒
X/n→π.
64
01
01
01
01
01
01
654.1c
General
case:X
1 ,X
2 ,...,X
n...
indep.−→
P{|X
n −µ|>ε}
n→∞
−→0
foreach
ε>
0,
01
20
12
01
20
12
01
2
0.0 0.5 1.0 1.5 2.0
dEmpirical
cumulative
distributionfunction
−→theoretical,
Fn 〈x〉
=1n#〈i
:Xi ≤
x〉n→
∞−→
F〈x〉
=P〈X
≤x〉
.
eConsistency
ofthe
characteristicvalues.
γ→γ.
664.1f
Functionals.T〈F
〉theoretical
characteristicvalue,
T〈Fn 〉
empirical.
T⟨Fn ⟩
n→∞
−→T〈F
〉
*Mathem
atica
lconditio
ns,“
regularity
assu
mptio
ns”.
gIntegrals
E〈X〉=
∫xdF〈x〉
For
continuousrandom
variables:
E〈X〉=
∫xdF〈x〉
=
∫xf〈x〉
dx
For
discreteones:
E〈X〉=
∫xdF〈x〉
=∑
xx〈X
=x〉
For
empirical
distributionfunctions:
∫xdFn 〈x〉
=∑
i xi1n=X
674.1h
Let
Fθ
beaparam
etricfam
ilywith
par.θ=
[θ1 ,...,θ
p ]
Sam
pleX
1 ,...,Xn ,Xi ∼
Fθ
Estim
atorof
theparam
eters:
Afunction
ofthe
sample,
Tk 〈F
n 〉designed
toestim
atepar.
θk
−→If
F=Fθ
thenwewant
Tk 〈F
θ 〉=θk
Fisher
consistency.
jThe
Location
Model
X1 ,X
2 ,...,Xn,
Xi ∼
N⟨µ,σ
20 ⟩,
independent
estimator:
X
–or
median?
med〈X
1 ,X2 ,...,X
n 〉→med ⟨N⟨µ,σ
20 ⟩⟩=µ
68
4.2
Maxim
um
likelih
oodandM-estim
ators
aGiven
aparam
etricfam
ilywith
densityfθ 〈x〉
write
asf〈x
,θ〉Given
theobservation
x,f〈x
,θ〉=
likelihoodofθ.
Sam
efor
probabilitiesPθ 〈X
=x〉,
alsowritten
asf〈x
,θ〉.
Maxim
um-Likelihood
estimator:
maxim
izethe
likelihood!
θ=
arg
min〈f〈x
,θ〉〉
694.2b
Sam
pleX
1 ,...,Xn
−→joint
density∏i f〈x
i ,θ〉.Maxim
izethis
or
L〈x1 ,...,x
n;θ〉
=∑
i log〈f〈x
i ,θ〉〉or
minim
izeD〈θ〉
=−2∑
i log〈f〈x
i ,θ〉〉=
∑i ρ〈x
i ,θ〉D:Deviance.
ρ-function:“deviation”of
theobservation
xifrom
model〈θ〉.
For
normaldistribution
with
givenvariance:
ρ〈x,µ〉
=((x
−µ)/σ )
2+c
−→Least
Squares
704.2c
Exam
plelogistic
distribution.
fµ,σ 〈x〉
=1
(ez/2+e −
z/2 )
2,z=x−µ
σ
location-scalefam
ily.Likelihood
−2log ⟨ez/2+e −
z/2 ⟩.
ρ〈x;µ,σ〉
=4log ⟨e(x−
µ)/
2σ+e −
(x−µ)/
2σ ⟩
dForm
derivativefor
θkand
setto
0∂L
∂θk 〈x
1 ,...,xn;θ〉
=∑
ni=1sk 〈x
i ;θ〉,
” Likelihood
scores”
sk 〈x
;θ〉
=∂∂θk
log〈f〈x
i ,θ〉〉=
−12
∂∂θk
ρ〈xi ,θ〉
.
∑ni=
1s ⟨xi ;θ ⟩
=0.
Maxim
um-Likelihood
estimator:
Solve
forθ !
714.2e
Logistic
distribution.Location-scale:
z=
(x−µ)/σ
−→∂z/∂µ=
−1/σ
and∂z/∂σ=
−(x
−µ)/σ2=
−z/σ
sµ 〈x
;θ 〉
=1σ
ez/2−
e −z/2
ez/2+e −
z/2
sσ 〈x
;θ 〉
=zsµ 〈x
;θ〉
fMore
general:
θ=
argmin
θ
∑ni=
1ρ〈X
i ,θ〉or
θ=
solutionof
∑ni=
1ψ 〈X
i ,θ〉=
0
Mestim
ator.
Use
scoresfunction
ofthe
logisticdistribution
evenif
youthink
(orhop
e)that
theobservations
followthe
normaldistribution.
724.2g
Mestim
atorsas
functionals.
Tρ 〈F
〉=
argmin
θ
∫ρ〈x
,θ〉dF〈x〉
or
Tψ 〈F
〉=
solutionof
∫ψ〈x
,θ〉dF〈x〉
=0.
Estim
ator:Solution
of∫ψ〈x
,θ〉dFn 〈x〉
=1n ∑
i ψ〈xi ,θ〉
=0
T〈Fn 〉→
T〈F〉.
Testim
atesθifT〈F
θ 〉=θ
=⇒
∫ψ〈x
,θ〉dFθ 〈x〉
=0
734.2h
Exam
ple:Location
model
andHuber
estimator
f〈x,µ〉
=f〈x
−µ,0〉
,f〈z
,0〉=
1√2πe −
z2/2.
log-lik.: −12log〈2
π〉−(x
−µ)2/
2,
scoress〈x
,µ〉=x−µ
−→estim
ator∑
i (xi −
µ)=
0−→
µ=
1n ∑i x
i
−→M
estimator
with
ψ〈x,µ〉
=x−µ
Median
isan
Mestim
ator,ψ〈x
,µ〉=
{−1
x−µ<
01
x−µ>
0.
Huber
estimator:
Mestim
atorwith
ψ〈x,µ〉
=
{x−µ
fur|x−µ|≤
k−k
furx−µ<
−k
kfur
x−µ>k
ktuning
constant.
74
−5
−4
−3
−2
−1
01
23
45
−3 −2 −1 0 1 2 3
z
ψ(z, µ)M
ittelM
edianH
uberB
iweight
754.2i
Maxim
um-Likelihood
estimator
∫s〈x
,θ〉f〈x
,θ〉dx=
0
Fisher-consistent
estimators
forthe
parameter(s)
ofthe
family!
Proof:
∂∂θlog〈f〈x
,θ 〉〉=
1∂∂θ f〈x
,θ〉∂∂θf〈x
,θ 〉=
s〈x,θ〉
=⇒
∂∂θf〈x
,θ 〉=
s〈x,θ〉
f〈x,θ〉
∫f〈x
,θ〉dx
=1
=⇒
∫∂∂θf〈x
,θ〉dx=
0
76
4.3
InfluenceFunctio
n
bEmprirical
Influence
function.Exam
pleof
weights
ofpigs
107
108
111
101
97
113
109
105
116
122.
Additional
observationx0
−→
(107+
108+...
+122+x0 )/
11
=nx+x0
n+
1=
(n+
1)x
−x
n+
1+
x0
n+
1=x+
1n+1(x
0 −x)
=108.9
+(x
0 −108.9)/
11.
Median
(108+
109)/
2=
108.5
−→108,ifx0 ≤
108
−→109,ifx0 ≥
109
77
9095
100105
110115
120125
130
107 108 109 110 111
x, x0
Schätzwert
arithm. M
ittel10%
−gestutztes M
.M
edian
784.3c
Empirical
Influence
Function
orSensitivity
Curve
SC〈x
0 ;T,x
1 ,...,xn 〉
=n(T〈x
1 ,...,xn ,x
0 〉−T〈x
1 ,...,xn 〉 )
dGross
Error
Model.
(1−ε)F〈.〉
+ε∆
x0 〈.〉
Model
for“wrong”
observationor“gross
error”
x−
4−
20
24
0.0 0.2 0.4 0.6 0.8 1.0
x0
ε
G(x)
(i)
x−
4−
20
24
0.0 0.2 0.4 0.6 0.8 1.0
x0 ε
G(x)
(ii)
G(x)F(x)∆(x)
More
general:G〈x〉
=(1
−ε)F〈x〉
+εH
〈x〉
794.3e
Influence
Function.
IF〈x;T,F
〉=
limε→
0
T〈(1−ε)F
+ε∆
x 〉−T〈F
〉ε
.
fIF
forX:
E〈(1−ε)F
+ε∆
x 〉=
(1−ε)E〈F
〉+εE〈∆
x 〉=
(1−ε)E〈F
〉+εx
IF ⟨x;X,F
⟩=
x−
E〈F〉
804.3g*
Median
med〈F
〉=F
−1〈0
.5〉.Ifx>t+ε=F
−1〈0
.5/(1
−ε)〉
thenmed〈(1
−ε)F
+ε∆
x 〉=t+ε
Ifx>t −ε
=F
−1〈1
−0.5/(1
−ε)〉
then...
=t −ε
ddεF
−1〈0
.5/(1
−ε)〉
=1
f〈F−1〈0
.5/(1
−ε)〉〉 ·
0.5
·(−
1)
(1−ε)2
·(−1)
→1/(2f〈µ〉),
µ=F
−1〈0
.5〉
IF ⟨x;X,F
⟩=
{−1/(2f〈µ〉)
furx<
med〈F
〉1/(2f〈µ〉)
furx>
med〈F
〉isnot
continuousat
themedian,
butjum
ps
from−1/(2f〈µ〉)
to1/(2f〈µ〉).
814.3h
Influence
Function
forM
estimators.
IF〈x;F〉=
1cψ〈x
,θ〉mit
c=
−∫
∂∂θψ〈x
,θ〉f〈x
,θ〉dx.
Influence
functionisprop
ortionalto
ψ.
IfTestim
atesθ,
thatis, ∫
s 〈x,θ〉
f〈x,θ〉
dx=
0
c=
∫ψ〈x
,θ〉s〈x
,θ〉f〈x
,θ〉dx.
Special
case:Maxim
um-Likelihood
estimator
c=
−∫
∂∂θs〈X
,θ〉f〈x
,θ〉dx=
∫s〈X
,θ〉2f〈x
,θ〉dx.
82
*Proof:T〈G
〉forthedistrib
utio
nG
=(1
−ε)F
+ε∆
x:
∫ψ〈x
,T〈G〉〉dG〈x〉
=(1
−ε) ∫ψ〈x
,T〈G〉〉dF〈x〉
+εψ〈x
,T〈G〉〉
ψ〈x,T〈G
〉〉≈
ψ〈x,T〈F
〉〉+
∂∂θ ψ〈x
,T〈F〉〉
(T〈G〉−
T〈F〉)
∫...
≈∫ψ〈x
,T〈F〉〉dF〈x〉
+(T〈G
〉−T〈F
〉) ∫∂∂θ ψ〈x
,T〈F〉〉dF〈x〉
≈(1
−ε)(T〈G
〉−T〈F
〉) ∫∂∂θ ψ〈x
,T〈F〉〉dF〈x〉
+ε (ψ〈x
,T〈F〉〉
+(T〈G
〉−T〈F
〉)∂∂θ ψ〈x
,T〈F〉〉 )
=0
T〈G〉−T〈F
〉≈−ε (ψ〈x
,T〈F〉〉
+(T〈G
〉−T〈F
〉)∂∂θ ψ〈x
,T〈F〉〉 )
(1−ε) ∫
∂∂θ ψ〈x
,T〈F〉〉dF〈x〉
834.3i
Linearization.
T⟨Fn ⟩
≈T〈F
〉+
1n ∑ni=
1IF〈X
i ;T,F
〉
T〈G〉
≈T〈F
〉+
∫IF〈x
;T,F
〉d(G
−F)
84
4.4
Asymptotic
Distrib
utio
n
aCentral
limittheorem
.The
distributionof
thestandardized
average
Zn=Xn −
µ
σ/ √
n
approachesthe
standardnorm
aldistribution
asngrow
s.
P〈Z
n ≤z〉
n→∞
−→Φ〈z〉
(Φ:cum
.distribution
functionof
thestandard
normal)
...ifXiare
independent
andequally
distributedand
thevariance
isfinite.
bExpressed
differently:
Xn≈∼
N〈µ,σ
2/n〉
85
01
20
12
01
20
12
01
2
0.0 0.5 1.0 1.5 2.0
−2
02
−2
02
−2
02
−2
02
−2
02
0.0 0.5 1.0
864.4c
Central
LimitTheorem
forfunctionals.
T〈X1 ,Xi ,...,X
n 〉≈
T〈F〉+
1n ∑ni=
1IF〈X
i ;T,F
〉≈
∼N
〈T〈F〉,v
/n〉
v=
var〈IF〈X
;F〉〉
Wealw
ayshave
E⟨IF〈X
;T,F
〉 ⟩=
0.
−→v=
E⟨IF〈X
;T,F
〉2 ⟩.
dAsym
ptoticvariance
forM
estimators.
v=
1c2 ∫
ψ〈x,θ〉
2dF〈x〉
.
Maxim
um-Likelihood
estimator:
Integral=c
−→v=
1/c,
c=
∫s〈x
,θ〉2dFθ 〈x〉
.
Fisher-Inform
ation.
874.4e
Exam
pleHuber
estimator
forstandard
normaldistribution
v=
∫ψ〈x〉
2dΦ〈x〉
(∫ψ
′〈x〉dΦ〈x〉 )
2
k
as. Varianz
0.00.2
0.40.6
0.81.0
1.21.4
1.61.8
2.0
0.0 1.0 1.2 1.4 1.6
884.4f
Exam
pleLogistic
distribution
Maxim
um-Likelihood
estimator
forL〈µ
=0,σ
=1〉
∫(ez/2−
e −z/2)
2
(ez/2+e −
z/2)
4dz=
0.333
=1for
σ=
1.732.
gMaxim
umlikelihood
estimator
isthe
best
est.asym
ptotically.
vT≥
1/cfor
allFisher
consistentestim
ators.
894.4h
Tests
andconfi
denceintervals.
•Standarized
teststatistic
T=
(θ−θ0 ) / √
v/n≈∼
Φ
•Confi
denceinterval
forθ:θ±
1.96∗ √
v/n.
iSum
sof
squares−→
chisquareddistribution!.
T=
∑k (N
k −µk )
2/µk,
µk=
E〈Nk 〉
T≈
∼χ2ν
Contradiction
tothe
Central
LimitTheorem
?
90
4.5
Likelih
oodRatio
Tests
aBasic
Idea.
The
plausibilityof
amodel
inthe
lightof
data
ismeasured
bythe
likelihood.
Anullhyp
othesisusually
restrictsaparam
eterto
aspecifi
cvalue
(orone
sideof
agiven
value,for
one-sidedcase).
The
restrictiondeteriorates
the“fit”of
thedata
tothe
model.
The
likelihooddecreases.
Ifitdecreases
toomuch,
thenull
hypothesis
must
berejected.
91
−→Test
statstic:
•likelihood
ratio,or
•log
likelihooddiff
erence,or
•deviance
–diff
erenceof
deviancevalues
betw
een“full
model”
(freeparam
eter)and
“reducedmodel”
(parameter
fixed
at“nullvalue”)
924.5b
Exam
ple:sim
pleregression,
scaleknow
n.Log
likelihood:
c−1
2σ
2 ∑i (y
i −β1 x
i −β0 )
2
Maxim
umlikelihood
=Least
Squares.
Null
hypothesis
β1=
0,β0unsp
ecified.
−→Log
likelihooddiff
erence
c−1
2σ
2 ∑i (y
i −β1 x
i −β0 )
2−(c−
12σ
2 ∑i (y
i −β1 x
i −β0 )
2 )
=1
2σ
2 (∑i (y
i −β1 x
i −β0 )
2−∑
i (yi −
β0 )
2 )
=diff
erenceof
Sum
sof
Squares
(totalminus
residual)
=Sum
sof
Squares
ofModel
...divided
by2σ2.
93
σ2unknow
n−→
estimate
fromresiduals!
multiply
by2
−→Difference
ofdeviances
−→F-Test.
(Tobeprecise,
σisestim
atedunder
thealternative,
notunder
thenull
hypothesis...)
Also
applicablefor
multiple
regression,
more
thanone
coefficient
tobetested.
944.5c
Sam
eprop
ertiesfor
deviancediff
erences–asym
ptotically–
ingeneral
(underconditions):
Under
thenull
hypothesis,
thedeviance
difference
(=twice
thelog
likelihoodratio)
isdistributed
asymptotically
∼χ2df .
Degrees
offreedom
df=
number
ofparam
etersthat
are
fixed
bynull
hypothesis.
Only
appliesto“nested”
models:
The
reducedmodel
isobtained
byrestricting
thefull
model.
95
4.6
Robust
Estim
ators
aInfl
uenceFunction
shouldbebounded!
bGross
errorsensitivity.
γ∗〈T
,F〉=
supx
⟨|IF〈x;T,F
〉| ⟩.
cExam
ples:
•Median:|IF〈x
;T,F
〉|=
1 /(2f〈m
ed〈F
〉〉)=γ∗
•γ∗ ⟨X,F
⟩=
∞.
964.6d
Maxim
alBias.
Gross
Error
distribution(1
−ε)F〈.〉
+εH
〈.〉−→
Gross
Error“neighb
orhood””U〈F,ε〉
b〈ε;T,F
〉=
sup
G∈U〈F
,ε〉 〈G〈T
〉〉.
eBreakdow
npoint
ε ∗〈T,F
〉:minim
al“radius”εof
aneighb
orhoodaround
F,
forwhich
Tbreaks
down,
ε ∗〈T,F
〉=
infε 〈b〈ε
;T,F
〉=
∞〉,
974.6f
Empirical
breakdownpoint.
Sam
plex1 ,x
2 ,..,xn .
Plus
qarbitrary
x∗1 ,x
∗2 ,...,x∗q .
T⟨x1 ,x
2 ,...,xn ,x
∗1 ,x∗2 ,...,x
∗q ⟩−T〈x
1 ,x2 ,...,x
n 〉infi
nite?
−→Prop
ortionq/(n
+q)such
that...
remains
bounded.
Usually
independent
ofx1 ,x
2 ,..,xn .
10%trim
med
mean
−→10%
gIm
portance.
hCom
pareto
abridge:
canvibrate
underlow
wind
–more
orless
strongly–
andbreak
downunder
astorm
–more
orless
violent
984.6i
Ifwewant
minim
alGross
Error
Sensitivity
−→Median
−→asym
ptoticvariance
1.571,γ∗=
1.253.
Com
pareto
insurance:Prem
iumfor
coveringrisk.
jOptim
alcom
promise:
Ham
pel
For
normaldistribution
with
fixed
variance:Huber
estimators.
“Prem
ium”of
5%more
variancefor
k=
1.345.
γ∗=
1.637.The
comprom
isepays
off!
99
4.7
Outlo
ok
aBlock
onrobust
regression
bMultidim
ensionalestim
ators.−→
Multivariate
statistics
cSmall
Sam
pleAsym
ptotics.
Top Related