Consistency - stat.ethz.chstahel/courses/resampling/sl-asymp-e.pdf66 4.1 f Functionals. T h F i...

of 37/37
63 4 Asymptotics and Robustness 4.1 Consistency b Relative frequency R n P A= π = E〈R n . More precisely, lim n→∞ P {|R n π | } =0 Law of Large Numbers. Jakob Bernoulli (published posth. 1713) X ∼ B〈 n,π −→ E〈 X/n = E〈 X /n = nπ/n = π var X/n = var X /n 2 = (1 π )/n 2 = π (1 π )/n 0! = X/n π .
  • date post

    06-Nov-2019
  • Category

    Documents

  • view

    5
  • download

    0

Embed Size (px)

Transcript of Consistency - stat.ethz.chstahel/courses/resampling/sl-asymp-e.pdf66 4.1 f Functionals. T h F i...

  • 63

    4Asymptotic

    sandRobustn

    ess

    4.1

    Consiste

    ncy

    bRelative

    frequencyRn→P〈A

    〉=π=

    E〈Rn 〉.

    More

    precisely,

    limn→

    ∞P{|R

    n −π|>

    ε}=

    0

    Law

    ofLarge

    Num

    bers.

    JakobBernoulli(published

    posth.1713)

    X∼

    B〈n,π〉

    −→E〈X

    /n〉

    =E〈X

    〉/n=nπ/n=π

    var〈X

    /n〉

    =var〈X

    〉/n2=nπ(1

    −π)/n2=π(1

    −π)/n

    →0

    !=⇒

    X/n→π.

  • 64

    01

    01

    01

    01

    01

    01

  • 654.1c

    General

    case:X

    1 ,X

    2 ,...,X

    n...

    indep.−→

    P{|X

    n −µ|>ε}

    n→∞

    −→0

    foreach

    ε>

    0,

    01

    20

    12

    01

    20

    12

    01

    2

    0.0 0.5 1.0 1.5 2.0

    dEmpirical

    cumulative

    distributionfunction

    −→theoretical,

    F̂n 〈x〉

    =1n#〈i

    :Xi ≤

    x〉n→

    ∞−→

    F〈x〉

    =P〈X

    ≤x〉

    .

    eConsistency

    ofthe

    characteristicvalues.

    γ̂→γ.

  • 664.1f

    Functionals.T〈F

    〉theoretical

    characteristicvalue,

    T〈F̂n 〉

    empirical.

    T〈F̂n 〉

    n→∞

    −→T〈F

    *Mathem

    atica

    lconditio

    ns,“

    regularity

    assu

    mptio

    ns”.

    gIntegrals

    E〈X〉=

    ∫xdF〈x〉

    For

    continuousrandom

    variables:

    E〈X〉=

    ∫xdF〈x〉

    =

    ∫xf〈x〉

    dx

    For

    discreteones:

    E〈X〉=

    ∫xdF〈x〉

    =∑

    xx〈X

    =x〉

    For

    empirical

    distributionfunctions:

    ∫xdF̂n 〈x〉

    =∑

    i xi1n=X

  • 674.1h

    Let

    beaparam

    etricfam

    ilywith

    par.θ=

    [θ1 ,...,θ

    p ]

    Sam

    pleX

    1 ,...,Xn ,Xi ∼

    Estim

    atorof

    theparam

    eters:

    Afunction

    ofthe

    sample,

    Tk 〈F̂

    n 〉designed

    toestim

    atepar.

    θk

    −→If

    F=Fθ

    thenwewant

    Tk 〈F

    θ 〉=θk

    Fisher

    consistency.

    jThe

    Location

    Model

    X1 ,X

    2 ,...,Xn,

    Xi ∼

    N〈µ,σ

    20 〉,

    independent

    estimator:

    X

    –or

    median?

    med〈X

    1 ,X2 ,...,X

    n 〉→med 〈N〈µ,σ

    20 〉〉=µ

  • 68

    4.2

    Maxim

    um

    likelih

    oodandM-estim

    ators

    aGiven

    aparam

    etricfam

    ilywith

    densityfθ 〈x〉

    write

    asf〈x

    ,θ〉Given

    theobservation

    x,f〈x

    ,θ〉=

    likelihoodofθ.

    Sam

    efor

    probabilitiesPθ 〈X

    =x〉,

    alsowritten

    asf〈x

    ,θ〉.

    Maxim

    um-Likelihood

    estimator:

    maxim

    izethe

    likelihood!

    θ̂=

    arg

    min〈f〈x

    ,θ〉〉

  • 694.2b

    Sam

    pleX

    1 ,...,Xn

    −→joint

    density∏i f〈x

    i ,θ〉.Maxim

    izethis

    or

    L〈x1 ,...,x

    n;θ〉

    =∑

    i log〈f〈x

    i ,θ〉〉or

    minim

    izeD〈θ〉=

    −2∑

    i log〈f〈x

    i ,θ〉〉=

    ∑i ρ〈x

    i ,θ〉D:Deviance.

    ρ-function:“deviation”of

    theobservation

    xifrom

    model〈θ〉.

    For

    normaldistribution

    with

    givenvariance:

    ρ〈x,µ〉

    =((x

    −µ)/σ )

    2+c

    −→Least

    Squares

  • 704.2c

    Exam

    plelogistic

    distribution.

    fµ,σ 〈x〉

    =1

    (ez/2+e −

    z/2 )

    2,z=x−µ

    σ

    location-scalefam

    ily.Likelihood

    −2log 〈ez/2+e −

    z/2 〉.

    ρ〈x;µ,σ〉

    =4log 〈e(x−

    µ)/

    2σ+e −

    (x−µ)/

    2σ 〉

    dForm

    derivativefor

    θkand

    setto

    0∂L

    ∂θk 〈x

    1 ,...,xn;θ〉

    =∑

    ni=1sk 〈x

    i ;θ〉,

    ” Likelihood

    scores”

    sk 〈x

    ;θ〉

    =∂∂θk

    log〈f〈x

    i ,θ〉〉=

    −12

    ∂∂θk

    ρ〈xi ,θ〉

    .

    ∑ni=

    1s 〈xi ;θ̂ 〉

    =0.

    Maxim

    um-Likelihood

    estimator:

    Solve

    forθ̂ !

  • 714.2e

    Logistic

    distribution.Location-scale:

    z=

    (x−µ)/σ

    −→∂z/∂µ=

    −1/σ

    and∂z/∂σ=

    −(x

    −µ)/σ2=

    −z/σ

    sµ 〈x

    ;θ 〉

    =1σ

    ez/2−

    e −z/2

    ez/2+e −

    z/2

    sσ 〈x

    ;θ 〉

    =zsµ 〈x

    ;θ〉

    fMore

    general:

    θ̂=

    argmin

    θ

    ∑ni=

    1ρ〈X

    i ,θ〉or

    θ̂=

    solutionof

    ∑ni=

    1ψ 〈X

    i ,θ〉=

    0

    Mestim

    ator.

    Use

    scoresfunction

    ofthe

    logisticdistribution

    evenif

    youthink

    (orhop

    e)that

    theobservations

    followthe

    normaldistribution.

  • 724.2g

    Mestim

    atorsas

    functionals.

    Tρ 〈F

    〉=

    argmin

    θ

    ∫ρ〈x

    ,θ〉dF〈x〉

    or

    Tψ 〈F

    〉=

    solutionof

    ∫ψ〈x

    ,θ〉dF〈x〉

    =0.

    Estim

    ator:Solution

    of∫ψ〈x

    ,θ〉dF̂n 〈x〉

    =1n ∑

    i ψ〈xi ,θ〉

    =0

    T〈Fn 〉→

    T〈F〉.

    Testim

    atesθifT〈F

    θ 〉=θ

    =⇒

    ∫ψ〈x

    ,θ〉dFθ 〈x〉

    =0

  • 734.2h

    Exam

    ple:Location

    model

    andHuber

    estimator

    f〈x,µ〉

    =f〈x

    −µ,0〉

    ,f〈z

    ,0〉=

    1√2πe −

    z2/2.

    log-lik.: −12log〈2

    π〉−(x

    −µ)2/

    2,

    scoress〈x

    ,µ〉=x−µ

    −→estim

    ator∑

    i (xi −

    µ̂)=

    0−→

    µ̂=

    1n ∑i x

    i

    −→M

    estimator

    with

    ψ〈x,µ〉

    =x−µ

    Median

    isan

    Mestim

    ator,ψ〈x

    ,µ〉=

    {−1

    x−µ<

    01

    x−µ>

    0.

    Huber

    estimator:

    Mestim

    atorwith

    ψ〈x,µ〉

    =

    {x−µ

    für|x−µ|≤

    k−k

    fürx−µ<

    −k

    kfür

    x−µ>k

    ktuning

    constant.

  • 74

    −5

    −4

    −3

    −2

    −1

    01

    23

    45

    −3 −2 −1 0 1 2 3

    z

    ψ(z, µ)M

    ittelM

    edianH

    uberB

    iweight

  • 754.2i

    Maxim

    um-Likelihood

    estimator

    ∫s〈x

    ,θ〉f〈x

    ,θ〉dx=

    0

    Fisher-consistent

    estimators

    forthe

    parameter(s)

    ofthe

    family!

    Proof:

    ∂∂θlog〈f〈x

    ,θ 〉〉=

    1∂∂θ f〈x

    ,θ〉∂∂θf〈x

    ,θ 〉=

    s〈x,θ〉

    =⇒

    ∂∂θf〈x

    ,θ 〉=

    s〈x,θ〉

    f〈x,θ〉

    ∫f〈x

    ,θ〉dx

    =1

    =⇒

    ∫∂∂θf〈x

    ,θ〉dx=

    0

  • 76

    4.3

    InfluenceFunctio

    n

    bEmprirical

    Influence

    function.Exam

    pleof

    weights

    ofpigs

    107

    108

    111

    101

    97

    113

    109

    105

    116

    122.

    Additional

    observationx0

    −→

    (107+

    108+...

    +122+x0 )/

    11

    =nx+x0

    n+

    1=

    (n+

    1)x

    −x

    n+

    1+

    x0

    n+

    1=x+

    1n+1(x

    0 −x)

    =108.9

    +(x

    0 −108.9)/

    11.

    Median

    (108+

    109)/

    2=

    108.5

    −→108,ifx0 ≤

    108

    −→109,ifx0 ≥

    109

  • 77

    9095

    100105

    110115

    120125

    130

    107 108 109 110 111

    x, x0

    Schätzwert

    arithm. M

    ittel10%

    −gestutztes M

    .M

    edian

  • 784.3c

    Empirical

    Influence

    Function

    orSensitivity

    Curve

    SC〈x

    0 ;T,x

    1 ,...,xn 〉

    =n(T〈x

    1 ,...,xn ,x

    0 〉−T〈x

    1 ,...,xn 〉 )

    dGross

    Error

    Model.

    (1−ε)F〈.〉

    +ε∆

    x0 〈.〉

    Model

    for“wrong”

    observationor“gross

    error”

    x−

    4−

    20

    24

    0.0 0.2 0.4 0.6 0.8 1.0

    x0

    ε

    G(x)

    (i)

    x−

    4−

    20

    24

    0.0 0.2 0.4 0.6 0.8 1.0

    x0 ε

    G(x)

    (ii)

    G(x)F(x)∆(x)

    More

    general:G〈x〉

    =(1

    −ε)F〈x〉

    +εH

    〈x〉

  • 794.3e

    Influence

    Function.

    IF〈x;T,F

    〉=

    limε→

    0

    T〈(1−ε)F

    +ε∆

    x 〉−T〈F

    〉ε

    .

    fIF

    forX:

    E〈(1−ε)F

    +ε∆

    x 〉=

    (1−ε)E〈F

    〉+εE〈∆

    x 〉=

    (1−ε)E〈F

    〉+εx

    IF 〈x;X,F

    〉=

    x−

    E〈F〉

  • 804.3g*

    Median

    med〈F

    〉=F

    −1〈0

    .5〉.Ifx>t+ε=F

    −1〈0

    .5/(1

    −ε)〉

    thenmed〈(1

    −ε)F

    +ε∆

    x 〉=t+ε

    Ifx>t −ε

    =F

    −1〈1

    −0.5/(1

    −ε)〉

    then...

    =t −ε

    ddεF

    −1〈0

    .5/(1

    −ε)〉

    =1

    f〈F−1〈0

    .5/(1

    −ε)〉〉 ·

    0.5

    ·(−

    1)

    (1−ε)2

    ·(−1)

    →1/(2f〈µ〉),

    µ=F

    −1〈0

    .5〉

    IF 〈x;X,F

    〉=

    {−1/(2f〈µ〉)

    fürx<

    med〈F

    〉1/(2f〈µ〉)

    fürx>

    med〈F

    〉isnot

    continuousat

    themedian,

    butjum

    ps

    from−1/(2f〈µ〉)

    to1/(2f〈µ〉).

  • 814.3h

    Influence

    Function

    forM

    estimators.

    IF〈x;F〉=

    1cψ〈x

    ,θ〉mit

    c=

    −∫

    ∂∂θψ〈x

    ,θ〉f〈x

    ,θ〉dx.

    Influence

    functionisprop

    ortionalto

    ψ.

    IfTestim

    atesθ,

    thatis, ∫

    s 〈x,θ〉

    f〈x,θ〉

    dx=

    0

    c=

    ∫ψ〈x

    ,θ〉s〈x

    ,θ〉f〈x

    ,θ〉dx.

    Special

    case:Maxim

    um-Likelihood

    estimator

    c=

    −∫

    ∂∂θs〈X

    ,θ〉f〈x

    ,θ〉dx=

    ∫s〈X

    ,θ〉2f〈x

    ,θ〉dx.

  • 82

    *Proof:T〈G

    〉forthedistrib

    utio

    nG

    =(1

    −ε)F

    +ε∆

    x:

    ∫ψ〈x

    ,T〈G〉〉dG〈x〉

    =(1

    −ε) ∫ψ〈x

    ,T〈G〉〉dF〈x〉

    +εψ〈x

    ,T〈G〉〉

    ψ〈x,T〈G

    〉〉≈

    ψ〈x,T〈F

    〉〉+

    ∂∂θ ψ〈x,T〈F

    〉〉(T〈G

    〉−T〈F

    〉)∫...

    ≈∫ψ〈x

    ,T〈F〉〉dF〈x〉

    +(T〈G

    〉−T〈F

    〉) ∫∂∂θ ψ〈x

    ,T〈F〉〉dF〈x〉

    ≈(1

    −ε)(T〈G

    〉−T〈F

    〉) ∫∂∂θ ψ〈x

    ,T〈F〉〉dF〈x〉

    +ε (ψ〈x

    ,T〈F〉〉

    +(T〈G

    〉−T〈F

    〉)∂∂θ ψ〈x

    ,T〈F〉〉 )

    =0

    T〈G〉−T〈F

    〉≈−ε (ψ〈x

    ,T〈F〉〉

    +(T〈G

    〉−T〈F

    〉)∂∂θ ψ〈x

    ,T〈F〉〉 )

    (1−ε) ∫

    ∂∂θ ψ〈x,T〈F

    〉〉dF〈x〉

  • 834.3i

    Linearization.

    T〈F̂n 〉

    ≈T〈F

    〉+

    1n ∑ni=

    1IF〈X

    i ;T,F

    T〈G〉

    ≈T〈F

    〉+

    ∫IF〈x

    ;T,F

    〉d(G

    −F)

  • 84

    4.4

    Asymptotic

    Distrib

    utio

    n

    aCentral

    limittheorem

    .The

    distributionof

    thestandardized

    average

    Zn=Xn −

    µ

    σ/ √

    n

    approachesthe

    standardnorm

    aldistribution

    asngrow

    s.

    P〈Z

    n ≤z〉

    n→∞

    −→Φ〈z〉

    (Φ:cum

    .distribution

    functionof

    thestandard

    normal)

    ...ifXiare

    independent

    andequally

    distributedand

    thevariance

    isfinite.

    bExpressed

    differently:

    Xn≈∼

    N〈µ,σ

    2/n〉

  • 85

    01

    20

    12

    01

    20

    12

    01

    2

    0.0 0.5 1.0 1.5 2.0

    −2

    02

    −2

    02

    −2

    02

    −2

    02

    −2

    02

    0.0 0.5 1.0

  • 864.4c

    Central

    LimitTheorem

    forfunctionals.

    T〈X1 ,Xi ,...,X

    n 〉≈

    T〈F〉+

    1n ∑ni=

    1IF〈X

    i ;T,F

    〉≈

    ∼N

    〈T〈F〉,v

    /n〉

    v=

    var〈IF〈X

    ;F〉〉

    Wealw

    ayshave

    E〈IF〈X

    ;T,F

    〉 〉=

    0.

    −→v=

    E〈IF〈X

    ;T,F

    〉2 〉.

    dAsym

    ptoticvariance

    forM

    estimators.

    v=

    1c 2 ∫ψ〈x

    ,θ〉2dF〈x〉

    .

    Maxim

    um-Likelihood

    estimator:

    Integral=c

    −→v=

    1/c,

    c=

    ∫s〈x

    ,θ〉2dFθ 〈x〉

    .

    Fisher-Inform

    ation.

  • 874.4e

    Exam

    pleHuber

    estimator

    forstandard

    normaldistribution

    v=

    ∫ψ〈x〉

    2dΦ〈x〉

    (∫ψ

    ′〈x〉dΦ〈x〉 )

    2

    k

    as. Varianz

    0.00.2

    0.40.6

    0.81.0

    1.21.4

    1.61.8

    2.0

    0.0 1.0 1.2 1.4 1.6

  • 884.4f

    Exam

    pleLogistic

    distribution

    Maxim

    um-Likelihood

    estimator

    forL〈µ

    =0,σ

    =1〉

    ∫(ez/2−

    e −z/2)

    2

    (ez/2+e −

    z/2)

    4dz=

    0.333

    =1for

    σ=

    1.732.

    gMaxim

    umlikelihood

    estimator

    isthe

    best

    est.asym

    ptotically.

    vT≥

    1/cfor

    allFisher

    consistentestim

    ators.

  • 894.4h

    Tests

    andconfi

    denceintervals.

    •Standarized

    teststatistic

    T=

    (θ̂−θ0 ) / √

    v/n≈∼

    Φ

    •Confi

    denceinterval

    forθ:θ̂±

    1.96∗ √

    v/n.

    iSum

    sof

    squares−→

    chisquareddistribution!.

    T=

    ∑k (N

    k −µk )

    2/µk,

    µk=

    E〈Nk 〉

    T≈

    ∼χ2ν

    Contradiction

    tothe

    Central

    LimitTheorem

    ?

  • 90

    4.5

    Likelih

    oodRatio

    Tests

    aBasic

    Idea.

    The

    plausibilityof

    amodel

    inthe

    lightof

    data

    ismeasured

    bythe

    likelihood.

    Anullhyp

    othesisusually

    restrictsaparam

    eterto

    aspecifi

    cvalue

    (orone

    sideof

    agiven

    value,for

    one-sidedcase).

    The

    restrictiondeteriorates

    the“fit”of

    thedata

    tothe

    model.

    The

    likelihooddecreases.

    Ifitdecreases

    toomuch,

    thenull

    hypothesis

    must

    berejected.

  • 91

    −→Test

    statstic:

    •likelihood

    ratio,or

    •log

    likelihooddiff

    erence,or

    •deviance

    –diff

    erenceof

    deviancevalues

    betw

    een“full

    model”

    (freeparam

    eter)and

    “reducedmodel”

    (parameter

    fixed

    at“nullvalue”)

  • 924.5b

    Exam

    ple:sim

    pleregression,

    scaleknow

    n.Log

    likelihood:

    c−1

    2 ∑i (y

    i −β1 x

    i −β0 )

    2

    Maxim

    umlikelihood

    =Least

    Squares.

    Null

    hypothesis

    β1=

    0,β0unsp

    ecified.

    −→Log

    likelihooddiff

    erence

    c−1

    2 ∑i (y

    i −β1 x

    i −β0 )

    2−(c−

    12σ

    2 ∑i (y

    i −β1 x

    i −β0 )

    2 )

    =1

    2 (∑i (y

    i −β1 x

    i −β0 )

    2−∑

    i (yi −

    β0 )

    2 )

    =diff

    erenceof

    Sum

    sof

    Squares

    (totalminus

    residual)

    =Sum

    sof

    Squares

    ofModel

    ...divided

    by2σ2.

  • 93

    σ2unknow

    n−→

    estimate

    fromresiduals!

    multiply

    by2

    −→Difference

    ofdeviances

    −→F-Test.

    (Tobeprecise,

    σisestim

    atedunder

    thealternative,

    notunder

    thenull

    hypothesis...)

    Also

    applicablefor

    multiple

    regression,

    more

    thanone

    coefficient

    tobetested.

  • 944.5c

    Sam

    eprop

    ertiesfor

    deviancediff

    erences–asym

    ptotically–

    ingeneral

    (underconditions):

    Under

    thenull

    hypothesis,

    thedeviance

    difference

    (=twice

    thelog

    likelihoodratio)

    isdistributed

    asymptotically

    ∼χ2df .

    Degrees

    offreedom

    df=

    number

    ofparam

    etersthat

    are

    fixed

    bynull

    hypothesis.

    Only

    appliesto“nested”

    models:

    The

    reducedmodel

    isobtained

    byrestricting

    thefull

    model.

  • 95

    4.6

    Robust

    Estim

    ators

    aInfl

    uenceFunction

    shouldbebounded!

    bGross

    errorsensitivity.

    γ∗〈T

    ,F〉=

    supx 〈|IF〈x

    ;T,F

    〉| 〉.

    cExam

    ples:

    •Median:|IF〈x

    ;T,F

    〉|=

    1 /(2f〈m

    ed〈F

    〉〉)=γ∗

    •γ∗ 〈X,F

    〉=

    ∞.

  • 964.6d

    Maxim

    alBias.

    Gross

    Error

    distribution(1

    −ε̃)F〈.〉

    +ε̃H

    〈.〉−→

    Gross

    Error“neighb

    orhood””U〈F,ε〉

    b〈ε;T,F

    〉=

    sup

    G∈U〈F

    ,ε〉 〈G〈T

    〉〉.

    eBreakdow

    npoint

    ε ∗〈T,F

    〉:minim

    al“radius”εof

    aneighb

    orhoodaround

    F,

    forwhich

    Tbreaks

    down,

    ε ∗〈T,F

    〉=

    infε 〈b〈ε

    ;T,F

    〉=

    ∞〉,

  • 974.6f

    Empirical

    breakdownpoint.

    Sam

    plex1 ,x

    2 ,..,xn .

    Plus

    qarbitrary

    x∗1 ,x

    ∗2 ,...,x∗q .

    T〈x1 ,x

    2 ,...,xn ,x

    ∗1 ,x∗2 ,...,x

    ∗q 〉−T〈x

    1 ,x2 ,...,x

    n 〉infi

    nite?

    −→Prop

    ortionq/(n

    +q)such

    that...

    remains

    bounded.

    Usually

    independent

    ofx1 ,x

    2 ,..,xn .

    10%trim

    med

    mean

    −→10%

    gIm

    portance.

    hCom

    pareto

    abridge:

    canvibrate

    underlow

    wind

    –more

    orless

    strongly–

    andbreak

    downunder

    astorm

    –more

    orless

    violent

  • 984.6i

    Ifwewant

    minim

    alGross

    Error

    Sensitivity

    −→Median

    −→asym

    ptoticvariance

    1.571,γ∗=

    1.253.

    Com

    pareto

    insurance:Prem

    iumfor

    coveringrisk.

    jOptim

    alcom

    promise:

    Ham

    pel

    For

    normaldistribution

    with

    fixed

    variance:Huber

    estimators.

    “Prem

    ium”of

    5%more

    variancefor

    k=

    1.345.

    γ∗=

    1.637.The

    comprom

    isepays

    off!

  • 99

    4.7

    Outlo

    ok

    aBlock

    onrobust

    regression

    bMultidim

    ensionalestim

    ators.−→

    Multivariate

    statistics

    cSmall

    Sam

    pleAsym

    ptotics.