The Automatic Construction of Bootstrap Confidence ckirby/brad/talks/2018Automatic-Constructioآ ...

download The Automatic Construction of Bootstrap Confidence ckirby/brad/talks/2018Automatic-Constructioآ  Bootstrap

of 31

  • date post

    18-Oct-2020
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of The Automatic Construction of Bootstrap Confidence ckirby/brad/talks/2018Automatic-Constructioآ ...

  • The Automatic Construction of Bootstrap

    Confidence Intervals

    Bradley Efron

    Stanford University

  • The Standard Intervals

    θ̂ ± z(α)σ̂ ( z(0.975) = 1.96

    )

    θ̂ a point estimate of θ

    σ̂ an estimate of its standard error

    (Taylor series, jackknife, bootstrap)

    Automatic!

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 2 / 31

  • Poisson Example

    x ∼ Poisson(θ)

    Observe x = 16

    95% central intervals:

    Lo Up Right/Left

    Standard 8.16 23.84 1

    Exact 9.47 25.41 1.44

    Bootstrap 9.42 25.53 1.45

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 3 / 31

  • 0 5 10 15 20 25 30

    0. 00

    0. 02

    0. 04

    0. 06

    0. 08

    0. 10

    Poisson example x~Poisson(theta), observe x=16; exact (black) and standard (red) 95% endpoints

    x

    f( x)

    * * * * * * * *

    *

    *

    *

    *

    *

    *

    * * *

    *

    *

    *

    *

    *

    *

    *

    * *

    * * * * *● | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

    16

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 4 / 31

  • Student-t Corrections

    xi ind∼ N(θ, σ2) i = 1,2, . . . ,n

    θ̂ = x̄ , σ̂2 = ∑

    (xi − x̄)2 / (n − 1)

    θ ∈ θ̂ ± t (α)n−1σ̂

    t (0.975)15 = 2.13 cf z (0.975) = 1.96

    t correction is order O(1/n)

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 5 / 31

  • Bootstrap Confidence Intervals

    bca algorithm makes 3 corrections to standard endpoints

    Corrections of order O ( 1 /√

    n )

    Nonparametric bcajack (automatic)

    Parametric bcapar (almost automatic)

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 6 / 31

  • The Diabetes Data

    n = 442 patients

    xi = (pi , yi)

     pi vector of 10 baseline predictors

    yi measure of disease progression, 1 year

    x a 442 × 11 matrix

    θ̂ = adjusted R2 of y regressed on p

    x = (lm(y ∼ p)$ adj.r.squared = 0.507)

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 7 / 31

  • Diabetes Data n = 442 subjects, 10 predictors, response y “progression”.

    age sex bmi map tc ldl hdl tch ltg glu y

    .04 .05 .06 .02 −.04 −.03 −.04 .00 .02 −.02 −1.13

    .00 −.04 −.05 −.03 −.01 −.02 .07 −.04 −.07 −.09 −77.13

    .09 .05 .04 −.01 −.05 −.03 −.03 .00 .00 −.03 −11.13 −.09 −.04 −.01 −.04 .01 .02 −.04 .03 .02 −.01 53.87

    .01 −.04 −.04 .02 .00 .02 .01 .00 −.03 −.05 −17.13 −.09 −.04 −.04 −.02 −.07 −.08 .04 −.08 −.04 −.10 −55.13

    ... ...

    ... ...

    ... ...

    ... ...

    ... ...

    ...

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 8 / 31

  • Nonparametric Bootstrapping

    Data matrix x n×p

    : rows are p-vectors

    θ̂ = t(x) estimates parameter of interest θ

    Bootstrap data matrix x∗: randomly choose n rows from x

    with replacement

    Gives θ̂∗ = t(x∗){ θ̂∗(1), θ̂∗(2), . . . , θ̂∗(B)

    } provides inferences for θ(

    σ̂boot = standard deviation of θ̂∗ )

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 9 / 31

  • B=2000 nonparametric bootstrap reps for adj.r.squared; Only 37.2% of the boots are less than .507

    thetahat*

    F re

    qu en

    cy

    0.45 0.50 0.55 0.60

    0 20

    40 60

    80 10

    0

    | |^ ^.507

    37.2%

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 10 / 31

  • 0.70 0.75 0.80 0.85 0.90 0.95

    0. 45

    0. 50

    0. 55

    Bca and standard two−sided intervals: adjusted R2 statistc (from bcajack; red bars indicate monte carlo error)

    black = bca, green=standard coverage

    lim its

    68% 80% 90% 95%

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 11 / 31

  • Correcting the Standard Intervals

    Standard CIs assume θ̂ ∼ N(θ, σ2), σ2 constant bca makes 3 corrections:

    1 for non-normality: using boot cdf Ĝ

    2 for bias: using ẑ0 = Φ−1Ĝ ( θ̂ )(

    diabetes: ẑ0 = Φ−1(0.372) = −0.327 )

    3 for “acceleration”: or nonconstant σ2, using jackknife

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 12 / 31

  • The bca Level α Endpoint

    θ̂bca(α) = Ĝ−1Φ ( ẑ0 +

    ẑ0 + z(α)

    1 − â(ẑ0 + z(α))

    )

    Ĝ bootstrap cdf

    Φ standard normal cdf

    ẑ0 = Φ−1Ĝ ( θ̂ )

    â acceleration estimate

    Second-order accuracy Claimed coverage accurate to O(1/n),

    compared to O ( 1 /√

    n )

    for standard CIs

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 13 / 31

  • Jackknife Calculations

    x(i) is (n − 1) × p matrix having xi removed from x

    θ̂(i) = t(x(i))

    θ̂(·) = ∑n

    1 θ̂(i) / n

    Di = θ̂(i) − θ̂(·)

    Jackknife standard error σ̂jack =

    n − 1n n∑ 1

    D2i

     1/2

    Acceleration estimate â = 1 6

    n∑ 1

    D3i

    /  n∑ 1

    D2i

     3/2

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 14 / 31

  • Grouped Jackknife

    n can be enormous these days

    partition x into m groups of size g = n/m:

    Xk = {xi1 , xi2 , . . . , xig } (k = 1,2, . . . ,m)

    Now X = (X1,X2, . . . ,Xm) has just m “points”

    Diabetes m = 40, g = 11 (2 left over)

    bcajack(x,2000, rfun,m = 40)

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 15 / 31

  • Partial output of bcajack(x, 2000, rfun, m=40) x = diabetes data, rfun = adjusted R2

    α bcalims jacksd standard

    .025 .424 .004 .443 .16 .465 .001 .474 .5 .497 .001 .507

    .84 .529 .001 .539 .975 .560 .002 .570

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 16 / 31

  • Monte Carlo Error

    Two types of error “sampling” and “Monte Carlo”

    σ̂boot concerns sampling error of θ̂

    Was B = 2000 enough bootstrap replications?

    Answer Jackknife the bootstrap replications.

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 17 / 31

  • Jackknife Estimates ofMonte Carlo Error

    Randomly partition bootstrap replications { θ̂∗ (1), θ̂

    ∗ (2), . . . , θ̂

    ∗ (2000)

    } into 10 groups of 200 each

    Remove one group at a time, and rerun bcajack

    Use jackknife formula to get “jacksd”

    No new bootstrapping required

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 18 / 31

  • 0.70 0.75 0.80 0.85 0.90 0.95

    0. 45

    0. 50

    0. 55

    Monte Carlo error is small for upper limits, not so small for lower limits

    black = bca, green=standard coverage

    lim its

    68% 80% 90% 95%

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 19 / 31

  • More output of bcajack(x, 2000, rfun, m=40)

    θ sdboot z0 a sdjack

    estimate .507 .033 −.327 −.004 .034 jacksd .000 .001 .027 .000 .000

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 20 / 31

  • Parametric Example: The Pediatric Death Data

    800 cases African facility for very sick babies

    600 survived, 200 died

    11 predictors x = (respiration, heart rate, weight, . . . )

    Logistic regression model

    πi death probability (i = 1,2, . . . ,800)

    yi =

     1 probability πi

    0 probability 1 − πi where πi =

    1

    1 + e−x ′ i α

    θ = “resp” coefficient α1

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 21 / 31

  • Parametric Bootstrapping

    glm

    ( y ∼ X

    800×11 ,binomial

    ) gives MLE α̂, θ̂ = α̂1 = 0.943

    π̂i = 1 / (

    1 + e−x ′ i α̂ ) , i = 1,2, . . . ,800

    Bootstrap sample y∗i =

     1 prob π̂i

    0 prob 1 − π̂i

    glm(y∗ ∼ X ,binomial) gives α̂∗ and θ̂∗ = α̂∗1

    Also need b∗ = X ′y∗ for bca calculations

    Bradley Efron Stanford University

    Automatic Construction of Bootstrap CIs 22 / 31

  • Program bcapar

    B bootstrap samples give

    θ̂∗ =