Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation...

24
Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1

Transcript of Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation...

Page 1: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Bootstrap Confidence IntervalsST552 Lecture 13

Charlotte Wickham2019-02-11

1

Page 2: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Motivation

The inferences we’ve covered so far relied on our assumption ofNormal errors:

ε ∼ N(0, σ2In×n)

For example, we’ve seen under this assumption, the least squaresestimates are also Normally distributed:

β ∼ N(β, σ2

(XT X

)−1)

If the errors aren’t truly Normally distributed, whatdistribution do the estimates have?

2

Page 3: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Warm-up: Your Turn

Imagine the errors are in fact t3 distributed?

With your neighbour: design a simulation to understand thedistribution of the least squares estimates.

3

Page 4: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

4

Page 5: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Example: 1. Fix n, fix X

2

4

6

8

10

5 10 15 20

x

Res

pons

e

5

Page 6: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Example: 1. Fix β, find y

0

4

8

12

5 10 15 20

x

Res

pons

e

6

Page 7: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Example: 2. Simulate errors, find y

0

4

8

12

5 10 15 20

x

Res

pons

e

7

Page 8: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Example: 3. Find least squares line

0

4

8

12

5 10 15 20

x

Res

pons

e

8

Page 9: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Example: 4. Repeat #2. and #3. many times

0

4

8

12

5 10 15 20

x

Res

pons

e

9

Page 10: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Example: Examine distribution of estimates

0.0

0.1

0.2

0.3

0.4

−5 0 5 10

Estimate of intercept

dens

ity

0

1

2

3

4

5

0.00 0.25 0.50 0.75

Estimate of slope

dens

ity

10

Page 11: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Example: Compared to theory

0.0

0.1

0.2

0.3

0.4

−5 0 5 10

Estimate of intercept

dens

ity

0

1

2

3

4

5

0.00 0.25 0.50 0.75

Estimate of slope

dens

ity

11

Page 12: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

When the errors aren’t Normal: CLT

Think of our estimates like linear combinations of the errors. I.e. asort of average of i.i.d random variables.Some version of the Central Limit Theorem will apply.

For large samples, even when the errors aren’t Normal,

β∼N(β, σ2(XT X )−1)

12

Page 13: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Summary so far

If we knew the error distribution and true parameters we could usesimulation to understand the sampling distribution the leastsquares estimates.

Simulation can also be used to demonstrate the CLT at work inregression.

13

Page 14: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Bootstrap confidence intervals

In practice, with data in front of us, we don’t know the distributionof the errors (nor the true parameter values).

The bootstrap is one approach to estimate the samplingdistribution of β, by using the simulation idea, and substituting inour best guesses for the things we don’t know.

14

Page 15: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Bootstrapping regression

(Model based resampling)

0. Fit model and find estimates, β, and residuals, ei

1. Fix X ,2. For k = 1, . . . ,B

i. Generate errors, ε∗i sampled with replacement from ei

ii. Construct y , using the model, y = y + ε∗

iii. Use least squares to find β∗(k)

3. Examine the distribution of β∗ and compare to β

One confidence interval for βj is the 2.5% and 97.5% quantiles ofthe distribution of β∗

j .(Known as the Percentile method, there are other (better?)methods).

15

Page 16: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Example: Faraway Galapagos Islands

(I’ll illustrate with simple linear regression, Faraway does multiplecase in 3.6)

−200

0

200

400

0 500 1000 1500

Elevation

Spe

cies

Observed data

16

Page 17: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Bootstrap: 1. Find β, y , and ei .

−200

0

200

400

0 500 1000 1500

Elevation

Spe

cies

Observed data

17

Page 18: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Bootstrap: Using fixed X , ˆbeta from observed data

−200

0

200

400

0 500 1000 1500

Elevation

Spe

cies

Observed data

18

Page 19: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Bootstrap: 2. Resample residuals to construct bootstrappedresponse

−200

0

200

400

0 500 1000 1500

Elevation

Spe

cies

Bootstrapped data

19

Page 20: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Bootstrap: 3. Fit regression model to bootstrapped response

−200

0

200

400

0 500 1000 1500

Elevation

Spe

cies

Bootstrapped data

20

Page 21: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Bootstrap: 3. Repeat #2. and #3. many times

−200

0

200

400

0 500 1000 1500

Elevation

Spe

cies

21

Page 22: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Examine distribution of estimates

0.000

0.005

0.010

0.015

0.020

−50 0 50

Bootstrap estimates of intercept

dens

ity

0

5

10

0.10 0.15 0.20 0.25 0.30 0.35

Bootstrap estimates of slope

dens

ity

22

Page 23: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

High level: bootstrap idea

We don’t know the distribution of the errors, but our best guess isprobably the empirical c.d.f on the residuals.

Sampling from a random variable with a c.d.f. defined as theempirical c.d.f. of the residuals, boils down to sampling withreplacement from residuals.

23

Page 24: Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Limitations

We might rely on bootstrap confidence intervals when we areworried about the assumption of Normal errors. But, there arelimitations.

• We still rely on the assumption that the errors areindependent and identically distributed.

• Generally scaled residuals are used (residuals don’t have thesame variance, more later)

• An alternative bootstrap resamples the (yi , xi1, . . . , xip)vectors, i.e. resamples the rows of the data, a.k.a resamplingcases bootstrap.

24