STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS...

3
STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS 12/2/04 Inference for coefficients Mean response at x vs. New observation at x Linear Model (or Simple Linear Regression) for the population. (“Simple” means single explanatory variable, in fact we can easily add more variables ) – explanatory variable (independent var / predictor) – response (dependent var) Probability model for linear regression: Y i = α + βx i + i , i N (02 ) independent deviations α + βx i mean response at x = x i Goals: unbiased estimates of the three parameters (α,β,σ 2 ) tests for null hypotheses: α = α 0 or β = β 0 C.I.’s for α, β or to predictE(Y |X = x 0 ). (A model is our ‘stereotype’ – a simplification for summarizing the variation in data) For example if we simulate data from a temperature model of the form: Y i = 65 + 1 3 x i + i , x i =1, 2,..., 30 Model is exactly true, by construction An equivalent statement of the LM model: Assume x i fixed, Y i independent, and Y i |x i N (μ y|x i 2 ), μ y|x i = α + βx i , population regression line Remark: Suppose that (X i ,Y i ) are a random sample from a bivariate normal distribution with means (μ X Y ), variances σ 2 X 2 Y and correlation ρ. Suppose that we condition on the observed values X = x i . Then the data (x i ,y i ) satisfy the LM model. Indeed, we saw last time that Y |x i N (μ y|x i 2 y|x i ), with μ y|x i = α + βx i , σ 2 Y |X = (1 - ρ 2 )σ 2 Y Example: Galton’s fathers and sons: μ y|x = 35 + 0.5x ; σ = 2.34 (in inches). For comparison: σ Y =2.7. Note: σ is SD of y given x. It is less than the unconditional SD of Y , σ Y (without x fixed). e.g. consider extreme case when y = x: σ=0, but σ Y = σ X . Estimating parameters (α,β,σ 2 ) from a sample Recall the least squares estimates: (a,b) chosen to minimize (y i - a - bx i ) 2 : a y - b ¯ x b = r s y s x = (x i - ¯ x)(y i - ¯ y) (x i - ¯ x) 2 Fitted values: ˆ y = a + bx i Residuals: e i = y i - ˆ y i = y i - a - bx i = α + βx i + i - a - bx i =(α - a)+(β - b)x i + i where e i = deviations in sample, i = deviations in model Estimate of σ 2 : Again, use the squared scale: s 2 Y |x = 1 n - 2 i e 2 i = 1 n - 2 i (y i - a - bx i ) 2 i ,s = s 2 Y |x (n - 2) = “degrees of freedom” Why n-2?? 1

Transcript of STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS...

Page 1: STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS …statweb.stanford.edu/~susan/courses/s141/horegconf.pdf · STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS 12/2/04

STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS 12/2/04Inference for coefficientsMean response at x vs. New observation at x

Linear Model (or Simple Linear Regression) for the population.(“Simple” means single explanatory variable, in fact we can easily add more variables )– explanatory variable (independent var / predictor) – response (dependent var)

Probability model for linear regression:

Yi = α + βxi + εi,εi ∼ N(0, σ2) independent deviationsα + βxi mean response atx = xi

Goals: unbiased estimates of the three parameters(α, β, σ2) tests for null hypotheses:α = α0 or β = β0

C.I.’s for α, β or to predictE(Y |X = x0).

(A model is our ‘stereotype’ – a simplification for summarizing the variation in data)

For example if we simulate data from a temperature model of the form:

Yi = 65 +1

3xi + εi, xi = 1, 2, . . . , 30

Model is exactly true, by construction

An equivalentstatement of the LM model: Assumexi fixed,Yi independent, and

Yi|xi ∼ N(µy|xi, σ2), µy|xi

= α + βxi, population regression line

Remark:Suppose that(Xi, Yi) are a random sample from a bivariate normal distribution with means(µX , µY ), variancesσ2

X , σ2Y and correlationρ. Suppose that we condition on the observed values

X = xi. Then the data(xi, yi) satisfy the LM model. Indeed, we saw last time thatY |xi ∼ N(µy|xi

, σ2y|xi

), with

µy|xi= α + βxi, σ2

Y |X = (1− ρ2)σ2Y

Example:Galton’s fathers and sons:µy|x = 35 + 0.5x ; σ = 2.34 (in inches). For comparison:σY = 2.7.Note: σ is SD of ygivenx. It is less than the unconditional SD ofY , σY (withoutx fixed). e.g.consider extreme case wheny = x: σ=0, butσY = σX .Estimating parameters (α, β, σ2) from a sampleRecall the least squares estimates: (a,b)chosen to minimize

∑(yi − a− bxi)

2:

a = y − bx b = rsy

sx

=

∑(xi − x)(yi − y)∑

(xi − x)2

Fitted values: y = a + bxi

Residuals:

ei = yi − yi = yi − a− bxi = α + βxi + εi − a− bxi = (α− a) + (β − b)xi + εi

whereei = deviations insample,εi = deviations inmodel

Estimate of σ2: Again, use the squared scale:

s2Y |x =

1

n− 2

∑i

e2i =

1

n− 2

∑i

(yi − a− bxi)2i , s =

√s2

Y |x (n− 2) = “degrees of freedom”

Why n-2??

1

Page 2: STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS …statweb.stanford.edu/~susan/courses/s141/horegconf.pdf · STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS 12/2/04

1. Recall with SRSYi ∼ N(µ, σ2), s2 = 1n−1

∑(yi − y)2

(andy estimatesµ – we estimateoneparameter)

Here,two parameter estimates:a estimatesα , and b estimatesβ.

2. There are two linear constraints onei:

• e = y − a− bx = y − (y − bx)− bx = 0

• ∑(xi − x)ei = 0

Properties of Regression Estimates.Assume model (LM ): then(a, b, s2 = s2Y |x) arerandom

variables.

1. Means (a, b, s2 = s2Y |x) are unbiased estimates of(α, β, σ2)

2. VariancesFor (a, b) variances are

σ2a = V ar(a) = σ2

{1

n+

x2∑(xi − x)2

}, σ2

b = V ar(b) =σ2∑

(xi − x)2

3. (a, b) are jointly normal – in particular

a ∼ N(α, σ2a), b ∼ N(β, σ2

b ), (n− 2)s2

σ2∼ χ2

n−2 (and a,b, ands2 are independent).

Confidence Intervals:at levelα CI’s always have the form: Est± t1−α2

,(n−2)SEEst

SEEstmeansσEst with σ replaced by its estimate s. Thus

SEb =s√∑

(xi − x)2, SEa = s

√√√√ 1

n+

x2∑(xi − x)2

Thus,(1− α)% CI for:slope b± t1−α

2,(n−2)SEb

intercept a± t1−α2

,(n−2)SEa

TestsDescribed here for slopeβ , ( but same for interceptα).ForH0 : β = β0, uset = b−β0

SEbwhich underH0 has thetn−2 distribution.

P values: One sided:(e.g.) H1 : β < β0, P = P (tn−2 < tobs)Two sided: H1 : β 6= β0, P = 2P (tn−2 > |tobs|)

( Aside: Why tn−2?By definitiontν ∼ N(0,1)√

χ2ν

ν

, with numerator and denominator variable independent. But we can write the

slope test statistic in the form

t =(b− β0)/σb√

SE2b

σ2b

and now we can note from properties (1) – (3) that

(b− β0)/σb ∼ N(0, 1) and SE2b /σ

2b = s2/σ2 ∼ χ2

n−2

and it can be shown thatb ands2 are independent. This shows that the test statistict ∼ tn−2.)

2

Page 3: STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS …statweb.stanford.edu/~susan/courses/s141/horegconf.pdf · STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS 12/2/04

VsMean response at xWant to estimateµY |x∗ = α + βx∗

mean for subpopulation givenx = x∗

Point estimate:µY |x∗ = a + bx∗

Sources of uncertainty :a, b estimated.

SEµY |x∗ = s

√√√√ 1

n+

(x∗ − x)2∑(xi − x)2

CI :level (1− α)% n-2 df

µY |x∗ ± t1−α/2,n−2SEµY |x∗

Claim: 95% of the time , CI coversµY |x∗ = α + βx∗

ie

µ− t1−α/2,n−2SEµ ≤ µ ≤ µ + t1−α/2,n−2SEµ

> predict(lm(y ~ x), new,interval="confidence",se.fit=T)$fit

fit lwr upr1 -1.82756 -3.165220 -0.489912 -1.55835 -2.690602 -0.426103 -1.28914 -2.222694 -0.355584 -1.01992 -1.766869 -0.27298

New Observation at xWant topredict ynew(x∗) = α + βx∗ + εnew

new draw ofy from subpopulation givenx = x∗

Point prediction:y(x∗) = a + bx∗

Sources of uncertainty :a, b estimated.random errorεof new observ.

SE2y(x∗) = s2

Y |x + SE2µY |x∗

Prediction Interval : level(1− α)% n-2 df

y(x∗)± t1−α/2,n−2SEy(x∗)

Claim: 95% of the time , Pred. Interv. coversynew(x∗)ie

y − t1−α/2,n−2SE ≤ ynew ≤ y + t1−α/2,n−2SE

predict(lm(y ~ x), data,interval="prediction",se.fit=T)$fit

fit lwr upr1 -1.827565 -3.99873 0.343602 -1.558351 -3.60936 0.492663 -1.289137 -3.23751 0.659244 -1.019922 -2.88608 0.84624

3