STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS...

Post on 06-Feb-2018

219 views 1 download

Transcript of STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS...

STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS 12/2/04Inference for coefficientsMean response at x vs. New observation at x

Linear Model (or Simple Linear Regression) for the population.(“Simple” means single explanatory variable, in fact we can easily add more variables )– explanatory variable (independent var / predictor) – response (dependent var)

Probability model for linear regression:

Yi = α + βxi + εi,εi ∼ N(0, σ2) independent deviationsα + βxi mean response atx = xi

Goals: unbiased estimates of the three parameters(α, β, σ2) tests for null hypotheses:α = α0 or β = β0

C.I.’s for α, β or to predictE(Y |X = x0).

(A model is our ‘stereotype’ – a simplification for summarizing the variation in data)

For example if we simulate data from a temperature model of the form:

Yi = 65 +1

3xi + εi, xi = 1, 2, . . . , 30

Model is exactly true, by construction

An equivalentstatement of the LM model: Assumexi fixed,Yi independent, and

Yi|xi ∼ N(µy|xi, σ2), µy|xi

= α + βxi, population regression line

Remark:Suppose that(Xi, Yi) are a random sample from a bivariate normal distribution with means(µX , µY ), variancesσ2

X , σ2Y and correlationρ. Suppose that we condition on the observed values

X = xi. Then the data(xi, yi) satisfy the LM model. Indeed, we saw last time thatY |xi ∼ N(µy|xi

, σ2y|xi

), with

µy|xi= α + βxi, σ2

Y |X = (1− ρ2)σ2Y

Example:Galton’s fathers and sons:µy|x = 35 + 0.5x ; σ = 2.34 (in inches). For comparison:σY = 2.7.Note: σ is SD of ygivenx. It is less than the unconditional SD ofY , σY (withoutx fixed). e.g.consider extreme case wheny = x: σ=0, butσY = σX .Estimating parameters (α, β, σ2) from a sampleRecall the least squares estimates: (a,b)chosen to minimize

∑(yi − a− bxi)

2:

a = y − bx b = rsy

sx

=

∑(xi − x)(yi − y)∑

(xi − x)2

Fitted values: y = a + bxi

Residuals:

ei = yi − yi = yi − a− bxi = α + βxi + εi − a− bxi = (α− a) + (β − b)xi + εi

whereei = deviations insample,εi = deviations inmodel

Estimate of σ2: Again, use the squared scale:

s2Y |x =

1

n− 2

∑i

e2i =

1

n− 2

∑i

(yi − a− bxi)2i , s =

√s2

Y |x (n− 2) = “degrees of freedom”

Why n-2??

1

1. Recall with SRSYi ∼ N(µ, σ2), s2 = 1n−1

∑(yi − y)2

(andy estimatesµ – we estimateoneparameter)

Here,two parameter estimates:a estimatesα , and b estimatesβ.

2. There are two linear constraints onei:

• e = y − a− bx = y − (y − bx)− bx = 0

• ∑(xi − x)ei = 0

Properties of Regression Estimates.Assume model (LM ): then(a, b, s2 = s2Y |x) arerandom

variables.

1. Means (a, b, s2 = s2Y |x) are unbiased estimates of(α, β, σ2)

2. VariancesFor (a, b) variances are

σ2a = V ar(a) = σ2

{1

n+

x2∑(xi − x)2

}, σ2

b = V ar(b) =σ2∑

(xi − x)2

3. (a, b) are jointly normal – in particular

a ∼ N(α, σ2a), b ∼ N(β, σ2

b ), (n− 2)s2

σ2∼ χ2

n−2 (and a,b, ands2 are independent).

Confidence Intervals:at levelα CI’s always have the form: Est± t1−α2

,(n−2)SEEst

SEEstmeansσEst with σ replaced by its estimate s. Thus

SEb =s√∑

(xi − x)2, SEa = s

√√√√ 1

n+

x2∑(xi − x)2

Thus,(1− α)% CI for:slope b± t1−α

2,(n−2)SEb

intercept a± t1−α2

,(n−2)SEa

TestsDescribed here for slopeβ , ( but same for interceptα).ForH0 : β = β0, uset = b−β0

SEbwhich underH0 has thetn−2 distribution.

P values: One sided:(e.g.) H1 : β < β0, P = P (tn−2 < tobs)Two sided: H1 : β 6= β0, P = 2P (tn−2 > |tobs|)

( Aside: Why tn−2?By definitiontν ∼ N(0,1)√

χ2ν

ν

, with numerator and denominator variable independent. But we can write the

slope test statistic in the form

t =(b− β0)/σb√

SE2b

σ2b

and now we can note from properties (1) – (3) that

(b− β0)/σb ∼ N(0, 1) and SE2b /σ

2b = s2/σ2 ∼ χ2

n−2

and it can be shown thatb ands2 are independent. This shows that the test statistict ∼ tn−2.)

2

VsMean response at xWant to estimateµY |x∗ = α + βx∗

mean for subpopulation givenx = x∗

Point estimate:µY |x∗ = a + bx∗

Sources of uncertainty :a, b estimated.

SEµY |x∗ = s

√√√√ 1

n+

(x∗ − x)2∑(xi − x)2

CI :level (1− α)% n-2 df

µY |x∗ ± t1−α/2,n−2SEµY |x∗

Claim: 95% of the time , CI coversµY |x∗ = α + βx∗

ie

µ− t1−α/2,n−2SEµ ≤ µ ≤ µ + t1−α/2,n−2SEµ

> predict(lm(y ~ x), new,interval="confidence",se.fit=T)$fit

fit lwr upr1 -1.82756 -3.165220 -0.489912 -1.55835 -2.690602 -0.426103 -1.28914 -2.222694 -0.355584 -1.01992 -1.766869 -0.27298

New Observation at xWant topredict ynew(x∗) = α + βx∗ + εnew

new draw ofy from subpopulation givenx = x∗

Point prediction:y(x∗) = a + bx∗

Sources of uncertainty :a, b estimated.random errorεof new observ.

SE2y(x∗) = s2

Y |x + SE2µY |x∗

Prediction Interval : level(1− α)% n-2 df

y(x∗)± t1−α/2,n−2SEy(x∗)

Claim: 95% of the time , Pred. Interv. coversynew(x∗)ie

y − t1−α/2,n−2SE ≤ ynew ≤ y + t1−α/2,n−2SE

predict(lm(y ~ x), data,interval="prediction",se.fit=T)$fit

fit lwr upr1 -1.827565 -3.99873 0.343602 -1.558351 -3.60936 0.492663 -1.289137 -3.23751 0.659244 -1.019922 -2.88608 0.84624

3