STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS...
Transcript of STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS...
STAT 141 REGRESSION: CONFIDENCE vs PREDICTION INTERVALS 12/2/04Inference for coefficientsMean response at x vs. New observation at x
Linear Model (or Simple Linear Regression) for the population.(“Simple” means single explanatory variable, in fact we can easily add more variables )– explanatory variable (independent var / predictor) – response (dependent var)
Probability model for linear regression:
Yi = α + βxi + εi,εi ∼ N(0, σ2) independent deviationsα + βxi mean response atx = xi
Goals: unbiased estimates of the three parameters(α, β, σ2) tests for null hypotheses:α = α0 or β = β0
C.I.’s for α, β or to predictE(Y |X = x0).
(A model is our ‘stereotype’ – a simplification for summarizing the variation in data)
For example if we simulate data from a temperature model of the form:
Yi = 65 +1
3xi + εi, xi = 1, 2, . . . , 30
Model is exactly true, by construction
An equivalentstatement of the LM model: Assumexi fixed,Yi independent, and
Yi|xi ∼ N(µy|xi, σ2), µy|xi
= α + βxi, population regression line
Remark:Suppose that(Xi, Yi) are a random sample from a bivariate normal distribution with means(µX , µY ), variancesσ2
X , σ2Y and correlationρ. Suppose that we condition on the observed values
X = xi. Then the data(xi, yi) satisfy the LM model. Indeed, we saw last time thatY |xi ∼ N(µy|xi
, σ2y|xi
), with
µy|xi= α + βxi, σ2
Y |X = (1− ρ2)σ2Y
Example:Galton’s fathers and sons:µy|x = 35 + 0.5x ; σ = 2.34 (in inches). For comparison:σY = 2.7.Note: σ is SD of ygivenx. It is less than the unconditional SD ofY , σY (withoutx fixed). e.g.consider extreme case wheny = x: σ=0, butσY = σX .Estimating parameters (α, β, σ2) from a sampleRecall the least squares estimates: (a,b)chosen to minimize
∑(yi − a− bxi)
2:
a = y − bx b = rsy
sx
=
∑(xi − x)(yi − y)∑
(xi − x)2
Fitted values: y = a + bxi
Residuals:
ei = yi − yi = yi − a− bxi = α + βxi + εi − a− bxi = (α− a) + (β − b)xi + εi
whereei = deviations insample,εi = deviations inmodel
Estimate of σ2: Again, use the squared scale:
s2Y |x =
1
n− 2
∑i
e2i =
1
n− 2
∑i
(yi − a− bxi)2i , s =
√s2
Y |x (n− 2) = “degrees of freedom”
Why n-2??
1
1. Recall with SRSYi ∼ N(µ, σ2), s2 = 1n−1
∑(yi − y)2
(andy estimatesµ – we estimateoneparameter)
Here,two parameter estimates:a estimatesα , and b estimatesβ.
2. There are two linear constraints onei:
• e = y − a− bx = y − (y − bx)− bx = 0
• ∑(xi − x)ei = 0
Properties of Regression Estimates.Assume model (LM ): then(a, b, s2 = s2Y |x) arerandom
variables.
1. Means (a, b, s2 = s2Y |x) are unbiased estimates of(α, β, σ2)
2. VariancesFor (a, b) variances are
σ2a = V ar(a) = σ2
{1
n+
x2∑(xi − x)2
}, σ2
b = V ar(b) =σ2∑
(xi − x)2
3. (a, b) are jointly normal – in particular
a ∼ N(α, σ2a), b ∼ N(β, σ2
b ), (n− 2)s2
σ2∼ χ2
n−2 (and a,b, ands2 are independent).
Confidence Intervals:at levelα CI’s always have the form: Est± t1−α2
,(n−2)SEEst
SEEstmeansσEst with σ replaced by its estimate s. Thus
SEb =s√∑
(xi − x)2, SEa = s
√√√√ 1
n+
x2∑(xi − x)2
Thus,(1− α)% CI for:slope b± t1−α
2,(n−2)SEb
intercept a± t1−α2
,(n−2)SEa
TestsDescribed here for slopeβ , ( but same for interceptα).ForH0 : β = β0, uset = b−β0
SEbwhich underH0 has thetn−2 distribution.
P values: One sided:(e.g.) H1 : β < β0, P = P (tn−2 < tobs)Two sided: H1 : β 6= β0, P = 2P (tn−2 > |tobs|)
( Aside: Why tn−2?By definitiontν ∼ N(0,1)√
χ2ν
ν
, with numerator and denominator variable independent. But we can write the
slope test statistic in the form
t =(b− β0)/σb√
SE2b
σ2b
and now we can note from properties (1) – (3) that
(b− β0)/σb ∼ N(0, 1) and SE2b /σ
2b = s2/σ2 ∼ χ2
n−2
and it can be shown thatb ands2 are independent. This shows that the test statistict ∼ tn−2.)
2
VsMean response at xWant to estimateµY |x∗ = α + βx∗
mean for subpopulation givenx = x∗
Point estimate:µY |x∗ = a + bx∗
Sources of uncertainty :a, b estimated.
SEµY |x∗ = s
√√√√ 1
n+
(x∗ − x)2∑(xi − x)2
CI :level (1− α)% n-2 df
µY |x∗ ± t1−α/2,n−2SEµY |x∗
Claim: 95% of the time , CI coversµY |x∗ = α + βx∗
ie
µ− t1−α/2,n−2SEµ ≤ µ ≤ µ + t1−α/2,n−2SEµ
> predict(lm(y ~ x), new,interval="confidence",se.fit=T)$fit
fit lwr upr1 -1.82756 -3.165220 -0.489912 -1.55835 -2.690602 -0.426103 -1.28914 -2.222694 -0.355584 -1.01992 -1.766869 -0.27298
New Observation at xWant topredict ynew(x∗) = α + βx∗ + εnew
new draw ofy from subpopulation givenx = x∗
Point prediction:y(x∗) = a + bx∗
Sources of uncertainty :a, b estimated.random errorεof new observ.
SE2y(x∗) = s2
Y |x + SE2µY |x∗
Prediction Interval : level(1− α)% n-2 df
y(x∗)± t1−α/2,n−2SEy(x∗)
Claim: 95% of the time , Pred. Interv. coversynew(x∗)ie
y − t1−α/2,n−2SE ≤ ynew ≤ y + t1−α/2,n−2SE
predict(lm(y ~ x), data,interval="prediction",se.fit=T)$fit
fit lwr upr1 -1.827565 -3.99873 0.343602 -1.558351 -3.60936 0.492663 -1.289137 -3.23751 0.659244 -1.019922 -2.88608 0.84624
3