MATH2931 Lecture 6
-
Upload
matthew-goodwin -
Category
Documents
-
view
16 -
download
0
description
Transcript of MATH2931 Lecture 6
-
MATH2831/2931
Linear Models/ Higher Linear Models.
August 2, 2013
-
Week 2 Lecture 3 - Last lecture:
Review of Hypothesis testing two-sided and one-sided tests.
Hypothesis testing for 0 and 1.
Example 1: zinc concentrations in plants.
Example 2: sales and advertising data.
-
Week 2 Lecture 3 - This lecture:
The analysis of variance table
Inference on 1: some further examples on testing andANOVA.
Confidence intervals for the mean.
Prediction intervals.
-
Week 2 Lecture 3 - The ANOVA table
So far we have considered the model with a singleindependent variable to be the true model i.e. presumptionthat y |x is related to x linearly.
NOTE 1: Prediction of the response will be poor in situationsin which there are several independent variables, eachaffecting response.
NOTE 2: Prediction of the response will be poor in situationsin which presumption that y |x is related to x linearly is falsein range of variables considered.
Here we utilize ANOVA to analyze the quality of the estimatedregression line.
-
Week 2 Lecture 3 - The ANOVA table
If the true unknown model is linear in more than one variablex , ie.
y |x = 0 + 1x1 + 2x2
then standard least squares estimate derived so far
b1 =Sxy
Sxx
which is calculated only considering x1, is a biased estimatefor 1.
The bias will be a function of the additiona coefficient 2
We need a methodology to assess the quality of ourregression line !
-
Week 2 Lecture 3 - The ANOVA table
Analysis of the quality of an estimated regression line can behandled by an ANOVA approach.
ANOVA procedure: considers total variation in dependentvariable as subdivided into meaningful components which arethen observed and treated in a systematic fashion.(recall Week 1 Lecture 3 - partitioning of variance.)
Fundamental identity:SStotal = SSreg + SSres
Regression sum of squares (variation explained by the fit)
SSreg =
n
i=1
(yi y)2.
Residual sum of squares (variation unexplained by the fit)
SSres =
n
i=1
(yi yi )2.
-
Week 2 Lecture 3 - The ANOVA table
Question: Why is testing H0 : 1 = 0 of particular interest?Answer: It helps answer question about whether the predictor isuseful for explaining the response.t-test: statistic for testing H0 : 1 = 0 was
T =b1
/Sxx
.
This statistic (T) has a t-distribution with n 2 degrees offreedom under H0.
From our distribution results (Week 2 Lecture 1) we also canderive that under H0
F = T 2 =b21Sxx
2
has an F distribution with 1 and n 2 degrees of freedom.
-
Week 2 Lecture 3 - The ANOVA table
Theorem: One can express the regression sum of squares SSregas a function of the least squares as follows:
SSreg = b21Sxx
With SSreg = b21Sxx the above statistic, under H0,
F =b21Sxx
2
can be written as
F =SSreg/1
SSres/(n 2)(ratio of variation explained by the model to scaled residualvariation).
-
Week 2 Lecture 3 - The ANOVA tableProof:
SSreg =n
i=1
(yi y)2
=n
i=1
(b0 + b1xi y)2
=n
i=1
(y b1x + b1xi y)2
=n
i=1
b21(xi x)2
= b21Sxx .
So
F =SSreg/1
SSres/(n 2)has an F1,n2 distribution under H0 : 1 = 0 as claimed.
-
Week 2 Lecture 3 - The ANOVA table
With F as test statistic we obtain a test of
H0 : 1 = 0
againstH1 : 1 6= 0
at significance level by using the critical region
F > F;1,n2.
Computation of p-value:
p = Pr(F f |1 = 0)
where f is the observed value of F (given 1 = 0, F F1,n2).
-
Week 2 Lecture 3 - The ANOVA table
Source Sum of Degrees Mean FSquares of Square
freedom
Regression SSreg 1 MSreg = MSregSSreg/1 /
2
Residual SSres n 2 MSres =SSres(n2) =
2
Total SStotal n 1
-
Week 2 Lecture 3 - The ANOVA table
When the null hypothesis is rejected, i.e. computed F-statisticexceeds a critical value f (1, n 2) the conclusion is: there is a significant amount of variation in the responseaccounted for by the postulated model (simple linearregression)
NOTE: the t-test allows for testing both two-side and one-sidedalternative hypothesis, where as the ANOVA F-test is restricted totesting against the two-sided alternative.
-
Week 2 Lecture 3 - Example 1: market model of stockreturns
Monthly rate of return on a stock (R) is linearly related to monthlyreturn on the overall stock market (Rm).
R = 0 + 1Rm +
Rm is taken to be the monthly rate of return on some major stockmarket indexRECALL:
Coefficient 1 is called the beta coefficient of the stock
1 > 1 indicates stocks rate of return is more senstive tooverall market than average
1 < 1 less sensitive than average
Estimate 1 and is it significantly different from 1?
-
Week 2 Lecture 3 - Example 1: Market model of stockreturns
Scatter plot of Host International (y-axis) versus overall marketreturns (x-axis) with fitted regression line.
-
Week 2 Lecture 3 - Example 1: Market model of stockreturns
Fitted line is R = 0.14 + 1.60Rm = 9.27, Sxx = 1117.90.RECALL: 100(1 ) percentage confidence interval for 1 is(
b1 t/2;n2Sxx
, b1 + t/2;n2Sxx
).
95% confidence interval for 1:(1.60 2.002 9.27
1117.90, 1.60 + 2.002
9.271117.90
)= (1.04, 2.16),
which doesnt contain 1A value of 1 for the slope does not seem plausible based onthe data.
-
Week 2 Lecture 3 - Example 1: Market model of stockreturns
Hypothesis testing equivalent:
H0 : 1 = 1
versusH1 : 1 6= 1
Test statistic:
b1 1/Sxx
=1.60 1
9.27/1117.90
= 2.16.
So if T t58, the p-value for the test isp = Pr(|T | 2.16)
= 2Pr(T 2.16)= 0.0349
so that we reject H0 : 1 = 1 at the 5% level
-
Week 2 Lecture 3 - Example 2: Risk assessment fromfinancial reports
DATA Collection:
Investors are interested in the riskiness of a stock.
Want company financial reports to provide information helpfulfor assessing risk.
Seven accounting determined measures of risk (available froma companys financial reports)
Divident payout, current ratio, asset size, asset growth,leverage, variability in earnings, covariability in earnings.
These were computed for 25 well known stocks based onannual reports.
-
Week 2 Lecture 3 - Example 2: risk assessment fromfinancial reports
ExperimentData sent to a random sample of 500 financial analysts of which209 respondedMean rating assigned by the 209 analysts recorded for each of the25 stocks
Mean rating by analysts taken as reasonable surrogate ofmarket risk for each stock
AIM: Want to predict market risk from accounting measures:response is market risk, predictors are accounting measures(multiple).
-
Week 2 Lecture 3 - Example 2: Risk assessment example
Estimated market risk (y-axis) versus log(asset size) (x-axis).Fitted line is y = 8.143 0.412x and R2 = 0.21
-
Week 2 Lecture 3 - Prediction in the simple linearregression model
Reason for building a simple linear regression model is often topredict a new response value when the value of the predictor isknown.Example: risk assessment data
Riskiness of a stock rated by 209 financial analysts (response).Predictors are various accounting determined measures of risk.
Aim: Simple linear regression model for risk with asset size aspredictor.
Outcome: For a company not assessed by the financialanalysts we can determine asset size from company reportsand predict risk using the fitted model.
-
Week 2 Lecture 3 - Confidence intervals for the mean andprediction intervals
Prediction of a new response value when predictor is x0:
y(x0) = b0 + b1x0.
True conditional mean of response at x0:
0 + 1x0
New response value y0 when predictor is x0: write
y0 = 0 + 1x0 + 0,
0 independent of previous responses, normal with mean zero,variance 2.Want to find confidence interval for conditional mean, and intervalwhich covers y0 with specified confidence (prediction interval).
-
Week 2 Lecture 3 - Confidence and Prediction Intervals
Confidence interval for conditional mean will reflect ouruncertainty due to estimating 0, 1.
Prediction interval will reflect our uncertainty due toestimating 0, 1, and the level of variation of the responsesabout the conditional mean (captured by our estimate of 2).
First well define a statistic which can be used for constructing aconfidence interval for the conditional mean at x0.Consider y(x0) = b0 + b1x0.
y(x0) is a Gaussian random variable.
E(y(x0)) = E(b0 + b1x0) = 0 + 1x0.
Var(y(x0)) = 2(1n+ (x0x)
2
Sxx
)
-
Week 2 Lecture 3 - Confidence interval for the mean
Var(y(x0)) = Var (b0 + b1x0)
= Var (b0) + x20Var (b1) + 2x0Cov(b0, b1)
= 2(1
n+
x2
Sxx
)+ x20
(2
Sxx
) 2x0
(2x
Sxx
)
= 2(1
n+
(x0 x)2Sxx
).
(1)
-
Week 2 Lecture 3 - Confidence interval for the mean
y(x0) is normally distributed, so
y(x0) 0 1x0
1n+ (x0x)
2
Sxx
N(0, 1).
As in previous lectures
(n 2)22
2n2
independently of y(x0) (since y(x0) is a linear combination of b0,b1 both independent of
2).
-
Week 2 Lecture 3 - Confidence intervals for the meanWe have
y(x0) 0 1x0
1n+ (x0x)
2
Sxx
tn2.
Write t/2,n2 for upper 100 /2 percentage point of tdistribution with n 2 degrees of freedom,
Pr
t/2,n2 y(x0) 0 1x0
1n+ (x0x)
2
Sxx
t/2,n2
= 1
Confidence interval for 0 + 1x0:
y(x0) t/2,n2
1
n+
(x0 x)2Sxx
-
Week 2 Lecture 2 - Learning Expectations.
Understand the quantities within the analysis of variance table
Be able to use the ANOVA table to answer questionsregarding suitability of a postulated statistical model.
Be able to formulate and evaluate a confidence intervals forthe mean.
Be able to formulate and evaluate a prediction interval