MATH2931 Lecture 6

MATH2831/2931

Linear Models/ Higher Linear Models.

August 2, 2013

Week 2 Lecture 3 - Last lecture:

Review of Hypothesis testing two-sided and one-sided tests.

Hypothesis testing for 0 and 1.

Example 1: zinc concentrations in plants.

Example 2: sales and advertising data.

Week 2 Lecture 3 - This lecture:

The analysis of variance table

Inference on 1: some further examples on testing andANOVA.

Confidence intervals for the mean.

Prediction intervals.

Week 2 Lecture 3 - The ANOVA table

So far we have considered the model with a singleindependent variable to be the true model i.e. presumptionthat y |x is related to x linearly.

NOTE 1: Prediction of the response will be poor in situationsin which there are several independent variables, eachaffecting response.

NOTE 2: Prediction of the response will be poor in situationsin which presumption that y |x is related to x linearly is falsein range of variables considered.

Here we utilize ANOVA to analyze the quality of the estimatedregression line.


If the true unknown model is linear in more than one variablex , ie.

y |x = 0 + 1x1 + 2x2

then standard least squares estimate derived so far

b1 =Sxy

Sxx

which is calculated only considering x1, is a biased estimatefor 1.

The bias will be a function of the additiona coefficient 2

We need a methodology to assess the quality of ourregression line !


Analysis of the quality of an estimated regression line can behandled by an ANOVA approach.

ANOVA procedure: considers total variation in dependentvariable as subdivided into meaningful components which arethen observed and treated in a systematic fashion.(recall Week 1 Lecture 3 - partitioning of variance.)

Fundamental identity:SStotal = SSreg + SSres

Regression sum of squares (variation explained by the fit)

SSreg =

n

i=1

(yi y)2.

Residual sum of squares (variation unexplained by the fit)

SSres =

n

i=1

(yi yi )2.


Question: Why is testing H0 : 1 = 0 of particular interest?Answer: It helps answer question about whether the predictor isuseful for explaining the response.t-test: statistic for testing H0 : 1 = 0 was

T =b1

/Sxx

.

This statistic (T) has a t-distribution with n 2 degrees offreedom under H0.

From our distribution results (Week 2 Lecture 1) we also canderive that under H0

F = T 2 =b21Sxx

2

has an F distribution with 1 and n 2 degrees of freedom.


Theorem: One can express the regression sum of squares SSregas a function of the least squares as follows:

SSreg = b21Sxx

With SSreg = b21Sxx the above statistic, under H0,

F =b21Sxx

2

can be written as

F =SSreg/1

SSres/(n 2)(ratio of variation explained by the model to scaled residualvariation).

Week 2 Lecture 3 - The ANOVA tableProof:

SSreg =n

i=1

(yi y)2

=n

i=1

(b0 + b1xi y)2

=n

i=1

(y b1x + b1xi y)2

=n

i=1

b21(xi x)2

= b21Sxx .

So

F =SSreg/1

SSres/(n 2)has an F1,n2 distribution under H0 : 1 = 0 as claimed.


With F as test statistic we obtain a test of

H0 : 1 = 0

againstH1 : 1 6= 0

at significance level by using the critical region

F > F;1,n2.

Computation of p-value:

p = Pr(F f |1 = 0)

where f is the observed value of F (given 1 = 0, F F1,n2).


Source Sum of Degrees Mean FSquares of Square

freedom

Regression SSreg 1 MSreg = MSregSSreg/1 /

2

Residual SSres n 2 MSres =SSres(n2) =

2

Total SStotal n 1


When the null hypothesis is rejected, i.e. computed F-statisticexceeds a critical value f (1, n 2) the conclusion is: there is a significant amount of variation in the responseaccounted for by the postulated model (simple linearregression)

NOTE: the t-test allows for testing both two-side and one-sidedalternative hypothesis, where as the ANOVA F-test is restricted totesting against the two-sided alternative.

Week 2 Lecture 3 - Example 1: market model of stockreturns

Monthly rate of return on a stock (R) is linearly related to monthlyreturn on the overall stock market (Rm).

R = 0 + 1Rm +

Rm is taken to be the monthly rate of return on some major stockmarket indexRECALL:

Coefficient 1 is called the beta coefficient of the stock

1 > 1 indicates stocks rate of return is more senstive tooverall market than average

1 < 1 less sensitive than average

Estimate 1 and is it significantly different from 1?

Week 2 Lecture 3 - Example 1: Market model of stockreturns

Scatter plot of Host International (y-axis) versus overall marketreturns (x-axis) with fitted regression line.


Fitted line is R = 0.14 + 1.60Rm = 9.27, Sxx = 1117.90.RECALL: 100(1 ) percentage confidence interval for 1 is(

b1 t/2;n2Sxx

, b1 + t/2;n2Sxx

).

95% confidence interval for 1:(1.60 2.002 9.27

1117.90, 1.60 + 2.002

9.271117.90

)= (1.04, 2.16),

which doesnt contain 1A value of 1 for the slope does not seem plausible based onthe data.


Hypothesis testing equivalent:

H0 : 1 = 1

versusH1 : 1 6= 1

Test statistic:

b1 1/Sxx

=1.60 1

9.27/1117.90

= 2.16.

So if T t58, the p-value for the test isp = Pr(|T | 2.16)

= 2Pr(T 2.16)= 0.0349

so that we reject H0 : 1 = 1 at the 5% level

Week 2 Lecture 3 - Example 2: Risk assessment fromfinancial reports

DATA Collection:

Investors are interested in the riskiness of a stock.

Want company financial reports to provide information helpfulfor assessing risk.

Seven accounting determined measures of risk (available froma companys financial reports)

Divident payout, current ratio, asset size, asset growth,leverage, variability in earnings, covariability in earnings.

These were computed for 25 well known stocks based onannual reports.

Week 2 Lecture 3 - Example 2: risk assessment fromfinancial reports

ExperimentData sent to a random sample of 500 financial analysts of which209 respondedMean rating assigned by the 209 analysts recorded for each of the25 stocks

Mean rating by analysts taken as reasonable surrogate ofmarket risk for each stock

AIM: Want to predict market risk from accounting measures:response is market risk, predictors are accounting measures(multiple).

Week 2 Lecture 3 - Example 2: Risk assessment example

Estimated market risk (y-axis) versus log(asset size) (x-axis).Fitted line is y = 8.143 0.412x and R2 = 0.21

Week 2 Lecture 3 - Prediction in the simple linearregression model

Reason for building a simple linear regression model is often topredict a new response value when the value of the predictor isknown.Example: risk assessment data

Riskiness of a stock rated by 209 financial analysts (response).Predictors are various accounting determined measures of risk.

Aim: Simple linear regression model for risk with asset size aspredictor.

Outcome: For a company not assessed by the financialanalysts we can determine asset size from company reportsand predict risk using the fitted model.

Week 2 Lecture 3 - Confidence intervals for the mean andprediction intervals

Prediction of a new response value when predictor is x0:

y(x0) = b0 + b1x0.

True conditional mean of response at x0:

0 + 1x0

New response value y0 when predictor is x0: write

y0 = 0 + 1x0 + 0,

0 independent of previous responses, normal with mean zero,variance 2.Want to find confidence interval for conditional mean, and intervalwhich covers y0 with specified confidence (prediction interval).

Week 2 Lecture 3 - Confidence and Prediction Intervals

Confidence interval for conditional mean will reflect ouruncertainty due to estimating 0, 1.

Prediction interval will reflect our uncertainty due toestimating 0, 1, and the level of variation of the responsesabout the conditional mean (captured by our estimate of 2).

First well define a statistic which can be used for constructing aconfidence interval for the conditional mean at x0.Consider y(x0) = b0 + b1x0.

y(x0) is a Gaussian random variable.

E(y(x0)) = E(b0 + b1x0) = 0 + 1x0.

Var(y(x0)) = 2(1n+ (x0x)

2

Sxx

)

Week 2 Lecture 3 - Confidence interval for the mean

Var(y(x0)) = Var (b0 + b1x0)

= Var (b0) + x20Var (b1) + 2x0Cov(b0, b1)

= 2(1

n+

x2

Sxx

)+ x20

(2

Sxx

) 2x0

(2x

Sxx

)

= 2(1

n+

(x0 x)2Sxx

).

(1)

Week 2 Lecture 3 - Confidence interval for the mean

y(x0) is normally distributed, so

y(x0) 0 1x0

1n+ (x0x)

2

Sxx

N(0, 1).

As in previous lectures

(n 2)22

2n2

independently of y(x0) (since y(x0) is a linear combination of b0,b1 both independent of

2).

Week 2 Lecture 3 - Confidence intervals for the meanWe have

y(x0) 0 1x0

1n+ (x0x)

2

Sxx

tn2.

Write t/2,n2 for upper 100 /2 percentage point of tdistribution with n 2 degrees of freedom,

Pr

t/2,n2 y(x0) 0 1x0

1n+ (x0x)

2

Sxx

t/2,n2

= 1

Confidence interval for 0 + 1x0:

y(x0) t/2,n2

1

n+

(x0 x)2Sxx

Week 2 Lecture 2 - Learning Expectations.

Understand the quantities within the analysis of variance table

Be able to use the ANOVA table to answer questionsregarding suitability of a postulated statistical model.

Be able to formulate and evaluate a confidence intervals forthe mean.

Be able to formulate and evaluate a prediction interval

MATH2931 Lecture 6

Documents

Transcript of MATH2931 Lecture 6