MATH2931 Lecture 6

26
 MATH2831/2931 Linear Models/ Higher Linear Models. August 2, 2013

description

chbcffh

Transcript of MATH2931 Lecture 6

  • MATH2831/2931

    Linear Models/ Higher Linear Models.

    August 2, 2013

  • Week 2 Lecture 3 - Last lecture:

    Review of Hypothesis testing two-sided and one-sided tests.

    Hypothesis testing for 0 and 1.

    Example 1: zinc concentrations in plants.

    Example 2: sales and advertising data.

  • Week 2 Lecture 3 - This lecture:

    The analysis of variance table

    Inference on 1: some further examples on testing andANOVA.

    Confidence intervals for the mean.

    Prediction intervals.

  • Week 2 Lecture 3 - The ANOVA table

    So far we have considered the model with a singleindependent variable to be the true model i.e. presumptionthat y |x is related to x linearly.

    NOTE 1: Prediction of the response will be poor in situationsin which there are several independent variables, eachaffecting response.

    NOTE 2: Prediction of the response will be poor in situationsin which presumption that y |x is related to x linearly is falsein range of variables considered.

    Here we utilize ANOVA to analyze the quality of the estimatedregression line.

  • Week 2 Lecture 3 - The ANOVA table

    If the true unknown model is linear in more than one variablex , ie.

    y |x = 0 + 1x1 + 2x2

    then standard least squares estimate derived so far

    b1 =Sxy

    Sxx

    which is calculated only considering x1, is a biased estimatefor 1.

    The bias will be a function of the additiona coefficient 2

    We need a methodology to assess the quality of ourregression line !

  • Week 2 Lecture 3 - The ANOVA table

    Analysis of the quality of an estimated regression line can behandled by an ANOVA approach.

    ANOVA procedure: considers total variation in dependentvariable as subdivided into meaningful components which arethen observed and treated in a systematic fashion.(recall Week 1 Lecture 3 - partitioning of variance.)

    Fundamental identity:SStotal = SSreg + SSres

    Regression sum of squares (variation explained by the fit)

    SSreg =

    n

    i=1

    (yi y)2.

    Residual sum of squares (variation unexplained by the fit)

    SSres =

    n

    i=1

    (yi yi )2.

  • Week 2 Lecture 3 - The ANOVA table

    Question: Why is testing H0 : 1 = 0 of particular interest?Answer: It helps answer question about whether the predictor isuseful for explaining the response.t-test: statistic for testing H0 : 1 = 0 was

    T =b1

    /Sxx

    .

    This statistic (T) has a t-distribution with n 2 degrees offreedom under H0.

    From our distribution results (Week 2 Lecture 1) we also canderive that under H0

    F = T 2 =b21Sxx

    2

    has an F distribution with 1 and n 2 degrees of freedom.

  • Week 2 Lecture 3 - The ANOVA table

    Theorem: One can express the regression sum of squares SSregas a function of the least squares as follows:

    SSreg = b21Sxx

    With SSreg = b21Sxx the above statistic, under H0,

    F =b21Sxx

    2

    can be written as

    F =SSreg/1

    SSres/(n 2)(ratio of variation explained by the model to scaled residualvariation).

  • Week 2 Lecture 3 - The ANOVA tableProof:

    SSreg =n

    i=1

    (yi y)2

    =n

    i=1

    (b0 + b1xi y)2

    =n

    i=1

    (y b1x + b1xi y)2

    =n

    i=1

    b21(xi x)2

    = b21Sxx .

    So

    F =SSreg/1

    SSres/(n 2)has an F1,n2 distribution under H0 : 1 = 0 as claimed.

  • Week 2 Lecture 3 - The ANOVA table

    With F as test statistic we obtain a test of

    H0 : 1 = 0

    againstH1 : 1 6= 0

    at significance level by using the critical region

    F > F;1,n2.

    Computation of p-value:

    p = Pr(F f |1 = 0)

    where f is the observed value of F (given 1 = 0, F F1,n2).

  • Week 2 Lecture 3 - The ANOVA table

    Source Sum of Degrees Mean FSquares of Square

    freedom

    Regression SSreg 1 MSreg = MSregSSreg/1 /

    2

    Residual SSres n 2 MSres =SSres(n2) =

    2

    Total SStotal n 1

  • Week 2 Lecture 3 - The ANOVA table

    When the null hypothesis is rejected, i.e. computed F-statisticexceeds a critical value f (1, n 2) the conclusion is: there is a significant amount of variation in the responseaccounted for by the postulated model (simple linearregression)

    NOTE: the t-test allows for testing both two-side and one-sidedalternative hypothesis, where as the ANOVA F-test is restricted totesting against the two-sided alternative.

  • Week 2 Lecture 3 - Example 1: market model of stockreturns

    Monthly rate of return on a stock (R) is linearly related to monthlyreturn on the overall stock market (Rm).

    R = 0 + 1Rm +

    Rm is taken to be the monthly rate of return on some major stockmarket indexRECALL:

    Coefficient 1 is called the beta coefficient of the stock

    1 > 1 indicates stocks rate of return is more senstive tooverall market than average

    1 < 1 less sensitive than average

    Estimate 1 and is it significantly different from 1?

  • Week 2 Lecture 3 - Example 1: Market model of stockreturns

    Scatter plot of Host International (y-axis) versus overall marketreturns (x-axis) with fitted regression line.

  • Week 2 Lecture 3 - Example 1: Market model of stockreturns

    Fitted line is R = 0.14 + 1.60Rm = 9.27, Sxx = 1117.90.RECALL: 100(1 ) percentage confidence interval for 1 is(

    b1 t/2;n2Sxx

    , b1 + t/2;n2Sxx

    ).

    95% confidence interval for 1:(1.60 2.002 9.27

    1117.90, 1.60 + 2.002

    9.271117.90

    )= (1.04, 2.16),

    which doesnt contain 1A value of 1 for the slope does not seem plausible based onthe data.

  • Week 2 Lecture 3 - Example 1: Market model of stockreturns

    Hypothesis testing equivalent:

    H0 : 1 = 1

    versusH1 : 1 6= 1

    Test statistic:

    b1 1/Sxx

    =1.60 1

    9.27/1117.90

    = 2.16.

    So if T t58, the p-value for the test isp = Pr(|T | 2.16)

    = 2Pr(T 2.16)= 0.0349

    so that we reject H0 : 1 = 1 at the 5% level

  • Week 2 Lecture 3 - Example 2: Risk assessment fromfinancial reports

    DATA Collection:

    Investors are interested in the riskiness of a stock.

    Want company financial reports to provide information helpfulfor assessing risk.

    Seven accounting determined measures of risk (available froma companys financial reports)

    Divident payout, current ratio, asset size, asset growth,leverage, variability in earnings, covariability in earnings.

    These were computed for 25 well known stocks based onannual reports.

  • Week 2 Lecture 3 - Example 2: risk assessment fromfinancial reports

    ExperimentData sent to a random sample of 500 financial analysts of which209 respondedMean rating assigned by the 209 analysts recorded for each of the25 stocks

    Mean rating by analysts taken as reasonable surrogate ofmarket risk for each stock

    AIM: Want to predict market risk from accounting measures:response is market risk, predictors are accounting measures(multiple).

  • Week 2 Lecture 3 - Example 2: Risk assessment example

    Estimated market risk (y-axis) versus log(asset size) (x-axis).Fitted line is y = 8.143 0.412x and R2 = 0.21

  • Week 2 Lecture 3 - Prediction in the simple linearregression model

    Reason for building a simple linear regression model is often topredict a new response value when the value of the predictor isknown.Example: risk assessment data

    Riskiness of a stock rated by 209 financial analysts (response).Predictors are various accounting determined measures of risk.

    Aim: Simple linear regression model for risk with asset size aspredictor.

    Outcome: For a company not assessed by the financialanalysts we can determine asset size from company reportsand predict risk using the fitted model.

  • Week 2 Lecture 3 - Confidence intervals for the mean andprediction intervals

    Prediction of a new response value when predictor is x0:

    y(x0) = b0 + b1x0.

    True conditional mean of response at x0:

    0 + 1x0

    New response value y0 when predictor is x0: write

    y0 = 0 + 1x0 + 0,

    0 independent of previous responses, normal with mean zero,variance 2.Want to find confidence interval for conditional mean, and intervalwhich covers y0 with specified confidence (prediction interval).

  • Week 2 Lecture 3 - Confidence and Prediction Intervals

    Confidence interval for conditional mean will reflect ouruncertainty due to estimating 0, 1.

    Prediction interval will reflect our uncertainty due toestimating 0, 1, and the level of variation of the responsesabout the conditional mean (captured by our estimate of 2).

    First well define a statistic which can be used for constructing aconfidence interval for the conditional mean at x0.Consider y(x0) = b0 + b1x0.

    y(x0) is a Gaussian random variable.

    E(y(x0)) = E(b0 + b1x0) = 0 + 1x0.

    Var(y(x0)) = 2(1n+ (x0x)

    2

    Sxx

    )

  • Week 2 Lecture 3 - Confidence interval for the mean

    Var(y(x0)) = Var (b0 + b1x0)

    = Var (b0) + x20Var (b1) + 2x0Cov(b0, b1)

    = 2(1

    n+

    x2

    Sxx

    )+ x20

    (2

    Sxx

    ) 2x0

    (2x

    Sxx

    )

    = 2(1

    n+

    (x0 x)2Sxx

    ).

    (1)

  • Week 2 Lecture 3 - Confidence interval for the mean

    y(x0) is normally distributed, so

    y(x0) 0 1x0

    1n+ (x0x)

    2

    Sxx

    N(0, 1).

    As in previous lectures

    (n 2)22

    2n2

    independently of y(x0) (since y(x0) is a linear combination of b0,b1 both independent of

    2).

  • Week 2 Lecture 3 - Confidence intervals for the meanWe have

    y(x0) 0 1x0

    1n+ (x0x)

    2

    Sxx

    tn2.

    Write t/2,n2 for upper 100 /2 percentage point of tdistribution with n 2 degrees of freedom,

    Pr

    t/2,n2 y(x0) 0 1x0

    1n+ (x0x)

    2

    Sxx

    t/2,n2

    = 1

    Confidence interval for 0 + 1x0:

    y(x0) t/2,n2

    1

    n+

    (x0 x)2Sxx

  • Week 2 Lecture 2 - Learning Expectations.

    Understand the quantities within the analysis of variance table

    Be able to use the ANOVA table to answer questionsregarding suitability of a postulated statistical model.

    Be able to formulate and evaluate a confidence intervals forthe mean.

    Be able to formulate and evaluate a prediction interval