PS4 Solutions Exercise 1 i - sungpark · PS4 Solutions Exercise 1 Consider a linear model yi = ...

4
PS4 Solutions Exercise 1 Consider a linear model y i = β 0 + β 1 x i + ϵ i where the error terms (ϵ i ) are indepen- dently and normally distributed with mean zero. You observe the following pairs of observations, x 7 4 6 2 1 1 3 y 2 4 2 5 7 6 5 1. Find the least square estimates b β 0 and b β 1 . 2. Compute the sum of squared errors (SSE) of the least squares fitted line. 3. Calculate the R 2 for the least squares fitted line. 4. Find the 95% confidence interval of β 1 . 5. Calculate the F-test statistic of the mode. Use a significance level α =0.05,decide whether you reject the null hypothesis H 0 : β 1 = 0 based on the F-test statistic. Answer 1. b β 1 = (x i -¯ x)(y i -¯ y) (x i -¯ x) 2 = 0, 7797 b β 0 y ˆ β 1 ¯ x =7.1017 2. SSE = (Y i ˆ Y i ) 2 =1.22 3. r 2 = SSR SST =0.9438 4. The 95% C.I. for β 1 is (0.998 0.561). 5.F = SSR/1 SSE//(7-2) = 0.9438 (1-0.9438)/5 =3.359 F 0.05(1,5) =6.61 >F Therefore we fail to reject the null hypothesis. Exercise 2 Consider a linear model y i = β 0 + β 1 x i + ϵ i where the error terms (ϵ i ) are indepen- dently and normally distributed with mean zero. Let t denote the T-test statistics of H 0 : β 1 = 0, let F denotes the F-test statistic of the same H 0 : β 1 =0, and let r denotes the coefficient of correlation between x and y. Show that t = r n-2 1-r 2 . Proof. t = b β 1 s b β 1 = cov(x, y)/SS xx SSE n-2 = SS xy n 2 SS xx SSE = SS xy n 2 SS xx SS yy (1 r 2 ) = SS xy SS xx SS yy · n 2 1 r 2 = r n 2 1 r 2 Please notice that r = SS xy SS xx SS yy r 2 = SS yy SSE SS yy SSE = SS yy (1 r 2 ) 1

Transcript of PS4 Solutions Exercise 1 i - sungpark · PS4 Solutions Exercise 1 Consider a linear model yi = ...

Page 1: PS4 Solutions Exercise 1 i - sungpark · PS4 Solutions Exercise 1 Consider a linear model yi = ...

PS4 Solutions

Exercise 1 Consider a linear model yi = β0 + β1xi + ϵi where the error terms (ϵi) are indepen-

dently and normally distributed with mean zero. You observe the following pairs of observations,

x 7 4 6 2 1 1 3

y 2 4 2 5 7 6 5

1. Find the least square estimates β0 and β1.

2. Compute the sum of squared errors (SSE) of the least squares fitted line.

3. Calculate the R2 for the least squares fitted line.

4. Find the 95% confidence interval of β1.

5. Calculate the F-test statistic of the mode. Use a significance level α = 0.05,decide whether

you reject the null hypothesis H0 : β1 = 0 based on the F-test statistic.

Answer

1. β1 =∑

(xi−x)(yi−y)∑(xi−x)2 = −0, 7797

β0 = y − β1x = 7.1017

2. SSE =∑

(Yi − Yi)2 = 1.22

3. r2 = SSRSST = 0.9438

4. The 95% C.I. for β1 is (−0.998− 0.561).

5.F = SSR/1SSE//(7−2) =

0.9438(1−0.9438)/5 = 3.359

F0.05(1,5) = 6.61 > F

Therefore we fail to reject the null hypothesis.

Exercise 2 Consider a linear model yi = β0 + β1xi + ϵi where the error terms (ϵi) are indepen-

dently and normally distributed with mean zero. Let t denote the T-test statistics of H0 : β1 = 0,

let F denotes the F-test statistic of the same H0 : β1 = 0, and let r denotes the coefficient of

correlation between x and y. Show that t = r√n−2√1−r2

.

Proof.

t =β1

sβ1

=cov(x, y)/SSxx√

SSEn−2

=SSxy

√n− 2√

SSxxSSE=

SSxy

√n− 2√

SSxxSSyy(1− r2)=

SSxy√SSxxSSyy

·√n− 2√1− r2

=r√n− 2√1− r2

Please notice that

r =SSxy√

SSxx

√SSyy

⇒ r2 =SSyy − SSE

SSyy⇒ SSE = SSyy(1− r2)

1

Page 2: PS4 Solutions Exercise 1 i - sungpark · PS4 Solutions Exercise 1 Consider a linear model yi = ...

Exercise 3 In a given population of two earner male/female household, male earnings (x1) have

a mean of $45,000 per year and a standard deviation of $12,000. Female earnings (x2) have a

mean of $40,000 per year and a standard deviation of $15,000. The correlation between male

and female earnings within the same household is 0.60.

1. Consider a linear model that regresses the male earnings (x1) on the female earnings (x2)

within the same household for which all least squares assumptions hold. Find the least

squares estimates of the intercept and the slope , respectively.

2. Consider a linear model that regresses the female earnings (x2) on the male earnings (x1)

within the same household for which all least squares assumptions hold. Find the least

squares estimates of the intercept and the slope , respectively.

Answer:

(1)

The regression model is x1 = β0 + β1x2

σ2(x1) =

∑(x1i − x1)

2

n= 12000

σ2(x2) =

∑(x2i − x2)

2

n= 15000

corr(x1, x2) =

∑(x1i − x1)(x2i − x2)

n ∗ σ(x1)σ(x2)= 0.6

Therefore,∑

(x1i − x1)(x2i − x2) = 0.6 ∗ n√12000 ∗ 15000 = 8049.845n

β1 =

∑(x1i − x1)(x2i − x2)∑

(x2i − x2)2=

8049.845

15000= 0.537

β0 = x1 − β1 ∗ x2 = 45000− 0.537 ∗ 40000 = 23533.75

(2)

The regression model is x2 = β0 + β1x1

β1 =

∑(x1i − x1)(x2i − x2)∑

(x1i − x1)2=

8049.845

12000= 0.671

β0 = x2 − β1 ∗ x1 = 40000− 0.671 ∗ 45000 = 18167.18

Exercise 4 The following multiple regression model was fitted to a sample of 30 families in

order to explain household milk consumption:

yi = β0 + β1x1i + β2x2i + εi

where yi = Milk consumption (in quarts per week);

x1i = Income (in thousands of HK dollars per week);

2

Page 3: PS4 Solutions Exercise 1 i - sungpark · PS4 Solutions Exercise 1 Consider a linear model yi = ...

x2i = Family size

The least squares estimates of the model are:

E(yi) = 0.485 + 0.187x1i + 0.705x2i

(2.173) (0.115) (0.600)

R2 = 0.216

where standard errors are in parentheses.

1. At α = 0.05, test the null hypothesis H0: β1 = 0 against the alternative hypothesis Ha:

β1 = 0.

2. At α = 0.05, test the null hypothesis H0: β1 = β2 = 0 against the alternative hypothesis

Ha: at least one of β1 and β2 is nonzero.

3. Give a statistical interpretation of your results in (1) and (2).

Answer:

(1) t = 0.187−00.115 = 1.626

t0.025,27 = 2.052 > t

Therefore, we fail to reject the null hypothesis

(2) F = SSR/2SSE/(30−3) =

0.216(1−0.216) ∗

272 = 3.719

F0.05(2,27) = 3.35 < F

Therefore we reject the null thypothesis

(3) For the part (1), if the popluation parameter β1 = 0,Pr(|t|) > 2.052) = 5%

For the part (2), if the population parameter β1 = β2 = 0,Pr(F > 3.35) = 5%

Exercise 5 R2 and Model Fit.

Because the coefficient of determination R2 always increases when a new independent variable

is added to a model, it is tempting to include many variables in the model in order to increase

R2. However, doing so reduces the number of degrees of freedom available for estimating, which

adversely affects our ability to make reliable inferences. Suppose you want to use 18 psychological

and sociological factors to predict a student’s Scholastic Assessment Test (SAT) score. You fit

the modelwhere y = SAT score and are the psychological and sociological factors. Only 20

observations (n = 20) are used to fit the model, and you obtain R2 = 0.95.

1. Test to see whether this impressive-looking is large enough for you to infer that the model

is useful- that is, that at least one term in the model is important in predicting SAT scores.

Use α = 0.05.

3

Page 4: PS4 Solutions Exercise 1 i - sungpark · PS4 Solutions Exercise 1 Consider a linear model yi = ...

2. Calculate the adjusted R2 of this model.

Answer:

(1) Ho : β1 = β2 = ... = β18 = 0;H1 : at least one of the parametre is not equal to zero

F = R2/(18)(1−R2)/(20−19) =

0.95/18(1−0.95)/1 = 1.056

F0.05(18,1) = 247.323 > F

Therefore, we fail to reject the null hypothesis.

(2)R2a = 1− (1−R2) 20−1

20−19 = 1− (1− 0.95) ∗ 19 = 0.05

4