Universidad Carlos III de Madrid - UC3M - … Carlos III de Madrid Econometrics Simple Linear...

3
Universidad Carlos III de Madrid Econometrics Simple Linear Regression II Problem Set 3 1. Consider the regression model Y i = βX i + U i , where U i and X i satisfy the classical assumptions. Let ¯ β = ¯ Y/ ¯ X an estimator of β, where ¯ Y and ¯ X are the sample mean of Y and X. (a) Show that ¯ β is a linear estimator (linear combination of {Y i } n i=1 ) (b) Show that ¯ β is conditionally unbiased. 2. Let X i a binary variable and consider the regression Y i = β 0 + β 1 X i + U i . Let ¯ Y 0 the sample mean of the observations with X i = 0 and ¯ Y 1 the sample mean of the observations with X i =1. Prove that ¯ β 0 = ¯ Y 0 , ¯ β 0 + ¯ β 1 = ¯ Y 1 y ¯ β 1 = ¯ Y 1 - ¯ Y 0 . 3. Consider two attributes Y and X of a population, where Y are earnings and X years of schooling. We have two independent samples realized on the subpopulation of men {Y mi ,X mi } nm i=1 and on the subpopulation of women {Y wi ,X wi } nw i=1 .The regression model for men is Y mi = β m0 + β m1 X mi + U mi and for women is Y wi = β m0 + β m1 X wi + U wi . Let ˆ β m1 and ˆ β w1 the OLS estimators of β m1 and β w1 obtained from the respective samples, and let EE ˆ β m1 and EE ˆ β w1 the corresponding standard errors. Show that the standard error of ˆ β m1 - ˆ β w1 is given by EE ˆ β m1 - ˆ β w1 = r EE ˆ β m1 2 + EE ˆ β w1 2 . 4. Suppose that a researcher, with data about the class size (TC) and the average test grade (Calificaci ´ onExamen) for 100 clases of third grade, estimates the following OLS regression d Calificaci ´ onExamen = 520, 4 (20,4) - 5, 82 2,21 × TC, R 2 =0, 08, EER =1, 5 (a) Build a 95% confidence interval for β 1 , the regression slope coefficient. (b) Obtain the p-value for the 2-sided test of the null hypothesis H 0 : β 1 =0. Would you reject the null hypothesis at the 5% level? And at the 1% level? (c) Obtain the p-value for the 2-sided test for the null hypothesis H 0 : β 1 = -5, 6. Without doing further calculations, determine if -5, 6 is contained in a 5% confidence interval for β 1 . (d) Build a 99% confidence interval for β 0 . 5. Suppose that a researcher, with wage data for 250 male workers and 280 female workers, randomly selected, fits the OLS regression d Wage = 12, 52 (0,23) +2, 12 0,36 × Male, R 2 =0, 06,EER =4, 2 where Wage is the hourly wage in dollars and Male is a binary variable equal to 1 if the worker is a man and 0 if the worker is a woman. Define the gender wage gap as the difference in average wage income between men and women. (a) Which is the estimated gender wage gap? (b) Is the estimated gender wage gap statistically significant different from zero? (Obtain the p-value for the test of the null hypothesis that there is no gender gap). (c) Build a 95% confidence interval for the gender gap. (d) In the sample, which is the average wage of women? And for men? (e) Other researcher use the same data but regress the variable Wage on the variable F emale, which is equal to 1 if the worker is a woman and 0 if is a man. Which are the regression estimates for these variables? d Wage = + × Female, R 2 = ,EER = . 1

Transcript of Universidad Carlos III de Madrid - UC3M - … Carlos III de Madrid Econometrics Simple Linear...

Page 1: Universidad Carlos III de Madrid - UC3M - … Carlos III de Madrid Econometrics Simple Linear Regression II Problem Set 3 1. Consider the regression model Y i = X i +U i;where U i

Universidad Carlos III de MadridEconometricsSimple Linear Regression II

Problem Set 3

1. Consider the regression model Yi = βXi +Ui, where Ui and Xi satisfy the classical assumptions. Let β = Y /Xan estimator of β, where Y and X are the sample mean of Y and X.

(a) Show that β is a linear estimator (linear combination of {Yi}ni=1)

(b) Show that β is conditionally unbiased.

2. Let Xi a binary variable and consider the regression Yi = β0 + β1Xi + Ui. Let Y0 the sample mean of theobservations with Xi = 0 and Y1 the sample mean of the observations with Xi = 1. Prove that β0 = Y0,β0 + β1 = Y1 y β1 = Y1 − Y0.

3. Consider two attributes Y and X of a population, where Y are earnings and X years of schooling. Wehave two independent samples realized on the subpopulation of men {Ymi, Xmi}nm

i=1 and on the subpopulationof women {Ywi, Xwi}nw

i=1 .The regression model for men is Ymi = βm0 + βm1Xmi + Umi and for women is

Ywi = βm0 +βm1Xwi +Uwi. Let βm1 and βw1 the OLS estimators of βm1 and βw1 obtained from the respective

samples, and let EE(βm1

)and EE

(βw1

)the corresponding standard errors. Show that the standard error

of βm1 − βw1 is given by EE(βm1 − βw1

)=

√EE

(βm1

)2+ EE

(βw1

)2.

4. Suppose that a researcher, with data about the class size (TC) and the average test grade (CalificacionExamen)for 100 clases of third grade, estimates the following OLS regression

CalificacionExamen = 520, 4(20,4)

− 5, 822,21× TC, R2 = 0, 08, EER = 1, 5

(a) Build a 95% confidence interval for β1, the regression slope coefficient.

(b) Obtain the p-value for the 2-sided test of the null hypothesis H0 : β1 = 0. Would you reject the nullhypothesis at the 5% level? And at the 1% level?

(c) Obtain the p-value for the 2-sided test for the null hypothesis H0 : β1 = −5, 6. Without doing furthercalculations, determine if −5, 6 is contained in a 5% confidence interval for β1.

(d) Build a 99% confidence interval for β0.

5. Suppose that a researcher, with wage data for 250 male workers and 280 female workers, randomly selected,fits the OLS regression

Wage = 12, 52(0,23)

+ 2, 120,36× Male, R2 = 0, 06, EER = 4, 2

where Wage is the hourly wage in dollars and Male is a binary variable equal to 1 if the worker is a man and0 if the worker is a woman. Define the gender wage gap as the difference in average wage income between menand women.

(a) Which is the estimated gender wage gap?

(b) Is the estimated gender wage gap statistically significant different from zero? (Obtain the p-value for thetest of the null hypothesis that there is no gender gap).

(c) Build a 95% confidence interval for the gender gap.

(d) In the sample, which is the average wage of women? And for men?

(e) Other researcher use the same data but regress the variable Wage on the variable Female, which is equalto 1 if the worker is a woman and 0 if is a man. Which are the regression estimates for these variables?

Wage = + × Female, R2 = , EER = .

1

Page 2: Universidad Carlos III de Madrid - UC3M - … Carlos III de Madrid Econometrics Simple Linear Regression II Problem Set 3 1. Consider the regression model Y i = X i +U i;where U i

6. With a sample of 2989 full time workers, between 29 and 30 years old, with education between 6 and 18 years,in the US in 2008, the following linear regression was fitted

Income = −5, 38(1,05)

+ 1, 76(0,8)

× YearsEducation, R2 = 0, 159, EER = 9, 5

where Income is the average hour wage and YearsEducation are the years of completed education.

(a) A randomly chosen 30 years old worker has 16 years of education. Which is the expected average wageincome for this worker?

(b) A high school graduate (12 years of education) is considering attending a college to graduate (2 years).How much is expected that the average hour wage increases for this worker?

(c) A counselor in a high school tells a student that, on average, the university graduates gain 10$ moreper hour that the high school graduates. Is this statement consistent with the evidence provided by theregression? Which range of values is consistent with the regression evidence?

7. Suppose that (Xi, Yi) satisfy the OLS classic assumptions (error zero conditional mean, random iid samplingand no outliers). A random sample is drawn of n = 250 size which provides the following results

Y = 5, 4(3,1)

− 3, 2(1,5)

X, R2 = 0, 26, EER = 6, 2

(a) Test H0 : β1 = 0 vs. H1 : β1 6= 0 at the 5% level.

(b) Build a 95% confidence interval for β1.

(c) Assume that Yi and Xi are independent. Would it surprise you? Explain.

(d) Assume that Yi and Xi are independent and many random samples of size n = 250 are drawn, regressionsare estimated and (a) and (b) are answered. In which proportion of the samples H0 from (a) would berejected? In which proportion of the samples the confidence interval from (b) would contain the valueβ1 = 0 ?

8. With the database CollegeDistance from Stock and Watson (2012) webpage

http : //wps.aw.com/aw stock ie 3/178/45691/11696965.cw/index.html

containing data from a random sample of last year high school students interviewed in 1980 and re-interviewedin 1986, fit a linea regression model of the years on education (ED) on the distance to the closest university(Dist) and answer the following questions.

(a) Is the estimated coefficient for the regression slope statistically significant? I.e. can you reject the nullhypothesis H0 : β1 = 0 against the two sided alternative at the 10%, 5%, 1% level? Which is the p-valueassociated to the t-statistic of the coefficient?

(b) Build a 95% confidence interval for the slope coefficient.

(c) Fit the regression using only data for women and repeat (b) .

(d) Fit the regression using only data for men and repeat (b) .

(e) Is the effect over the completed years of education of the distance different from men and for women?[Hint: use Problem 3].

SOLUTIONS:

4. (a) 5, 82± 1.96 ∗ 2.21

1. (b) p-v=Pr(|Z| > |t|) = 2Pr(Z > | − 2.63|) = 2 (1− 0.9957) = 0.0086, where t = −5.82/2.21 = −2.63 and Z isa standard normal.

(c) p-v=Pr(|Z| > |t|) = 2Pr(Z > | − 0.1|) = 2 (1− 0.5398) = 0.9204, donde t = (−5.82− (−5.6)) /2.21 = −0, 1and Z is a standard normal. As the p-value is larger than 5%, the hypothesis H0 can not be rejected at the5% level and the 95% confidence interval will contain the value −5− 6.

2

Page 3: Universidad Carlos III de Madrid - UC3M - … Carlos III de Madrid Econometrics Simple Linear Regression II Problem Set 3 1. Consider the regression model Y i = X i +U i;where U i

5. (a) 2.12.

(b) Yes, H0 : β1 = 0, t = 2.12/0.36 = 5.889, p-v=Pr(|Z| > |t|) = 2Pr(Z > 5.889) ≈ 0.

(c) 2.12± 1.96 ∗ 0.36.

(d) Women: 12.52. Men: 14.64.

(e)

Wage = 14.64− 2.12× Female, R2 = 0.06, EER = 4.2

6. (c) No.

7. (c) Yes!

(d) 5% and 95%.

8. (a)ED = 13.96−

(0.04)0.073(0.013)

Dist

The t-statistic is -0.073/0.013 = -5.46, which has a p-value of 0.000, so the null hypothesis can be rejected atthe 1% level (and thus, also at the 10% and 5% levels).

(b) The 95% confidence interval is −0.073± 1.96× 0.013 or -0.100 to -0.047.

(c)ED = 13.94−

(0.05)0.064(0.018)

Dist

The 95% confidence interval is 0.064± 1.96× 0.018 or -0.100 to -0.028.

3