Chapter 4: Multiple Regression Analysis Inference -...

16
Chapter 4: Multiple Regression Analysis – Inference Econometrics Michal Houda University of South Bohemia in České Budějovice Department of Applied Mathematics and Informatics Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Transcript of Chapter 4: Multiple Regression Analysis Inference -...

Page 1: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Chapter 4: Multiple Regression Analysis – InferenceEconometrics

Michal Houda

University of South Bohemia in České BudějoviceDepartment of Applied Mathematics and Informatics

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 2: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Sampling Distributions of the OLS EstimatorsClassical Linear Model (CLM) Assumptions

Assumption MLR 6 (Normality)ε ∼ N (0;σ2) and it is independent of x1, . . . , xk .

Justifying normality

result of central limit theorem (not a tie with large sample sizes)under MLR.1–6: OLS estimators are the best (minimum variance) unbiasedestimators (not only best linear)population assumptions summarized as

y |x ∼ N (β0 + β1x1 + . . .+ βkxk ;σ2)

Theorem 1 (Normal Sampling Distribution)

Under MLR.1–6, βj ∼ N (βj , var βj), that is,

βj − βjsdβj

∼ N (0, 1).

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 3: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Testing Hypotheses about a single population parameterOne-Sided and Two-Sided t-Test

Corollary 2 (t Distribution for the standardized estimators)Under MLR.1–6,

βj − βjse βj

∼ tn−k−1.

Statistical packages usually provide t-ratio tβj := βjse βj

automatically ⇒ tests

H0 : βj = 0 against HA : βj 6= 0

(two-sided tests) are straightforward.

One-sided alternatives should be considered in econometrics.

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 4: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Testing Hypotheses about a single population parameterRight-Tailed t-Test

One-sided alternative—right-tailed test:

H0 : βj = 0 against HA : βj > 0

significance level α: probability of rejecting H0 when it is true (the mostpopular choice: α := 0.05 = 5%)critical value: (1− α)-percentile (quantile) of the appropriate distribution(tn−k−1 here)rejection rule: tβj

> tn−k−1(1− α)

As the degrees of freedom (n − k − 1) get larger, the t distribution approaches thestandard normal distribution N (0; 1)Compare: t120(0.95) = 1.658 with u(0.95) = 1.645

R

curve(dnorm(x), xlim=c(-5,5), col="red", lwd=4)curve(dt(x, df=5), col="blue", lwd=2, add=TRUE)curve(dt(x, df=120), col="green", lwd=1, add=TRUE)colors()

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 5: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Testing Hypotheses about a single population parameterRight-Tailed t-Test

Example 3 (Hourly Wage Equation)Data: WAGE1

ln(wage) = 0.284 + 0.092educ + 0.0041exper + 0.022tenure

H0 : βexper = 0 against HA : βexper > 0texper ≈ 0.0041/0.0017 ≈ 2.39 > t522(0.95) = 1.648 (or u(0.95) = 1.645)

p-value = Pr{texper > 2.39} = 120.0171 = 0.0085

H0 rejected at α = 5% (even at 1%) . . . the effect of experience on wagesis statistically significantBut: the estimated return of experience is not large— for example, additional3 years of experience provide only 3× 0.0041 = 1.23% increase of wages

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 6: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Testing Hypotheses about a single population parameterLeft-Tailed t-Test

One-sided alternative—left-tailed test:

H0 : βj = 0 against HA : βj < 0

critical value: α-percentile (quantile) of the appropriate distributionrejection rule: tβj

< tn−k−1(α)

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 7: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Testing Hypotheses about a single population parameterLeft-Tailed t-Test

Example 4 (Student Performance and School Size)Data: MEAP93

math10 = 2.274 + 0.00046totcomp + 0.048staff − 0.00020enroll

math10 . . . percentage of students passing the Michigan Educational Assessment Program(MEAP) standardized 10-grade math testtotcomp . . . average annual teacher compensation (measure of teacher quality)staff . . . number of staff per 1000 students (measure of attention received)enroll . . . student enrollment (measure of school size)

H0 : βenroll = 0 against HA : βenroll < 0tenroll ≈ −0.918 > t404(0.05) = −1.649

p-value = Pr{texper > 2.39} = 120.36 = 0.18

H0 not rejected at 5% (even at 15%); changing the model:

math10 = 2.274 + 0.00046 ln(totcomp) + 0.048 ln(staff )− 0.00020 ln(enroll)tln(enroll) ≈ −1.829 < t404(0.05) = −1.649⇒ H ′0 rejected at 5%!

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 8: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Testing Hypotheses about a single population parameterTwo-sided t-Test

H0 : βj = 0 against HA : βj 6= 0

critical value: 1− α2 -percentile (quantile) of the appropriate distribution

rejection rule:∣∣tβj

∣∣ < tn−k−1(1− α2 )

Example 5 (Determinants of College GPA)Data: GPA1

colGPA = 1.39 + 0.412hsGPA + 0.015ACT − 0.083skipped

skipped . . . average number of lectures missed per week

1 H0 : βhsGPA = 0 against HA : βhsGPA 6= 0: thsGPA = 4.396,p-value ≈ 10−5 ⇒ H0 rejected at any conventional level;

2 H0 : βACT = 0 against HA : βACT 6= 0: tACT = 1.393, t137(0.95) = 1.656,p-value = 0.166 ⇒ H0 not rejected at 10%—also small in practice;

3 H0 : βskipped = 0 against HA : βskipped 6= 0: tskipped = −3.197, t137(0.995) = 2.612,p-value = 0.0017 ⇒ H0 rejected at 1% —but practically of small effect!

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 9: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Testing Hypotheses about a single population parameterTests against Other Alternatives

H0 : βj = aj against HA : βj T aj

Example 6 (Campus Crime and Student Enrollement)Data: CAMPUS (FBI’s Uniform Crime Report for 1992, n = 97)

ln(crime) = β0 + β1 ln(enroll) + ε

ln(crime) = −6.63 + 1.27 ln(enroll)H0 : βexper = 1 against HA : βexper > 1

(crime is of more problem on larger campuses)

tln(enroll) ≈ (1.27− 1)/0.11 ≈ 2.46 > t95(0.95) = 1.66p-value = Pr{texper > 2.46} ≈ 0.0079

H0 rejected (at 1%)warning: this analysis holds no other factor fixed ⇒ elasticity 1.27 is not necessarilya good estimate of ceteris paribus effect.

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 10: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Testing Hypotheses about a single population parameterEconomic (Practical) vs. Statistical Significance

Example 7 (Participation Rates in 401(k) Plans)

prate = 80.29 + 5.44mrate + 0.269age − 0.00013totemp

totemp . . . total number of employees (firm size)

H0 : βtotemp = 0 against HA : βtotemp 6= 0tln(totemp) ≈ −3.25, p-value ≈ 0.001

H0 rejected (even at 0,1%) ⇒ βtotemp statistically significantbut: holding mrate, age fixed, +10,000 employees ⇒ only 1.3 percentagepoint decrease in participation rate— not practically very large (noteconomically significant)

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 11: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Confidence Intervals

95% confidence interval for βj :(βj ± tn−k−1(0.975) · se βj

)interpretation: unknown βj is in (known) (βj ;βj) with 95% probability (for95% samples)we only hope that we have used one of these 95% samplesconnection with H0 : βj = aj against HA : βj 6= aj — H0 rejected at (say) 5%level ⇔ aj 6∈ (βj ;βj)

R confint(model)

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 12: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Testing a Single Linear Combination of ParametersExample 8 (Return to Education)

ln(wage) = β0 + β1jc + β2univ + β3exper + ε

jc . . . # years attending a two-year collegeuniv . . . # years attending a four-year collegeexper . . . # months in the workforce

H0 : β1 = β2 against HA : β1 < β2

cannot simply use individual t statistics as se(β1 − β2) 6= se β1 − se β2

standard error estimated by se(β1 − β2) =√

(se β1)2 + (se β2)2 − 2 cov(β1, β2)(sometimes reported by the software).easier technique: define θ1 := β1 − β2, totcoll := jc + univ , and rewrite the model

ln(wage) = β0 + θ1jc + β2totcoll + β3exper + ε

H0 : θ1 = 0 against HA : θ1 < 0tθ1 ≈ −1.48, p-value ≈ 0.070 (data: TWOYEAR)

⇒ H0 is not rejected at 5% (it is rejected at 10%)—there is some, but not strong,evidence of the campus size on criminal activities.

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 13: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Multiple Regression Analysis – Statistical InferenceTesting Multiple Linear Restrictions: F test

Example 9 (Baseball Players’ Salaries)ln(salary) = β0 + β1years + β2gamesyr + β3bavg + β4hrunsyr + β5rbisyr + ε

salary . . . 1993 total salaryyears . . . # years in the leaguegamesyr . . . average # games played per yearbavg . . . career batting averagehrunsyr . . . # home runs per yearrbisyr . . . runs batted in per year

Estimate Std. Error t value Pr(>|t|)(Intercept) 1.119e+01 2.888e-01 38.752 < 2e-16 ***years 6.886e-02 1.211e-02 5.684 2.79e-08 ***gamesyr 1.255e-02 2.647e-03 4.742 3.09e-06 ***bavg 9.786e-04 1.104e-03 0.887 0.376hrunsyr 1.443e-02 1.606e-02 0.899 0.369rbisyr 1.077e-02 7.175e-03 1.500 0.134

H0 : β3 = β4 = β5 = 0 against HA : nonH0

. . . multiple (joint, three) exclusion restrictions

Overall significance test: H0 : x1 = . . . = xk = 0

H0 often rejected, even if R2 is smalloccasionally, the overall F is the focus of a study (e. g., to test whether some variable ispredictable based on selected factors — cf. efficient markets hypothesis)

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 14: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Multiple Regression Analysis – Statistical InferenceTesting Multiple Linear Restrictions: F test

Example 9 (Baseball Players’ Salaries)ln(salary) = β0 + β1years + β2gamesyr + β3bavg + β4hrunsyr + β5rbisyr + ε

H0 : β3 = β4 = β5 = 0 against HA : nonH0

. . . multiple (joint, three) exclusion restrictionsData: MLB1 (n = 353)Restricted model: ln(salary) = β0 + β1years + β2gamesyr + εTest statistics: F -ratio

F :=(SSRr − SSRur )/qSSRur/(n − k − 1)

∼ Fq,n−k−1

In the example, F ≈ 9.55, p-value ≈ 4.10−6 ⇒ H0 rejected.Note again that all the three t-statistics are insignificant!(Reason: corr(hrusyn, rbisyr) ≈ 0.89).

Overall significance test: H0 : x1 = . . . = xk = 0

H0 often rejected, even if R2 is smalloccasionally, the overall F is the focus of a study (e. g., to test whether some variable ispredictable based on selected factors — cf. efficient markets hypothesis)

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 15: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Multiple Regression Analysis – Statistical InferenceTesting General Linear Restrictions: F test

Example 10

ln(price) = β0 + β1 ln(assess) + β2 ln(lotsize) + β3 ln(sqrtft) + β4 ln(bdrms) + ε

price . . . house priceassess . . . the assessed housing value (before sold)lotsize . . . size of the lot (in feet)sqrft . . . square footagebdrms . . . number of bedrooms

Data: HPRICE1, n = 88

Estimate Std. Error t value Pr(>|t|)(Intercept) 0.263743 0.569665 0.463 0.645log(assess) 1.043065 0.151446 6.887 1.01e-09 ***log(lotsize) 0.007438 0.038561 0.193 0.848log(sqrft) -0.103238 0.138430 -0.746 0.458bdrms 0.033839 0.022098 1.531 0.129

Are the assessed housing prices of a rational valuation?

H0 : β1 = 1, β2 = β3 = β4 = 0

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

Page 16: Chapter 4: Multiple Regression Analysis Inference - Econometricshouda/econometrics/lectures/04-inference.pdf · 2014-11-11 · TestingHypothesesaboutasinglepopulationparameter One-SidedandTwo-Sidedt-Test

Multiple Regression Analysis – Statistical InferenceTesting General Linear Restrictions: F test

Example 10

ln(price) = β0 + β1 ln(assess) + β2 ln(lotsize) + β3 ln(sqrtft) + β4 ln(bdrms) + ε

Are the assessed housing prices of a rational valuation?

H0 : β1 = 1, β2 = β3 = β4 = 0

F ≈ 0.661, p-value≈ 0.62 ⇒ failed to reject H0.There is no evidence against rational valuation.

Michal Houda Chapter 4: Multiple Regression Analysis – Inference