4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or...

32
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: ) ˆ ( * ˆ j j se t CI -Given a significance level α (which is used to determine t*), we construct 100(1- α)% confidence intervals -Given random samples, 100(1- α)% of our confidence intervals contain the true value B j -we don’t know whether an individual confidence interval contains the true value

Transcript of 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or...

Page 1: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.3 Confidence Intervals-Using our CLM assumptions, we can construct

CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form:

)ˆ(*ˆjj setCI

-Given a significance level α (which is used to determine t*), we construct 100(1- α)% confidence intervals

-Given random samples, 100(1- α)% of our confidence intervals contain the true value Bj

-we don’t know whether an individual confidence interval contains the true value

Page 2: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.3 Confidence Intervals-Confidence intervals are similar to 2-tailed tests in that α/2 is in each tail when finding t*

-if our hypothesis test and confidence interval use the same α:

1) we can not reject the null hypothesis (at the given significance level) that Bj=aj if aj is within the confidence interval

2) we can reject the null hypothesis (at the given significance level) that Bj=aj if aj is not within the confidence interval

Page 3: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.3 Confidence Example-Going back to our Pepsi example, we now

look at geekiness:

43N 62.0

5.03.03.4ˆ

2

21.025.01.2

R

PepsiGeekoloC

-From before our 2-sided t* with α=0.01 was t*=2.704, therefore our 99% CI is:

]976.0,376.0[

)25.0(704.23.0

)ˆ(*ˆ

CI

CI

setCI jj

Page 4: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.3 Confidence Intervals-Remember that a CI is only as good as the 6 CLM assumptions:

1) Omitted variables cause the estimates (Bjhats) to be unreliable

-CI is not valid2) If heteroskedasticity is present, standard error is not a valid estimate of standard deviation

-CI is not valid3) If normality fails, CI MAY not be valid if our sample size is too small

Page 5: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.4 Complicated Single Tests-In this section we will see how to test a single

hypothesis involving more than one Bj

-Take again our coolness regression:

43N 62.0

5.03.03.4ˆ

2

21.025.01.2

R

PepsiGeekoloC

-If we wonder if geekiness has more impact on coolness than Pepsi consumption:

21

210

:

:

aH

H

Page 6: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.4 Complicated Single Tests-This test is similar to our one coefficient

tests, but our standard error will be different-We can rewrite our hypotheses for clarity:

0:

0:

21

210

aH

H

-We can reject the null hypothesis if the estimated difference between B1hat and B2hat is positive enough

Page 7: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.4 Complicated Single Tests-Our new t statistic becomes:

)ˆˆ(

ˆˆ

21

21

se

t

-And our test continues as before:1) Calculate t2) Pick α and calculate t*3) Reject if t<t*

Page 8: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.4 Complicated Standard Errors-The standard error in this test is more complicated than before

-If we simply subtract standard errors, we may end up with a negative value

-this is theoretically impossible-se must always be positive since

it estimates standard deviations

Page 9: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.4 Complicated Standard Errors-Using the properties of variances, we know

that: )ˆ,ˆ(2)ˆ()ˆ()ˆˆ( 212121 CovVarVarVar -Where the variances are always added and the covariance always subtracted-transferring to standard deviation, this becomes:

122

22

121 2)}ˆ({)}ˆ({)ˆˆ( ssesese -Where s12 is an estimate of the covariance between coefficients

-s12 can either be calculated using matrix algebra or be supplied by econometrics programs

Page 10: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.4 Complicated Standard Errors-To see how to find this standard error,

take our typical regression:uxxxy 3322110

-and consider the related equation where θ=B1-B2 or B1= θ+B2:

uxxxxy

uxxxy

3321210

3322120

)(

)(

-where x1 and x1 could be related concepts (ie: sleep time and naps) and x3 could be relatively unrelated (ie: study time)

Page 11: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.4 Complicated Standard Errors-By running this new regression, we can find the standard error for our hypothesis test

-using an econometric program is easier

-Empirically:

1) B0 and se(B0) are the same for both regressions

2) B2 and B3 are the same for both regressions

3) Only B1 (the coefficient of θ) changes

-given this new standard error, CI’s are created as normal

Page 12: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Testing Multiple Restrictions-Thus far we have tested whether a SINGLE variable is significant, or how two different variable’s impacts compare

-In this section we will test whether a SET of variables are significant; have a partial effect on the dependent variable

-Even though a group of variables may be individually insignificant, they may be significant as a group due to multicollinearity

Page 13: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Testing Multiple Restrictions-Consider our general true model and an

example measuring reading week utility (rwu):

utripsskirwu

uxxxy

homework3210

3322110

-we want to test the hypothesis that B1 and B2 equal zero at the same time, that x1 and x1 have no partial effect simultaneously:

0,0: 210 H-in our example, we are testing that positive activities have no effect

on r.w. utility

Page 14: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Testing Multiple Restrictions-our null hypothesis had two EXCLUSION RESTRICTIONS

-this set of MULTIPLE RESTRICTIONS is tested using a MULTIPLE HYPOTHESIS TEST or JOINT HYPOTHESIS TEST

-the alternate hypothesis is unique:

not true is : 0HH a-note that we CANNOT use individual t tests to test this multiple

restriction; we need to test the restriction jointly

Page 15: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Testing Multiple Restrictions-to test joint significance, we need to use SSR and R squared values obtained from two different

regressions-we know that SSR increases and R2 decreases when variable are dropped from the model-in order to conduct our test, we need to regress two models:1) An UNRESTRICTED model with all of the variables2) A RESTRICTED MODEL that excludes the variables in the test

Page 16: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Testing Multiple Restrictions-Given a hypothesis test with q restrictions,

we have the following regressions:

(4.36) ...

(4.34) ...

(4.35) 0,.....,0:

22110

22110

10

uxxxy

uxxxy

H

qkqk

kk

kqk

-Where 4.34 is the UNRESTIRCTED MODEL giving us SSRu and 4.35 is the RESTRICTED MODEL giving us SSRr

Page 17: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Testing Multiple Restrictions-These SSR values combine to give us our F

STATISTIC or TEST F STATISTIC:

(4.37) )1/(

/)(

knSSR

qSSRSSRF

ur

urr

-Where q is the number of restrictions in the null hypothesis and q=numerator degrees of freedom-n-k-1=denominator degrees of freedom (the denominator is the unbiased estimator of σ2)-since SSRr≥SSRur, F is always positive

Page 18: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Testing Multiple Restrictions-Once can think of our test F stat as measuring the relative increase in SSR from moving from

the unrestricted model to restricted-a large F indicates that the excluded variables have much explanatory power -using Ho and our CLM assumptions, we know that F has an F distribution with q, n-k-1 degrees of

freedom: F~Fq, n-k-1

-we obtain F* from F tables and reject Ho if:

*FF

Page 19: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Multiple Example-Given our previous example of reading

week utility, a restricted and unrestricted model give us:

175SSR 572N

homework5.00.30.29.15ˆ)12.0()3.1()9.0()3.4(

tripsskiuwr

141SSR 572N

homework6.06.17ˆ)17.0()3.6(

uwr

not true is :

0,0:

0

320

HH

H

a

-Which correspond to the hypotheses:

Page 20: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Multiple Example-We use these SSR to construct a test statistic:

6.68)13572/(141

141)/2-(175F

)1/(

/)(

knSSR

qSSRSSRF

ur

urr

-given α=0.05, F*2,569=3.00

-since F>F*, reject H0 at a 95% confidence level; positive activities have an impact on reading week utility

Page 21: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Multiple Notes-Once the degrees of freedom in F’s denominator reach about 120, the F distribution is no longer sensitive to

it-hence the infinity entry in the F table

-if H0 is rejected, the variables in question are JOINTLY (STATISTICALLY) SIGNIFICANT at the given alpha level

-if H0 is not rejected the variables in question are JOINTLY INSIGNIFICANT at the alpha level

-an F test can often be not rejected when individual t tests are rejected due to multicollinearity

Page 22: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 F, t’s secret identity?-the F statistic can also be used to test significance of a single variable

-in this case, q=1

-it can be shown that F=t2 in this case

-or t2n-k-1 ~F1, n-k-1

-this only applies to two-sided tests

-therefore t statistic is more flexible since it allows for one-sided tests

-the t statistic is always best suited for testing a single hypothesis

Page 23: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 F tests and abuse-we have already seen where individually insignificant variables may be jointly significant due to

multicollinearity

-a significant variable can also prove to be jointly insignificant if grouped with enough insignificant variables

-an insignificant variable can also prove to be significant if grouped with significant variables

-therefore t tests are much better than F tests at determining individual significance

Page 24: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 R2 and F-While SSR can be large, R2 is bounded, often making it an easier way to calculate F:

(4.41) )1/()1(

/)(2

22

knR

qRRF

ur

rur

-Which is also called the R-SQUARED FORM OF THE F STATISTIC-since R2

ur>R2r, F is still always positive

-this form is NOT valid for testing all linear restrictions (as seen later)

Page 25: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 F and p-values-similar to t-tests, F tests can produce p-values which are defined as:

(4.43) F)P(F*value- p-the p-value is the “probability of observing a value of F at least as large as we did, given that the null hypothesis is true”

-a small p-value is therefore evidence against H0

-as before, reject H0 if p>α

-p-values can give us a more complete view of significance

Page 26: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Overall significance-Often it is valid to test if the model is significant overall-the hypothesis that NONE of the explanatory variables have an effect on y is given as:

(4.44) 0.... : k210 H-as before with multiple restrictions, we compare against the restricted model:

(4.45)u 0 y

Page 27: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Overall significance-Since our restricted model has no independent variables, its R2 is zero and our F formula simplifies to:

(4.46) )1/()1(

/2

2

knR

kRF

-Which is only valid for this special test-this test determines the OVERALL SIGNIFICANCE OF THE REGRESSION-if this tests fails, we need to find other explanatory variables

Page 28: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Testing General Linear Restrictions-Sometimes economic theory (generally

using elasticity) requires us to test complicated joint restrictions, such as:

2,1,0: 3210 H

-Which expects our model:

uxxxy 3322110 -To be of the form:

uxxxy 3210 210

Page 29: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.5 Testing General Linear Restrictions-We rewrite this expected model to obtain

a restricted model:

uxxy 032 21 -We then calculate the F statistic using the SSR formula-note that since the dependent variable changes between the two

models, the R2 F formula is not valid in this case-note that the number of restrictions (q) is simply equal to the number of

= in the null hypothesis

Page 30: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.6 Reporting Regression Results-When reporting single regressions, the

proper reporting method is:

431N 41.0

)ln(4.1)ln(2.07.3)ˆln(

2

)78.0()15.0()9.0(

R

SkillTimesteaT iii

-where R2, estimated coefficients, and N MUST be reported (note also the ^ and i’s)-either standard errors or t-values must also be reported (se is more robust for tests other than Bk=0)

-SSR and standard error of the regression can also be reported

Page 31: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.6 Reporting Regression Results-When multiple, related regressions are run

(often to test for joint significance), the results can be expressed in table format, as seen on the next slide-whether a simple or table reporting method is done, the meanings and scaling of all the included variables must always be explained in a proper projectIe: price: average price, measured weekly, in American dollarsCollege: Dummy Variable. 0 if no college education, 1 if college education

Page 32: 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.

4.6 Reporting Regression Results

Dependent variable: Midterm readiness

Ind. variables 1 2

Study Time 0.47(0.12)

-

Intellect 1.89(1.7)

2.36(1.4)

Intercept 2.5(0.03)

2.8(0.02)

ObservationsR2

330.48

330.34