Chapter 5 Inference in the Simple Regression Model ...web.thu.edu.tw/wichuang/www/Financial...

1

Undergraduate Econometrics, 2nd Edition –Chapter 5

Chapter 5

Inference in the Simple Regression Model: Interval Estimation, Hypothesis Testing,

and Prediction

Assumptions of the Simple Linear Regression Model

SR1. yt = β1 + β2xt + et

SR2. E(et) = 0 ⇔ E[yt] = β1 + β2xt

SR3. var(et) = σ2 = var(yt)

SR4. cov(ei, ej) = cov(yi, yj) = 0

SR5. xt is not random and takes at least two different values

SR6. et ~ N(0, σ2) ⇔ yt ~ N[(β1 + β2xt), σ2] (optional)

2


If all the above-mentioned assumptions are correct, then the least squares estimators b1

and b2 are normally distributed random variables and have, from Chapter 4.4, normal

distributions with means and variances as follows:

2 2

1 1 2

2

2 2 2

~ ,( )

~ ,( )

t

t

t

xb N

T x x

b Nx x

σβ −

σβ −

∑∑

∑

From Chapter 4.5 we know that the unbiased estimator of the error variance is as follows:

2

2 ˆˆ

2te

Tσ =

−∑

3


By replacing the unknown parameter σ2 with this estimator we can estimate the variances

of the least squares estimators and their covariance.

In Chapter 4 you learned how to calculate point estimates of the regression

parameters β1 and β2 using the best, linear unbiased estimation procedure. The estimates

represent an inference about the regression function E(y) = β1 + β2x of the population

from which the sample data was drawn.

In this chapter we introduce the additional tools of statistical inference: interval

estimation, prediction, interval prediction, and hypothesis testing. A prediction is a

forecast of a future value of the dependent variable y, for creating ranges of values,

sometimes called confidence intervals, in which the unknown parameters, or the value

of y, are likely to be located. Hypothesis testing procedures are a means of comparing

conjecture that we as economists might have about the regression parameters to the

information about the parameters contained in a sample of data. Hypothesis tests allow

4


us to say that the data are compatible, or are not compatible, with a particular conjecture,

or hypothesis.

The procedures for interval estimation, prediction, and hypothesis testing, depend

heavily on assumption SR6 of the simple linear regression model, and the resulting

normality of the least squares estimators. If assumption SR6 is not made, then the sample

size must be sufficiently large so that the least squares estimator’s distributions are

approximately normal, in which case the procedures we develop in this chapter are also

approximate. In developing the procedures in this chapter we will be using the normal

distribution, and distributions related to the normal, namely “Student’s” t-distribution and

the chi-square distribution.

5


5.1 Interval Estimation

5.1.1 The Theory

A standard normal random variable that we will use to construct an interval estimator is

based on the normal distribution of the least squares estimator. Consider, for example,

the normal distribution of b2 the least squares estimator of β2, which we denote as

2

2 2 2~ ,( )t

b Nx x

σβ − ∑

A standardized normal random variable is obtained from b2 by subtracting its mean and

dividing by its standard deviation:

6


2 2

2

~ (0,1)var( )

bZ Nb

− β= (5.1.1)

That is, the standardized random variable Z is normally distributed with mean 0 and

variance 1.

5.5.1a The Chi-Square Distribution

• Chi-square random variables arise when standard normal, N(0,1), random variables are

squared.

If Z1, Z2, ..., Zm denote m independent N(0,1) random variables, then 2 2 2 2

1 2 ( )~m mV Z Z Z= + + + χK (5.1.2)

7


The notation 2( )~ mV χ is read as: the random variable V has a chi-square distribution

with m degrees of freedom. The degrees of freedom parameter m indicates the

number of independent N(0,1) random variables that are squared and summed to form

V.

• The value of m determines the entire shape of the chi-square distribution, and its mean

and variance

2( )

2( )

[ ]

var[ ] var 2

m

m

E V E m

V m

= χ =

= χ =

(5.1.3)

In Figure 5.1, graphs of the chi-square distribution for various degrees of freedom, m,

are presented.

8


• Since V is formed be squaring and summing m standardized normal [N(0,1)] random

variables, the value of V must be nonnegative, v ≥ 0.

• The distribution has a long tail, or is skewed, to the right.

• As the degrees of freedom m gets larger, the distribution becomes more symmetric and

“bell-shaped.”

• As m gets large, the chi-square distribution converges to, and essentially becomes, a

normal distribution.

5.5.1b The Probability Distribution of 2σ

• If SR6 holds, then the random error term et has a normal distribution, et ~ N(0,σ2).

• Standardize the random variable by dividing by its standard deviation so that , et/σ ~

N(0,1).

• The square of a standard normal random variable is a chi-square random variable with

one degree of freedom, so 2 2(1)( / ) ~te σ χ .

9


• If all the random errors are independent then

2 2 2 2

21 2( )~t TT

t

e e e e = + + + χ σ σ σ σ ∑ L (5.1.4)

Since the true random errors are unobservable we replace them by their sample

counterparts, the least squares residuals 1 2t t te y b b x= − − to obtain

2

2

2 2

ˆˆ( 2)t

te

TV − σ= =

σ σ

∑ (5.1.5)

• The random variable V in Equation (5.1.5) does not have a 2( )Tχ distribution because

the least squares residuals are not independent random variables.

10


• All T residuals 1 2t t te y b b x= − − depend on the least squares estimators b1 and b2. It

can be shown that only T – 2 of the least squares residuals are independent in the

simple linear regression model. That is, when multiplied by the constant (T – 2)/σ2 the

random variable 2σ has a chi-square distribution with T – 2 degrees of freedom,

2

2( 2)2

ˆ( 2) ~ TTV −

− σ= χ

σ (5.1.6)

• We have not established the fact that the chi-square random variable V is statistically

independent of the least squares estimators b1 and b2, but it is. Now we turn our

attention to define a t-random variable.

11


5.1.1c The t-Distribution

• A “t” random variable (no uppercase) is formed by dividing a standard normal, Z ~

N(0,1), random variable by the square root of an independent chi-square random

variable, 2( )~ mV χ , that has been divided by its degrees of freedom, m.

If Z~N(0,1) and 2( )~ mV χ , and if Z and V are independent, then

( )~ mZt tV

m= (5.1.7)

• The shape of the t-distribution is completely determined by the degrees of freedom

parameter, m, and the distribution is symbolized by t(m).

12


• Figure 5.2 shows a graph of the t-distribution with m = 3 degrees of freedom, relative

to the N(0,1). Note that the t-distribution is less “peaked,” and more spread out than

the N(0,1).

• The t-distribution is symmetric, with mean E[t(m)] = 0 and variance var[t(m)] = m/(m−2).

• As the degrees of freedom parameter m→∞, the t(m) distribution approaches the

standard normal N(0,1).

5.1.1d A Key Result

• From the two random variable V and Z we can form a t-random variable. Recall that a

t-random variable is formed by dividing a standard normal random variable, Z~N(0,1),

by the square root of an independent chi-square random variable, 2( )~ mV χ , that has

been divided by its degrees of freedom, m. That is Equation (5.1.7).

13


• The t-distribution’s shape is completely determined by the degrees of freedom

parameter, m, and the distribution is symbolized by t(m). Using Z and V from

Equations (5.1.1) and (5.1.5), respectively, we have

2 22

22 2

2 2

2 2

2 2 2 2

22

( )

?( 2)2( )

2

~ ( 2)se( )ˆvar( )

t

t

b

x xZ btV TT

x xT

b b t Tbb

− β

σ− − β

= = =− σ σ

−σ −

−

− β − β= = −

∑

∑ (5.1.8)

14


5.1.2 Obtaining Interval Estimates

• If assumptions SR1-SR6 of the simple linear regression model hold, then

( 2)~ , 1,2se( )

k kT

k

bt t kb −

− β= = (5.1.9)

The random variable t in Equation (5.1.9) will be the basis for interval estimation and

hypothesis testing in the simple linear regression model.

• Equation (5.1.9), for k = 2, is

2 2( 2)

2

~ se( ) Tbt t

b −− β

= (5.1.10)

where

15


2

2 2 22

ˆ?var( ) and se( ) var( )( )t

b b bx xσ

= =−∑

• Using Table 2 inside the front cover of the book we can find critical values tc from a

t(m) distribution such that

( ) ( )2c cP t t P t t α

≥ = ≤ − =

where α is a probability value often taken to be α = .01 or α = .05. The values tc and -

tc are depicted in Figure 5.3. Each of the shaded “tail” areas contains α/2 of the

probability, so that 1-α of the probability is contained in the center portion.

Consequently, we can make the probability statement

16


( ) 1c cP t t t− ≤ ≤ = − α (5.1.11)

• Now, we put all these pieces together to create a procedure for interval estimation.

Substitute t from Equation (5.1.10) into Equation (5.1.11) to obtain

2 2

2

[ ] 1se( )c c

bP t tb

− β− ≤ ≤ = − α (5.1.12)

Simplify this expression to obtain

2 2 2 2 2[ se( ) se( )] 1c cP b t b b t b− ≤ β ≤ + = − α (5.1.13)

17


In the interval endpoint, b2 – tcse(b2) and b2 + tcse(b2), both b2 and se(b2) are random

variables, since their value are not known until a sample of data is drawn.

• The probability endpoints of the interval define an interval estimator of β2. The

probability statement in Equation (5.1.13) says that the interval 2 2se( )cb t b± , with

random endpoints, has probability 1–α of containing the true but unknown parameter

β2. This interval estimation procedure and its properties are established based on

model assumptions SR1-SR6 and may be applied to any sample of data that we might

obtain. When b2 and se(b2) in Equation (5.1.13) are estimated values (numbers), based

on a sample of data, then 2 2se( )cb t b± is called a (1–α)×(100%) interval estimate of β2,

or, equivalently, it is called a (1–α)×(100%) confidence interval.

• The properties of the random interval estimator are based on the notion of repeating

sampling. If we were to select many random samples of size T, compute the least

squares estimate b2 and its standard error se(b2) for each sample, and then construct

18


the interval estimate 2 2se( )cb t b± for each sample, then (1–α)×(100%) of all the

intervals constructed would contain the true parameter β2. This we know before any

data are actually collected.

• Any one interval estimate, based on one sample of data, may or may not contain the

true parameter β2, and since β2 is unknown, we will never know if it does or not.

When confidence intervals are discussed, remember that our confidence is in the

procedure used to construct the interval estimate; it is not in any one interval estimate

calculated from a sample of data.

19


5.1.3 The Repeated Sampling Context

• Table 5.1, using the ten samples of data, reports the least squares estimates, the

estimates of σ2, and the estimated errors from each sample and Table 5.2 presents the

95 percent confidence interval estimates for the parameters β1 and β2.

• Sampling variability causes the center of each of the interval estimates to change with

the location of the least squares estimates, and it causes the widths of the intervals to

change with the standard errors.

Table 5.1 Least Squares Estimates from 10 Random Samples

n b1 se(b1) b2 se(b2) 2σ

1 51.1314 27.4260 0.1442 0.0378 2193.4597

2 61.2045 24.9177 0.1286 0.0344 1810.5972

3 40.7882 17.6670 0.1417 0.0244 910.1835

20


4 80.1396 23.8146 0.0886 0.0329 1653.8324

5 31.0110 22.8126 0.1669 0.0315 1517.5837

6 54.3099 26.9317 0.1086 0.0372 2115.1085

7 69.6749 19.2903 0.1003 0.0266 1085.1312

8 71.1541 26.1807 0.1009 0.0361 1998.7880

9 18.8290 22.4234 0.1758 0.0309 1466.2541

10 36.1433 23.5531 0.1626 0.0325 1617.7087

Table 5.2 Interval Estimates from 10 Random Samples n 1 1se( ) cb t b− 1 1se( ) cb t b+ 2 2se( ) cb t b− 2 2se( ) cb t b+ 1 -4.3897 106.6524 0.0676 0.2207 2 10.7612 111.6479 0.0590 0.1982 3 5.0233 76.5531 0.0923 0.1910 4 31.9294 128.3498 0.0221 0.1551 5 −15.1706 77.1926 0.1032 0.2306 6 −0.2105 108.8303 0.0334 0.1838 7 30.6237 108.7261 0.0464 0.1542

21


8 18.1541 124.1542 0.0278 0.1741 9 −26.5649 64.2229 0.1131 0.2384 10 −11.5374 83.8240 0.0968 0.2284

• Since 95 percent of all interval estimates constructed this way contain the true

parameter values, we would expect perhaps nine or ten of these intervals to contain the

true but unknown parameters.

• We have used the least squares estimators to obtain, from a sample of data, point

estimates that are “best guesses” of unknown parameters. The estimated variance

ˆvar( )kb , for k = 1 or 2, and its square root ˆvar( ) se( )k kb b= , provide information

about the sampling variability of the least squares estimator from one sample to

another.

• Interval estimators combine point estimation with estimation of sampling variability to

provide a range of values in which the unknown parameters might fall. Interval

estimates are a convenient way to inform others about the estimated location of the

22


unknown parameter and also provide information about the sampling variability of the

least squares estimator, through se(bk), and the “level of confidence” 1–α.

• When the sampling variability of the least squares estimator is relatively small, then

the interval estimates will be relatively narrow, implying that the least squares

estimates are “reliable.” On the other hand, if the least squares estimators suffer from

large sampling variability, then the interval estimates will be wide, implying that the

least squares estimates are “unreliable.”

5.1.4 An Illustration

• For the food expenditure data in Table 3.1, where T = 40 and the degrees of freedom

are T – 2 = 38, if we let α = .05, Equation (5.3.13) becomes

2 2 2 2 2[ 2.024se( ) 2.024se( )] .95P b b b b− ≤ β ≤ + = (5.1.14)

23


• The critical value tc = 2.024, which is appropriate for α = .05 and 38 degrees of

freedom, can be found in the Table 2 at the end of the book.

• It can be computed exactly with a software package.

• To construct an interval estimate for β2 we use the least squares estimate b2 = .1283

which has the standard error

2 2ˆse( ) var( ) 0.0009326 0.0305b b= = =

• Substituting these values into Equation (5.1.14) we obtain a “95 percent confidence

interval estimate” for β2:

2 2se( ) .1283 2.024(.0305) [.0666, .1900]cb t b± = ± =

24


• Is β2 in the interval [.0666, .1900]? We do not know, and will never know. What we

do know is that when the procedure we used is applied to many random samples of

data from the same population, then 95 percent of all the interval estimates constructed

using this procedure will contain the true parameter. The interval estimation

procedure “works” 95 percent of the time.

• All we can say about the interval estimate based on our one sample is that, given the

reliability of the procedure, we would be “surprised” if β2 was not in the interval

[.0666, .1900]. Since this interval estimate contains no random quantities we cannot

make probabilistic statements about it such as “There is a .95 probability that β2 is in

the interval [.0666, .1900].”

• What is the usefulness of an interval estimate of β2? When reporting regression results

we always give a point estimate, such as b2 = .1283. However, the point estimate

alone gives no sense of its reliability. Interval estimates incorporate both the point

estimate and the standard error of the estimate, which is a measure of the variability of

25


the least squares estimator. If an interval estimate is wide (implying a large standard

error), there is not much information in the sample about β2. A narrow interval

estimate suggests that we have learned more about β2.

5.2 Hypothesis Testing

26


Hypothesis testing procedures compare a conjecture we have about a population to the

information contained in a sample of data. More specifically, the conjectures we test

concern the unknown parameters of the economic model. Given an econometric model,

hypotheses are formed about economic behavior. These hypotheses are then represented

as conjectures about model parameters.

Hypothesis testing uses the information about a parameter that is contained in a

sample of data, namely, its least squares point estimate and its standard error, to draw a

conclusion about the conjecture, or hypothesis.

In each and every hypothesis test four ingredients must be present:

Components of Hypothesis Tests

1. A null hypothesis, H0

2. An alternative hypothesis, H1

3. A test statistic

27


4. A rejection region

5.2.1 The Null Hypothesis

The “null” hypothesis, which is denoted H0 (H-naught), specifies a value for a parameter.

The null hypothesis is stated H0: β2 = c, where c is a constant, and is an important value

in the context of a specific regression model. A null hypothesis is the belief we will

maintain until we are convinced by the sample evidence that it is not true, in which case

we reject the null hypothesis.

5.2.2 The Alternative Hypothesis

Paired with every null hypothesis is a logical alternative hypothesis, H1, that we will

accept if the null hypothesis is rejected. The alternative hypothesis is flexible and

28


depends to some extent on economic theory. For the null hypothesis H0: β2 = c three

possible alternative hypotheses are:

• H1: β2 ≠ c. Rejecting the null hypothesis that β2 = c implies the conclusion that β2

takes some other value greater than or less than c.

• H1: β2 > c. Rejecting the null hypothesis that β2 is c leads to the conclusion that it is

greater than c. Using this alternative completely discounts the possibility that β2 < c.

It implies that these values are logically unacceptable alternatives to the null

hypothesis. Inequality alternative hypotheses are widely used in economics, since

economic theory frequently provides information about the signs of relationships

between variables.

• H1: β2 < c. Following the previous discussion, use this alternative when there is no

chance that β2 > c.

29


5.2.3 The Test Statistic

The sample information about the null hypothesis is embodied in the sample value of a

test statistic. Based on the value of a test statistic, which itself is a random variable, we

decide either to reject the null hypothesis or not to reject it. A test statistic has a very

special characteristic: its probability distribution must be completely known when the null

hypothesis is true, and it must have some other distribution if the null hypothesis is not

true.

• As an example, consider in the food expenditure example the null hypothesis H0: β2 =

c and the alternative H1: β2 ≠ c. In Equation (5.1.10) we established, under

assumptions SR1-SR6 of the simple linear regression model, that

2 2( 2)

2

~se( ) Tbt t

b −− β

= (5.2.1)

30


• If the null hypothesis H0: β2 = c is true, then, by substitution, it muse be true that

2( 2)

2

~se( ) Tb ct t

b −−

= (5.2.2)

If the null hypothesis is not true, then the t-statistic in Equation (5.2.2) does not have a

t-distribution with T − 2 degrees of freedom.

• To examine the distribution of the t-statistic in Equation (5.2.2) when the null

hypothesis is not true, suppose that the true β2 = 1. Following the steps in Equation

(5.1.8), we would find that

2( 2)

2

1 ~se( ) Tbt t

b −−

=

31


If β2 = 1, and c ≠ 1, then the test statistic in Equation (5.2.2) does not have a t-

distribution, since, in its formation, the numerator of Equation (5.1.8) is not standard

normal. It is not standard normal because the incorrect value β2 = c is subtracted from

b2.

If β2 = 1 and we incorrectly hypothesize that β2 = c, then the numerator of

Equation (5.1.8) that is used in forming Equation (5.2.2) has the distribution

2

2 2

1~ [ , 1]var( ) var( )b c cN

b b− −

where

2

2 2σvar( )

( )tb

x x=

−∑

32


Since its mean is not zero, this distribution is not standard normal, as required in the

formation of a t random variable.

5.2.4 The Rejection Region

• The rejection region is the range of values of the test statistic that leads to rejection of

the null hypothesis. It is possible to construct a rejection region only if we have a test

statistic whose distribution is known when the null hypothesis is true.

• In practices, the rejection region is a set of test statistic values that, when the null

hypothesis is true, are unlikely and have low probability of occurring. If a sample

value of the test statistic is obtained that falls in a region of low probability, then it is

unlikely that the test statistic has the assumed distribution, and thus it is unlikely that

the null hypothesis is true.

33


• To illustrate let us continue to use the food expenditure example. If the null

hypothesis H0: β2 = c is true, then the test statistic t = (b2 – c)/se(b2) ~ t(T – 2). Thus, if

the hypothesis is true, then the distribution of t is that shown in Figure 5.3.

• If the alternative hypothesis H1: β2 ≠ c is true, then values of the test statistic t will

tend to be unusually “large” or unusually “small.” The terms large and small are

determined by choosing a probability α, called level of significance of the test, which

provides a meaning for “an unlikely event.” The level of significance of the test α is

frequently chosen to be .01, .05 or .10.

• The rejection region is determined by finding critical values tc such that P(t ≥ tc) = P(t

≤ tc) = α/2. Thus, the rejection region consists of the two “tails” of the t-distribution.

• When the null hypothesis is true, the probability of obtaining a sample value of the test

statistic that falls in either tail area is “small,” and, combined, is equal to α. Sample

values of the test statistic that are in the tail areas are incompatible with the null

hypothesis and are evidence against the null hypothesis being true. When testing the

34


null hypothesis H0: β2 = c against the alternative H1: β2 ≠ c we are led to the following

rule:

Rejection rule for a two-tailed test: If the value of the test statistic falls in the

rejection region, either tail of the t-distribution, then we reject the null hypothesis

and accept the alternative.

• If the null hypothesis H0: β2 = c is true, then the probability of obtaining a value of the

test statistic t in the central nonrejection region, P(–tc ≤ t ≤ tc) = 1 – α, is high. Sample

values of the test statistic in the central non-rejection area are compatible with the null

hypothesis and are not taken as evidence against the null hypothesis being true.

• Finding a sample value of the test statistic in the non-rejection region does not make

the null hypothesis true! Intuitively, if the true value of β2 is near c, but not equal to it,

then the value of the test statistic will still fall in the nonrejection region with high

probability. In this case we would not reject the null hypothesis even though it is false.

35


Consequently, when testing the null hypothesis H0: β2 = c against the alternative H1: β2

≠ c the true is:

If the value of the test statistic falls between the critical values −tc and tc, in the non-

rejection region, then we do not reject the null hypothesis.

• Avoid saying that “we accept the null hypothesis.” This statement implies that we are

concluding that the null hypothesis is true, which is not the case at all, based on the

preceding discussion. The weaker statements, “We do not reject the null hypothesis,”

or “We fail to reject the null hypothesis,” do not send any misleading message.

• The test decision rules are summarized in Figure 5.4.

36


5.2.5 The Food Expenditure Example

• Let us illustrate the hypothesis testing procedure by testing the null hypothesis that

that β2 = .10 against the alternative that β2 ≠ .10, in the food expenditure model. That

is, if income rises by $100, do we expect food expenditures to rise by $10, or not? We

will carry through the test using, as you should, a standard testing format that

summarizes the four test ingredients and the test outcome.

• Format for Testing Hypotheses

1. Determine the null and alternative hypotheses.

2. Specify the test statistic and its distribution if the null hypothesis is true.

3. Select α and determine the rejection region.

4. Calculate the sample value of the test statistic.

5. State your conclusion.

37


• Applying to Food Expenditure example,

1. The null hypothesis is H0: β2 = .10. The alternative hypothesis is H1: β2 ≠ .10.

2. Using Equation (5.2.2), the test statistic 2( 2)

2

.10 ~se( ) T

bt tb −

−= if the null hypothesis is

true.

3. Let us select α = .05. The critical value tc is 2.024 for a t-distribution with (T − 2) =

38 degrees of freedom. Thus, we will reject the null hypothesis in favor of the

alternative if t ≥ 2.024 or −t ≤ −2.024, or equivalently, if |t| ≥ 2.024.

4. Using the data in Table 3.1, the least squares estimate of β2 is b2 = .1283, with

standard error se(b2) = 0.0305. The value of the test statistic is .1283 .10 .93.0305

t −= = .

5. Conclusion: Since t = .93 < tc = 2.024 we do not reject the null hypothesis. The

sample information we have is compatible with the null hypothesis that β2 = .10.

38


5.2.6 Type I and Type II Errors

Whenever we reject, or do not reject, a null hypothesis, there is a chance that we might be

making a mistake. This is unavoidable. In any hypothesis testing situation there are two

ways that we can make a correct decision and two ways that we can make an incorrect

decision.

We make a correct decision if:

• The null hypothesis is false and we decide to reject it.

• The null hypothesis is true and we decide not to reject it.

Our decision is incorrect if:

• The null hypothesis is true and we decide to reject it (a Type I error).

• The null hypothesis is false and we decide not to reject it (a Type II error).

39


• When we reject the null hypothesis we risk what is called a Type I error. The

probability of a Type I error is α, the level of significance of the test. A value of the

test statistic in the rejection region, the range of unlikely values for the test statistic

where we reject the null hypothesis, occurs with probability α when the null

hypothesis is true. Thus, the hypothesis testing procedure we use will reject a true

hypothesis with probability α. The only good news here is that we control the

probability of a Type I error by choosing the level of significance of the test. If this

type of decision error is costly one, then we should choose the level of significance to

be small, perhaps α = .01 or .05.

• We risk a Type II error when we do not reject the null hypothesis. Our testing

procedure will lead us to fail to reject null hypotheses that are false with a certain

probability. The magnitude of the probability of a Type II error is not under our

control and cannot be computed, as it depends on the true but unknown value of the

parameter in question. We know these facts about the probability of a Type II error:

40


• The probability of a Type II error varies inversely with the level of significance of

the test, α, which is the probability of a Type I error. If you choose to make α

smaller, the probability of a Type II error increases.

• The closer the true value of the parameter is to the hypothesized parameter value

the larger is the probability of a Type II error. Intuitively, the test loses the power

to discriminate between the true parameter value and the (false) hypothesized value

if they are similar in magnitude.

• The larger the sample size T, the lower the probability of a Type II error, given a

level of Type I error α.

• By “best” test we mean one that has the minimum Type II error for any given level

of Type I error, α. However, the test based on the t-distribution that we have

described is a very good test and it is without question the test used most frequently

in the situation that we have described.

41


5.2.7 The p-Value of a Hypothesis Test

The p-value of a test is calculated by finding the probability that the t-distribution can

take a value greater than or equal to the absolute value of the sample value of the test

statistic. Using a p-value, we can determine whether to reject a null hypothesis by

comparing it to the level of significance α. The rule is:

Rejection rule for a two-tailed test: When the p-value of a hypothesis test is

smaller than the chosen value of α, then the test procedure leads to rejection of the

null hypothesis.

• If the p-value is greater than α we do not reject the null hypothesis.

• In the food expenditure example the p-value for the test of H0: β2 = .10 against H1: β2

≠ .10 is illustrate in Figure 5.5.

• The p-value for this hypothesis test is p = .3601, which is the area in the tails of the

t(38) distribution where |t| ≥ .9263.

42


• The fact that the portion of the p-value in the upper tail area is approximately .18

means that the critical value tc marking of α/2 = .025 must be to the right of t = .9263,

and thus the t-test will not lead to rejection of the null hypothesis.

5.2.8 Tests of Significance

• In the food expenditure model one important null hypothesis is H0: β2 = 0.

• The general alternative hypothesis is H1: β2 ≠ 0.

• Rejecting the null hypothesis that β2 = 0 implies that β2 takes some other value, and it

can be either positive or negative.

• In general, rejecting the null hypothesis implies that there is a “statistically significant”

relationship between y and x.

43


5.2.8a A Significance Test in the Food Expenditure Model

Let us illustrate the hypothesis testing procedure by testing whether income has an effect

(either positive or negative) on food expenditure. We will carry through the test using the

five-point format described above (on page 103).

1. The null hypothesis is H0: β2 = 0. The alternative hypothesis is H1: β2 ≠ 0. If the null

hypothesis is true, then there is no economic relationship between weekly household

income and weekly household expenditure, given our econometric model. If the

alternative hypothesis is true, then there is a relationship between income and food

expenditure.

2. The test statistic 2( 2)

2

~se( ) T

bt tb −= if the null hypothesis is true.

3. Let us select α = .05. The critical value tc is 2.024 for a t-distribution with (T − 2) =

38 degrees of freedom. Thus, we will reject the null hypothesis in favor of the

alternative if |t| ≥ 2.024.

44


4. Using the data in Table 3.1, the least squares estimate of β2 is b2 = .1283, with

standard error se(b2) = 0.0305. The value of the test statistic is t = .1283/.0305 = 4.20.

5. Conclusion: Since t = 4.20 > tc = 2.024 we reject the null hypothesis and accept the

alternative, that there is a relationship between weekly income and weekly food

expenditure. This test is called a “test of significance” since it is a test of whether b2 is

statistically significantly different from zero.

6. In the food expenditure example the p-value for the test of H0: β2 = 0 against H1: β2 ≠

0 is illustrated in Figure 5.6. The p-value for this hypothesis test is p = .000155, which

is the area in the tails of the t(38) distribution where |t| ≥ 4.20. Since p ≤ α we reject the

null hypothesis that β2 = 0 and accept the alternative that β2 ≠ 0, and thus that a

“statistically significant” relationship exists between y and x.

In Figure 5.6 the p-value is depicted by the combined shaded tail areas. On the

figure we have also marked the α = .05 critical values tc = ±2.024. The figure

illustrates why the rule reject H0 if p ≤ α gives the same result as the t-test. When p ≤

45


α, the value of the t-statistic must fall in the rejection region. The fact that the portion

of the p-value in the upper tail area is approximately .00008 means that the critical

value tc marking off α/2 = .025 must be to the left of t = 4.20, and thus the t-test will

lead to rejection of the null hypothesis.

Remark: “Statistically significant” does not, however, necessarily imply “economically

significant.”

• For example, suppose the CEO of the supermarket chain plans a certain course of

action if β2 ≠ 0.

• Furthermore suppose a large sample is collected from which we obtain the estimate b2

= .0001 with se(b2) = .00001, yielding the t-statistic t = 10.0.

• We would reject the null hypothesis that β2 = 0 and accept the alternative that β2 ≠ 0.

Here b2 = .0001 is statistically different from zero.

46


• However, .0001 may not be “economically” different from 0, and the CEO may decide

not to proceed with her plans.

5.2.8b Reading Computer Output

• The hypothesis that β2 = 0 is so important when interpreting the results of a regression

analysis that econometric software automatically reports the results of this test. The

test statistic value, using the computer’s calculation, is

2

2

.1283 4.20.0305se( )bt b= = =

• In the EViews computer shown in Figure 5.7, this value is the “coefficient” value

divided by the standard error. The value of the t-statistic for the null hypothesis H0: β2

= 0 is shown in the column labeled “t-Statistic.”

47


Dependent Variable: FOODEXP

Method: Least Squares

Sample: 1 40

Included observations: 40

Variable Coefficient Std. Error t-Statistic Prob.

C 40.76756 22.13865 1.841465 0.0734

INCOME 0.128289 0.030539 4.200777 0.0002

Figure 5.7 EViews Regression Output

• The p-value of this test, a two-tail p-value that EViews has rounded to .0002, is

reported in the column labeled “Prob.”

48


• Simply examine the p-value and compare it to the chosen value of α (the level of

significance of the test). If p ≤ α we know immediately that we will reject the null

hypothesis H0: β2 = 0 in favor of the alternative H1: β2 ≠ 0.

• The results of testing H0: β1 = 0 are also given by the software, but this test is not

usually as economically relevant as the test for β2 = 0.

5.2.9 A Relationship Between Hypothesis Testing and Interval Estimation

• There is an algebraic relationship between two-tailed hypothesis tests and confidence

interval estimates that is sometimes useful.

• Suppose that we are testing the null hypothesis H0: βk = c against the alternative H1: βk

≠ c. If we fail to reject the null hypothesis at the α level of significance, then the value

c will fall within a (1−α)×100% confidence interval estimate of βk. Conversely, if we

49


reject the null hypothesis, then c will fall outside the (1−α)×100% confidence interval

estimate of βk.

• This algebraic relationship is true because we fail to reject the null hypothesis when

−tc ≤ t ≤ tc, or when

se( )k

c ck

b ct tb−

− ≤ ≤

which, when rearranged, becomes

bk − tcse(bk) ≤ c ≤ bk + tcse(bk).

The endpoints of this interval are the same as those of a (1−α)×100% confidence

interval estimate of βk.

50


• Thus, for any value of c within the interval, we do not reject H0: βk = c against the

alternative H1: βk ≠ c. For any value of c outside the interval we reject H0: βk = c and

accept the alternative H1: βk ≠ c.

• This relationship can be handy if you are given only a confidence interval and want to

determine what the outcome of a two-tailed test would be. However, you should note

two things about this relationship between interval estimation and hypothesis testing:

1. The relationship is between confidence intervals and two-tailed tests. It does not

apply to one-tailed tests.

2. A confidence interval is an estimation tool; that is, it is an interval estimator. A

hypothesis test about one or more parameters is a completely separate form of

inference, with the only connection being that the test statistic incorporates the least

squares estimator. To test hypotheses you should carry out the steps outlined on

page 103, and should not compute and report an interval estimate. Keep hypothesis

testing and parameter estimation separate at all times!

51


5.2.10 One-Tailed Tests

• One-tailed tests are used to test H0: βk = c against the alternative H1: βk > c, or H1: βk <

c.

• To test H0: βk = c against the alternative H1: βk > c we select the rejection region to be

values of the test statistic t that support the alternative hypothesis and that are unlikely

if the null hypothesis is true. We define the rejection region to be values of t greater

than a critical value tc, from a t-distribution with T − 2 degrees of freedom, such that

P(t ≥ tc) = α, where α is the level of significance of the test and the probability of a

Type I error. This critical value is depicted in Figure 5.8.

• The decision rule for this one-tailed test is, “Reject H0: βk = c and accept the

alternative H1: βk > c if t ≥ tc.” If t < tc then we do not reject the null hypothesis.

• Computation of the p-value is similarly confined to one tail of the distribution of the

test statistic, though its interpretation is exactly as before. For testing H0: βk = c

52


against the alternative H1: βk > c, the p-value is computed by finding the probability

that the test statistic is greater than or equal to the computed sample value of the test

statistic.

• In the food expenditure example we test H0: β2 = 0 against the alternative H1: β2 > 0.

This is the relevant “test of significance” for this example, since economic theory rules

out negative values of β2. Following our standard testing format we have:

1. The null hypothesis is H0: β2 = 0. The alternative hypothesis is H1: β2 > 0. If the

null hypothesis is true, then there is no economic relationship between weekly

household income and weekly household food expenditure, given our economic

and statistical model. If the alternative hypothesis is true, then there is a positive

relationship between income and food expenditure.

2. The test statistic 2( 2)

2

~se( ) T

bt tb −= if the null hypothesis is true.

53


3. For the level of significance α = .05 the critical value tc is 1.686 for a t-distribution

with T − 2 = 38 degrees of freedom. Thus, we will reject the null hypothesis in

favor of the alternative if t ≥ 1.686.

4. The least squares estimate of β2 is b2 = .1283, with standard error se(b2) = .0305.

Exactly as in the two-tailed test the value of the test statistic is .1283 4.20.0305

t = = .

5. Conclusion: Since t = 4.20 > tc = 1.686 we reject the null hypothesis and accept

the alternative, that there is a positive relationship between weekly income and

weekly food expenditure. The p-value for this test is p(t ≥ 4.20) = .000775, which

is far less than the level of significance α = .05; thus, we also reject the null

hypothesis on this basis. The p-value in the one-tailed test is exactly one-half the

p-value in the two-tailed test. By forming an inequality alternative, we have added

the information that negative values of β2 are impossible.

54


5.2.11 A Comment on Stating Null and Alternative Hypotheses

• When we fail to reject a null hypothesis, all the hypothesis test can establish is that the

information in a sample of data is compatible with the null hypothesis. On the other

hand, a statistic test can lead us to reject the null hypothesis, with only a small

probability, α, of rejecting the null hypothesis when it is actually true. Thus, rejecting

a null hypothesis is a stronger conclusion than failing to reject it.

• The null hypothesis is usually stated in such a way that if our theory is correct, then

we will reject the null hypothesis. For example, economic theory implies that there

should be a positive relationship between income and food expenditure.

• When using a hypothesis test we would like to establish that there is statistical

evidence, based on a sample of data, to support this theory. With this goal we set up

the null hypothesis that there is no relation between the variables, H0: β2 = 0. In the

alternative hypothesis we put the conjecture that we would like to establish, H1: β2 > 0.

55


• Alternatively, suppose the conjecture that we would like to establish is that the

marginal propensity to spend on food is greater than 0.10. To do so, we define the

null hypothesis H0: β2 = 0.10 against the alternative hypothesis H1: β2 > 0.10. You

may view the null hypothesis to be too limited in this case, since it is feasible that β2 <

0.10. The hypothesis testing procedure for testing the null hypothesis that H0: β2 ≤

0.10 against the alternative hypothesis H1: β2 > 0.10 is exactly the same as testing H0:

β2 = 0.10 against the alternative hypothesis H1: β2 > 0.10. The test statistic, rejection

region, and p-value are exactly the same. For one-tailed test you can form the null

hypothesis in either of these ways. What counts is that the alternative hypothesis is

properly specified.

• It is important to set up the null and alternative hypotheses before you carry out the

regression analysis. Failing to do so can lead to errors in formulating the alternative

hypothesis.

56


5.3 The Least Squares Predictor

• Given the model and assumptions SR1-SR6, we want to predict for a given value of

the explanatory variable x0 the value of the dependent variable y0, which is given by

y0 = β1 + β2x0 + e0 (5.3.1)

where e0 is a random error. This random error has mean E(e0) = 0 and variance var(e0)

= σ2. We also assume that cov(e0, et) = 0.

• In Equation (5.3.1) we can replace the unknown parameters by their estimators, b1 and

b2. Since y0 is not known, the random error e0 can not be estimated, so we replace it

by its expectation, zero. This produces the least squares predictor of y0,

0 1 2 0y b b x= + (5.3.2)

57


graphed in Figure 5.9. This prediction is given by the point on the least squares fitted

line where x = x0.

• How good a prediction procedure is this? Since the least squares estimators b1 and b2

are random variables, then so is 0 1 2 0y b b x= + . To evaluate the sampling properties of

this predictor it is customary to examine the forecast error

0 0 1 2 0 1 2 0 0

1 1 2 2 0 0

ˆ ( )( ) ( )

f y y b b x x eb b x e

= − = + − β + β += − β + − β −

(5.3.3)

Using the properties of the least squares estimators and the assumptions about e0, the

expected value of f is:

0 0 1 1 2 2 0 0ˆ( ) ( ) ( ) ( ) ( )0 0 0 0

E f E y y E b E b x E e= − = − β + − β −= + − =

(5.3.4)

58


which means, on average, the forecast error is zero, and 0y is an unbiased linear

predictor of y0.

• Using Equation (5.3.3) for the forecast error, and what we know about the variances

and covariances of the least squares estimators, it can be shown that the variance of

the forecast error can be proven as follows in Equation (5.3.5):

0 0

2 2 2 21 1 0 2 2 0 0 1 1 2 2

1 1 0 2 2 0

21 0 2 0 0 1 2

ˆvar( ) var( )

( ) ( ) ( ) 2 [( )( )] 2 [( ) ] 2 [( ) ]

var( ) var( ) var( ) 2 cov( , )

f y y

E b x E b E e x E b bE b e E b e

b x b e x b b

= −

= − β + − β + + − β − β+ − β + − β

= + + +

59


2 2 2 2 220

2 2 2

2 2 2 2 2 20 0 0 02 2

2 2

2 2 20 02 2 0

2

2 (from Equation (4.2.10))( ) ( ) ( )

2 ( ) ( 2 )[1 ] [1 ]

( ) ( )

( ) ( 2 ) 1 ( )[1 ] 1( )

t

t t t

t t

t t

t

t

x x xT x x x x x x

x Tx Tx x x Tx Tx x xT TxT x x T x x

x x T x x x x x xT x x T

σ σ − σ= + + σ +

− − −

+ − − + − += σ + = σ +

− −

− + − + −= σ + = σ + +

−

∑∑ ∑ ∑

∑ ∑∑ ∑

∑∑

2

2( )tx x

− ∑

Notice that the further x0 is from the sample mean x , the more unreliable the forecast

will be, in the sense that the variance of the forecast error is larger.

• If the random errors are normally distributed, or if the sample size is large, then the

forecast error f is normally distributed with mean zero and variance given by Equation

(5.3.5).

60


• The forecast error variance is estimated by replacing 2σ by its estimator 2σ to give

2

2 02

1 ( )ˆ ˆvar( ) 1( )t

x xfT x x

−= σ + +

− ∑ (5.3.6)

• The square root of the estimated variance is the standard error of the forecast,

( ) ( )ˆvarse f f= (5.3.7)

We can use the predicted value 0y and the standard error of the forecast to compute a

confidence interval, or a prediction interval.

• Consequently we can construct a standard normal random variable as

61


~ (0, 1)var( )

f Nf

(5.3.8)

Then by replacing var(f) in Equation (5.3.8) by ˆvar( )f , we obtain a t-statistic,

( 2)~se( )ˆvar( ) T

f f tff −= (5.3.9)

• Using these results we can construct a prediction interval for y0 just as we constructed

confidence intervals for the parameters βk. If tc is a critical value from the t(T-2)

distribution such that P(t ≥ tc) = α/2, then

( ) 1c cP t t t− ≤ ≤ = − α (5.3.10)

62


• Substitute the t random variable from Equation (5.3.9) into Equation (5.3.10) to obtain

0 0ˆ[ ] 1

se( )c cy yP t t

f−

− ≤ ≤ = − α

and simplify this expression to obtain

0 0 0?[ se( ) se( )] 1c cP y t f y y t f− ≤ ≤ + = − α (5.3.11)

• A (1−α)×100% confidence interval, or prediction interval, for y0 is

0ˆ se( )cy t f± (5.3.12)

63


• Equation (5.3.5) implies that, the farther x0 is from the sample mean x , the larger the

variance of the prediction error. In other words, our predictions for values of x0 close

to the sample mean x are more reliable than our predictions for values of x0 far from

the sample mean x .

• The relationship between point and interval predictions for different values of x0 is

illustrated in Figure 5.10. A point prediction is always given by the fitted least squares

line, 0 1 2 0y b b x= + . A prediction confidence interval takes the form of two bands

around the least squares line. Since the forecast variance increases the farther x0 is

from the sample mean of x , the confidence bands increase in width as 0| |x x−

increases.

64


5.3.1 Prediction in the Food Expenditure Model

• In Chapter 3.3.3b we predicted the weekly expenditure on food for a household with x0

= $750 weekly income. The point prediction is

0 1 2 0ˆ 40.7676 .1283(750) 136.98y b b x= + = + =

This means we predict that a household with $750 weekly income will spend $136.98

on food per week.

• Using our estimate 2σ = 1429.2456, the estimated variance of the forecast error is

2 2

2 02

1 ( ) 1 (750 698)ˆ ˆvar( ) 1 1429.2456 1 1467.4986( ) 40 1532463t

x xfT x x

− −= σ + + = + + = − ∑

65


The standard error of the forecast is then

ˆse( ) var( ) 1467.4986 38.3079f f= = =

• If we select 1 – α = .95, then tc = 2.024 and the 95% confidence interval for y0 is

0ˆ se( ) 136.98 2.024(38.3079)cy t f± = ±

or [59.44 to 214.52]. Our prediction interval suggests that a household with $750

weekly income will spend somewhere between $59.44 and $214.52 on food. Such a

wide interval means that our point prediction, $136.98, is not reliable. We might be

able to improve it by measuring the effect that factors other than income might have.

66


Exercise

5.1 5.2 5.3 5.4 5.5

5.6 5.7 5.8 5.10 5.15

5.16 5.19

Chapter 5 Inference in the Simple Regression Model ...web.thu.edu.tw/wichuang/www/Financial...

Documents

Transcript of Chapter 5 Inference in the Simple Regression Model ...web.thu.edu.tw/wichuang/www/Financial...