# SIMPLE REGRESSION MODEL - Yıldız Teknik Üniversitesiyildiz.edu.tr/~tastan/teaching/Handout 02...

Embed Size (px)

### Transcript of SIMPLE REGRESSION MODEL - Yıldız Teknik Üniversitesiyildiz.edu.tr/~tastan/teaching/Handout 02...

1

SIMPLE REGRESSION MODEL

Huseyin Tastan1

1Yıldız Technical UniversityDepartment of Economics

These presentation notes are based onIntroductory Econometrics: A Modern Approach (2nd ed.)

by J. Wooldridge.

14 Ekim 2012

2

Simple Regression Model

Simple (Bivariate) Regression Model

y = β0 + β1x+ u

I y: dependent variable

I x: explanatory variable

I Also called “bivariate linear regression model”, “two-variablelinear regression model”

I Purpose: to explain the dependent variable y by theindependent variable x

3

Simple Regression Model

Terminologyy x

Dependent variable Independent variableExplained variable Explanatory variableResponse variable Control variablePredicted variable Predictor variable

Regressand Regressor

4

Predictor variable: y = β0 + β1x+ u

u: Random Error term - Disturbance termRepresents factors other than x that affect y. These factors aretreated as “unobserved” in the simple regression model.

Slope parameter β1

I If the other factors in u are held fixed, i.e. ∆u = 0, then thelinear effect of x on y:

∆y = β1∆x

I β1: slope term.

Intercept term (also called constant term): β0

the value of y when x = 0.

5

Simple Regression Model: Examples

Agricultural production and fertilizer usage

yield = β0 + β1fertilizer + u

yield: quantity of wheat production

Slope parameter β1

∆yield = β1∆fertilizer

Ceteris Paribus, one unit change in fertilizer leads to β1 unitchange in wheat yield.

Random error term: uContains factors such as land quality, rainfall, etc, which areassumed to be unobserved.Ceteris Paribus ⇔ Holding all other factors fixed ⇔ ∆u = 0

6

Simple Regression Model: Examples

A Simple Wage Equation

wage = β0 + β1educ+ u

wage: hourly wage (in dollars); educ: education level (in years)

Slope parameter β1

∆wage = β1∆educ

β1 measures the change in hourly wage given another year ofeducation, holding all other factors fixed (ceteris paribus).

Random error term uOther factors include labor force experience, innate ability, tenurewith current employer, gender, quality of education, marital status,number of children, etc. Any factor that may potentially affectworker productivity.

7

Linearity

I The linearity of simple regression model means: a one-unitchange in x has the same effect on y regardless of the initialvalue of x.

I This is unrealistic for many economic applications.

I For example, if there increasing or decreasing returns to scalethen this model is inappropriate.

I In wage equation, the impact of the next year of education onwages has a larger effect than did the previous year.

I An extra year of experience may also have similar increasingreturns.

I We will see how to allow for such possibilities in the followingclasses.

8

Assumptions for Ceteris Paribus conclusions

1. Expected value of the error term u is zero

I If the model includes a constant term (β0) then we canassume:

E(u) = 0

I This assumption is about the distribution of u(unobservables). Some u terms will be + and some will be −but on average u is zero.

I This assumption is always guaranteed to hold by redefining β0.

9

Assumptions for Ceteris Paribus conclusions

2. Conditional mean of u is zero

I How can we be sure that the ceteris paribus notion is valid(which meas that ∆u = 0?

I For this to hold, x and u must be uncorrelated. But sincecorrelation coefficient measures only the linear associationbetween two variables it is not enough just to have zerocorrelation.

I u must also be uncorrelated with the functions of x(e.g. x2,√x etc.)

I Zero Conditional Mean assumption ensures this:

E(u|x) = E(u) = 0

I This equation says that the average value of theunobservables is the same across all slices of the populationdetermined by the value of x.

10

Zero Conditional Mean Assumption

Conditional mean of u given x is zero

E(u|x) = E(u) = 0

I Both u and x are random variables. Thus, we can define theconditional distribution of u given a value of x.

I A given value of x represents a slice in the population. Theconditional mean of u for this specific slice of the populationcan be defined.

I This assumption means that the average value of u does notdepend on x.

I For a given value of x unobservable factors have a zero mean.Also, unconditional mean of unobservables is zero.

11

Zero Conditional Mean Assumption: Example

Wage equation:wage = β0 + β1educ+ u

I Suppose that u represents innate ability of employees, denotedabil.

I E(u|x) assumption implies that innate ability is the sameacross all levels of education in the population:

E(abil|educ = 8) = E(abil|educ = 12) = . . . = 0

I If we believe that average ability increases with years ofeduction this assumption will not hold.

I Since we cannot observe ability we cannot determine ifaverage ability is the same for all education levels.

12

Zero Conditional Mean Assumption: Example

Agricultural production-fertilizer model:

I Recall the fertilizer experiment: the land is divided into equalplots and different amounts of fertilizer is applied to each plot.

I If the amount of fertilizer is assigned to land plotsindependent of the quality of land then thezero-conditional-mean assumption will hold.

I However, if larger amounts of fertilizer is assigned to landplots with higher quality then the expected value of the errorterm will increase with the amount of fertilizer.

I In this case zero conditional mean assumption is false.

13

Population Regression Function (PRF)

I Expected value of y given x:

E(y|x) = β0 + β1x+ E(u|x)︸ ︷︷ ︸=0

= β0 + β1x

I This is called PRF. Obviously, conditional expectation of thedependent variable is a linear function of x.

I Linearity of PRF: for a one-unit change in x conditionalexpectation of y changes by β1.

I The center of the conditional distribution of y for a givenvalue of x is E(y|x).

Population Regression Function (PRF)

15

Systematic and Unsystematic Parts of Dependent Variable

I In the simple regression model

y = β0 + β1x+ u

under E(u|x) = 0 the dependent variable y can bedecomposed into two parts:

I Systematic part: β0 +β1x. This is the part of y explained by x.

I Unsystematic part: u. This is the part of y that cannot beexplained by x.

16

Estimation of Unknown Parameters

I How can we estimate the unknown population parameters(β0, β1) given a cross-sectional data set.?

I Suppose that we have a random sample of n observations:

{yi, xi : i = 1, 2, . . . , n}

I Regression model can be written for each observation asfollows:

yi = β0 + β1xi + ui, i = 1, 2, . . . , n

I Now we have a system of n equations with two unknowns.

17

Estimation of unknown population parameters (β0, β1)

yi = β0 + β1xi + ui, i = 1, 2, . . . , n

n equations with 2 unknowns:

y1 = β0 + β1x1 + u1

y2 = β0 + β1x2 + u2

y3 = β0 + β1x3 + u3

... =...

yn = β0 + β1xn + un

Random Sample Example: Savings and income for 15families

19

Estimation of unknown population parameters (β0, β1):Method of Moments

We just made two assumptions for ceteris paribus conclusions tobe valid:

E(u) = 0

Cov(x, u) = E(xu) = 0

NOTE: If E(u|x) = 0 then by definition Cov(x, u) = 0 but thereverse may not be true. Since E(u) = 0 then by definitionCov(x, u) = E(xu). Using these assumptions and u = y− β0 − β1xPopulation Moment Conditions can be written as:

E(y − β0 − β1x) = 0

E[x(y − β0 − β1x)] = 0

Now we have 2 equations with 2 unknowns.

20

Method of Moments: Sample Moment Conditions

Population moment conditions:

E(y − β0 − β1x) = 0

E[x(y − β0 − β1x)] = 0

Replacing these with their sample analogs we obtain:

1

n

n∑i=1

(yi − β0 − β1xi) = 0

1

n

n∑i=1

xi(yi − β0 − β1xi) = 0

This system can easily be solved for β0 and β1 using sample data .Note that β0 and β1 have hats on them, they are not fixedquantities. They change as the data change.

21

Method of Moments: Sample Moment Conditions

Using the properties of the summation operator, from the firstsample moment condition:

y = β0 + β1x

where y and x sample means.Using this we can write

β0 = y − β1x

Substituting this into the second sample moment condition we cansolve for β1.

22

Method of Moments

Substituting β0 into second moment condition after multiplying itwith 1/n:

n∑i=1

xi(yi − (y − β1x)− β1xi) = 0

This expression can be written as

n∑i=1

xi(yi − y) = β1

n∑i=1

xi(xi − x)

23

Slope Estimator

β1 =

∑ni=1(xi − x)(yi − y)∑n

i=1(xi − x)2

The following properties have been used in deriving the expressionabove:

n∑i=1

xi(xi − x) =n∑i=1

(xi − x)2

n∑i=1

xi(yi − y) =n∑i=1

(xi − x)(yi − y)

24

Slope Estimator

β1 =

∑ni=1(xi − x)(yi − y)∑n

i=1(xi − x)2

1. Slope estimator is the ratio of the sample covariance betweenx and y to the sample variance of x.

2. The sign of β1 depends on the sign of sample covariance. If xand y are positively correlated in the sample, β1 is positive; ifx and y are negatively correlated then β1 is negative.

3. To be able to calculate β1 x must have enough variability:n∑i=1

(xi − x)2 > 0

If all x values are the same then the sample variance will be 0.In this case, β1 will be undefined. For example, if allemployees have the same level of education, say 12 years, thenit is not possible to measure the impact of eduction on wages.

25

Ordinary Least Squares (OLS) Estimation

Fitted values of y can be calculated after β0 and β1 are found:

yi = β0 + β1xi

Residuals are the difference between observed and fitted values:

ui = yi − yi= yi − β0 − β1xi

Residual is not the same as error term. The random error term u isunobserved whereas u is estimated given a sample of observations.

OLS Objective Function

OLS estimators are found by making the sum of squaredresiduals (SSR) as small as possible:

minβ0,β1

n∑i=1

u2i

Residuals

27

Ordinary Least Squares (OLS) Estimators

OLS Problem

minβ0,β1

SSR =n∑i=1

(yi − β0 − β1xi)2

OLS First Order Conditions

∂SSR

∂β0

= −2n∑i=1

(yi − β0 − β1xi) = 0

∂SSR

∂β1

= −2n∑i=1

xi(yi − β0 − β1xi) = 0

The solution of this system is the same as the solution of thesystem obtained using the method of moments. Notice that if wemultiply sample moment conditions by −2n we obtain OLS firstorder conditions.

28

Population and Sample Regression Functions

Population Regression Function - PRF

E(y|x) = β0 + β1x

PRF is unique and unknown.

Sample Regression Function - SRF

y = β0 + β1x

SRF may be thought of as the estimated version of PRF.Interpretation of slope coefficient:

β1 =∆y

∆x

or∆y = β1∆x

29

Example: CEO Salary and Firm Performance

I We want to model the relationship between CEO salary andfirm performance:

salary = β0 + β1roe+ u

I salary: annual CEO salary (1000 US$), roe: average return onequity for the last three years, %

I Using n = 209 firms in ceosal1.gdt data set in GRETL theSRF is estimated as follows:

salary = 963.191 + 18.501roe

I β1 = 18.501. Interpretation: If the return of equity increasesby one percentage point, i.e. ∆roe = 1, then salary ispredicted to increase by 18.501 or 18,501 US$ (ceterisparibus).

CEO Salary Model - SRF

0

2000

4000

6000

8000

10000

12000

14000

16000

0 10 20 30 40 50

sala

ry

roe

salary versus roe (with least squares fit)

Y = 963. + 18.5X

CEO Salary Model - SRF CEO Salary Model - Fitted values, Residuals

33

Algebraic Properties of OLS Estimators

I Sum of OLS residuals, as well as their sample mean is zero:

n∑i=1

ui = 0, ¯u = 0

This follows from the first sample moment condition.

I Sample covariance between x and residuals is zero:

n∑i=1

xiui = 0

This follows from the second sample moment condition.

I The point (x, y) is always on the regression line.

I Sample average of the fitted values is equal to the sampleaverage of observed y values: y = ¯y

34

Sum of Squares

I For each observation i we have

yi = yi + ui

Summing both sides of this equation we obtain the followingquantities:

I SST: Total Sum of Squares

SST =

n∑i=1

(yi − y)2

I SSE: Explained Sum of Squares

SSE =n∑i=1

(yi − y)2

I SSR: Residual Sum of Squares

SSR =n∑i=1

u2i

35

Sum of Squares

I SST gives the total variation in y:

SST =n∑i=1

(yi − y)2

Note that Var(y) = SST/(n− 1).I Similarly, SSE measures the variation in the fitted values.

SSE =

n∑i=1

(yi − y)2

I SSR measures the sample variation in the residuals.

SSR =

n∑i=1

u2i

I Total sample variation in y can be written as

SST = SSE + SSR

36

Goodness-of-fit

I By definition total sample variation in y can be decomposedinto two parts:

SST = SSE + SSR

I Dividing both sides by SST we obtain:

1 =SSE

SST+SSR

SSTI The ration of explained variation to the total variation is

called the coefficient of determination and denoted by R2:

R2 =SSE

SST= 1− SSR

SST

I Since SSE can never be larger than SST we have 0 ≤ R2 ≤ 1I R2 is interpreted as the fraction of the sample variation in y

that is explained by x. After multiplying by 100 it can beinterpreted as the percentage of the sample variation in yexplained by x.

I R2 can also be calculated as follows: R2 = Corr(y, y)2

37

Incorporating Nonlinearities in Simple Regression

I Linear relationships may not be appropriate in some cases.

I By appropriately redefining variables we can easily incorporatenonlinearities into the simple regression.

I Our model will still be linear in parameters. We do not usenonlinear transformations of parameters.

I In practice natural logarithmic transformations are widelyused. (log(y) = ln(y)).

I Other transformations may also be used, e.g., addingquadratic or cubic terms, inverse form, etc.

38

Linearity in Parameters

I The linearity of the regression model is determined by thelinearity of βs not x and y.

I We can still use nonlinear transformations of x and y such aslog x, log y, x2,

√x, 1/x, y1/4. The model is still linear in

parameters.

I But the models that include nonlinear transformations of βsare not linear in parameters and cannot be analyzed usingOLS framework.

I For example the following models are not linear in parameters:

consumption =1

β0 + β1income+ u

y = β0 + β21x+ u

y = β0 + eβ1x + u

39

Functional Forms using Natural Logarithms

Log-Level

log y = β0 + β1x+ u

∆ log y = β1∆x

%∆y = (100β1)∆x

Interpretation: For a one-unit change in x, y changes by %(100β1).Note: 100∆ log y = %∆y

Level-Log

y = β0 + β1 log x+ u

∆y = β1∆ log x

=

(β1

100

)100∆ log x︸ ︷︷ ︸

%∆x

Interpretation: For a %1 change in x, y changes by (β1/100) (inits own units of measurement).

40

Functional Forms using Natural Logarithms

Log-Log (Constant Elasticity Model)

log y = β0 + β1 log x+ u

∆ log y = β1∆ log x

%∆y = β1%∆x

Interpretation: β1 is the elasticity of y with respect to x. It givesthe percentage change in y for a %1 change in x.

%∆y

%∆x= β1

Example: Wage-Education Relationship,log(wage) = β0 + β1educ+ u

42

Wage-Education Relationship

-5

0

5

10

15

20

25

0 2 4 6 8 10 12 14 16 18

wag

e

educ

wage versus educ (with least squares fit)

Y = -0.905 + 0.541X

0

0.05

0.1

0.15

0.2

0 5 10 15 20 25

Rel

ativ

e fr

eque

ncy

wage

-1

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

0 2 4 6 8 10 12 14 16 18

lwag

e

educ

lwage versus educ (with least squares fit)

Y = 0.584 + 0.0827X

0

0.05

0.1

0.15

0.2

-0.5 0 0.5 1 1.5 2 2.5 3

Rel

ativ

e fr

eque

ncy

lwage

43

Log-Level Simple Wage Equation

logwage = 0.584(0.097)

+ 0.083(0.008)

educ

n = 526 R2 = 0.186

(standard errors in parentheses)

I After multiplying the slope estimate by 100 it can beinterpreted as %; 100× 0.083 = 8.3

I An additional year of education is predicted to increaseaverage wages by %8.3. This is called return to another yearof education.

I WRONG: an additional year of education increases logwageby %8.3. Here, wage increases by %8.3 not logwage.

I R2 = 0.186: Education explains about %18.6 of the variationin logwage.

44

Log-Log Example: CEO Salaries (ceosal1.gdt)

Model:log(salary) = β0 + β1 log(sales) + u

Estimation results:

log(salary) = 4.822(0.288)

+ 0.257(0.035)

log(sales)

n = 209 R2 = 0.211

(standard errors in parentheses)

I Interpretation: %1 increase in firm sales increases CEO salaryby %0.257. In other words, the elasticity of CEO salary withrespect to sales is 0.257. About %4 increase in firm sales willincrease CEO salary by %1.

I R2 = 0.211: logsales can explain about %21.1 of variation inlogsalary.

Functional Forms using Natural Logarithms: Summary46

Statistical Properties of OLS Estimators, β0, β1

I What are the properties of the distributions of β0, β1 overdifferent random samples from the population?

I What are the expected values and variances of OLSestimators?

I We will first examine finite sample properties: unbiasednessand efficiency. These are valid for any sample size n.

I Recall that unbiasedness means that the mean of thesampling distribution of an estimator is equal to the unknownparameter value.

I Efficiency is related to the variance of the estimators. Anestimator is said to be efficient if its variance is the smallestamong a set of unbiased estimators.

47

Unbiasedness of OLS Estimators

We need the following assumptions for unbiasedness:

1. SLR.1: Model is linear in parameters: y = β0 + β1x+ u

2. SLR.2: Random sampling: we have a random sample from thetarget population.

3. SLR.3: Zero conditional mean: E(u|x) = 0. Since we have arandom sample we can write:

E(ui|xi) = 0, ∀ i = 1, 2, . . . , n

4. SLR.4: Sample variation in x. The variance of x must not bezero:

n∑i=1

(xi − x)2 > 0

48

Unbiasedness of OLS Estimators

THEOREM: Unbiasedness of OLSIf all SLR.1-SLR.4 assumptions hold then OLS estimators areunbiased:

E(β0) = β0, E(β1) = β1

PROOF:

This theorem says that the centers of the sampling distributions ofOLS estimators (i.e. their expectations) are equal to the unknownpopulation parameter.

49

Notes on Unbiasedness

I Unbiasedness is feature of the sampling distributions of β0

and β1 that are obtained via repeated random sampling.

I As such, it does not say anything about the estimate that weobtain for a given sample. It is possible that we could obtainan estimate which is far from the true value.

I Unbiasedness generally fails if any of the SLR assumptions fail.

I SLR. 2 needs to be relaxed for time series data. But there areways that it cannot hold in cross-sectional data as well.

I If SLR. 3 fails then the OLS estimators will generally bebiased. This is the most important issue in nonexperimentaldata.

I If x and u are correlated then we have biased estimators.

I Spurious correlation: we find a relationship between y and xthat is really due to other unobserved factors that affect y.

50

Unbiasedness of OLS: A Simple Monte Carlo Experiment

I Population model (DGP - Data Generating Process):

y = 1 + 0.5x+ 2×N(0, 1)

I True parameter values are known: β0 = 1, β1 = 0.5,u = 2×N(0, 1). N(0, 1) represents a random draw from thestandard normal distribution.

I The values of x are drawn from the Uniform distribution:x = 10× Unif(0, 1)

I Using random numbers we can generate artificial data sets.Then, for each data set we can apply the OLS method to findestimates.

I After repeating these steps many times, say 1000, we wouldobtain 1000 slope and intercept estimates.

I Then we can analyze the sampling distribution of theseestimates.

I This is a simple example of Monte Carlo simulationexperiment. These experiments may be useful in analyzingproperties of estimators.

I The following code is written in GRETL.

51

Unbiasedness of OLS: A Simple Monte Carlo Experiment

nulldata 50

seed 123

genr x = 10 * uniform()

loop 1000

genr u = 2 * normal()

genr y = 1 + 0.5 * x + u

ols y const x

genr a = $coeff(const)

genr b = $coeff(x)

genr r2 = $rsq

store MC1coeffs.gdt a b

Endloop

52

Variances of the OLS Estimators

I Unbiasedness of OLS estimators, β0 and β1 is a feature aboutthe center of the sampling distributions.

I We should also know how far we can expect β1 to be awayfrom β1 on average.

I In other words, we should know the sampling variation in OLSestimators in order to establish efficiency and to calculatestandard errors.

I SLR.5: Homoscedasticity (constant variance assumption):This says that the variance of u conditional on x is constant:

Var(u|x) = σ2

I This is also the unconditional variance: Var(u) = σ2

I Using this assumption we can say that u and x are statisticallyindependent: E(u|x) = E(u) = 0 and Var(u|x) = Var(u) = σ2

53

Variances of the OLS Estimators

I Assumptions SLR.3 and SLR.5 can be rewritten in terms ofthe conditional mean and variance of y:

E(y|x) = β0 + β1x

Var(y|x) = σ2

I Conditional expectation of y given x is linear in x.

I Conditional variance of y given x is constant and equal to theerror variance, σ2.

Simple Regression Model under Homoscedasticity

An example of Heteroskedasticity56

Sampling Variances of the OLS Estimators

Under assumptions SLR.1 through SLR.5:

Var(β1) =σ2∑n

i=1(xi − x)2=σ2

s2x

and

Var(β0) =σ2n−1

∑ni=1 x

2i∑n

i=1(xi − x)2

I These formulas are not valid under heteroscedasticity (ifSLR.5 does not hold).

I Sampling variances of OLS estimators increase with the errorvariance and decrease with the sampling variation in x.

57

Error Terms and Residuals

I Error terms and residuals are not the same.

I Error terms are not observable:

yi = β0 + β1xi + ui

I Residuals can be calculated after the model is estimated:

yi = β0 + β1xi + ui

I Residuals can be rewritten as a function of error terms:

ui = yi − β0 − β1xi = β0 + β1xi + ui − β0 − β1xi

ui = ui − (β0 − β0)− (β1 − β1)xi

I From unbiasedness: E(u) = E(u) = 0.

58

Estimating the Error Variance

I We would like to find an unbiased estimator for σ2.I Since by assumption we have E(u2) = σ2 an unbiased

estimator is:1

n

n∑i=1

u2i

I But we cannot use this because we do not observe u.Replacing the errors with the residuals:

1

n

n∑i=1

u2i =

SSR

n

I However, this estimator is biased. We need to make degreesof freedom adjustment. Thus, the unbiased estimator is:

σ2 =1

n− 2

n∑i=1

u2i =

SSR

n− 2

I degrees of freedom (dof) = number of observations - numberof parameters = n-2

59

Standard Errors of OLS estimators

I The square root of the variance of the error term is called thestandard error of the regression):

σ =√σ2 =

√√√√ 1

n− 2

n∑i=1

u2i =

√SSR

n− 2

I σ is also called the root mean squared error.

I Standard error of the OLS slope estimate can be written as :

se(β1) =σ√∑n

i=1(xi − x)2=

σ

sx

60

Regression through the Origin

I In some rare cases we want y = 0 when x = 0. For example,tax revenue is zero whenever income is zero.

I We can redefine the simple regression model without theconstant term as follows: y = β1x.

I Using OLS principle

min

n∑i=1

(y − β1xi)2

I First Order Condition:n∑i=1

xi(y − β1xi) = 0

I Solving this we obtain the OLS estimator of the slopeparameter:

β1 =

∑ni=1 xiyi∑ni=1 x

2i