Download - Regression Analysis 04

Transcript
Page 1: Regression Analysis 04

1

QM IIQM II

Lecture 9: Exploration of Multiple

Regression Assumptions.

Page 2: Regression Analysis 04

2

Organization of Lecture 91. Review of Gauss-Markov Assumptions2. Assumptions to calculate βhat

- Multicollinearity

3. Assumptions to show βhat is unbiased - Omitted Variable Bias - Irrelevant Variables

4. Assumptions to calculate variance of βhat

- Limited Degrees of Freedom- Non-Constant Variance

Page 3: Regression Analysis 04

3

1. 6 Classical Linear Model Assumptions(Wooldridge, p. 167)

1. Linear in Parameters

2. Random Sampling of n observations

3. No perfect collinearity

4. Zero conditional mean. The error u has an expected value of 0, given any values of the independent variables

5. Homoskedasticity. The error has the same variance given any value of the explanatory variables.

6. Normality. The population error u is independent of the explanatory variables and is normally distributed

None of the independent variables are constant and there are no exact linear relationship among independent variables

Page 4: Regression Analysis 04

4

Categories of Assumptions

• Total number of assumptions necessary depends upon how you count– Matrix vs. Scalar

• Four categories of assumptions:– Assumptions to calculate βhat – Assumptions to show βhat is unbiased– Assumptions to calculate variance of βhat

– Assumption for Hypotheses Testing (Normality of the Errors)

Page 5: Regression Analysis 04

5

Categories of Assumptions

• Note that these sets of assumptions follow in sequence

• Each step builds on the previous results• Thus if an assumption is necessary to calculate

βhat , it is also necessary to show that βhat is unbiased, etc.

• If an assumption is only necessary for a later step, earlier results are unaffected

Page 6: Regression Analysis 04

6

2. Assumptions to calculate βhat

Page 7: Regression Analysis 04

7

Assumptions to Calculate βhat :

GM3: x’s vary

• Every x takes on at least two distinct values• Recall our bivariate estimator

• If x does not vary, then the denominator is 0

Page 8: Regression Analysis 04

8

• If x does not vary then our data points become a vertical line.

• The slope of a vertical line is undefined

• Conceptually, if we do not observe variation in x, we cannot draw conclusions about how y varies with x

Assumptions to Calculate βhat :GM3: x’s vary

Page 9: Regression Analysis 04

9

Assumptions to Calculate βhat :GM3 New: x’s are Not Perfectly Collinear

• Matrix (X’X)-1 exists• Recall our multivariate estimator

• This means that (X’X) must be of full rank• No columns can be linearly related

Page 10: Regression Analysis 04

10

• If one x is a perfect linear function of another then OLS cannot distinguish among their effects

• If an x has no variation independent of other x’s, OLS has no information to estimate its effect.

• This is more general statement of the previous assumption – x varies.

Assumptions to Calculate βhat :GM3 New: x’s are Not Perfectly Collinear

Page 11: Regression Analysis 04

11

Multicollinearity

Page 12: Regression Analysis 04

12

Multicollinearity• What happens if two of variables are not perfectly related,

but are still highly correlated?

Excerpts from readings:• Wooldridge, p. 866: “A term that refers to correlation among the independent

variables in a multiple regression model; it is usually invoked when some correlations are ‘large,’ but an actual magnitude is not well-defined.”

• KKV, p. 122. “Any situation where we can perfectly predict one explanatory variable from one or more of the remaining explanatory variables.”

• UCLA On-line Regression Course: “The primary concern is that as the degree of multicollinearity increases, the regression model estimates of the coefficients become unstable and the standard errors for the coefficients can get wildly inflated.”

• Wooldrige, p. 144. Large standard errors can result from multicollinearity.

Page 13: Regression Analysis 04

13

Multicollinearity: Definition

• If multicollinearity is not perfect, then (X’X)-1 exists and analysis can proceed.

• But multicollinearity can cause problems for hypothesis testing even if correlations are not perfect.

• As the level of multicollinearity increases, the amount of independent information about the x’s decreases

• The problem is insufficient information in the sample

Page 14: Regression Analysis 04

14

Multicollinearity: Implications

• Our only assumption in deriving βhat and showing it is BLUE was that (X’X)–1 exists.

• Thus if multicollinearity is not perfect, then OLS is unbiased and is still BLUE.

• But while βhat will have the least variance among unbiased linear estimators, its variance will increase.

Page 15: Regression Analysis 04

15

Multicollinearity: Implications

• Recall our MR equation for βhat in scalar notation:

• As multicollinearity increases, the correlation between xik and xik

hat increases

Page 16: Regression Analysis 04

16

Multicollinearity: Implications• Reducing x’s variation from xhat is the same as reducing

x’s variation from the mean of x in the bivariate model.

• Recall the equation for the variance of βhat in the bivariate regression model:

• Now, we have the analogous formula for Multiple Regression1

22 ˆˆ1 2

ˆ( )( )

u

i

Varx x

1

2 22 ˆ ˆˆ1 2 2

1 1 1 1

21 1 1

1

ˆ( )ˆ( ) (1 )

, where is the total variation in x and is the R-squared from

the regression of x on all the other x's.

u u

i i

Varx x SST R

SST R

The closer x is to its predicted value, the smaller the quantity and therefore

the larger the variance around Beta1_hat. (a.k.a a

larger standard error).

Page 17: Regression Analysis 04

17

Multicollinearity: Implications

• Thus as the correlation between x and xhat increases, the denominator for the variance of βhat decreases – increasing the variance of βhat

• Notice if x1 is uncorrelated with x2…xn, then the formula is the same as in the bivariate case.

Page 18: Regression Analysis 04

18

Multicollinearity: Implications

• In practice, x’s that are causing the same y are often correlated to some extent

• If the correlation is high, it becomes difficult to distinguish the impact of x1 on y from the impact of x2…xn

• OLS estimates tend to be sensitive to small changes in the data.

Page 19: Regression Analysis 04

19

Illustrating Multicollinearity

• When correlation among x’s is low, OLS has lots of information to estimate βhat

• This gives us confidence in our estimates of βhat

Y

X1

X2

Page 20: Regression Analysis 04

20

Illustrating Multicollinearity

• When correlation among x’s is high, OLS has very little information to estimate βhat

• This makes us relatively uncertain about our estimate of βhat

y

x1

x2

Page 21: Regression Analysis 04

21

Multicollinearity: Causes

• x’s are causally related to one another and you put them both in your model. Proximate versus ultimate causality.

(i.e. legacy v. political institutions and economic reform)

• Poorly constructed sampling design causes correlation among x’s.

(i.e. You survey 50 districts in Indonesia, where the richest districts are all ethnically homogenous, so you cannot distinguish between ethnic tension and wealth on propensity to violence.).

• Poorly constructed measures over aggregate information can make cases correlate.

(i.e. Freedom House – Political Liberties, Global Competitiveness Index)

Page 22: Regression Analysis 04

22

Multicollinearity: Causes

• Statistical model specification: adding polynomial terms or trend indicators.

(i.e. Time since the last independent election correlates with Eastern Europe or Former Soviet Union).

• Too many variables in the model – x’s measure the same conceptual variable.

(i.e. Two causal variables essentially pick-up up on the same underlying variable).

Page 23: Regression Analysis 04

23

Multicollinearity: Warning Signs

• F-test of joint-significance of several variables is significant but coefficients are not.

• Coefficients are substantively large but statistically insignificant.

• Standard errors of βhat ’s change when other variables included or removed, but estimated value of βhat does not.

Page 24: Regression Analysis 04

24

Multicollinearity: Warning Signs• Multicollinearity could be a problem any time that

a coefficient is not statistically significant

• Should always check analyses for multicollinearity levels

• If coefficients are significant, then multicollinearity is not a problem for hypothesis testing.

• Its only effect is to increase the σβhat2

Page 25: Regression Analysis 04

25

Multicollinearity: The Diagnostic

• Diagnostic of multicollinearity is the auxiliary R-squared

• Regress each x on all other x’s in the model

• R-squared will show you linear correlation between each x and all other x’s in the model

Page 26: Regression Analysis 04

26

Multicollinearity: The Diagnostic• There is no definitive threshold when the

auxiliary R-squared is too high. – Depends on whether β is significant

• Tolerance for multicollinearity depends on n– Larger n means more information

• If auxiliary R-squared is close to 1 and βhat is insignificant, you should be concerned

– If n is small then .7 or .8 may be too high

Page 27: Regression Analysis 04

27

Multicollinearity: Remedies

• Increase sample size to get more information

• Change sampling mechanism to allow greater variation in x’s

• Change unit of analysis to allow more cases and more variation in x’s (districts instead of states)

• Look at bivariate correlations prior to modeling

Page 28: Regression Analysis 04

28

Multicollinearity: Remedies

• Disaggregate measures to capture independent variation

• Create a composite scale or index if variables measure the same concept

• Construct measures to avoid correlations

Page 29: Regression Analysis 04

29

Multicollinearity: An Example reg approval cpi unemrate realgnp

Source | SS df MS Number of obs = 148

---------+------------------------------ F( 3, 144) = 5.38

Model | 2764.51489 3 921.504964 Prob > F = 0.0015

Residual | 24686.582 144 171.434597 R-squared = 0.1007

---------+------------------------------ Adj R-squared = 0.0820

Total | 27451.0969 147 186.742156 Root MSE = 13.093

------------------------------------------------------------------------------

approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

cpi | -.0519523 .1189559 -0.437 0.663 -.2870775 .1831729

unemrate | 1.38775 .9029763 1.537 0.127 -.3970503 3.172551

realgnp | -.0120317 .007462 -1.612 0.109 -.0267809 .0027176

_cons | 61.52899 5.326701 11.551 0.000 51.00037 72.05762

------------------------------------------------------------------------------

Page 30: Regression Analysis 04

30

Multicollinearity: An Example

reg cpi unemrate realgnp

Source | SS df MS Number of obs = 148

---------+------------------------------ F( 2, 145) = 493.51

Model | 82467.9449 2 41233.9725 Prob > F = 0.0000

Residual | 12115.0951 145 83.5523803 R-squared = 0.8719

---------+------------------------------ Adj R-squared = 0.8701

Total | 94583.0401 147 643.422041 Root MSE = 9.1407

------------------------------------------------------------------------------

cpi | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

unemrate | 3.953796 .5381228 7.347 0.000 2.890218 5.017374

realgnp | .0538443 .0026727 20.146 0.000 .0485618 .0591267

_cons | -30.68007 2.708701 -11.326 0.000 -36.03371 -25.32644

------------------------------------------------------------------------------

Page 31: Regression Analysis 04

31

Multicollinearity: An Example

alpha realgnp unemrate cpi, item gen(economy) std

Test scale = mean(standardized items)

item-test item-rest interitem

Item | Obs Sign correlation correlation correlation alpha

---------+--------------------------------------------------------------------

realgnp | 148 + 0.9179 0.8117 0.7165 0.8348

unemrate | 148 + 0.8478 0.6720 0.9079 0.9517

cpi | 148 + 0.9621 0.9092 0.5961 0.7469

---------+--------------------------------------------------------------------

Test 0.7402 0.8952

---------+--------------------------------------------------------------------

Page 32: Regression Analysis 04

32

Multicollinearity: An Example

. reg approval economy

Source | SS df MS Number of obs = 148

---------+------------------------------ F( 1, 146) = 7.89

Model | 1408.13844 1 1408.13844 Prob > F = 0.0056

Residual | 26042.9585 146 178.376428 R-squared = 0.0513

---------+------------------------------ Adj R-squared = 0.0448

Total | 27451.0969 147 186.742156 Root MSE = 13.356

------------------------------------------------------------------------------

approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

economy | -3.403865 1.211486 -2.810 0.006 -5.798181 -1.00955

_cons | 54.68946 1.097837 49.816 0.000 52.51975 56.85916

------------------------------------------------------------------------------

Page 33: Regression Analysis 04

33

3. Assumptions to Show βhat is Unbiased

Page 34: Regression Analysis 04

34

Assumptions to Show βhat is Unbiased – GM4: x’s are fixed

• Conceptually, this assumption implies that we could repeat an “experiment” in which we held x constant and observed new values for u, and y

• This assumption is necessary to allow us to calculate a distribution of βhat

• Knowing the distribution of βhat is essential for calculating its mean.

Page 35: Regression Analysis 04

35

• Without knowing the mean of βhat, we cannot know whether E(βhat )= β

• In addition, without knowing the mean of βhat or its distribution, we cannot know the variance of βhat

• In practical terms, we must assume that the independent variables are measured without error.

Assumptions to Show βhat is Unbiased –

GM4: x’s are fixed

Page 36: Regression Analysis 04

36

• Recall our equation for βhat :

• X is fixed and non-zero• Thus if

1 1

1 1

1

ˆ ' '( )

ˆ ' '

ˆ '

u

u

u

(X'X) X y (X'X) X X

(X'X) X X (X'X) X

(X'X) X

ˆ( ) 0, ( )E u then E

Assumptions to Show βhat is Unbiased – GM4: x’s are fixed

Page 37: Regression Analysis 04

37

• Conceptually, this assumption means that we believe that our theory - described in the equation (y=Xβ+u) accurately represents our dependent variable.

• If E(u) is not equal to zero then we have the wrong equation – an omitted variable

Assumptions to Show βhat is Unbiased –

GM4: x’s are fixed

Page 38: Regression Analysis 04

38

• Note that the assumption E(u)=0 implies that E(u1)=E(u2)…E(un)=0

• Therefore, the assumption E(u)=0 also implies that u is independent of the values of X

• That is, E(ui|xik)=0

Assumptions to Show βhat is Unbiased – x’s are fixed

Page 39: Regression Analysis 04

39

Omitted Variable Bias

Page 40: Regression Analysis 04

40

What is Model Specification?

• Model Specification is two sets of choices:– The set of variables that we include in a

model – The functional form of the relationships we

specify

• These are central theoretical choices

• Can’t get the right answer if we ask the wrong question

Page 41: Regression Analysis 04

41

What is the “Right” Model?

• In truth, all our models are misspecified to some extent.

• Our theories are always a simplification of reality, and all our measures are imperfect.

• Our task is to seek models that are reasonably well specified – keeps our errors relatively modest

Page 42: Regression Analysis 04

42

Model Specification: Causes

• There are four basic types of model misspecification:

– Inclusion of an irrelevant variable

– Exclusion of a relevant variable

– Measurement error

– Erroneous functional form for the relationship

Page 43: Regression Analysis 04

43

Omitted Variable Bias: Excluding Relevant Variables

• Imagine that the true causes of y can be represented as:

• But we estimate the model:

0 1 1 2 2y x x u

*0 1 1

*2 2

:

y x u

Where

u x u

Page 44: Regression Analysis 04

44

Omitted Variable Bias: Consequences

• Clearly this model violates our assumption that E(u)=0– E(β2 x2+u) is not 0

• We required this assumption in order to show that βhat is an unbiased estimator of β

• Thus by excluding x2, we risk biasing our coefficients for x1

• If βhat is biased, then σβhat is also biased

Page 45: Regression Analysis 04

45

Omitted Variable Bias: Consequences

• Under these circumstances, what does our estimate of βhat reflect?

• Recall that:

11 1 1 1

11 1 1 1 1 1 2 2

1 11 1 1 1 1 2 2 1 1 1

ˆ ( ' ) '

ˆ ( ' ) '( )

ˆ ( ' ) ' ( ' ) '

X X X y

X X X X X u

X X X X X X X u

Page 46: Regression Analysis 04

46

Omitted Variable Bias: Consequences

• We retain the assumption that E(u)=0

• That is the expectation of the errors is 0 after we account for the impact of x2

• Thus our estimate of βhat1 is:

11 1 1 1 1 2 2

ˆ ( ' ) 'X X X X

Page 47: Regression Analysis 04

47

Omitted Variable Bias:Consequences

• This equation indicates that our estimate of βhat 1 will be biased by two factors

• The extent of the correlation between x1 and x2

• The extent of x2’s impact on y

• Often difficult to know direction of bias

Page 48: Regression Analysis 04

48

Omitted Variable Bias: Consequences

• Remember, if either of these sources of bias is 0, then the overall bias is 0

• Thus E(βhat1)= β1 if:

– Correlation of x1 & x2=0; note this is a characteristic of the sample

OR

– β2=0; note this is not a statement about the sample, but a theoretical assertion

Page 49: Regression Analysis 04

49

Summary of Bias in β1hat the Estimator

when x2 is omitted

Relationship Corr (x1, x2)>0 Corr (x1, x2)<0

β2>0 Positive Bias: β1hat

will appear to have a strong positive relationship with y.

(Also called upward bias or biased to the right)

Negative Bias: β1hat

will appear to have a strong negative

relationship with y.

(Also called downward bias or biased to the left)

β2<0 Negative Bias Positive Bias

Page 50: Regression Analysis 04

50

Omitting Variables and Model Specification

• These criteria give us our conceptual standards for determining when a variable must be included

• A variable must be included in a regression equation IF:– The variable is correlated with other x’sAND– The variable is also a cause of y

Page 51: Regression Analysis 04

51

Illustrating Omitted Variable Bias

• Imagine a true model where x1 has a small effect on y and is correlated with x2 that has a large effect on y.

• Specifying both variables can distinguish these effects

y

x1

x2

Page 52: Regression Analysis 04

52

Illustrating Omitted Variable Bias

• But when we run simple model excluding x2, we attribute all causal influence to x1

• Coefficient is too big and variance of coefficient is too small

Y

X1

Page 53: Regression Analysis 04

53

Omitted Variable Bias: Causes• This problem is primarily theoretical rather than

“statistical”

• Though OVTEST or LINKTEST in STATA are designed to offer some indication, ultimately they not satisfying. They may reveal that are residuals are not correlated with the independent variables, but this may still not be theoretically satisfying.

• Scholars can form hypotheses about other x’s that may be a source of omitted variable bias

Page 54: Regression Analysis 04

54

The Classis Case of Omitted Variable Bias

# of Patient Deaths

% of Patients Ill

% of Patients taking medicine

Page 55: Regression Analysis 04

55

Spuriousness and Omitted Variable Bias: An IR Example

CooperativeBehavior

National Interests

InternationalInstitutions

Page 56: Regression Analysis 04

56

Including Irrelevant Variables

• Do we see the same kinds of problems if we include irrelevant variables?

• NO!• Imagine we estimate the equation:

0 1 1 2 2ˆ ˆ ˆˆ ˆy x x u

Page 57: Regression Analysis 04

57

Including Irrelevant Variables• In this case:

• If βhat2=0, our estimate of βhat

1 is not affected

• If x1’x2=0, our estimate of βhat1 is not

affected• But including x2 if it is not relevant does

unnecessarily inflate the σ 2 βhat1

0

1

2

ˆ

ˆ ˆ( )

ˆ

E

Page 58: Regression Analysis 04

58

Including Irrelevant Variables: Consequences

• σ 2 βhat1 increases for two reasons:

• Addition of parameter for x2 reduces the degrees of freedom – Part of estimator for σuhat

2

• If βhat2=0 but x1’x2 is not, then including x2

unnecessarily reduces independent variation of x1. (Multicollinearity)

• Thus parsimony remains a virtue

Page 59: Regression Analysis 04

59

Rules for Model Specification

• Model specification is fundamentally a theoretical exercise. We build models to reflect our theories

• As much as we would like, the theorizing process cannot be replaced with statistical tests

• Avoid mechanistic rules for specification such as stepwise regression

Page 60: Regression Analysis 04

60

The Evils of Stepwise Regression

• Stepwise regression is a method of model specification that chooses variables on:– Significance of their t-scores

– Their contribution to R2

• Variables will be selected in or out depending on the order they are introduced into the model

Page 61: Regression Analysis 04

61

The Evils of Stepwise Regression

• Stepwise regression can fall victim to specification errors as a result of multicollinearity or spuriousness problems

• R2 is not a generalizable property of the model

• Thus stepwise regression is a theoretical curve-fitting

• Our forecasting ability becomes limited

Page 62: Regression Analysis 04

62

Choosing Your Variables• Variable selection should be based on your

theoretical understanding of the causal process underlying the dependent variable

• This is a theoretical rather than a statistical exercise

• There are rules of thumb for including or excluding variables from an analysis, but ultimately Mr. Theory needs to be your guide.

Page 63: Regression Analysis 04

63

Reasons to Include a Variable

• x2 is a potential Source of Omitted Variable Bias– Variable is a cause of y and correlated with x1

• Methodological Control variables– Selection parameters, temporal dependence, etc

• You must include these x’s as controls • x2 is a benchmark for evaluating the

substantive impact of x1

– Selected at your discretion

Page 64: Regression Analysis 04

64

Reasons to Exclude a Variable

• x2 is NOT a cause of y, but it creates multicollinearity problems

• x2 is an intervening variable in between x1 and y (a proximate cause of y, while x1

remains the ultimate cause).• With intervening variables, a good strategy

is to run the model both with and without x2

– Examine change in coefficient for x1

Page 65: Regression Analysis 04

65

3. Assumptions to Calculate the Variance of βhat

Page 66: Regression Analysis 04

66

Calculating the Variance of βhat -Degrees of Freedom

• We must have more cases than we have x’s

• In other words, n>k• Recall that our estimator of βhat is the

result of numerous summation operations• Each summation has n pieces of

information about x in it, but…

Page 67: Regression Analysis 04

67

Calculating the Variance of βhat : Degrees of Freedom

• Not all n pieces of information about x in the summations are independent

• Take the calculation:

• Once we calculate x-bar, then for the final observation xn, we know what that observation of x must be, given x-bar

1

( )n

ii

x x

Page 68: Regression Analysis 04

68

Calculating the Variance of βhat:Degrees of Freedom

• Thus the degrees of freedom for the summation is n-1-1(for the intercept)

• We lose one additional piece of information in estimating the parameter x-bar.

• For each parameter, we lose one more piece of independent information because the parameters depend on the values in the data.

1

( )n

ii

x x

Page 69: Regression Analysis 04

69

Calculating the Variance of βhat:Degrees of Freedom

• Dividing by the degrees of freedom was necessary to make an unbiased estimate of the variance of βhat

• Recall the formulas:

• If k>n then the variance of βhat is undefined

22ˆ

ˆ/( 1)

( 1)i

u

uSSR n k

n k

22 ˆˆ1 21

ˆ( )( )

u

i

Varx x

2 2 1

ˆ ˆˆ( ) ( ' )uVar X X

Scalar Notation Matrix Notation

Page 70: Regression Analysis 04

70

Calculating the Variance of βhat:Sufficient Degrees of Freedom

• Conceptually, this means that the values of the βhat vector are over-determined

• Hypothesis tests become impossible

• One could calculate a βhat “by hand,” though it would be useless.

• Homework 2 - Problem 6.

Page 71: Regression Analysis 04

71

Calculating the Variance of βhat: GM5: Error Variance is Constant

• More specifically we assume that:

• To calculate the variance of βhat we factored σu

2 out of a summation across the n data points.

• If σu2 is not constant then our numerator for

σβ-hat2 will be wrong

2 2 2( ) ( )i i uE u u E u

Page 72: Regression Analysis 04

72

Calculating the Variance of β: Error Variance is Constant

• Conceptually, this assumption states that each of our observations is equally reliable as a piece of information for estimating β

• If E(u) is not equal to σu2 then some data

points hold more information than others.• This is known as heteroskedasticity

Page 73: Regression Analysis 04

73

Calculating the Variance of βhat:The Independence of Errors

• More specifically, we assume that:

• Where t and s are errors on particular variables and the mean of u is the overall model error

• Proof of minimum variance assumed that matrix uu’ can be factored out of the variance of βhat as σu

2I.

• All elements of uu’ matrix off the diagonal are assumed to be 0

0),cov()])([( stst uuuuuuE

Page 74: Regression Analysis 04

74

Calculating the Variance of βhat

The Independence of Errors

• If this is not true, then OLS is not BLUE and our equation for the variance of βhat is wrong – usually too small

• Conceptually, this assumption means that the only systematic factors predicting y are the variables in the X matrix, u is not systematic

Page 75: Regression Analysis 04

75

Examples to Work Through

Page 76: Regression Analysis 04

76

WARNING !!!

• ALL DATA IN THE FOLLOWING EXAMPLES ARE COMPLETELY ARTIFICIAL!

• PLEASE DON’T DRAW ANY SUBSTANTIVE CONCLUSIONS FROM THESE RESULTS!!!

Page 77: Regression Analysis 04

77

Omitted Variable Bias: Trade as a Cause of Democracy

Democracy

Trade Openness

Economic Wealth

β=0.5

β =0.4

r =0.6

Page 78: Regression Analysis 04

78

Regressing Democracy on Trade, Omitting Wealth

reg democ tottrade Source | SS df MS Number of obs = 100

---------+------------------------------ F( 1, 98) = 34.98

Model | 3446.51643 1 3446.51643 Prob > F = 0.0000

Residual | 9656.70632 98 98.5378196 R-squared = 0.2630

---------+------------------------------ Adj R-squared = 0.2555

Total | 13103.2227 99 132.355785 Root MSE = 9.9266

------------------------------------------------------------------------------

democ | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

tottrade | 1.010671 .1708917 5.914 0.000 .6715418 1.3498

_cons | 2.372788 2.009129 1.181 0.240 -1.614264 6.35984

------------------------------------------------------------------------------

Page 79: Regression Analysis 04

79

Regressing Democracy on Trade AND Wealth

. reg democ tottrade wealth

Source | SS df MS Number of obs = 100

---------+------------------------------ F( 2, 97) = 28.27

Model | 4825.11958 2 2412.55979 Prob > F = 0.0000

Residual | 8278.10316 97 85.3412697 R-squared = 0.3682

---------+------------------------------ Adj R-squared = 0.3552

Total | 13103.2227 99 132.355785 Root MSE = 9.238

------------------------------------------------------------------------------

democ | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

tottrade | .5284415 .1992195 2.653 0.009 .1330461 .923837

wealth | .3886304 .0966934 4.019 0.000 .1967208 .5805399

_cons | 1.199239 1.892422 0.634 0.528 -2.556694 4.955173

------------------------------------------------------------------------------

Page 80: Regression Analysis 04

80

Irrelevant Controls and Multicollinearity

Democracy

Trade Openness

Economic Wealth

β =0.5

β =0.4r =0.6

Military Power

β = 0

r = .95

r = .75

Page 81: Regression Analysis 04

81

Regressing Democracy on Wealth, Trade and Power

. reg democ tottrade wealth milpow Source | SS df MS Number of obs = 100

---------+------------------------------ F( 3, 96) = 18.70

Model | 4833.01576 3 1611.00525 Prob > F = 0.0000

Residual | 8270.20698 96 86.1479894 R-squared = 0.3688

---------+------------------------------ Adj R-squared = 0.3491

Total | 13103.2227 99 132.355785 Root MSE = 9.2816

------------------------------------------------------------------------------

democ | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

tottrade | .6306823 .3925667 1.607 0.111 -.1485564 1.409921

wealth | .5127502 .4213262 1.217 0.227 -.3235757 1.349076

milpow | -.0619981 .2047824 -0.303 0.763 -.468488 .3444917

_cons | 1.088159 1.936422 0.562 0.575 -2.755609 4.931927

------------------------------------------------------------------------------

Page 82: Regression Analysis 04

82

Auxiliary Regression on Wealth, Trade and Power

reg wealth tottrade milpow Source | SS df MS Number of obs = 100

---------+------------------------------ F( 2, 97) = 1382.91

Model | 13837.6049 2 6918.80243 Prob > F = 0.0000

Residual | 485.297631 97 5.00306836 R-squared = 0.9661

---------+------------------------------ Adj R-squared = 0.9654

Total | 14322.9025 99 144.675783 Root MSE = 2.2368

------------------------------------------------------------------------------

wealth | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

tottrade | -.7139595 .0607854 -11.746 0.000 -.8346018 -.5933173

milpow | .4729452 .0113791 41.562 0.000 .4503608 .4955297

_cons | 1.007912 .4552951 2.214 0.029 .1042778 1.911547

------------------------------------------------------------------------------

Page 83: Regression Analysis 04

83

Irrelevant Controls and Extraneous (but harmless) B’s

Democracy

Trade Openness

Economic Wealth

β =0.5

β =0.4r =0.6

Olympic Medals

β = 0

r = .10

r = .03

Page 84: Regression Analysis 04

84

Regressing Democracy on Wealth, Trade and Medals

reg democ tottrade wealth medals Source | SS df MS Number of obs = 100

---------+------------------------------ F( 3, 96) = 19.03

Model | 4886.75371 3 1628.9179 Prob > F = 0.0000

Residual | 8216.46904 96 85.5882191 R-squared = 0.3729

---------+------------------------------ Adj R-squared = 0.3533

Total | 13103.2227 99 132.355785 Root MSE = 9.2514

------------------------------------------------------------------------------

democ | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

tottrade | .5201341 .1997475 2.604 0.011 .1236385 .9166298

wealth | .3988365 .0975772 4.087 0.000 .2051473 .5925257

medals | -.1334243 .1572285 -0.849 0.398 -.4455204 .1786718

_cons | 2.471675 2.416604 1.023 0.309 -2.325246 7.268597

------------------------------------------------------------------------------

Page 85: Regression Analysis 04

85

Control Variables for Comparisons

Presidential Approval

GNP Growth

InternationalThreats

B=1.0

B=0.5

r =0.05

Page 86: Regression Analysis 04

86

Regressing Approval on GNP Growth, Omitting Threats

reg approve gnpgro

Source | SS df MS Number of obs = 100

---------+------------------------------ F( 1, 98) = 13.70

Model | 4419.43882 1 4419.43882 Prob > F = 0.0004

Residual | 31604.8543 98 322.498513 R-squared = 0.1227

---------+------------------------------ Adj R-squared = 0.1137

Total | 36024.2931 99 363.881748 Root MSE = 17.958

------------------------------------------------------------------------------

approve | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

gnpgro | 1.144467 .3091601 3.702 0.000 .5309485 1.757985

_cons | 27.2151 3.634715 7.488 0.000 20.00213 34.42807

------------------------------------------------------------------------------

Page 87: Regression Analysis 04

87

Regressing Approval on GNP Growth AND Threats

reg approve gnpgro threat Source | SS df MS Number of obs = 100

---------+------------------------------ F( 2, 97) = 162.55

Model | 27745.7297 2 13872.8648 Prob > F = 0.0000

Residual | 8278.56343 97 85.3460147 R-squared = 0.7702

---------+------------------------------ Adj R-squared = 0.7655

Total | 36024.2931 99 363.881748 Root MSE = 9.2383

------------------------------------------------------------------------------

approve | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

gnpgro | 1.015053 .1592343 6.375 0.000 .6990165 1.331089

threat | .4972377 .0300769 16.532 0.000 .4375434 .556932

_cons | 4.292249 2.327818 1.844 0.068 -.3278251 8.912323

------------------------------------------------------------------------------

Page 88: Regression Analysis 04

88

Controlling for Intervening Variables

Bilateral Trade Flows

Similarity ofRegime Type

Similarity ofSecurity Interests

β = 0

β=2.0

β = 1.5

Page 89: Regression Analysis 04

89

Regressing Trade Flows Directly on Regime Similarity reg tradeflo regimsim

Source | SS df MS Number of obs = 100

---------+------------------------------ F( 1, 98) = 46.42

Model | 21021.1745 1 21021.1745 Prob > F = 0.0000

Residual | 44375.3387 98 452.809579 R-squared = 0.3214

---------+------------------------------ Adj R-squared = 0.3145

Total | 65396.5132 99 660.570841 Root MSE = 21.279

------------------------------------------------------------------------------

tradeflo | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

regimsim | 2.49602 .3663341 6.814 0.000 1.769042 3.222998

_cons | 10.20431 4.306895 2.369 0.020 1.657422 18.75121

------------------------------------------------------------------------------

Page 90: Regression Analysis 04

90

Regressing Trade Flows on Regime & Security Similarity reg tradeflo allysim regimsim Source | SS df MS Number of obs = 100

---------+------------------------------ F( 2, 97) = 334.65

Model | 57118.4095 2 28559.2047 Prob > F = 0.0000

Residual | 8278.10377 97 85.341276 R-squared = 0.8734

---------+------------------------------ Adj R-squared = 0.8708

Total | 65396.5132 99 660.570841 Root MSE = 9.238

------------------------------------------------------------------------------

tradeflo | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

allysim | 1.98863 .0966934 20.566 0.000 1.796721 2.18054

regimsim | .0284416 .1992195 0.143 0.887 -.3669539 .423837

_cons | 4.199239 1.892422 2.219 0.029 .4433061 7.955173

------------------------------------------------------------------------------