Other Stu and Examples: Polynomials, Logs, Time Series ...

118
Other Stuff and Examples: Polynomials, Logs, Time Series, Model Selection, Logistic Regression... Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1

Transcript of Other Stu and Examples: Polynomials, Logs, Time Series ...

Page 1: Other Stu and Examples: Polynomials, Logs, Time Series ...

Other Stuff and Examples:Polynomials, Logs, Time Series, Model Selection,

Logistic Regression...

Carlos M. CarvalhoThe University of Texas at Austin

McCombs School of Business

http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/

1

Page 2: Other Stu and Examples: Polynomials, Logs, Time Series ...

Regression Model Assumptions

Yi = β0 + β1Xi + ε

Recall the key assumptions of our linear regression model:

(i) The mean of Y is linear in X ′s.

(ii) The additive errors (deviations from line)

I are normally distributed

I independent from each other

I identically distributed (i.e., they have constant variance)

Yi |Xi∼N(β0 + β1Xi , σ2)

2

Page 3: Other Stu and Examples: Polynomials, Logs, Time Series ...

Regression Model Assumptions

Inference and prediction relies on this model being “true”!

If the model assumptions do not hold, then all bets are off:

I prediction can be systematically biased

I standard errors, intervals, and t-tests are wrong

We will focus on using graphical methods (plots!) to detect

violations of the model assumptions.

3

Page 4: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example

4 6 8 10 12 14

456789

11

x1

y1

4 6 8 10 12 14

34

56

78

9

x2

y2

4 6 8 10 12 14

68

10

12

x3

y3

8 10 12 14 16 18

68

10

12

x4

y4

Here we have two datasets... Which one looks compatible with our

modeling assumptions?

4

Page 5: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example

Week VII. Slide 13Applied Regression Analysis Carlos M. Carvalho

(1)

(2)

Example where things can go bad!

5

Page 6: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example

The regression output values are exactly the same...

4 6 8 10 12 14

45

67

89

1011

x1

y1

4 6 8 10 12 14

34

56

78

9

x2

y2

4 6 8 10 12 14

68

10

12

x3

y3

8 10 12 14 16 18

68

10

12

x4

y4

Thus, whatever decision or action we might take based on the

output would be the same in both cases!

6

Page 7: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example

...but the residuals (plotted against Y ) look totally different!!

5 6 7 8 9 10

-2-1

01

reg1$fitted

reg1$residuals

5 6 7 8 9 10

-2.0

-1.0

0.0

1.0

reg2$fittedreg2$residuals

5 6 7 8 9 10

-10

12

3

reg3$fitted

reg3$residuals

7 8 9 10 11 12

-1.5

-0.5

0.5

1.5

reg4$fitted

reg4$residualsPlotting e vs Y is your #1 tool for finding fit problems.

7

Page 8: Other Stu and Examples: Polynomials, Logs, Time Series ...

Residual Plots

We use residual plots to “diagnose” potential problems with the

model.

From the model assumptions, the error term (ε) should have a few

properties... we use the residuals (e) as a proxy for the errors as:

εi = yi − (β0 + β1x1i + β2x2i + · · ·+ βpxpi )

≈ yi − (b0 + b1x1i + b2x2i + · · ·+ bpxpi

= ei

8

Page 9: Other Stu and Examples: Polynomials, Logs, Time Series ...

Residual Plots

What kind of properties should the residuals have??

ei ≈ N(0, σ2) iid and independent from the X’s

I We should see no pattern between e and each of the X ’s

I This can be summarized by looking at the plot between

Y and e

I Remember that Y is “pure X”, i.e., a linear function of the

X ’s.

If the model is good, the regression should have pulled out of Y all

of its “x ness”... what is left over (the residuals) should have

nothing to do with X .9

Page 10: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example – Mid City (Housing)

Left: y vs. y

Right: y vs e

Example, the midcity housing regression:

Left: y vs fits, Right: fits vs. resids (y vs. e).

! !!

!

!!

!!

!

!

!

!

!

!

!

! !

!

!

!

!!

!

!

!

!

!

!

!

!!

!

!!

! !!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!!!

!

!

!

!

!

!

!

! !!

!

!

!

!

!

!

100 120 140 160 180

80120

160

200

yhat

y=price !

!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

! !

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!!

!!

100 120 140 160 180

−30

−10

010

2030

yhat

e=resid

10

Page 11: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example – Mid City (Housing)

Size vs. ex= size of house vs. resids for midcity multiple regression.

!

!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!!

!!

1.6 1.8 2.0 2.2 2.4 2.6

−30

−20

−10

010

2030

size

resids

11

Page 12: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example – Mid City (Housing)

I In the Mid City housing example, the residuals plots (both X

vs. e and Y vs. e) showed no obvious problem...

I This is what we want!!

I Although these plots don’t guarantee that all is well it is a

very good sign that the model is doing a good job.

12

Page 13: Other Stu and Examples: Polynomials, Logs, Time Series ...

Non Linearity

Example: Telemarketing

I How does length of employment affect productivity (number

of calls per day)?

13

Page 14: Other Stu and Examples: Polynomials, Logs, Time Series ...

Non Linearity

Example: Telemarketing

I Residual plot highlights the non-linearity!

14

Page 15: Other Stu and Examples: Polynomials, Logs, Time Series ...

Non Linearity

What can we do to fix this?? We can use multiple regression and

transform our X to create a no linear model...

Let’s try

Y = β0 + β1X + β2X2 + ε

The data...

months months2 calls

10 100 18

10 100 19

11 121 22

14 196 23

15 225 25

... ... ... 15

Page 16: Other Stu and Examples: Polynomials, Logs, Time Series ...

TelemarketingAdding Polynomials

Week VIII. Slide 5Applied Regression Analysis Carlos M. Carvalho

Linear Model

16

Page 17: Other Stu and Examples: Polynomials, Logs, Time Series ...

TelemarketingAdding Polynomials

Week VIII. Slide 6Applied Regression Analysis Carlos M. Carvalho

With X2

17

Page 18: Other Stu and Examples: Polynomials, Logs, Time Series ...

TelemarketingAdding Polynomials

Week VIII. Slide 7Applied Regression Analysis Carlos M. Carvalho

18

Page 19: Other Stu and Examples: Polynomials, Logs, Time Series ...

Telemarketing

What is the marginal effect of X on Y?

∂E [Y |X ]

∂X= β1 + 2β2X

I To better understand the impact of changes in X on Y you

should evaluate different scenarios.

I Moving from 10 to 11 months of employment raises

productivity by 1.47 calls

I Going from 25 to 26 months only raises the number of calls

by 0.27.

I This is just like variable interactions from Section 3. “The

effect of X1 onto Y depends on the value of X2”. Here, X1

and X2 are the same variable! 19

Page 20: Other Stu and Examples: Polynomials, Logs, Time Series ...

Polynomial Regression

Even though we are limited to a linear mean, it is possible to get

nonlinear regression by transforming the X variable.

In general, we can add powers of X to get polynomial regression:

Y = β0 + β1X + β2X2 . . .+ βmX

m

You can fit any mean function if m is big enough.

Usually, m = 2 does the trick.

20

Page 21: Other Stu and Examples: Polynomials, Logs, Time Series ...

Closing Comments on Polynomials

We can always add higher powers (cubic, etc) if necessary.

Be very careful about predicting outside the data range. The curve

may do unintended things beyond the observed data.

Watch out for over-fitting... remember, simple models are

“better”.

21

Page 22: Other Stu and Examples: Polynomials, Logs, Time Series ...

Be careful when extrapolating...

10 15 20 25 30 35 40

2025

30

months

calls

22

Page 23: Other Stu and Examples: Polynomials, Logs, Time Series ...

...and, be careful when adding more polynomial terms!

10 15 20 25 30 35 40

1520

2530

3540

months

calls

238

23

Page 24: Other Stu and Examples: Polynomials, Logs, Time Series ...

Non-constant Variance

Example...

This violates our assumption that all εi have the same σ2.

24

Page 25: Other Stu and Examples: Polynomials, Logs, Time Series ...

Non-constant Variance

Consider the following relationship between Y and X :

Y = γ0Xβ1(1 + R)

where we think about R as a random percentage error.

I On average we assume R is 0...

I but when it turns out to be 0.1, Y goes up by 10%!

I Often we see this, the errors are multiplicative and the

variation is something like ±10% and not ±10.

I This leads to non-constant variance (or heteroskedasticity)

25

Page 26: Other Stu and Examples: Polynomials, Logs, Time Series ...

The Log-Log Model

We have data on Y and X and we still want to use a linear

regression model to understand their relationship... what if we take

the log (natural log) of Y ?

log(Y ) = log[γ0X

β1(1 + R)]

log(Y ) = log(γ0) + β1 log(X ) + log(1 + R)

Now, if we call β0 = log(γ0) and ε = log(1 + R) the above leads to

log(Y ) = β0 + β1 log(X ) + ε

a linear regression of log(Y ) on log(X )!

26

Page 27: Other Stu and Examples: Polynomials, Logs, Time Series ...

Price Elasticity

In economics, the slope coefficient β1 in the regression

log(sales) = β0 + β1 log(price) + ε is called price elasticity.

This is the % change in sales per 1% change in price.

The model implies that E [sales] = A ∗ priceβ1

where A = exp(β0)

27

Page 28: Other Stu and Examples: Polynomials, Logs, Time Series ...

Price Elasticity of OJ

A chain of gas station convenience stores was interested in the

dependency between price of and Sales for orange juice...

They decided to run an experiment and change prices randomly at

different locations. With the data in hands, let’s first run an

regression of Sales on Price:

Sales = β0 + β1Price + εSUMMARY OUTPUT

Regression StatisticsMultiple R 0.719R Square 0.517Adjusted R Square 0.507Standard Error 20.112Observations 50.000

ANOVAdf SS MS F Significance F

Regression 1.000 20803.071 20803.071 51.428 0.000Residual 48.000 19416.449 404.509Total 49.000 40219.520

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 89.642 8.610 10.411 0.000 72.330 106.955Price -20.935 2.919 -7.171 0.000 -26.804 -15.065 28

Page 29: Other Stu and Examples: Polynomials, Logs, Time Series ...

Price Elasticity of OJ

1.5 2.0 2.5 3.0 3.5 4.0

2040

6080

100120140

Fitted Model

Price

Sales

1.5 2.0 2.5 3.0 3.5 4.0-20

020

4060

80

Residual Plot

Price

residuals

No good!!

29

Page 30: Other Stu and Examples: Polynomials, Logs, Time Series ...

Price Elasticity of OJ

But... would you really think this relationship would be linear?

Moving a price from $1 to $2 is the same as changing it form $10

to $11?? We should probably be thinking about the price elasticity

of OJ...log(Sales) = γ0 + γ1 log(Price) + ε

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.869R Square 0.755Adjusted R Square 0.750Standard Error 0.386Observations 50.000

ANOVAdf SS MS F Significance F

Regression 1.000 22.055 22.055 148.187 0.000Residual 48.000 7.144 0.149Total 49.000 29.199

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 4.812 0.148 32.504 0.000 4.514 5.109LogPrice -1.752 0.144 -12.173 0.000 -2.042 -1.463

How do we interpret γ1 = −1.75?

(When prices go up 1%, sales go down by 1.75%) 30

Page 31: Other Stu and Examples: Polynomials, Logs, Time Series ...

Price Elasticity of OJ

1.5 2.0 2.5 3.0 3.5 4.0

2040

6080

100120140

Fitted Model

Price

Sales

1.5 2.0 2.5 3.0 3.5 4.0-0.5

0.0

0.5

Residual Plot

Price

residuals

Much better!!

31

Page 32: Other Stu and Examples: Polynomials, Logs, Time Series ...

Making Predictions

What if the gas station store wants to predict their sales of OJ if

they decide to price it at $1.8?

The predicted log(Sales) = 4.812 + (−1.752)× log(1.8) = 3.78

So, the predicted Sales = exp(3.78) = 43.82.

How about the plug-in prediction interval?

In the log scale, our predicted interval in

[ log(Sales)− 2s; log(Sales) + 2s] =

[3.78− 2(0.38); 3.78 + 2(0.38)] = [3.02; 4.54].

In terms of actual Sales the interval is

[exp(3.02), exp(4.54)] = [20.5; 93.7]

32

Page 33: Other Stu and Examples: Polynomials, Logs, Time Series ...

Making Predictions

1.5 2.0 2.5 3.0 3.5 4.0

2040

6080

100

120

140

Plug-in Prediction

Price

Sales

0.4 0.6 0.8 1.0 1.2 1.42.0

2.5

3.0

3.5

4.0

4.5

5.0

Plug-in Prediction

log(Price)

log(Sales)

I In the log scale (right) we have [Y − 2s; Y + 2s]

I In the original scale (left) we have

[exp(Y ) ∗ exp(−2s); exp(Y ) exp(2s)] 33

Page 34: Other Stu and Examples: Polynomials, Logs, Time Series ...

Some additional comments...

I Another useful transformation to deal with non-constant

variance is to take only the log(Y ) and keep X the same.

Clearly the “elasticity” interpretation no longer holds.

I Always be careful in interpreting the models after a

transformation

I Also, be careful in using the transformed model to make

predictions

34

Page 35: Other Stu and Examples: Polynomials, Logs, Time Series ...

Summary of Transformations

Coming up with a good regression model is usually an iterative

procedure. Use plots of residuals vs X or Y to determine the next

step.

Log transform is your best friend when dealing with non-constant

variance (log(X ), log(Y ), or both).

Add polynomial terms (e.g. X 2) to get nonlinear regression.

The bottom line: you should combine what the plots and the

regression output are telling you with your common sense and

knowledge about the problem. Keep playing around with it until

you get something that makes sense and has nothing obviously

wrong with it. 35

Page 36: Other Stu and Examples: Polynomials, Logs, Time Series ...

Outliers

Body weight vs. brain weight...

X =body weight of a mammal in kilograms

Y =brain weight of a mammal in grams

0 1000 2000 3000 4000 5000 6000

01000

2000

3000

4000

5000

body

brain

0 1000 2000 3000 4000 5000 6000

-20

24

6

body

std

resi

dual

s

Do additive errors make sense here??

Also, what are the standardized residuals plotted above? 36

Page 37: Other Stu and Examples: Polynomials, Logs, Time Series ...

Standardized Residuals

In our model ε ∼ N(0, σ2)

The residuals e are a proxy for ε and the standard error s is an

estimate for σ

Call z = e/s, the standardized residuals... We should expect

z ≈ N(0, 1)

(How aften should we see an observation of |z | > 3?)

37

Page 38: Other Stu and Examples: Polynomials, Logs, Time Series ...

Outliers

Let’s try logs...

-4 -2 0 2 4 6 8

-20

24

68

log(body)

log(brain)

-4 -2 0 2 4 6 8

-4-2

02

4log(body)

std

resi

dual

s

Great, a lot better!! But we see a large and positive potential

outlier... the Chinchilla!38

Page 39: Other Stu and Examples: Polynomials, Logs, Time Series ...

Outliers

It turns out that the data had the brain of a Chinchilla weighting

64 grams!! In reality, it is 6.4 grams... after correcting it:

-4 -2 0 2 4 6 8

-20

24

68

log(body)

log(brain)

-4 -2 0 2 4 6 8

-4-2

02

4

log(body)

std

resi

dual

s

39

Page 40: Other Stu and Examples: Polynomials, Logs, Time Series ...

How to Deal with Outliers

When should you delete outliers?

Only when you have a really good reason!

There is nothing wrong with running regression with and without

potential outliers to see whether results are significantly impacted.

Any time outliers are dropped the reasons for

removing observations should be clearly noted.

40

Page 41: Other Stu and Examples: Polynomials, Logs, Time Series ...

Time Series Data and Dependence

Time-series data are simply a collection of observations gathered

over time. For example, suppose y1 . . . yT are

I Annual GDP.

I Quarterly production levels

I Weekly sales.

I Daily temperature.

I 5 minute Stock returns.

In each case, we might expect what happens at time t to be

correlated with what happens at time t − 1.

41

Page 42: Other Stu and Examples: Polynomials, Logs, Time Series ...

Time Series Data and Dependence

Suppose we measure temperatures daily for several years.

Which would work better as an estimate for today’s temp:

I The average of the temperatures from the previous year?

I The temperature on the previous day?

42

Page 43: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Length of a bolt...

Suppose you have to check the performance of a machine making

bolts... in order to do so you want to predict the length of the next

bolt produced...

Bolt index (in time)

Length

0 200 400 600 800 1000

98.5

99.0

99.5

100.0100.5101.0101.5

What is your best guess for the next part?43

Page 44: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Beer Production

Now, say you want to predict the monthly U.S. beer production (in

millions of barrels).

0 10 20 30 40 50 60 70

010

2030

4050

6070

month

beer

What about now, what is your best guess for the production in the next

month? 44

Page 45: Other Stu and Examples: Polynomials, Logs, Time Series ...

Examples: Temperatures

Now you need to predict tomorrow’s temperature at O’Hare from

(Jan-Feb).

0 10 20 30 40 50 60

010

2030

4050

day

temp

Is this one harder? Our goal in this section is to use regression models to

help answer these questions... 45

Page 46: Other Stu and Examples: Polynomials, Logs, Time Series ...

Fitting a Trend

Here’s a time series plot of monthly sales of a company...

0 20 40 60 80 100

4060

80100

120

140

160

time

sales

What would be a reasonable prediction for Sales 5 months from

now?46

Page 47: Other Stu and Examples: Polynomials, Logs, Time Series ...

Fitting a Trend

The sales numbers are “trending” upwards... What model could

capture this trend?

St = β0 + β1t + εt εt ∼ N(0, σ2)

This is a regression of Sales (y variable) on “time” (x variable).

This allows for shifts in the mean of Sales as a function of time.

47

Page 48: Other Stu and Examples: Polynomials, Logs, Time Series ...

Fitting a Trend

The data for this regression looks like:

months(t) Sales

1 69.95

2 59.64

3 61.96

4 61.55

5 45.10

6 77.31

7 49.33

8 65.49

... ...

100 140.27

48

Page 49: Other Stu and Examples: Polynomials, Logs, Time Series ...

Fitting a Trend

St = β0 + β1t + εt εt ∼ N(0, σ2)SUMMARY OUTPUT

Regression StatisticsMultiple R 0.892R Square 0.796Adjusted R Square 0.794Standard Error 14.737Observations 100.000

ANOVAdf SS MS F Significance F

Regression 1.000 82951.076 82951.076 381.944 0.000Residual 98.000 21283.736 217.181Total 99.000 104234.812

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 51.442 2.970 17.323 0.000 45.549 57.335t 0.998 0.051 19.543 0.000 0.896 1.099

St = 51.44 + 0.998t ± 2 ∗ 14.73

49

Page 50: Other Stu and Examples: Polynomials, Logs, Time Series ...

Fitting a Trend

Plug-in prediction...

0 20 40 60 80 100

50100

150

time

sales

50

Page 51: Other Stu and Examples: Polynomials, Logs, Time Series ...

Residuals

How should our residuals look? If our model is correct, the trend

should have captured the time series structure is sales and what is

left, should not be associated with time... i.e., it should be iid

normal.

0 20 40 60 80 100

-3-2

-10

12

3

time

std

resi

dual

s

Great! 51

Page 52: Other Stu and Examples: Polynomials, Logs, Time Series ...

Time Series Regression... Hotel Occupancy Case

In a recent legal case, a Chicago downtown hotel claimed that it

had suffered a loss of business due to what was considered an

illegal action by a group of hotels that decided to leave the plaintiff

out of a hotel directory.

In order to estimate the loss business, the hotel had to predict

what its level of business (in terms of occupancy rate) would have

been in the absence of the alleged illegal action.

In order to do this, experts testifying on behalf of the hotel use

data collected before the period in question and fit a relationship

between the hotels occupancy rate and overall occupancy rate in

the city of Chicago. This relationship would then be used to

predict occupancy rate during the period in question.52

Page 53: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Hotel Occupancy Case

Hotelt = β0 + β1Chicago + εt

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.7111011R Square 0.5056648Adjusted R Squa 0.48801Standard Error 7.5055176Observations 30

ANOVAdf SS MS F Significance F

Regression 1 1613.468442 1613.4684 28.64172598 1.06082E-05Residual 28 1577.318225 56.332794Total 29 3190.786667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 16.135666 8.518889357 1.8941044 0.068584205 -1.314487337 33.5858198ChicagoInd 0.7161318 0.133811486 5.3517965 1.06082E-05 0.442031445 0.990232246

I In the month after the omission from the directory the

Chicago occupancy rate was 66. The plaintiff claims that its

occupancy rate should have been 16 + 0.71*66 = 62.

I It was actually 55!! The difference added up to a big loss!!53

Page 54: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Hotel Occupancy Case

A statistician was hired by the directory to access the regression

methodology used to justify the claim. As we should know by now,

the first thing he looked at was the residual plot...

40 50 60 70 80

-3-2

-10

12

3

Chicago Occupancy

std

resi

dual

s

Looks fine. However... 54

Page 55: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Hotel Occupancy Case

... this is a time series regression, as we are regressing one time

series on another.

In this case, we should also check whether or not the residuals

show some temporal pattern.

If our model is correct the residuals should look iid normal over

time.

55

Page 56: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Hotel Occupancy Case

0 5 10 15 20 25 30

-10

12

Time

std

resi

dual

s

Does this look iid to you? Can you guess what does the red line

represent?56

Page 57: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Hotel Occupancy Case

It looks like part of hotel occupancy (y) not explained by the

Chicago downtown occupancy (x) is moving down over time. We

can try to control for that by adding a trend factor to our model...

Hotelt = β0 + β1Chicago + β2t + εt

u

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.869389917R Square 0.755838827Adjusted R Sq 0.737752815Standard Error 5.37162026Observations 30

ANOVAdf SS MS F Significance F

Regression 2 2411.720453 1205.86 41.79134652 5.41544E-09Residual 27 779.0662139 28.8543Total 29 3190.786667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 26.69391108 6.418837165 4.158683 0.000290493 13.52354525 39.8642769ChicagoInd 0.69523791 0.095849831 7.253408 8.41391E-08 0.498570304 0.89190552t -0.596476666 0.113404099 -5.259745 1.51653E-05 -0.82916265 -0.3637907 57

Page 58: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Hotel Occupancy Case

0 5 10 15 20 25 30

-10

1

Time

std

resi

dual

s

Much better!! What is the slope of the red line?

58

Page 59: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Hotel Occupancy Case

Okay, what happened?!

Well, once we account for the downward trend in the occupancy of

the plaintiff, the prediction for the occupancy rate is

26 + 0.69 ∗ 66− 0.59 ∗ 31 = 53.25

What do we conclude?

59

Page 60: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Hotel Occupancy Case

Take away lessons...

I When regressing a time series on another, always check the

residuals as a time series

I What does that mean... plot the residuals over time. If all is

well, you should see no patterns, i.e., they should behave like

iid normal samples.

60

Page 61: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Hotel Occupancy Case

Question

I What if we were interested in predicting the hotel occupancy

ten years from now?? We should compute

26 + 0.69 ∗ 66− 0.59 ∗ 150 = −16.96

I Would you trust this prediction? Could you defend it in court?

I Remember: always be careful with extrapolating relationships!

61

Page 62: Other Stu and Examples: Polynomials, Logs, Time Series ...

Examples: Temperatures

Now you need to predict tomorrow’s temperature at O’Hare from

(Jan-Feb).

0 10 20 30 40 50 60

010

2030

4050

day

temp

Does this look iid? If it is iid, tomorrow’s temperatures should not

depend on today’s... does that make sense? 62

Page 63: Other Stu and Examples: Polynomials, Logs, Time Series ...

Checking for Dependence

To see if Yt−1 would be useful for predicting Yt , just plot them

together and see if there is a relationship.

0 10 20 30 40 50

010

2030

4050

Daily Temp at O'Hare

temp(t-1)

temp(t)

Corr = 0.72

Correlation between Yt and Yt−1 is called autocorrelation.

63

Page 64: Other Stu and Examples: Polynomials, Logs, Time Series ...

Checking for Dependence

You need to create a “lagged” variable tempt−1... the data looks

like this:

t temp(t) temp(t-1)

1 42 35

2 41 42

3 50 41

4 19 50

5 19 19

6 20 19

... ...

64

Page 65: Other Stu and Examples: Polynomials, Logs, Time Series ...

Checking for Dependence

We can plot Yt against Yt−L to see L-period lagged relationships.

0 10 20 30 40 50

010

2030

4050

temp(t-2)

temp(t)

Lag 2 Corr = 0.46

0 10 20 30 40 50

010

2030

4050

temp(t-3)temp(t)

Lag 3 Corr = 0.21

I It appears that the correlation is getting weaker with increasing L.

I How can we test for this dependence?

65

Page 66: Other Stu and Examples: Polynomials, Logs, Time Series ...

Checking for Dependence

Back to the “length of a bolt” example. When things are not

related in time we should see...

99 100 101 102

99100

101

102

corr= 0.004

length t

leng

th t+

1

99 100 101 102

99100

101

102

corr= -0.018

length t

leng

th t+

2

66

Page 67: Other Stu and Examples: Polynomials, Logs, Time Series ...

The AR(1) Model

A simple way to model dependence over time in with the

autoregressive model of order 1...

Yt = β0 + β1Yt−1 + εt

I What is the mean of Yt for a given value of Yt−1?

I If the model successfully captures the dependence structure in

the data then the residuals should look iid.

I Remember: if our data is collected in time, we should always

check for dependence in the residuals...

67

Page 68: Other Stu and Examples: Polynomials, Logs, Time Series ...

The AR(1) Model

Again, the regression tool is our friend here... (Why?)

q

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.722742583R Square 0.522356842Adjusted R S 0.5138275Standard Erro 8.789861051Observations 58

ANOVAdf SS MS F Significance F

Regression 1 4731.684433 4731.684433 61.24233673 1.49699E-10Residual 56 4326.652809 77.2616573Total 57 9058.337241

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 6.705800085 2.516614758 2.664611285 0.010050177 1.664414964 11.74718521X Variable 1 0.723288866 0.092424243 7.825748317 1.49699E-10 0.53814086 0.908436873

68

Page 69: Other Stu and Examples: Polynomials, Logs, Time Series ...

The AR(1) Model

-20 -10 0 10 20

-20

-10

010

20

corr(e(t),e(t-1))= 0.066

e(t-1)

e(t)

No dependence left! 69

Page 70: Other Stu and Examples: Polynomials, Logs, Time Series ...

The AR(1) Model

0 10 20 30 40 50 60

-2-1

01

2

Time

std

resi

dual

s

Again, looks good... 70

Page 71: Other Stu and Examples: Polynomials, Logs, Time Series ...

The Seasonal Model

I Many time-series data exhibit some sort of seasonality

I The simplest solution is to add a set of dummy variables to

deal with the “seasonal effects”

0 10 20 30 40 50 60 70

010

2030

4050

6070

month

beer

Yt = monthly U.S. beer production (in millions of barrels).71

Page 72: Other Stu and Examples: Polynomials, Logs, Time Series ...

The Seasonal Model

S

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.959010553R Square 0.919701241Adjusted R Square 0.904979802Standard Error 0.588667988Observations 72

ANOVAdf SS MS F Significance F

Regression 11 238.138728 21.649 62.47359609 1.20595E-28Residual 60 20.7918 0.34653Total 71 258.930528

Coefficients tandard Erro t Stat P-value Lower 95% Upper 95%Intercept 13.24166667 0.2403227 55.0995 4.32368E-53 12.7609497 13.72238X Variable 1 1.911666667 0.33986762 5.62474 5.15088E-07 1.23183021 2.591503X Variable 2 1.693333333 0.33986762 4.98233 5.64079E-06 1.013496877 2.37317X Variable 3 3.936666667 0.33986762 11.5829 6.13313E-17 3.25683021 4.616503X Variable 4 3.983333333 0.33986762 11.7202 3.74305E-17 3.303496877 4.66317X Variable 5 5.083333333 0.33986762 14.9568 6.59589E-22 4.403496877 5.76317X Variable 6 5.19 0.33986762 15.2707 2.44866E-22 4.510163543 5.869836X Variable 7 4.978333333 0.33986762 14.6479 1.77048E-21 4.298496877 5.65817X Variable 8 4.581666667 0.33986762 13.4807 8.22861E-20 3.90183021 5.261503X Variable 9 2.016666667 0.33986762 5.93368 1.58522E-07 1.33683021 2.696503X Variable 10 1.923333333 0.33986762 5.65907 4.52211E-07 1.243496877 2.60317X Variable 11 0.118333333 0.33986762 0.34817 0.728927584 -0.561503123 0.79817

Let’s look at the Excel file... 72

Page 73: Other Stu and Examples: Polynomials, Logs, Time Series ...

The Seasonal Model

0 10 20 30 40 50 60 70

1314

1516

1718

19

time

beer

pro

duct

ion

Fitted Values

What would our future predictions look like? 73

Page 74: Other Stu and Examples: Polynomials, Logs, Time Series ...

The Seasonal Model

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

corr(e(t),e(t-1))= 0.113

e(t-1)

e(t)

Okay... good enough. 74

Page 75: Other Stu and Examples: Polynomials, Logs, Time Series ...

The Seasonal Model

0 10 20 30 40 50 60 70

-3-2

-10

12

3

Time

std

resi

dual

s

Still, no obvious problems... 75

Page 76: Other Stu and Examples: Polynomials, Logs, Time Series ...

Airline Data

Monthly passengers in the U.S. airline industry (in 1,000 of

passengers) from 1949 to 1960... we need to predict the number of

passengers in the next couple of months.

0 20 40 60 80 100 120 140

100

200

300

400

500

600

Time

Passengers

Any ideas?76

Page 77: Other Stu and Examples: Polynomials, Logs, Time Series ...

Airline Data

How about a “trend model”? Yt = β0 + β1t + εt

0 20 40 60 80 100 120 140

100

200

300

400

500

600

Time

Passengers

Fitted Values

What do you think?

77

Page 78: Other Stu and Examples: Polynomials, Logs, Time Series ...

Airline Data

Let’s look at the residuals...

0 20 40 60 80 100 120 140

-2-1

01

23

Time

std

resi

dual

s

Is there any obvious pattern here? YES!!

78

Page 79: Other Stu and Examples: Polynomials, Logs, Time Series ...

Airline Data

The variance of the residuals seems to be growing in time... Let’s

try taking the log. log(Yt) = β0 + β1t + εt

0 20 40 60 80 100 120 140

5.0

5.5

6.0

6.5

Time

log(Passengers)

Fitted Values

Any better?79

Page 80: Other Stu and Examples: Polynomials, Logs, Time Series ...

Airline Data

Residuals...

0 20 40 60 80 100 120 140

-2-1

01

2

Time

std

resi

dual

s

Still we can see some obvious temporal/seasonal pattern....

80

Page 81: Other Stu and Examples: Polynomials, Logs, Time Series ...

Airline Data

Okay, let’s add dummy variables for months (only 11 dummies)...

log(Yt) = β0 + β1t + β2Jan + ...β12Dec + εt

0 20 40 60 80 100 120 140

5.0

5.5

6.0

6.5

Time

log(Passengers)

Fitted Values

Much better!!81

Page 82: Other Stu and Examples: Polynomials, Logs, Time Series ...

Airline Data

Residuals...

0 20 40 60 80 100 120 140

-2-1

01

2

Time

std

resi

dual

s

I am still not happy... it doesn’t look normal iid to me...82

Page 83: Other Stu and Examples: Polynomials, Logs, Time Series ...

Airline Data

Residuals...

-0.15 -0.10 -0.05 0.00 0.05 0.10

-0.15

-0.10

-0.05

0.00

0.05

0.10

corr(e(t),e(t-1))= 0.786

e(t-1)

e(t)

I was right! The residuals are dependent on time...83

Page 84: Other Stu and Examples: Polynomials, Logs, Time Series ...

Airline Data

We have one more tool... let’s add one legged term.

log(Yt) = β0 + β1t + β2Jan + ...β12Dec + β13 log(Yt−1) + εt

0 20 40 60 80 100 120 140

5.0

5.5

6.0

6.5

Time

log(Passengers)

Fitted Values

Okay, good...84

Page 85: Other Stu and Examples: Polynomials, Logs, Time Series ...

Airline Data

Residuals...

0 20 40 60 80 100 120 140

-2-1

01

2

Time

std

resi

dual

s

Much better!! 85

Page 86: Other Stu and Examples: Polynomials, Logs, Time Series ...

Airline Data

Residuals...

-0.10 -0.05 0.00 0.05

-0.10

-0.05

0.00

0.05

corr(e(t),e(t-1))= -0.11

e(t-1)

e(t)

Much better indeed!! 86

Page 87: Other Stu and Examples: Polynomials, Logs, Time Series ...

Summary of Time Series

Whenever working with time series data we need to look for

dependencies over time.

We can deal with lots of types of dependencies by using regression

models... our tools are:

I trends

I lags

I seasonal dummies

87

Page 88: Other Stu and Examples: Polynomials, Logs, Time Series ...

Model Building Process

When building a regression model remember that simplicity is your

friend... smaller models are easier to interpret and have fewer

unknown parameters to be estimated.

Keep in mind that every additional parameter represents a cost!!

The first step of every model building exercise is the selection of

the the universe of variables to be potentially used. This task is

entirely solved through you experience and context specific

knowledge...

I Think carefully about the problem

I Consult subject matter research and experts

I Avoid the mistake of selecting too many variables88

Page 89: Other Stu and Examples: Polynomials, Logs, Time Series ...

Model Building Process

With a universe of variables in hand, the goal now is to select the

model. Why not include all the variables in?

Big models tend to over-fit and find features that are specific to

the data in hand... ie, not generalizable relationships.

The results are bad predictions and bad science!

In addition, bigger models have more parameters and potentially

more uncertainty about everything we are trying to learn... (check

the beer and weight example!)

We need a strategy to build a model in ways that accounts for the

trade-off between fitting the data and the uncertainty associated

with the model 89

Page 90: Other Stu and Examples: Polynomials, Logs, Time Series ...

Out-of-Sample Prediction

One idea is to focus on the model’s ability to predict... How do we

evaluate a forecasting model? Make predictions!

Basic Idea: We want to use the model to forecast outcomes for

observations we have not seen before.

I Use the data to create a prediction problem.

I See how our candidate models perform.

We’ll use most of the data for training the model,

and the left over part for validating the model.

90

Page 91: Other Stu and Examples: Polynomials, Logs, Time Series ...

Out-of-Sample Prediction

In a cross-validation scheme, you fit a bunch of models to most of

the data (training sample) and choose the model

that performed the best on the rest (left-out sample).

I Fit the model on the training data

I Use the model to predict Yj values for all of the NLO left-out

data points

I Calculate the Mean Square Error for these predictions

MSE =1

NLO

NLO∑j=1

(Yj − Yj)2

91

Page 92: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example

To illustrate the potential problems of “over-fitting” the data, let’s

look again at the Telemarketing example... let’s look at multiple

polynomial terms...

10 15 20 25 30 35

1520

2530

3540

months

Calls

92

Page 93: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example

Let’s evaluate the fit of each model by their R2

(on the training data)

2 4 6 8 10

0.775

0.776

0.777

0.778

0.779

Polynomial Order

R2

93

Page 94: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example

How about the MSE?? (on the left-out data)

2 4 6 8 10

2.155

2.160

2.165

2.170

2.175

Polynomial Order

RMSE

94

Page 95: Other Stu and Examples: Polynomials, Logs, Time Series ...

BIC for Model Selection

Another way to evaluate a model is to use Information Criteria

metrics which attempt to quantify how well our model would have

predicted the data (regardless of what you’ve estimated for the

βj ’s).

A good alternative is the BIC: Bayes Information Criterion, which

is based on a “Bayesian” philosophy of statistics.

BIC = n log(s2) + p log(n)

You want to choose the model that leads to minimum BIC.

95

Page 96: Other Stu and Examples: Polynomials, Logs, Time Series ...

BIC for Model Selection

One (very!) nice thing about the BIC is that you can

interpret it in terms of model probabilities.

Given a list of possible models {M1,M2, . . . ,MR}, the probability

that model i is correct is

P(Mi) ≈e−

12BIC(Mi )∑R

r=1 e− 1

2BIC(Mr )

=e−

12[BIC(Mi )−BICmin]∑R

r=1 e− 1

2[BIC(Mr )−BICmin]

(Subtract BICmin = min{BIC(M1) . . .BIC(MR)} for numerical stability.)

96

Page 97: Other Stu and Examples: Polynomials, Logs, Time Series ...

BIC for Model Selection

Thus BIC is an alternative to testing for comparing models.

I It is easy to calculate.

I You are able to evaluate model probabilities.

I There are no “multiple testing” type worries.

I It generally leads to more simple models than F -tests.

As with testing, you need to narrow down your options before

comparing models. What if there are too many possibilities?

97

Page 98: Other Stu and Examples: Polynomials, Logs, Time Series ...

Stepwise Regression

One computational approach to build a regression model

step-by-step is “stepwise regression” There are 3 options:

I Forward: adds one variable at the time until no remaining

variable makes a significant contribution (or meet a certain

criteria... could be out of sample prediction)

I Backwards: starts will all possible variables and removes one

at the time until further deletions would do more harm them

good

I Stepwise: just like the forward procedure but allows for

deletions at each step

98

Page 99: Other Stu and Examples: Polynomials, Logs, Time Series ...

LASSO

The LASSO is a shrinkage method that performs automatic

selection. Yet another alternative... has similar properties as

stepwise regression but it is more automatic... R does it for you!

The LASSO solves the following problem:

arg minβ

{N∑i=1

(Yi − X ′i β

)2+ λ|β|

}

I Coefficients can be set exactly to zero (automatic model

selection)

I Very efficient computational method

I λ is often chosen via CV

99

Page 100: Other Stu and Examples: Polynomials, Logs, Time Series ...

One informal but very useful idea to put it all together...

I like to build models from the bottom, up...

I Set aside a set of points to be your validating set (if dataset

large enought)I Working on the training data, add one variable at the time

deciding which one to add based on some criteria:

1. larger increases in R2 while significant

2. larger reduction in MSE while significant

3. BIC, etc...

I at every step, carefully analyze the output and

check the residuals!

I Stop when no additional variable produces a “significant”

improvement

I Always make sure you understand what the model is doing in

the specific context of your problem100

Page 101: Other Stu and Examples: Polynomials, Logs, Time Series ...

Binary Response Data

Let’s now look at data where the response Y

is a binary variable (taking the value 0 or 1).

I Win or lose.

I Sick or healthy.

I Buy or not buy.

I Pay or default.

I Thumbs up or down.

The goal is generally to predict the probability that Y = 1, and

you can then do classification based on this estimate.

101

Page 102: Other Stu and Examples: Polynomials, Logs, Time Series ...

Binary Response Data

Y is an indicator: Y = 0 or 1. The conditional mean is thus

E [Y |X ] = p(Y = 1|X )× 1 + p(Y = 0|X )× 0 = p(Y = 1|X )

The mean function is a probability: We need a model that gives

mean/probability values between 0 and 1.

We’ll use a transform function that takes the right-hand side of the

model (x′β) and gives back a value between zero and one.

102

Page 103: Other Stu and Examples: Polynomials, Logs, Time Series ...

Binary Response Data

The binary choice model is

p(Y = 1|X1 . . .Xd) = S(β0 + β1X1 . . .+ βdXd)

where S is a function that increases in value from zero to one.

103

Page 104: Other Stu and Examples: Polynomials, Logs, Time Series ...

Binary Response Data

There are two main functions that are used for this:

I Logistic Regression: S(z) =ez

1 + ez.

I Probit Regression: S(z) = pnorm(z).

Both functions are S-shaped and take values in (0, 1).

Probit is used by economists, logit by biologists, and the rest of us

are fairly indifferent: they result in practically the same fit.

104

Page 105: Other Stu and Examples: Polynomials, Logs, Time Series ...

Logistic Regression

We’ll use logistic regression, such that

p(Y = 1|X1 . . .Xd) =exp[β0 + β1X1 . . .+ βdXd ]

1 + exp[β0 + β1X1 . . .+ βdXd ]

The “logit” link is more common, and it’s the default in R.

These models are easy to fit in R:

glm(Y ∼ X1 + X2, family=binomial)

“g” stands for generalized, and binomial indicates Y = 0 or 1.

Otherwise, generalized linear models use the same syntax as lm().

105

Page 106: Other Stu and Examples: Polynomials, Logs, Time Series ...

Logistic Regression

What is happening here? Instead of least-squares,

glm is maximizing the product of probabilities:

n∏i=1

P(Yi |xi ) =n∏

i=1

(exp[x′b]

1 + exp[x′b]

)Yi(

1

1 + exp[x′b]

)1−Yi

This maximizes the likelihood of our data

(which is also what least-squares did).

106

Page 107: Other Stu and Examples: Polynomials, Logs, Time Series ...

Logistic Regression

The important things are basically the same as before:

I Individual parameter p-values are interpreted as always.

I extractAIC(reg,k=log(n)) will get your BICs.

I The predict function works as before, but you need to add type =

‘‘response’’ to get pi = exp[x′b]/(1 + exp[x′b])

(otherwise it just returns the linear function x′β).

Unfortunately, techniques for residual diagnostics and model

checking are different (but we’ll not worry about that today).

Also, without sums of squares there are no R2, anova, or F -tests!

107

Page 108: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Basketball Spreads

NBA basketball point spreads: we have Las Vegas betting point

spreads for 553 NBA games and the resulting scores.

We can use logistic regression of scores onto spread to predict the

probability of the favored team winning.

I Response: favwin=1 if favored team wins.

I Covariate: spread is the Vegas point spread.

spread

Frequency

0 10 20 30 40

040

80120

favwin=1favwin=0

01

0 10 20 30 40

spread

favwin

108

Page 109: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Basketball Spreads

This is a weird situation where we assume is no intercept.

I There is considerable evidence that betting odds are efficient.

I A spread of zero implies p(win) = 0.5 for each team.

I Thus p(win) = exp[β0]/(1 + exp[β0]) = 1/2⇔ β0 = 0.

The model we want to fit is thus

p(favwin|spread) =exp[β × spread ]

1 + exp[β × spread ]

109

Page 110: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Basketball Spreads

summary(nbareg <- glm(favwin ∼ spread-1, family=binomial))

Some things are different (z not t) and some are missing (F , R2).

110

Page 111: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Basketball Spreads

The fitted model is

p(favwin|spread) =exp[0.156× spread ]

1 + exp[0.156× spread ]

0 5 10 15 20 25 30

0.50.60.70.80.91.0

spread

P(favwin)

111

Page 112: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Basketball Spreads

We could consider other models... and compare with BIC!

Our “Efficient Vegas” model:

> extractAIC(nbareg, k=log(553))

1.000 534.287

A model that includes non-zero intercept:> extractAIC(glm(favwin ∼ spread, family=binomial), k=log(553))

2.0000 540.4333

What if we throw in home-court advantage?> extractAIC(glm(favwin ∼ spread+favhome, family=binomial), k=log(553))

3.0000 545.6371

The simplest model is best(The model probabilities are 19/20, 1/20, and zero.)

112

Page 113: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Basketball Spreads

Let’s use our model to predict the result of a game:

I Portland vs Golden State: spread is PRT by 8

p(PRT win) =exp[0.156× 8]

1 + exp[0.156× 8]= 0.78

I Chicago vs Orlando: spread is ORL by 4

p(CHI win) =1

1 + exp[0.156× 4]= 0.35

113

Page 114: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Credit Scoring

A common business application of logistic regression is in

evaluating the credit quality of (potential) debtors.

I Take a list of borrower characteristics.

I Build a prediction rule for their credit.

I Use this rule to automatically evaluate applicants

(and track your risk profile).

You can do all this with logistic regression, and then use the

predicted probabilities to build a classification rule.

114

Page 115: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Credit Scoring

We have data on 1000 loan applicants at German community

banks, and judgement of the loan outcomes (good or bad).

The data has 20 borrower characteristics, including

I Credit history (5 categories).

I Housing (rent, own, or free).

I The loan purpose and duration.

I Installment rate as a percent of income.

115

Page 116: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Credit Scoring

We can use forward step wise regression to build a model.

null <- glm(Y ∼ history3, family=binomial, data=credit[train,])

full <- glm(Y ∼., family=binomial, data=credit[train,])

reg <- step(null, scope=formula(full), direction="forward", k=log(n))

.

.

.

Step: AIC=882.94

Y[train] ∼ history3 + checkingstatus1 + duration2 + installment8

The null model has credit history as a variable, since I’d include

this regardless, and we’ve left-out 200 points for validation.

116

Page 117: Other Stu and Examples: Polynomials, Logs, Time Series ...

Classification

A common goal with logistic regression is to classify the inputs

depending on their predicted response probabilities.

For example, we might want to classify the German borrowers as

having “good” or “bad” credit (i.e., do we loan to them?).

A simple classification rule is to say that anyone with

p(good |x) > 0.5 can get a loan, and the rest do not.

117

Page 118: Other Stu and Examples: Polynomials, Logs, Time Series ...

Example: Credit Scoring

Let’s use the validation set to compare this and the full model.

> full <- glm(formula(terms(Y[train] ∼., data=covars)),

data=covars[train,], family=binomial)

> predreg <- predict(reg, newdata=covars[-train,], type="response")

> predfull <- predict(full, newdata=covars[-train,], type="response")

> # 1 = false negative, -1 = false positive

> errorreg <- Y[-train]-(predreg >= .5)

> errorfull <- Y[-train]-(predfull >= .5)

> # misclassification rates:

> mean(abs(errorreg))

0.220

> mean(abs(errorfull))

0.265

Our model classifies borrowers correctly 78% of the time.

118