New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller...

35
Linear Regression (part 2) Ryan Miller 1 / 35

Transcript of New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller...

Page 1: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Linear Regression (part 2)

Ryan Miller

1 / 35

Page 2: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Multiple Regression

Generally speaking, linear regression models a quantitativeoutcome using a linear combination of variables:

yi = β0 + β1xi1 + β2xi2 + . . .+ βpxip + εi

I This format allows us to express the null/alternative models ofANOVA as regression models

I It also allows us to model the outcome as a function ofmultiple different explanatory variables

2 / 35

Page 3: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Example - Ozone Concentration

I Ozone is a pollutant that has been linked with respiratoryailments and heart attacksI Ozone concentrations fluctuate on a day-to-day basis depending

on multiple factorsI It is useful to be able to predict concentrations to protect

vulnerable individuals (ozone alert days)

I The data we will use in this example consists of daily ozoneconcentration (ppb) measurements collected in New York City,along with some potential explanatory variables:I Solar: The amount of solar radiation (in Langleys)I Wind: The average wind speed that day (in mph)I Temp: The high temperature for that day (in Fahrenheit)

3 / 35

Page 4: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Ozone Concentration in New York City

I A first step in modeling with multiple variables is to inspect thescatterplot matrixI What do you see?

Ozone

0 50 100 150 200 250 300 60 70 80 90

050

100

010

025

0

Solar

Wind

510

1520

0 50 100 150

6070

8090

5 10 15 20

Temp

4 / 35

Page 5: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Ozone Concentration in New York City

I Wind and Temp each appear to have strong linear relationshipswith Ozone

I Solar shows a more diffuse, possibly quadratic relationshipswith Ozone

I Potentially problematic is that many of these explanatoryvariables are related with each otherI Wind and Temp have a strong negative correlation

5 / 35

Page 6: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Modeling Ozone Concentration

I We should also look at the correlation matrix to furtherunderstand these relationships

Ozone Solar Wind TempOzone 1.0000000 0.3483417 -0.6124966 0.6985414Solar 0.3483417 1.0000000 -0.1271835 0.2940876Wind -0.6124966 -0.1271835 1.0000000 -0.4971897Temp 0.6985414 0.2940876 -0.4971897 1.0000000

I Temp is most highly correlated with Ozone, so let’s start withthe simple linear regression model:

Ozonei = β0 + β1Tempi + εi

6 / 35

Page 7: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Modeling Ozone Concentration

I The estimated model is Ozonei = −147 + 2.4TempiI The R2 of this model is 0.49, it explains almost half the

variability in Ozone

0

50

100

150

60 70 80 90

Temp

Ozo

ne

7 / 35

Page 8: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Modeling Ozone Concentration

I Can this model be improved?I Lets consider also using Wind, the variable with the second

strongest marginal relationship with ozoneI We’ll use the model: Ozonei = β0 + β1Tempi + β2Windi + εi

I The estimated model is Ozonei = −147+1.8Tempi −3.3WindiI Notice the effect of temperature is less pronounced now that

the model includes wind

8 / 35

Page 9: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Modeling Ozone Concentration

I This model is defined by two different slopes, creating aregression plane

WindTem

p

Ozone

9 / 35

Page 10: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Modeling Ozone Concentration

I An incredibly important feature of multiple regression is that itallows us to estimate conditional effect of each variableI b1 = 1.8 is the expected increase in Ozone for a 1 unit increase

in Temp when Wind is held unchanged

60 70 80 90

0

50

100

150

Conditional Effect of Temp

Temp

Ozo

ne

10 / 35

Page 11: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Confounding

Discuss the following with your group:

1. Is Wind a confounding variable in the relationship betweenTemp and Ozone? Justify your answer (hint: use thescatterplot/correlation matrix and the definition ofconfounding)

2. How does stratification relate to the idea of a conditionalregression effect?

11 / 35

Page 12: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Confounding

I Because multiple regression provides conditional effects, itcan be used to control for confounding variables

I Unlike stratification, we can use multiple regression to controlfor quantitative confounding variablesI We can also control for many confounding variables

simultaneously by including them in the multiple regressionmodel

12 / 35

Page 13: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Example - Professor Evaluations

I At the University of Texas Austin, students anonymouslyevaluate each professor at the end of the year on a 1-5 scale

I In this study, 6 students, 3 males and 3 females, gave eachUT-Austin professor a beauty rating on a 1-10 scale afterviewing photographs of the professor

I We will model a professor’s average student evaluation scorebased upon their beauty (defined as their average beauty ratingfrom these 6 students), as well as other variables collected bythe researchers

13 / 35

Page 14: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Professor Evaluations

Practice: With your group:

1. Load the UT-Austin professor beauty data into Minitab2. Create a scatterplot matrix to visualize the relationships

between “score”, “bty_avg”, and “age”3. Is age a confounding variable in the relationship between

“score” and “bty_avg”? (you might want to use correlationcoefficients to help you interpret the matrix plot)

4. Compare and interpret the effects of “bty_avg” in the simplelinear regression model and the multiple regression model thatincludes “age” as an explanatory variable

14 / 35

Page 15: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Professor Evaluations (solutions)

1. There is a negative correlation between age and beauty rating,older professors tend to receive lower beauty ratings. There isalso a negative correlation between age and score, making agea confounding variable.

2. In the simple linear regression model, the effect of bty_avg is0.067, so a 1 point increase in beauty rating corresponds with a0.067 increase in evaluation score

3. In the multiple regression model that adjusts for age, the effectof bty_avg is 0.061, so a 1 point increase in beauty rating,while hold age constant, corresponds with a 0.061 increase inevaluation score

15 / 35

Page 16: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

ANOVA and Multiple Regression

I Previously we’ve seen that ANOVA can be used to comparenested modelsI In the multiple regression setting, there are many models nested

within the full model

I Consider the ozone concentration model:

Ozone = α+ β1Solar + β2Wind + β3Temp + ε

I Nested within this model are:I the null model (intercept-only)I 3 different models each containing a single variableI 3 different models each including two variables

16 / 35

Page 17: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

ANOVA and Multiple Regression

I ANOVA can be used to evaluate the importance of a singlevariable by comparing the full model to the nested model thatcontains everything but that variableI For example, we could evaluate the importance of “Wind” in the

model on the previous slide by comparing the following models:

M0: Ozone = α+ β1Solar + β3Temp + εM1: Ozone = α+ β1Solar + β2Wind + β3Temp + ε

I Due to some neat mathematical properties of least squaresmodeling, we don’t need to fit the smaller models, we can doan ANOVA test on each variable having fit only the full model

17 / 35

Page 18: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

ANOVA and Multiple Regression

I The ANOVA table for the ozone concentration model looks like:

I Error (SSE) and Total (SST) are familiarI SSE is the sum of squared residuals for the full modelI SST is the sum of squared residuals for the null (intercept-only)

modelI Source = “Regression” refers to what we’ve been calling SSG,

it is the overall portion of SST that is explained by the fullmodel, we’ll call it SSM

18 / 35

Page 19: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

ANOVA and Multiple Regression

I SSM is further partitioned by how much variability is explainedby each individual variableI In this example SSM = 73799, 11642 is attributable to the

variable “Wind”, 19050 to “Temp”, 2986 to “Solar”, and 40121is attributable to the model’s intercept (which is often omitted)

I In this example, all three explanatory variables play animportant role in the model

19 / 35

Page 20: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

ANOVA and Multiple Regression - Example

Practice: With your group:

1. Using the UT-Austin Professor data, fit a multiple regressionmodel that uses “bty_avg”, “age”, “ethnicity”, and“pic_outfit” to predict “score”

2. Which variables play an important role in the model? (Hint:use ANOVA for this)

3. How would you interpret the regression coefficient of“pic_outfit”?

20 / 35

Page 21: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

ANOVA and Multiple Regression - Example (solution)

Coming Soon

21 / 35

Page 22: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model

I It is rarely the case that every variable in a data set should beincluded in a model

I How to determine which variables belong in a model is a broadarea of statistics and could encompass an entire course

I That said, we will discuss a couple principles of model selectionand look at a few model selection procedures that exist inMinitab

22 / 35

Page 23: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model

Principle #1 - The Bias vs. Variance Tradeoff

I As a model includes more variables it becomes less biased(think about what happens if you omit a quadratic term for atruly quadratic relationship)

I However, additional variables also increase a model’s variance(think about what happens if you include a 6th degreepolynomial for a truly linear relationship)

I If too many variables are included, the model might fit thesample data well (low bias) but it’s coefficients will changedramatically if data is added or removed (high variable)

23 / 35

Page 24: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

The Bias vs. Variance Tradeoff

0 1 2 3 4 5

010

2030

40Biased but Low Variance

X

Y

Original DataNew Data (20 different sets)

I Simple linear regression is biased because it doesn’t account forthe curvature in the true relationship between X and Y

I However, it is shows low variance, fitting it to a differentsample doesn’t change much

24 / 35

Page 25: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

The Bias vs. Variance Tradeoff

0 1 2 3 4 5

010

2030

40Low Bias but High Variance (6th degree polynomials)

X

Y

Original DataNew Data (20 different sets)

I This model is very capable of capturing the curvature in thetrue relationship between X and Y

I However, it contains too many parameters, it changesdramatically depending on the specific sample that it is fit to

25 / 35

Page 26: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model

Principle #2 - Parsimony

I If two models are equally good (roughly) at explaining anoutcome, the simpler should be preferred (this principle issometimes called “Occam’s razor”)

I Simpler models are easier to interpret and have lower variance;however, we don’t want to simplify things too much

26 / 35

Page 27: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model - Exhaustive Approaches

I So how do find the sweet spot where the model isn’t toocomplex or too simple?

I A metric like R2 will always suggest the largest modelI But this model will have high variance (it fits the current data

well, but its coefficients could change dramatically if data pointsare added or removed)

I A better metric will adjust for the number of variables a modelincludes, potentially penalizing larger models which might beoverfitI Adjusted R2 does exactly this, it modifies R2 to account for

the number of predictor variables

27 / 35

Page 28: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model - Exhaustive Approaches

I A metric like Adjusted R2 makes it reasonable to comparemany possible models and objectively choose one of themI When the number of variables is small enough, it can be

feasible to use a best subsets approach that considers allpossible combinations of the available variables

I In Minitab, this can be done using “Stat -> Regression ->Regression -> Best Subsets”

I Unfortunately, Minitab only allows you to use quantitativepredictors when doing best subsets

28 / 35

Page 29: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model - Exhaustive Approaches

Which model appears to be the best?

29 / 35

Page 30: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model - Algorithmic Approaches

I When there are too many possible models to manually siftthrough, an alternative approach is to use an algorithm:I For example, we could start with an intercept only modelI Then add the variable that is “most significant” (based upon

that variable’s F -test)I We could keep doing this until there are no statistically

significant variables left to addI This procedure is known as forward selection

30 / 35

Page 31: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model - Algorithmic Approaches

I Alternatively, our algorithm could start with the full model andeliminate variables with high p-values one-at-a-timeI When there are no more variables that can be eliminated the

algorithm endsI This procedure is known as backward selection

I A compromise algorithm known as stepwise selection is likethe aforementioned procedures, but it can either add or dropvariables at every step (rather than only dropping variables likebackward selection, or only adding variables like forwardselection)

31 / 35

Page 32: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model - Algorithmic Approaches

I These selection algorithms are implemented in Minitab and canbe accessed using the “Stepwise” button under “Fit RegressionModel”

I Practice: With your group: apply backward selection to find amodel for “Score” in UT-Austin professorI Start with the predictors “bty_avg”, “age”, “ethnicity”,

“gender”, “rank”, and “outfit” and use α = .1I What is your final model? Which variable is most important?

32 / 35

Page 33: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model

I Algorithmic approaches, despite being frequently used, haveseveral downsidesI They are greedy algorithms, a computer science term meaning

they focus on making a short-term optimization at each stepbut aren’t guaranteed to yield the best overall model

I They rarely agree - forward, backward, and stepwise approachesoften choose different models

I They rely on multiple hypothesis tests and don’t makecorrections (this is difficult because we are never sure how manytests will be conducted during the model search)

I They remove the human element from modeling

33 / 35

Page 34: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model

I Better approaches exist including:I cross validationI model selection criteria like AIC and BICI penalization approaches like LASSO

I These approaches are beyond the scope of this course, but youcan learn about some of them in STA-230 (Intro to DataScience)

34 / 35

Page 35: New Linear Regression (part 2) · 2020. 8. 8. · Linear Regression (part 2) Author: Ryan Miller Created Date: 12/4/2019 9:00:35 AM ...

Choosing a Model - My Recommendations

I Each variable in your model should both make sense to youcontextually and have a relatively small p-valueI How small is up to you, but I suggest not setting any hard

thresholds

I Algorithmic approaches can provide useful starting points, butyou shouldn’t lock yourself in to using the model they suggest

I Polynomial effects should be included with skepticism, youshould have a clear reason to consider using themI Quadratic or cubic effects should be clearly visible in residual

plotsI Your real-world knowledge of the situation suggests their use

(for example, very high and very low blood glucose bothincrease the risk of negative health outcomes)

35 / 35