Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010...

44
Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression

Transcript of Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010...

Page 1: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Harlan D. Harris, PhDJared P. Lander, MA

NYC Predictive Analytics MeetupOctober 14, 2010

Predicting Pizza in Chinatown: An Intro to Multilevel Regression

Page 2: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

1. How do cost and fuel type affect pizza quality?

2. How do those factors vary by neighborhood?

Page 3: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Linear Regression (OLS)

ratingi = β0 + βprice*pricei + εi

 find β’s to minimize Σεi

2

 

Page 4: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Linear Regression (OLS)

ratingi = βp + βprice*pricei + εi

 find β’s to minimize Σεi

2

 

Page 5: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Multiple Regression

ratingi = beta[intercept] * 1 +              beta[price] * pricei +              beta[oven=wood] * I(oveni=wood) +              beta[oven=coal] * I(oveni=coal) +              errori

Goal: find betas/coefficients that minimize Σi errori2

3 types of oven = 2 coefficients(gas is reference)

Page 6: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Multiple Regression (OLS)

ratingi = β0 + βprice*pricei +  βwood*I(oveni = "wood") + βcoal*I(oveni = "coal") +     εi

 find β’s to minimize Σεi

2

  

Page 7: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Multiple Regression (OLS) with Interactions

ratingi = β0 +

βprice*pricei +

  βwood*I(oveni = "wood") +

βwood,price*pricei*

I(oveni = "wood") +

  βcoal*I(oveni = "coal") +  

  βcoal,price*pricei* 

       I(oveni = "coal") +

  εi

  

Page 8: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Groups

Examples:teachers / test scoresstates / poll results pizza ratings / neighborhoods

Page 9: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Full Pooling (ignore groups)

Examples:teachers / test scoresstates / poll results pizza ratings /                neighborhoods

ratingi = β0 + βprice*pricei + εi

Page 10: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

No Pooling (groups as factors)

ratingi = β0 +

βprice*pricei +

  βB*I(groupi = "B") + 

  βB,price*pricei*

I(groupi = "B") +  

  βC*I(groupi = "C") +  

  βC,price*pricei*

I(groupi = "C") +    

  εi

 

  

Page 11: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

PizzasName Rating $/Slice Fuel Type Neighborhoo

d

Rosario’s 3.5 2.00 Gas Lower East Side

Ray’s 2.8 2.50 Gas Chinatown

Joe’s 3.3 1.75 Wood East Village

Pomodoro 3.8 3.50 Coal SoHo

Response

Continuous

Categorical Group

Page 12: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Data Summary in R

> za.df <- read.csv("Fake Pizza Data.csv") > summary(za.df)     Rating       CostPerSlice   HeatSource      Neighborhood Min.   :0.030   Min.   :1.250   Coal: 17   Chinatown  :14    1st Qu.:1.445   1st Qu.:2.000   Gas :158   EVillage   :48    Median :4.020   Median :2.500   Wood: 25   LES        :35    Mean   :3.222   Mean   :2.584              LittleItaly:43    3rd Qu.:4.843   3rd Qu.:3.250              SoHo       :60    Max.   :5.000   Max.   :5.250                              

http://github.com/HarlanH/nyc-pa-meetup-multilevel-pizza

Page 13: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Viewing the Data in R> plot(za.df) 

Page 14: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Visualizeggplot(za.df, aes(CostPerSlice, Rating,    color=HeatSource)) + geom_point() + facet_wrap(~ Neighborhood) + geom_smooth(aes(color=NULL),    color='black', method='lm',     se=FALSE, size=2)

Page 15: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

> lm.full.main <- lm(Rating ~ CostPerSlice + HeatSource, data=za.df)> plotCoef(lm.full.main)

http://www.jaredlander.com/code/plotCoef.r

Multiple Regression in R

Page 16: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Full-Pooling: Include Interaction> lm.full.int <- lm(Rating ~ CostPerSlice * HeatSource,data=za.df)> plotCoef(lm.full.int)                

Page 17: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Visualize the Fit (Full-Pooling)> lm.full.int <- lm(Rating ~ CostPerSlice * HeatSource,data=za.df)

                

Page 18: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.
Page 19: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

No Pooling Modellm(Rating ~ CostPerSlice * Neighborhood + HeatSource,data=za.df) 

Page 20: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Visualize the Fit (No-Pooling)lm(Rating ~ CostPerSlice * Neighborhood + HeatSource,data=za.df)

Page 21: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Evaluation of Fitted Model

• Cross-Validation Error

• Adjusted-R2

• AIC

• BIC

• RSS

• Tests for Normal Residuals

Page 22: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Use Natural Groupings

Cluster Sampling

Intercluster Differences

Intracluster Similarities

Page 23: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Multilevel Characteristics

Model gravitates toward big groups

Small groups gravitate toward the model Best when groups are similar to each other  y_i = Intercept_j[i] + Slope_j[i] + noise

Intercept[j] = Intercept_alpha + Slope_alpha + noiseSlope[j] = Intercept_beta + Slope_beta + noise

Model the effects of the groups

Page 24: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Multi-Names for Multilevel Models

Multilevel

Hierarchical

Mixed-Effects

Bayesian

Partial-Pooling

Page 25: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Multi-Names for Multilevel Models (1) Fixed effects are constant across individuals, and random effects

vary. For example, in a growth study, a model with random intercepts a_i and fixed slope b corresponds to parallel lines for different individuals i, or the model y_it = a_i + b t. Kreft and De Leeuw (1998) thus distinguish between fixed and random coefficients.

(2) Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella, and McCulloch (1992, Section 1.4) explore this distinction in depth.

(3) "When a sample exhausts the population, the corresponding variable isfixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random." (Green and Tukey, 1960)

(4) "If an effect is assumed to be a realized value of a random variable, it is called a random effect." (LaMotte, 1983)

(5) Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage ("linear unbiased prediction" in the terminology of Robinson, 1991). This definition is standard in the multilevel modeling literature (see, for example, Snijders and Bosker, 1999, Section 4.2) and in econometrics.

http://www.stat.columbia.edu/~cook/movabletype/archives/2005/01/why_i_dont_use.html

Page 26: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Bayesian Interpretation

Everything has a distribution (including the groups)

Group-level model is prior information for the individual-level coefficients

Group-level model has an assumed-normal prior

(Can fit multilevel models with Bayesian methods, or with simpler/faster/easier approximations.)

Page 27: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

R Options

• lme4::lmer()

• nlme::lme()

• MCMCglmm()

• BUGS

• Others/niche approaches…

Page 28: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Back to the Pizza

Model the overall pattern among neighborhoods

Natural clustering of pizzerias in neighborhoods adds information

Neighborhoods with many/few pizzeriasMany: trust data, ala no-pooling modelFew: trust overall patterns, ala full-pooling model

Page 29: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Back to the PizzaUse Neighborhoods as natural grouping 

Page 30: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

5 slope coefficients and 5 intercept coefficients, one of each per neighborhood

Slopes/intercepts are assumed to have Gaussian distribution

Ideally, could describe all 5 slopes with 2 numbers (mean/variance)

Neighborhoods with little data don’t get freedom to set their own coefficients – get pulled towards overall slope or intercept

Multilevel Pizza

Page 31: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

R syntaxlm.me.cost2 <- lmer(Rating ~ HeatSource + (1+CostPerSlice | Neighborhood), data=za.df)

Page 32: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Results (Partial-Pooling)lm.me.cost2 <- lmer(Rating ~ HeatSource + (1+CostPerSlice | Neighborhood), data=za.df)

Page 33: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Predicting a New Pizzeria

Neighborhood: ChinatownCost: $4.20Fuel: Wood

Page 34: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Uncertainty in Prediction

Fitted coefficients are uncertain

arm::sim()

Model error term

rnorm(1, model matrix %*% sim$Neighborhood[ , ‘Chinatown’, ],

variance)

New neighborhood – model possible coefficients

mvrnorm(1, 0, VarCorr(model)$Neighborhood)

http://github.com/HarlanH/nyc-pa-meetup-multilevel-pizza

Page 35: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Red State Blue State

Other Examples

Page 36: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Tobacco Usage

Other Examples

Page 37: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Diabetes Prevalence

Other Examples

Page 38: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Insufficient Fruit and Vegetable Intake

Other Examples

Page 39: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Clean Drinking Water

Other Examples

Page 40: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Full-Pooling ModelNo-Pooling ModelSeparate ModelsTwo–Step Analysis

Steps to Multilevel Models

Page 41: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

As few as one or two groupsEven two observations per groupCan have many groups with just one

observation

How Many Groups? How Many Observations?

Page 42: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Andy Gelman: “The Blessing of Dimensionality”

More Data Add ComplexityBecause you can

Larger Datasets

Page 43: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Resources

Gelman and Hill (ARM)

Pineiro & Bates

Snijders and Bosker

R-SIG-Mixed-Models (http://glmm.wikidot.com/faq)

(SAS/SPSS)  

Page 44: Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression.

Thanks!