On Model Selection Criteria for Climate Change Impact Studies · Motivation Empirical Practice:...

41
On Model Selection Criteria for Climate Change Impact Studies Xiaomeng Cui Dalia Ghanem Todd Kuffner UC Davis UC Davis WUSTL December 8, 2017 University of Southern California 1 / 47

Transcript of On Model Selection Criteria for Climate Change Impact Studies · Motivation Empirical Practice:...

On Model Selection Criteria for Climate Change Impact Studies

Xiaomeng Cui Dalia Ghanem Todd KuffnerUC Davis UC Davis WUSTL

December 8, 2017University of Southern California

1 / 47

Motivation

Empirical Literature

Setting: For i = 1, 2, . . . , n, t = 1, 2, . . . ,T , we observe a scalar outcome Yit ; inaddition, for each i and t, we observe a regressor Witτ at a higher frequencyτ = 1, 2, . . . ,H.

I Weather and Climate Change Impacts (Review: Dell, Jones and Olken, JEL2014)Climate Change and Agriculture: Deschenes and Greenstone (2007, AER),Fisher et al (2012, AER), Schlenker and Roberts (2009, PNAS), Schlenkerand Lobell (2010, ERL), Feng, Krueger and Oppenheimer (2010, PNAS),Lobell, Schlenker and Costa-Roberts (2011, Science), Cui (2017), ...Climate Change and Health: Deschenes, Greenstone and Guryan (2009,AER), Deschenes and Greenstone (2011, AEJ: Applied), ...

I Pollution ImpactsBoone et al (2013), Burney and Ramanathan (2014, PNAS), Carter et al(2016, Scientific Reports)

2 / 47

Motivation

Empirical Practice: CHOOSE summary statistics of T kit ({Witτ}Hτ=1) as

regressors in a linear fixed effects model

yit =K∑

k=1

βkT kit ({Witτ}Hτ=1) + ai + uit (1)

Note: straightforward to accommodate year fixed effects and other control variables

Annual Mortality Rate Daily Temperature

1970 1975 1980 1985

0.000

0.005

0.010

0.015

0.020

Year

Annua

l Mort

ality R

ate

1970 1975 1980 1985

020

4060

80

Year

Daily

Mean

Temper

ature

Examples of Summary Statistics

- temperature bins

- various degree day measures

- annual or seasonal means

- quadratic function of annual mean

- ...

3 / 47

Motivation

Empirical Practice: CHOOSE summary statistics of T kit ({Witτ}Hτ=1) as

regressors in a linear fixed effects model

yit =K∑

k=1

βkT kit ({Witτ}Hτ=1) + ai + uit (1)

Note: straightforward to accommodate year fixed effects and other control variables

Annual Mortality Rate Daily Temperature

1970 1975 1980 1985

0.000

0.005

0.010

0.015

0.020

Year

Annua

l Mort

ality R

ate

1970 1975 1980 1985

020

4060

80

Year

Daily

Mean

Temper

ature

Examples of Summary Statistics

- temperature bins

- various degree day measures

- annual or seasonal means

- quadratic function of annual mean

- ...

3 / 47

Motivation

Goals of the Literature

1. Causal Inferenceidentify the damage/response function that governs the relationshipbetween Yit and the high-frequency regressor {Witτ}Hτ=1

2. Predictionestimate the impact of future climate change on outcomes of interest

This paper

I importance of model selection in this problem

I special features of the model selection problem faced by empirical researchersI properties of cross-validation (CV) and information criteria (IC) as model

selection tools for climate change impact studies- importance of guarding against over-fitting- mean squared error as a loss function

Disclaimer: the paper will focus on linear fixed effects models only.

4 / 47

Outline of the Talk

Why Model Selection in Climate Change Impact Studies?

Model Selection Problem

Overview of Model Selection Criteria

Simulation Study

Empirical Illustration

5 / 47

Why Model Selection in Climate Change Impact Studies?An Economic Reason

Consider the following models:

I Annual Mean (A): Yit = −0.5Wit + ai + uitI Quadratic in Annual Mean (QinA): Yit = 0.2Wit − 0.02W 2

it + ai + uit

I Quarterly Mean (Q): Yit = 0.5WQ1it − 0.75WQ2

it + ai + uit

Different damages functions have different policy implications!

I Differences in responses across space and seasons- A: an additional degree in annual mean temperature will affect all areas in

the same way.- QinA: an additional degree in annual mean temperature will be more/less

harmful in warmer regions- Q: additional warming in the summer, but beneficial in the winter

I Different adaptation mechanismsE.g., suppose the outcome is agricultural yield,

- QinA: agriculture will move to historically cooler areas as they get warmer- Q: growing season can be shifted within the year in regions with warming

patterns in the winter

I Different predictions for future damages due to climate change

6 / 47

Why Model Selection in Climate Change Impact Studies?An Economic Reason

Consider the following models:

I Annual Mean (A): Yit = −0.5Wit + ai + uitI Quadratic in Annual Mean (QinA): Yit = 0.2Wit − 0.02W 2

it + ai + uit

I Quarterly Mean (Q): Yit = 0.5WQ1it − 0.75WQ2

it + ai + uit

Different damages functions have different policy implications!

I Differences in responses across space and seasons- A: an additional degree in annual mean temperature will affect all areas in

the same way.- QinA: an additional degree in annual mean temperature will be more/less

harmful in warmer regions- Q: additional warming in the summer, but beneficial in the winter

I Different adaptation mechanismsE.g., suppose the outcome is agricultural yield,

- QinA: agriculture will move to historically cooler areas as they get warmer- Q: growing season can be shifted within the year in regions with warming

patterns in the winter

I Different predictions for future damages due to climate change

6 / 47

Why Model Selection in Climate Change Impact Studies?A Statistical Reason

Simulation IllustrationDesignn = 100, T = 5, S = 500

I Wit is a random sample of counties from the NCDC temperature dataset forthe years 1968-1972

I “Fixed Effects”: ai |Wi ∼ N(0.5Wi , 1)

I Idiosyncratic Shocks: uit |Wi , ai are i.i.d. across i , mixture normal, mean 0,heteroskedastic and correlated across time

Design Details

I Outcome is generated using the following models:(i) Annual Mean: Yit = Wit + ai + uit

(ii) Quadratic in Annual Mean: Yit = 0.2Wit − 0.05W 2it + ai + uit

(iii) Quarterly Mean: Yit = −0.5WQ1it + 0.75W

Q2it + ai + uit

7 / 47

Why Model Selection in Climate Change Impact Studies?A Statistical Reason

Models Considered

- “Seasonal” Mean ModelsA Annual Mean ModelB Bi-Annual Mean ModelQ Quarterly Mean ModelM Monthly Mean Model

- Other ModelsQinA Quadratic in Annual Mean ModelBins 10◦ F bins

Simulation Summary StatisticsFor each model considered, we report:

- simulation Mean and S .D. of linear FE estimator

- probability of rejecting a 5% significance test (p0.05)

8 / 47

Why Model Selection in Climate Change Impact Studies?A Statistical Reason

DGP (Y ) Annual Mean Quadratic inAnnual Mean

Quarterly Mean

Model Mean S.D. p0.05 Mean S.D. p0.05 Mean S.D. p0.05

A A Mean 1 0.05 1 -5.50 0.05 1 -0.27 0.05 0.86

B B1 Mean 0.50 0.04 1 -2.81 0.04 1 -0.82 0.04 1B2 Mean 0.50 0.03 1 -2.69 0.03 1 0.30 0.03 1

Q Q1 Mean 0.25 0.02 1 -1.42 0.02 1 -0.50 0.02 1Q2 Mean 0.25 0.04 1 -1.27 0.04 1 0 0.04 0.06Q3 Mean 0.24 0.04 1 -1.31 0.04 1 0.75 0.04 1Q4 Mean 0.25 0.02 1 -1.36 0.02 1 0 0.02 0.04

M M1 Mean 0.08 0.02 1 -0.48 0.02 1 -0.17 0.02 1M2 Mean 0.08 0.02 0.92 -0.48 0.02 1 -0.17 0.02 1M3 Mean 0.08 0.01 1 -0.49 0.01 1 -0.17 0.01 1M4 Mean 0.08 0.03 0.89 -0.32 0.03 1 0 0.03 0.06M5 Mean 0.09 0.03 0.94 -0.54 0.03 1 0 0.03 0.07M6 Mean 0.08 0.03 0.78 -0.49 0.03 1 0 0.03 0.06M7 Mean 0.08 0.04 0.67 -0.34 0.04 1 0.25 0.04 1M8 Mean 0.08 0.03 0.82 -0.43 0.03 1 0.25 0.03 1M9 Mean 0.08 0.03 0.69 -0.46 0.03 1 0.25 0.03 1M10 Mean 0.08 0.03 0.84 -0.36 0.03 1 0 0.03 0.09M11 Mean 0.08 0.03 0.81 -0.53 0.03 1 0 0.03 0.05M12 Mean 0.09 0.02 0.99 -0.45 0.02 1 0 0.02 0.05

Notes: Qj denotes the jth quarter, Mj denotes the jth month.

9 / 47

Why Model Selection in Climate Change Impact Studies?A Statistical Reason

DGP(Y ) Annual Mean Quadratic inAnnual Mean

Quarterly Mean

Model Mean S.D. p0.05 Mean S.D. p0.05 Mean S.D. p0.05

QinA A Mean 1.01 0.32 0.89 0.21 0.32 0.14 1.91 0.32 0.95

A Mean2 0 0 0.06 -0.05 0 1 -0.02 0 0.99

Bins Bin (−∞, 10) -0.15 0.02 1 0.74 0.02 1 0.13 0.02 1Bin [10, 20) -0.11 0.01 1 0.55 0.01 1 0.14 0.01 1Bin [20, 30) -0.08 0.01 1 0.43 0.01 1 0.08 0.01 1Bin [30, 40) -0.06 0.01 1 0.29 0.01 1 0.02 0.01 0.80Bin [40, 50) -0.03 0.01 0.99 0.17 0.01 1 0.01 0.01 0.16Bin [60, 70) 0.03 0.01 0.96 -0.18 0.01 1 -0.04 0.01 1Bin [70, 80) 0.05 0.01 1 -0.31 0.01 1 0.01 0.01 0.01Bin [80, 90) 0.07 0.01 1 -0.42 0.01 1 0.04 0.01 0.75Bin [90,+∞) 0.24 0.18 0.37 -1.62 0.18 1 0.97 0.18 1

10 / 47

Why Model Selection in Climate Change Impact Studies?A Statistical Reason

DGP(Y ) Quadratic in Annual Mean

n =100 n=500 n=1000

A A Mean -5.50 0.05 1 -5.26 0.02 1 -5.25 0.02 1

QinA A Mean 0.21 0.32 0.14 0.19 0.15 0.26 0.20 0.11 0.46

A Mean2 -0.05 0 1 -0.05 0 1 -0.05 0 1

Q Q Mean 1 -1.42 0.02 1 -1.33 0.01 1 -1.33 0.01 1Q Mean 2 -1.27 0.04 1 -1.20 0.02 1 -1.22 0.01 1Q Mean 3 -1.31 0.04 1 -1.32 0.02 1 -1.32 0.01 1Q Mean 4 -1.36 0.02 1 -1.31 0.01 1 -1.31 0.01 1

Model Selection based on “Stars” (p-hacking)

I if one model is true, all other models will likely have significant coefficients

I less likely to choose QinA model when it is true

11 / 47

Model Selection Problem

Set of Models Considered: {Mj}Jj=1 where J <∞For each j , Mj

Yit =X jitβ

j + aji + ujit

where

- kj ≡ dim((X jit)′) = dim(βj )

- Wit ≡ {Witτ}Hτ=1

- X jit = fj (Wit), where X j

it is a 1× kj vector.

Remarks

I All models considered are linear in the parameters and separable in aji and ujitI We assume that the true model M∗ =Mj for some j .

I The key difference between the models is the included regressors.

I Additional regressors and fixed effects can be readily accommodated.

Special feature of this problem is that all models considered consist of differentsummary statistics of the same underlying high-frequency variables.

12 / 47

Model Selection Problem: Definitions

For two models M1 and M2, assume wlog k1 < k2.

I nested: if M1 is a restricted version of M2.Example: annual and quarterly mean model

I overlapping: if restricted versions of M1 and M2 yield the same model.Example: quadratic in annual mean and quarterly mean model

I strictly non-nested : are neither nested nor overlapping

13 / 47

Overview of Model Selection Criteria

I Cross ValidationMCCV (Monte Carlo Cross-Validation)

- Create b random subsamples (training sets) of the original sample- For each subsample, estimate each model under consideration (training)- Use the complement of each subsample (test data) to predict each model

out-of-sample- Calculate the mean squared error (MSE) across subsamples- Decision Rule: Choose model that minimizes the MSE of the MCCV

procedure

Source: https://stats.stackexchange.com/questions/51416/k-fold-vs-monte-carlo-cross-validation

14 / 47

Overview of Model Selection Criteria

I Information CriteriaFor linear models

GICλn = n log(MSEn) + λnkj (2)

- εi = yi − y ji

- MSEn =∑n

i=1 ε2i /n is the MSE of the original sample

- λn is the penalty term for the dimension of the model

AIC: λn = 2BIC: λn = log(n)

- kj is the number of parameters in Mj

- Decision Rule: Choose model that minimizes the information criterion.

15 / 47

Overview of Model Selection Criteria: Consistent Model Selection

Definition: P(choosing the true model)→ 1 as n→∞

I MC Cross-validation (Shao 1993)

- MCCV can deliver consistent model selection if training sample size (nc) isproportionately smaller with larger samples and proportionately more to thetesting sample.nc/n → 0

- Formal results only available/possible for homoskedastic and seriallyuncorrelated errors.

I Information Criteria (Vuong 1989, Shao 1997, Sin and White 1996, Hong andPreston 2012)

- AIC tends to overfit.- BIC is consistent, but not for non-nested model selection.

- GICλn with λn =√

n log log(n) is consistent for all cases.- Formal results can accommodate heteroskedasticity and serial correlation.

In practice, we never know the true model, so what?

I if a model that nests the true model is considered, then a consistentprocedure will choose that model.

I if no model nests the true model is considered, then a consistent procedurewill choose the model with the smallest mean squared error asymptotically.

16 / 47

Overview of Model Selection Criteria: Consistent Model Selection

Definition: P(choosing the true model)→ 1 as n→∞

I MC Cross-validation (Shao 1993)

- MCCV can deliver consistent model selection if training sample size (nc) isproportionately smaller with larger samples and proportionately more to thetesting sample.nc/n → 0

- Formal results only available/possible for homoskedastic and seriallyuncorrelated errors.

I Information Criteria (Vuong 1989, Shao 1997, Sin and White 1996, Hong andPreston 2012)

- AIC tends to overfit.- BIC is consistent, but not for non-nested model selection.

- GICλn with λn =√

n log log(n) is consistent for all cases.- Formal results can accommodate heteroskedasticity and serial correlation.

In practice, we never know the true model, so what?

I if a model that nests the true model is considered, then a consistentprocedure will choose that model.

I if no model nests the true model is considered, then a consistent procedurewill choose the model with the smallest mean squared error asymptotically.

16 / 47

Simulation Study

PreviewI When the true model is considered or the models considered nest it.

- AIC and MCCV-p tend to overfit, with the former fairing worse in thatregard than the latter.

- BIC, MCCV-Shao, SW1 and SW2 choose the true model/the mostparsimonious model that nests the true model.

I When the true model is not considered and the models considered do not nestit.

- AIC, MCCV-p, BIC and MCCV-Shao tend to overfit.- SW1 and SW2 choose the most parsimonious model with the smallest MSE.

However, in this case, minimizing MSE may be misleading in terms of whatare important aspects of the damage function.

What is interesting here is that the relationship between the considered modelsand the true model is what matters for consistency, not how models relate to eachother as in “classical model selection problems”.

17 / 47

Simulation Study

Back to the Previous Design

Model Selection ProblemsNested Models

A Annual Mean Model

B Bi-Annual Mean Model

Q Quarterly Mean Model

M Monthly Mean Model(X j

itβj =

∑12s=1 β

jsW

Msit , where W

Msit =

∑τ∈Ms

Witτ/|Ms |)

Overlapping/Non-nested Models

A Annual Mean Model

QinA Quadratic in Annual Mean Model

Q Quarterly Mean Model

Bins 10◦ F bins Model

I We will consider the choice between 2, 3 and all models in each problem.

18 / 47

Simulation Study

Model Selection CriteriaI MCCV (nc/n)

- p = 0.75 (MCCV-p)

- pn = n−1/4 (MCCV-Shao)

- λnT = 2 (AIC)- λnT = log(nT ) (BIC)

- λnT =√

nT log(log(nT )) (SW1)

- λnT =√

nT log(nT ) (SW2)

0 2000 4000 6000 8000 10000

050

100

150

N

Pena

lty

BIC

SW1SW2

19 / 47

Simulation Study

DGP=A (Annual Mean): Yit = Wit + ai + uit

Nested ModelsA B Q M

Sim MSE 0.583 0.584 0.584 0.583

n = 100 n = 500MCCV GICλnT

MCCV GICλnT

Models p Shao AIC BIC SW1 SW2 p Shao AIC BIC SW1 SW2

A 0.99 1 0.88 1 1 1 0.99 1 0.89 1 1 1M 0.01 0 0.12 0 0 0 0.01 0 0.11 0 0 0

A 0.88 0.95 0.78 0.98 1 1 0.88 0.98 0.80 0.98 1 1B 0.12 0.05 0.22 0.02 0 0 0.12 0.02 0.20 0.02 0 0

A 0.94 1 0.79 1 1 1 0.93 1 0.82 1 1 1Q 0.06 0 0.21 0 0 0 0.07 0 0.18 0 0 0

A 0.87 0.95 0.71 0.98 1 1 0.87 0.98 0.74 0.98 1 1B 0.12 0.05 0.19 0.02 0 0 0.12 0.02 0.17 0.02 0 0M 0.01 0 0.10 0 0 0 0.01 0 0.09 0 0 0

A 0.93 1 0.75 1 1 1 0.92 1 0.77 1 1 1Q 0.06 0 0.17 0 0 0 0.07 0 0.16 0 0 0M 0.01 0 0.07 0 0 0 0.01 0 0.08 0 0 0

A 0.85 0.95 0.68 0.97 1 1 0.84 0.98 0.72 0.98 1 1B 0.11 0.05 0.17 0.02 0 0 0.11 0.02 0.15 0.02 0 0Q 0.03 0 0.15 0 0 0 0.05 0 0.13 0 0 0

B 0.92 0.99 0.71 0.99 1 1 0.87 1 0.69 1 1 1Q 0.07 0.01 0.20 0.01 0 0 0.11 0 0.21 0 0 0M 0.01 0 0.09 0 0 0 0.02 0 0.10 0 0 0

A 0.84 0.95 0.64 0.97 1 1 0.83 0.98 0.68 0.98 1 1B 0.11 0.05 0.15 0.02 0 0 0.11 0.02 0.13 0.02 0 0Q 0.03 0 0.13 0 0 0 0.05 0 0.11 0 0 0M 0.01 0 0.07 0 0 0 0.01 0 0.08 0 0 0

I MCCV-p and AIC is not vanishing as n increases.

Definition. Sim MSE is the mean squared error for the full sample of 3074 counties using the

simulation “pseudo-true”, i.e. simulation mean of the estimated parameters of each model.20 / 47

Simulation Study

DGP=A (Annual Mean): Yit = Wit + ai + uit

Non-Nested ModelsA QinA Q Bins

Sim MSE 0.583 0.583 0.584 0.610

n = 100 n = 500MCCV GICλnT

MCCV GICλnT

Models p Shao AIC BIC SW1 SW2 p Shao AIC BIC SW1 SW2

A 0.87 0.95 0.77 0.95 1 1 0.88 0.98 0.76 0.97 1 1QinA 0.13 0.05 0.23 0.05 0 0 0.12 0.02 0.24 0.03 0 0

A 1 1 0.98 1 1 1 1 1 1 1 1 1Bins 0 0 0.02 0 0 0 0 0 0 0 0 0

QinA 0.99 1 0.98 1 1 1 1 1 1 1 1 1Bins 0.01 0 0.02 0 0 0 0 0 0 0 0 0

A 0.87 0.95 0.75 0.95 1 1 0.98 0.81 0.76 0.97 1 1QinA 0.13 0.05 0.23 0.05 0 0 0.02 0.19 0.24 0.03 0 0Bins 0 0 0.02 0 0 0 0 0 0 0 0 0

A 0.94 1 0.77 1 1 1 0.93 1 0.81 1 1 1Q 0.06 0 0.20 0 0 0 0.07 0 0.19 0 0 0

Bins 0 0 0.02 0 0 0 0 0 0 0 0 0

A 0.82 0.95 0.62 0.95 1 1 0.82 0.98 0.62 0.97 1 1QinA 0.13 0.05 0.20 0.05 0 0 0.11 0.02 0.22 0.03 0 0

Q 0.05 0 0.18 0 0 0 0.07 0 0.16 0 0 0

QinA 0.82 0.93 0.72 0.99 1 1 0.83 0.99 0.77 1 1 1Q 0.17 0.07 0.26 0.01 0 0 0.17 0.01 0.23 0 0 0

Bins 0.01 0 0.02 0 0 0 0 0 0 0 0 0

A 0.82 0.95 0.60 0.95 1 1 0.82 0.98 0.62 0.97 1 1QinA 0.13 0.05 0.20 0.05 0 0 0.11 0.02 0.22 0.03 0 0

Q 0.05 0 0.18 0 0 0 0.07 0 0.16 0 0 0Bins 0 0 0.02 0 0 0 0 0 0 0 0 0

I All models considered nest the true model, except the Bins, which has a larger MSE.

I MCCV-Shao, BIC, SW1 and SW2 exhibit consistency.

21 / 47

Simulation Study

DGP=Q (Quarterly Mean): Yit = −0.25WQ1it + 0.75WQ2

it + ai + uit

Nested ModelsA B Q M

Sim. MSE 1.57 1.22 0.58 0.58

n = 100 n = 500MCCV GICλnT

MCCV GICλnT

Models p Shao AIC BIC SW1 SW2 p Shao AIC BIC SW1 SW2

A 0 0 0 0 0 1 0 0 0 0 0 0M 1 1 1 1 1 0 1 1 1 1 1 1

A 0 0 0 0 0 0 0 0 0 0 0 0B 1 1 1 1 1 1 1 1 1 1 1 1

A 0 0 0 0 0 0 0 0 0 0 0 0Q 1 1 1 1 1 1 1 1 1 1 1 1

A 0 0 0 0 0 0 0 0 0 0 0 0B 0 0 0 0 0.01 1 0 0 0 0 0 0M 1 1 1 1 0.99 0 1 1 1 1 1 1

A 0 0 0 0 0 0 0 0 0 0 0 0Q 0.98 1 0.86 1 1 1 0.98 1 0.90 1 1 1M 0.02 0 0.14 0 0 0 0.02 0 0.10 0 0 0

A 0 0 0 0 0 0 0 0 0 0 0 0B 0 0 0 0 0 0 0 0 0 0 0 0Q 1 1 1 1 1 1 1 1 1 1 1 1

B 0 0 0 0 0 0 0 0 0 0 0 0Q 0.98 1 0.86 1 1 1 0.98 1 0.90 1 1 1M 0.02 0 0.14 0 0 0 0.02 0 0.10 0 0 0

A 0 0 0 0 0 0 0 0 0 0 0 0B 0 0 0 0 0 0 0 0 0 0 0 0Q 0.98 1 0.86 1 1 1 0.98 1 0.90 1 1 1M 0.02 0 0.14 0 0 0 0.02 0 0.10 0 0 0

I SW2 has a tendency to underfit for smaller sample sizes, but problem eventually subsides.

22 / 47

Simulation Study

DGP=Q (Quarterly Mean): Yit = −0.25WQ1it + 0.75WQ2

it + ai + uit

Non-nested ModelsA QinA Q Bins

Sim. MSE 1.57 1.57 0.58 1.14

n = 100 n = 500MCCV GICλnT

MCCV GICλnT

Models p Shao AIC BIC SW1 SW2 p Shao AIC BIC SW1 SW2

A 0.03 0.16 0.01 0.07 1 1 0.09 0.52 0.03 0.47 1 1QinA 0.97 0.84 0.99 0.93 0 0 0.91 0.48 0.97 0.53 0 0

A 0 0 0 0 1 1 0 0 0 0 0 1Bins 1 1 1 1 0 0 1 1 1 1 1 0

QinA 0 0 0 0 1 1 0 0 0 0 0 1Bins 1 1 1 1 0 0 1 1 1 1 1 0

A 0 0 0 0 1 1 0 0 0 0 0 1QinA 0 0 0 0 0 0 0 0 0 0 0 0Bins 1 1 1 1 0 0 1 1 1 1 1 0

A 0 0 0 0 0 0 0 0 0 0 0 0Q 1 1 1 1 1 1 1 1 1 1 1 1

Bins 0 0 0 0 0 0 0 0 0 0 0 0

A 0 0 0 0 0 0 0 0 0 0 0 0QinA 0 0 0 0 0 0 0 0 0 0 0 0

Q 1 1 1 1 1 1 1 1 1 1 1 1

QinA 0 0 0 0 0 0 0 0 0 0 0 0Q 1 1 1 1 1 1 1 1 1 1 1 1

Bins 0 0 0 0 0 0 0 0 0 0 0 0

A 0 0 0 0 0 0 0 0 0 0 0 0QinA 0 0 0 0 0 0 0 0 0 0 0 0

Q 1 1 1 1 1 1 1 1 1 1 1 1B 0 0 0 0 0 0 0 0 0 0 0 0

I BIC and MCCV-Shao exhibit inconsistency issues in (A vs. QinA).

I SW1 and SW2 underfit for smaller samples (A vs. Bins).

23 / 47

Simulation Study

DGP=Q (Quarterly Mean): Yit = −0.25WQ1it + 0.75WQ2

it + ai + uit

Non-nested ModelsA QinA Q Bins

Sim. MSE 1.57 1.57 0.58 1.14

n = 1000 n = 3000

Models AIC BIC SW1 SW2 AIC BIC SW1 SW2

A 0.02 0.40 1 1 0 0 1 1QinA 0.98 0.60 0 0 1 1 0 0

A 0 0 0 0.62 0 0 0 0Bins 1 1 1 0.38 1 1 1 1

QinA 0 0 0 0 0 0 0 0Bins 1 1 1 1 1 1 1 1

A 0 0 0 0.62 0 0 0 0QinA 0 0 0 0 0 0 0 0Bins 1 1 1 0.38 1 1 1 1

A 0 0 0 0 0 0 0 0Q 1 1 1 1 1 1 1 1

Bins 0 0 0 0 0 0 0 0

A 0 0 0 0 0 0 0 0QinA 0 0 0 0 0 0 0 0

Q 1 1 1 1 1 1 1 1

QinA 0 0 0 0 0 0 0 0Q 1 1 1 1 1 1 1 1

Bins 0 0 0 0 0 0 0 0

A 0 0 0 0 0 0 0 0QinA 0 0 0 0 0 0 0 0

Q 1 1 1 1 1 1 1 1Bins 0 0 0 0 0 0 0 0

I BIC and MCC-Shao exhibit remain inconsistent for larger samples (A vs. QinA).

I Underfitting problem of SW1 and SW2 subsides (A vs. Bins).

24 / 47

Simulation Study

DGP=QinA (Quadratic in Annual Mean): Yit = 0.2Wit − 0.05W 2it + ai + uit

Nested ModelsA B Q M

Sim MSE 1.05 1.06 1.04 0.97

n = 1000 n = 3000

Models AIC BIC SW1 SW2 AIC BIC SW1 SW2

A 0 0 1 1 0 0 1 1M 1 1 0 0 1 1 0 0

A 1 1 1 1 1 1 1 1B 0 0 0 0 0 0 0 0

A 0.05 0.44 1 1 0 0.01 1 1Q 0.95 0.56 0 0 1 0.99 0 0

A 0 0 1 1 0 0 1 1B 0 0 0 0 0 0 0 0M 1 1 0 0 1 1 0 0

A 0 0 1 1 0 0 1 1Q 0 0 0 0 0 0 0 0M 1 1 0 0 1 1 0 0

A 0.05 0.44 1 1 0 0.01 1 1B 0 0 0 0 0 0 0 0Q 0.95 0.56 0 0 1 0.99 0 0

B 0 0 1 1 0 0 1 1Q 0 0 0 0 0 0 0 0M 1 1 0 0 1 1 0 0

A 0 0 1 1 0 0 1 1B 0 0 0 0 0 0 0 0Q 0 0 0 0 0 0 0 0M 1 1 0 0 1 1 0 0

I BIC is not consistent in the comparison between A and Q, which is interesting becausethey are nested models, but neither one of them nests the true model.

I MSE criterion may not be ideal for this problem, note that choosing monthly mean modelis misleading from a policy perspective.

25 / 47

Simulation Study

DGP=QinA (Quadratic in Annual Mean): Yit = 0.2Wit − 0.05W 2it + ai + uit

Non-nested ModelsA Q inA Q Bins

Sim MSE 1.05 0.58 1.04 1.59

n = 1000 n = 3000

AIC BIC SW1 SW2 AIC BIC SW1 SW2

A 0 0 0 0 0 0 0 0QinA 1 1 1 1 1 1 1 1

A 1 1 1 1 1 1 1 1Bins 0 0 0 0 0 0 0 0

QinA 1 1 1 1 1 1 1 1Bins 0 0 0 0 0 0 0 0

A 0 0 0 0 0 0 0 0QinA 1 1 1 1 1 1 1 1Bins 0 0 0 0 0 0 0 0

A 0.05 0.44 1 1 0 0.01 1 1Q 0.95 0.56 0 0 1 0.99 0 0

Bins 0 0 0 0 0 0 0 0

A 0 0 0 0 0 0 0 0QinA 1 1 1 1 1 1 1 1

Q 0 0 0 0 0 0 0 0

QinA 1 1 1 1 1 1 1 1Q 0 0 0 0 0 0 0 0

Bins 0 0 0 0 0 0 0 0

A 0 0 0 0 0 0 0 0QinA 1 1 1 1 1 1 1 1

Q 0 0 0 0 0 0 0 0Bins 0 0 0 0 0 0 0 0

I BIC is inconsistent in the choice between A and Q.

26 / 47

Simulation Study

Summary of Recommendations

I Information criteria are consistent under the most general conditions.

I AIC (MCCV-p) will tend to overfit.

I BIC (MCCV-Shao) leads to consistent model selection, unless no modelconsidered nests the true model.Important to consider the physical relationship to ensure that we are close tothe true model.

I SW1/SW2 leads to consistent model selection in general, however it mayunderfit in some settings.

27 / 47

Empirical Illustration

Data: a subsample of US counties in Deschenes & Greenstone (2011)(3074 counties in 49 states from 1968-1988)

Regression:

yit = X jitβ

j + θPit + δs t + ai + γtεit

I yit- overall annual mortality rate (number of deaths per 100,000, not age-adjusted)- age-specific annual mortality rate (number of deaths per 100,000) for < 1,

1− 45, 45− 65, > 65

I X jit = fj (Tit): temperature variables constructed based on daily average

temperature Tit ≡ {Titτ}Hτ=1

I Pit : annual total precipitation

I δs : state-level linear trends

I γt : year fixed effects

Note: This is NOT a replication of Deschenes and Greenstone (2011). The dataset in Deschenes

and Greenstone (2011) spans 1968-2002, use precipitation bins and age-adjusted mortality rate.

28 / 47

Empirical Illustration

Temperature specifications:

- No temperature variables (N)

- Annual mean (A)

- Quarterly mean (Q)

- Monthly mean (M)

- Quadratic in annual mean (QinA)

- Annual 20◦F bins (10◦Bins)reference bin: 40-60◦F

- Annual 10◦F bins (20◦Bins)reference bin: 40-50◦F

29 / 47

Empirical Illustration: Estimation Results

A Q M

Annual Mean -0.82(0.55)

Q1 Mean -0.85***(0.22)

Q2 Mean 0.09(0.43)

Q3 Mean 0.92*(0.45)

Q4 Mean -0.08(0.29)

M1 Mean -0.50**(0.18)

M2 Mean -0.21(0.18)

M3 Mean -0.34(0.22)

M4 Mean 0.50(0.27)

M5 Mean -0.12(0.26)

M6 Mean -0.70**(0.26)

M7 Mean 0.78*(0.33)

M8 Mean 0.43(0.32)

M9 Mean 0.01(0.29)

M10 Mean 0.53*(0.25)

M11 Mean 0.37(0.22)

M12 Mean -0.46*(0.19)

Observations 64,554 64,554 64,554

Notes: All regressions control for total precipitation, state linear trends, county and year fixed effects.Standard errors (in parentheses) are clustered at county-level. Significance: * (p < 0.05), ** (p < 0.01),*** (p < 01). 30 / 47

Empirical Illustration: Estimation Results

A QinA 20◦FBins

10◦FBins

Annual Mean -0.82 -2.52(0.55) (3.29)

Annual Mean2 0.02(0.03)

[−∞,20] 0.21*(0.09)

[20,40] 0.16**(0.06)

[60,80] -0.01(0.06)

[80,+∞] 0.11(0.08)

[−∞,10] 0.29*(0.13)

[10,20] 0.19(0.13)

[20,30] 0.17*(0.08)

[30,40] 0.20**(0.07)

[50,60] 0.05(0.07)

[60,70] 0.01(0.08)

[70,80] 0.03(0.08)

[80,+∞] 0.15(0.09)

Observations 64,554 64,554 64,554 64,554

Notes: All regressions control for total precipitation, state linear trends, county and year fixed effects.

Standard errors (in parentheses) are clustered at county-level. Significance: * (p < 0.05), ** (p < 0.01),

*** (p < 01).

31 / 47

Empirical Illustration: Model Selection

Nested ModelsN A Q M

MSE 13,317 13,316 13,313 13,307

AIC 613,193 613,192 613,182 613,171

BIC 613,819 613,827 613,844 613,906

SW1 640,241 640,632 641,798 644,923

SW2 671,398 672,240 674,761 681,499

MCCV 13423 13423 13422 13421

(p=0.75,b=1000)

MCCV-Shao 11958 11959 11962 11971

(p=0.13,b=6148)

Non-Nested ModelsN A Q in A Bin 20F Bin 10F

MSE 13,317 13,316 13,316 13,313 13,313

AIC 613,193 613,192 613,194 613,184 613,192

BIC 613,819 613,827 613,838 613,847 613,891

SW1 640,241 640,632 641,025 641,800 643,376

SW2 671,398 672,240 673,086 674,763 678,145

MCCV 13423 13423 13424 13422 13424

(p=0.75,b=1000)

MCCV-Shao 11958 11959 11960 11961 11967

(p=0.13,b=6148)

N: No temperature variables, A: Annual mean, Q: Quarterly mean, M: Monthly mean,QinA: Quadratic in annual mean, Bin 10F: Annual 10◦F bins (reference bin: 40-50◦F),Bin 20F: Annual 20◦F bins (reference bin: 40-60◦F) 32 / 47

Empirical Illustration: Estimation Results (Age: >65)

A Q M

Annual Mean -16.84***(3.48)

Q1 Mean -10.27***(1.56)

Q2 Mean -2.79(2.58)

Q3 Mean 7.47**(2.86)

Q4 Mean -2.64(1.85)

M1 Mean -3.30**(1.13)

M2 Mean -4.14***(1.17)

M3 Mean -3.87**(1.43)

M4 Mean 0.81(1.72)

M5 Mean -0.25(1.59)

M6 Mean -3.60*(1.72)

M7 Mean 2.85(2.04)

M8 Mean 3.13(1.93)

M9 Mean 1.21(1.85)

M10 Mean -0.19(1.69)

M11 Mean 1.59(1.23)

M12 Mean -2.40*(1.17)

Observations 64,554 64,554 64,554

Notes: All regressions control for total precipitation, state linear trends, county and year fixed effects.Standard errors (in parentheses) are clustered at county-level. Significance: * (p < 0.05), ** (p < 0.01),*** (p < 01). 33 / 47

Empirical Illustration: Estimation Results (Age: >65)

A QinA 20◦FBins

10◦FBins

Annual Mean -16.84*** -42.68*(3.48) (18.79)

Annual Mean2 0.25(0.17)

[−∞,20] 2.74***(0.62)

[20,40] 0.84*(0.35)

[60,80] -0.95*(0.37)

[80,+∞] -0.31(0.49)

[−∞,10] 3.25***-0.78

[10,20] 1.91*-0.88

[20,30] 0.57(0.52)

[30,40] 0.65(0.45)

[50,60] -0.65(0.45)

[60,70] -1.46**(0.50)

[70,80] -1.16*(0.49)

[80,+∞] -0.61(0.57)

Observations 64,554 64,554 64,554 64,554

Notes: All regressions control for total precipitation, state linear trends, county and year fixed effects.Standard errors (in parentheses) are clustered at county-level. Significance: * (p < 0.05), ** (p < 0.01),*** (p < 01).

34 / 47

Empirical Illustration: Model Selection (Age: >65)

Nested ModelsN A Q M

MSE 469,982 469,728 469,403 469,321

AIC 843,242 843,209 843,171 843,175

BIC 843,869 843,845 843,833 843,911

SW1 870,290 870,649 871,787 874,927

SW2 901,447 902,258 904,750 911,503

MCCV 472,738 472,506 472,246 472,333

(p=0.75,b=1000)

MCCV-Shao 479,201 479,033 478,954 479,489

(p=0.13,b=6148)

Non-Nested ModelsN A Q in A Bin 20F Bin 10F

MSE 469,982 469,728 469,712 469,579 469,579

AIC 843,242 843,209 843,209 843,195 843,203

BIC 843,869 843,845 843,853 843,857 843,902

SW1 870,290 870,649 871,041 871,811 873,387

SW2 901,447 902,258 903,101 904,774 908,156

MCCV 472,738 472,506 472,508 472,458 472,491

(p=0.75,b=1000)

MCCV-Shao 479,201 479,033 479,085 479,141 479,376

(p=0.13,b=6148)

N: No temperature variables, A: Annual mean, Q: Quarterly mean, M: Monthly mean,QinA: Quadratic in annual mean, Bin 10F: Annual 10◦F bins (reference bin: 40-50◦F),Bin 20F: Annual 20◦F bins (reference bin: 40-60◦F)

35 / 47

Empirical Illustration: Summary

I For the mortality rate > 65, BIC/MCCV-Shao choose the quarterly meanmodel.The result is intuitive, since it shows warming in the winter reduces themortality rate but warming in the summer increases it.

I For overall mortality rate and for all other age groups, all consistent modelselection criteria (BIC, etc.) choose a model with no temperaturevariables/annual mean model (which is not significant).

I The results are qualitatively consistent with Deschenes and Greenstone(2011), even though we use a different sample size and specification.

36 / 47

Concluding Remarks

Summary

I The model selection problem examined here is relevant whether we areinterested in examining climate change impacts or adaptation to climatechange.

I Practical recommendations- BIC behaves similarly to MCCV-Shao, and it is less computationally

intensive.- BIC is inconsistent when at least one of the models under consideration nests

the true model. (This is different from the typical model selection setting.)- SW1 can guard against overfitting, but it can underfit in some cases.

Further Questions/Caveats

I Prediction of impacts of future climate change- Is MSE the right criterion?

I Post-selection inference- Data splitting?

I Nonseparability in unobservables- e.g. random coefficient models

I Infinite-dimensional true model

Thank you!

37 / 47

Concluding Remarks

Summary

I The model selection problem examined here is relevant whether we areinterested in examining climate change impacts or adaptation to climatechange.

I Practical recommendations- BIC behaves similarly to MCCV-Shao, and it is less computationally

intensive.- BIC is inconsistent when at least one of the models under consideration nests

the true model. (This is different from the typical model selection setting.)- SW1 can guard against overfitting, but it can underfit in some cases.

Further Questions/Caveats

I Prediction of impacts of future climate change- Is MSE the right criterion?

I Post-selection inference- Data splitting?

I Nonseparability in unobservables- e.g. random coefficient models

I Infinite-dimensional true model

Thank you!37 / 47