Robust linear static panel data models using ε-contamination

2017-06

Robust linear static panel data models using ε-contamination

Badi H. BaltagiaGeorges BressonAnoop ChaturvediGuy Lacroix

Août / August 2017

Centre de recherche sur les risquesles enjeux économiques et les politiques publiques

www.crrep.ca

Abstract

Baltagi : Department of Economics and Center for Policy Research, Syracuse University, Syracuse, New York, U.S.A. and Department of Economics, Leicester University, Leicester, UK - [email protected]

Bresson: Université Paris II / Sorbonne Universités, France - [email protected]

Chaturvedi: University of Allahabad, India - [email protected]

Lacroix: Départment d'économique, Université Laval, Québec, Canada - [email protected]

The paper develops a general Bayesian framework for robust linear static panel data models using ε-contamination. A two-step approach is employed to derive the conditional type-II maximum likelihood (ML-II) posterior distribution of the coeffcients and individual effects. The ML-II poste-rior densities are weighted averages of the Bayes estimator under a base prior and the data-de-pendent empirical Bayes estimator. Two-stage and three stage hierarchy estimators are developed and their finite sample performance is investigated through a series of Monte Carlo experiments. These include standard random effects as well as Mundlak-type, Chamberlain-type and Haus-man-Taylor-type models. The simulation results underscore the relatively good performance of the three-stage hierarchy estimator. Within a single theoretical framework, our Bayesian approach encompasses a variety of specifications while conventional methods require separate estimators for each case.

Keywords: ε-contamination, hyper g-priors, type-II maximum likelihood posterior density, panel data, robust Bayesian estimator, three-stage hierarchy

JEL Classification: C11, C23, C26

We are grateful to Richard Blundell, Matthew Harding, Franz Palm, participants at the 34th annual convention of Indian Society for Probability and Statistics (ISPS), Tirupati, India, Dec. 02, 2014, par-ticipants at the 21th International Panel Data Conference, Budapest, Hungary, June, 29-30, 2015 and at the research seminars of Maastricht University (The Netherlands), Université Laval (Canada) and University of Munich (Germany) for helpful comments and suggestions. We also thank the editor Oliver Linton, the associate editor and two anonymous referees of the Journal of Econome-trics for valuable comments and suggestions. The usual disclaimers apply.

1. Introduction

The choice of which classic panel data estimator to use in a linear static regression frameworkdepends upon the hypothesized correlation between the individual effects and the regressors. Oneusually assumes either of two polar situations. The random effects model assumes that the regressorsand the individual effects are uncorrelated. On the other hand, the fixed effects model assumes thatall the regressors are correlated with the individual effects (see e.g. Mundlak (1978); Chamberlain(1982)). An intermediate situation arises when both a subset of time-varying and time-invariant re-gressors are assumed correlated with the individual effects, in which case the instrumental variablesestimator of Hausman and Taylor (1981) is an appropriate alternative.

To a Bayesian analyst, the distinction between fixed, random and mixed models boils downto the specification of the number of stages in a given hierarchical model. While intuitively moreattractive, the Bayesian approach nevertheless rests upon hypothesized prior distributions (andpossibly on their hyperparameters). The choice of specific distributions is often made out of con-venience rather than being based upon relevant subjective information.1 Yet, it is well-known thatthe estimators can be sensitive to misspecification of the latter. Fortunately, this difficulty canbe partly circumvented by use of the robust Bayesian approach which relies upon a class of priordistributions and selects an appropriate one in a data dependent fashion. This paper studies therobustness of Bayesian panel data models to possible misspecification of the prior distribution inthe spirit of the works of Good (1965), Dempster (1977), Rubin (1977), Hill (1980), Berger (1985),Berger and Berliner (1984) and Berger and Berliner (1986) to mention a few. In particular, it isconcerned with what they call the posterior robustness which is different from the robustness a laWhite (1980). The idea is to acknowledge the prior uncertainty by specifying a class Γ of possibleprior distributions and then investigating the robustness of the posterior distribution as the priorvaries over Γ. Several classes of priors have been proposed in the literature but the most commonlyused one is the ε-contamination class. As mentioned by Berger (1985), the ε-contamination classof priors is particularly attractive to work with when investigating posterior robustness. The ε-contamination class combines the elicited prior, termed the base prior, with a contamination class ofpriors. This approach implements the Type-II maximum likelihood (ML-II) procedure for selectingthe appropriate prior distribution. The primary benefit of using such a contamination class of priordistributions is that the resulting estimator performs well even if the elicited base prior distributiondiffers from the prior distribution that is consistent with the data.

The objective of our paper is to propose a robust Bayesian approach for linear static paneldata models which departs from the standard Bayesian one in two ways. First, we consider the ε-contamination class of prior distributions for the model parameters (and for the individual effects).Second, both the base elicited priors and the ε-contamination priors use Zellner (1986)’s g-priorsrather than the standard Wishart distributions for the variance-covariance matrices. The papercontributes to the panel data literature by proposing a general robust Bayesian framework whichencompasses all the above mentioned conventional frequentist specifications and their associatedestimation methods as special cases.

Section 2 gives the general framework, while Section 3 derives the Type-II maximum likeli-

1For instance, conventional proper priors in the normal linear model have been based on the conjugate Normal-Gamma family essentially because all the marginal likelihoods have closed-form solutions. Likewise, statisticianscustomarily assume that the variance-covariance matrix of the slope parameters follow a Wishart distribution becauseit is convenient from an analytical point of view.

1

hood posterior mean and the variance-covariance matrix of the coefficients in a two-stage hierarchymodel. Section 4 introduces a three-stage hierarchy with generalized hyper-g priors on the variance-covariance matrix of the individual effects. The predictive densities corresponding to the base priorsand the ε-contaminated priors turn out to be Gaussian and Appell hypergeometric functions, respec-tively. The main difference between the two-stage and the three-stage hierarchy models pertains tothe definition of the Bayes estimators, the empirical Bayes estimators and the weights of the ML-IIposterior means. Section 5 investigates the finite sample performance of our robust Bayesian esti-mator through extensive Monte Carlo experiments. The simulation results underscore the relativelygood performance of the three-stage hierarchy estimator as compared to the standard frequentistestimation methods. Two applications on an earnings equation and a crime model, available inthe supplementary appendix, illustrate and confirm the simulation results. Section 6 gives ourconclusion.

2. The general setup

Let the Gaussian linear mixed model be written as:

yit = X ′itβ +W ′itbi + uit , i = 1, ..., N , t = 1, ..., T, (1)

where X ′it is a (1×K1) vector of explanatory variables including the intercept, and β is a (K1 × 1)vector of parameters. Furthermore, let W ′it denote a (1×K2) vector of covariates and bi a (K2 × 1)vector of parameters. The subscript i of bi indicates that the model allows for heterogeneityon the W variables. Finally, uit is a remainder term assumed to be normally distributed, i.e.uit ∼ N

(0, τ−1

). The distribution of uit is parametrized in terms of its precision τ rather than

its variance σ2u (= 1/τ) . In the statistics literature, the elements of β do not differ across i and are

referred to as fixed effects whereas the bi’s are referred to as random effects.2 This terminologydiffers from the one used in econometrics. In the latter, the bi’s are treated either as randomvariables, and hence referred to as random effects, or as constant but unknown parameters andhence referred to as fixed effects. In line with the econometrics terminology, whenever the bi’sare assumed to be correlated (uncorrelated) with all the X ′its, they will be termed fixed (random)effects.3

In the Bayesian context, following the seminal papers of Lindley and Smith (1972) and Smith(1973), several authors have proposed a very general three-stage hierarchy framework to handlesuch models (see, e.g., Chib and Carlin (1999); Koop (2003); Chib (2008); Greenberg (2008); Zhenget al. (2008); Rendon (2013)):

First stage : y = Xβ +Wb+ u, u ∼ N(0,Σ),Σ = τ−1INT

Second stage : β ∼ N (β0,Λβ) and b ∼ N (b0,Λb) (2)

Third stage : Λ−1b ∼Wish (νb, Rb) and τ ∼ G(·).

where y is (NT × 1), X is (NT ×K1), W is (NT ×K2), u is (NT × 1) and INT is a (NT ×NT )identity matrix. The parameters depend upon hyperparameters which themselves follow random

2See Lindley and Smith (1972), Smith (1973), Laird and Ware (1982), Chib and Carlin (1999), Koop (2003), Chib(2008) and Greenberg (2008) to mention a few.

3When we write fixed effects in italics, we refer to the terminology of the statistical or Bayesian literature.Conversely, when we write fixed effects (in normal characters), we refer to the terminology of panel data econometrics.

2

distributions. The second stage (also called fixed effects model in the Bayesian literature) updatesthe distribution of the parameters. The third stage (also called random effects model in the Bayesianliterature) updates the distribution of the hyperparameters. As stated by Smith (1973) (page 67)“for the Bayesian model the distinction between fixed, random and mixed models, reduces to thedistinction between different prior assignments in the second and third stages of the hierarchy”.In other words, the fixed effects model is a model that does not have a third stage. The randomeffects model simply updates the distribution of the hyperparameters. The precision τ is assumedto follow a Gamma distribution and Λ−1

b is assumed to follow a Wishart distribution with νb degreesof freedom and a hyperparameter matrix Rb which is generally chosen close to an identity matrix.In that case, the hyperparameters only concern the variance-covariance matrix of the b coefficients4

and the precision τ . As is well-known, Bayesian models may be sensitive to misspecification of thedistributions of the priors. Conventional proper priors in the normal linear model have been basedon the conjugate Normal-Gamma family because they allow closed form calculations of all marginallikelihoods. Likewise, rather than specifying a Wishart distribution for the variance-covariancematrices as is customary, Zellner’s g-prior (Λβ = (τgX ′X)

−1for β or Λb = (τhW ′W )

−1for b)

has been widely adopted because of its computational efficiency in evaluating marginal likelihoodsand because of its simple interpretation as arising from the design matrix of observables in thesample. Since the calculation of marginal likelihoods using a mixture of g-priors involves only aone-dimensional integral, this approach provides an attractive computational solution that madethe original g-priors popular while insuring robustness to misspecification of g (see Zellner (1986)and Fernandez et al. (2001) to mention a few). To guard against mispecifying the distributions ofthe priors, many suggest considering classes of priors (see Berger (1985)).

3. The robust linear static model in the two-stage hierarchy

Following Berger (1985), Berger and Berliner (1984, 1986), Zellner (1986), Moreno and Pericchi(1993), Chaturvedi (1996), and Chaturvedi and Singh (2012) among others, we consider the ε-contamination class of prior distributions for (β, b, τ):

Γ = {π (β, b, τ | g0, h0) = (1− ε)π0 (β, b, τ | g0, h0) + εq (β, b, τ | g0, h0)} . (3)

π0 (·) is the base elicited prior, q (·) is the contamination belonging to some suitable class Q ofprior distributions, 0 ≤ ε ≤ 1 is given and reflects the amount of error in π0 (·) . The precision τ isassumed to have a vague prior, p (τ) ∝ τ−1, 0 < τ < ∞, and π0 (β, b, τ | g0, h0) is the base priorassumed to be a specific g-prior with β ∼ N

(β0ιK1 , (τg0ΛX)

−1)

with ΛX = X ′X

b ∼ N(b0ιK2 , (τh0ΛW )

−1)

with ΛW = W ′W,(4)

where ιK1is a (K1 × 1) vector of ones. Furthermore, β0, b0, g0 and h0 are known scalar hyperpa-

rameters of the base prior π0 (β, b, τ | g0, h0). The probability density function (henceforth pdf) of

4Note that in (2), the prior distribution of β and b are assumed to be independent, so Var[θ] is block-diagonalwith θ = (β′, b′)′. The third stage can be extended by adding hyperparameters on the prior mean coefficientsβ0 and b0 and on the variance-covariance matrix of the β coefficients: β0 ∼ N

(β00,Λβ0

), b0 ∼ N

(b00,Λb0

)and

Λ−1β ∼ Wish

(νβ , Rβ

)(see for instance, Koop (2003); Greenberg (2008); Hsiao and Pesaran (2008); Bresson and

Hsiao (2011)).

3

the base prior π0 (.) is given by:

π0 (β, b, τ | g0, h0) = p (β | b, τ, β0, b0, g0, h0)× p (b | τ, b0, h0)× p (τ) . (5)

The possible class of contamination Q is defined as:

Q =

{q (β, b, τ | g0, h0) = p (β | b, τ, βq, bq, gq, hq)× p (b | τ, bq, hq)× p (τ)

with 0 < gq ≤ g0, 0 < hq ≤ h0

}, (6)

with β ∼ N(βqιK1 , (τgqΛX)

−1)

b ∼ N(bqιK2 , (τhqΛW )

−1),

(7)

where βq, bq, gq and hq are unknown. The restrictions gq ≤ g0 and hq ≤ h0 imply that the baseprior is the best possible so that the precision of the base prior is greater than any prior belongingto the contamination class. The ε-contamination class of prior distributions for (β, b, τ) is thenconditional on known g0 and h0 and two estimation strategies are possible:

1. a one-step estimation of the ML-II posterior distribution5 of β, b and τ ;

2. or a two-step approach as follows6:

(a) Let y∗ = (y −Wb). Derive the conditional ML-II posterior distribution of β given thespecific effects b.

(b) Let y = (y − Xβ). Derive the conditional ML-II posterior distribution of b given thecoefficients β.

We use the two-step approach because it simplifies the derivation of the predictive densities(or marginal likelihoods). In the one-step approach the pdf of y and the pdf of the base priorπ0 (β, b, τ | g0, h0) need to be combined to get the predictive density. It thus leads to a complicatedexpression whose integration with respect to (β, b, τ) may be involved. Using a two-step approachwe can integrate first with respect to (β, τ) given b and then, conditional on β, we can next integratewith respect to (b, τ) . Thus, the marginal likelihoods (or predictive densities) corresponding to thebase priors are:

m (y∗ | π0, b, g0) =

∞∫0

∫RK1

π0 (β, τ | g0)× p (y∗ | X, b, τ) dβ dτ

and

m (y | π0, β, h0) =

∞∫0

∫RK2

π0 (b, τ | h0)× p (y |W,β, τ) db dτ,

5“We consider the most commonly used method of selecting a hopefully robust prior in Γ, namely choice of thatprior π which maximizes the marginal likelihood m (y | π) over Γ. This process is called Type II maximum likelihoodby Good (1965)” (Berger and Berliner, 1986, page 463).

6We will see that the mean of the ML-II posterior of β (resp. of b) depends on draws from multivariate t-distributions. The two steps can thus be viewed as a Gibbs sampler.

4

with

π0 (β, τ | g0) =(τg0

2π

)K12

τ−1 |ΛX |1/2 exp(−τg0

2(β − β0ιK1)′ΛX(β − β0ιK1))

),

π0 (b, τ | h0) =

(τh0

2π

)K22

τ−1 |ΛW |1/2 exp

(−τh0

2(b− b0ιK2

)′ΛW (b− b0ιK2)

).

Solving these equations is considerably easier than solving the equivalent expression in the one-stepapproach.

3.1. The first step of the robust Bayesian estimator

Let y∗ = y −Wb. Combining the pdf of y∗ and the pdf of the base prior, we get the predictivedensity corresponding to the base prior7:

m (y∗ | π0, b, g0) =

∞∫0

∫RK1

π0 (β, τ | g0)× p (y∗ | X, b, τ) dβ dτ (8)

= H

(g0

g0 + 1

)K1/2(

1 +

(g0

g0 + 1

)(R2β0

1−R2β0

))−NT2

with H =Γ(NT2 )

π(NT2 )v(b)(NT2 )

, R2β0

=(β(b)−β0ιK1

)′ΛX(β(b)−β0ιK1)

(β(b)−β0ιK1)′ΛX(β(b)−β0ιK1

)+v(b), β (b) = Λ−1

X X ′y∗ and v (b) =

(y∗ −Xβ (b))′(y∗ −Xβ (b)), and where Γ (·) is the Gamma function.

Likewise, we can obtain the predictive density corresponding to the contaminated prior for thedistribution q (β, τ | g0, h0) ∈ Q from the class Q of possible contamination distributions:

m (y∗ | q, b, g0) = H

(gq

gq + 1

)K12

(1 +

(gq

gq + 1

)(R2βq

1−R2βq

))−NT2, (9)

where

R2βq =

(β (b)− βqιK1)′ΛX(β (b)− βqιK1)

(β (b)− βqιK1)′ΛX(β (b)− βqιK1

) + v (b).

As the ε-contamination of the prior distributions for (β, τ) is defined by π (β, τ | g0) = (1− ε)π0 (β, τ | g0)+εq (β, τ | g0), the corresponding predictive density is given by:

m (y∗ | π, b, g0) = (1− ε)m (y∗ | π0, b, g0) + εm (y∗ | q, b, g0)

andsupπ∈Γ

m (y∗ | π, b, g0) = (1− ε)m (y∗ | π0, b, g0) + ε supq∈Q

m (y∗ | q, b, g0) .

7Derivation of all the following expressions can be found in the supplementary appendix.

5

The maximization of m (y∗ | π, b, g0) requires the maximization of m (y∗ | q, b, g0) with respect toβq and gq. The first-order conditions lead to

βq =(ι′K1

ΛXιK1

)−1ι′K1

ΛX β (b) (10)

and

gq = min (g0, g∗) , (11)

with g∗ = max

( (NT −K1)

K1

(β (b)− βqιK1)′ΛX(β (b)− βqιK1)

v (b)− 1

)−1

, 0

= max

( (NT −K1)

K1

(R2βq

1−R2βq

)− 1

)−1

, 0

.Denote supq∈Qm (y∗ | q, b, g0) = m (y∗ | q, b, g0). Then

m (y∗ | q, b, g0) = H

(gq

gq + 1

)K12

(1 +

(gq

gq + 1

)( R2βq

1−R2βq

))−NT2.

Let π∗0 (β, τ | g0) denote the posterior density of (β, τ) based upon the prior π0 (β, τ | g0). Also, letq∗ (β, τ | g0) denote the posterior density of (β, τ) based upon the prior q (β, τ | g0). The ML-IIposterior density of β is thus given by:

π∗ (β | g0) =

∞∫0

π∗ (β, τ | g0) dτ

= λβ,g0

∞∫0

π∗0 (β, τ | g0) dτ +(

1− λβ,g0) ∞∫

0

q∗ (β, τ | g0) dτ

= λβ,g0π∗0 (β | g0) +

(1− λβ,g0

)q∗ (β | g0) (12)

with

λβ,g0 =

1 +ε

1− ε

gqgq+1g0g0+1

K1/21 +

(g0g0+1

)(R2β0

1−R2β0

)1 +

(gqgq+1

)( R2

βq

1−R2

βq

)

NT2−1

.

Note that λβ,g0 depends upon the ratio of the R2β0

and R2βq

, but primarily on the sample size NT .

Indeed, λβ,g0 tends to 0 when R2β0> R2

βqand tends to 1 when R2

β0< R2

βq, irrespective of the model

fit (i.e, the absolute values of R2β0

or R2βq

). Only the relative values of R2βq

and R2β0

matter.

It can be shown that π∗0 (β | g0) is the pdf (see the supplementary appendix) of a multivariate

t-distribution with mean vector β∗(b | g0), variance-covariance matrix

(ξ0,βM

−10,β

NT−2

)and degrees of

6

freedom (NT ) with

M0,β =(g0 + 1)

v (b)ΛX and ξ0,β = 1 +

(g0

g0 + 1

)(R2β0

1−R2β0

). (13)

β∗(b | g0) is the Bayes estimate of β for the prior distribution π0 (β, τ) :

β∗ (b | g0) =β (b) + g0β0ιK1

g0 + 1. (14)

Likewise q∗ (β) is the pdf of a multivariate t-distribution with mean vector βEB (b | g0), variance-

covariance matrix

(ξq,βM

−1q,β

NT−2

)and degrees of freedom (NT ) with

ξq,β = 1 +

(gq

gq + 1

)( R2βq

1−R2βq

)and Mq,β =

((gq + 1)

v (b)

)ΛX , (15)

where βEB (b | g0) is the empirical Bayes estimator of β for the contaminated prior distributionq (β, τ) given by:

βEB (b | g0) =β (b) + gqβqιK1

gq + 1. (16)

The mean of the ML-II posterior density of β is then:

βML−II = E [π∗ (β | g0)] (17)

= λβ,g0E [π∗0 (β | g0)] +(

1− λβ,g0)E [q∗ (β | g0)]

= λβ,g0β∗(b | g0) +(

1− λβ,g0)βEB (b | g0) .

The ML-II posterior density of β, given b and g0 is a shrinkage estimator. It is a weighted averageof the Bayes estimator β∗(b | g0) under base prior g0 and the data-dependent empirical Bayes

estimator βEB (b | g0). If the base prior is consistent with the data, the weight λβ,g0 → 1 and theML-II posterior density of β gives more weight to the posterior π∗0 (β | g0) derived from the elicited

prior. In this case βML−II is close to the Bayes estimator β∗(b | g0). Conversely, if the base prior is

not consistent with the data, the weight λβ,g0 → 0 and the ML-II posterior density of β is then close

to the posterior q∗ (β | g0) and to the empirical Bayes estimator βEB (b | g0). The ability of theε-contamination model to extract more information from the data is what makes it superior to theclassical Bayes estimator based on a single base prior. Following Berger (1985)(page 207), we derivethe analytical ML-II posterior variance-covariance matrix of β in the supplementary appendix.

3.2. The second step of the robust Bayesian estimator

Let y = y − Xβ. Moving along the lines of the first step, the ML-II posterior density of b isgiven by:

π∗ (b | h0) = λb,h0π∗0 (b | h0) +

(1− λb,h0

)q∗ (b | h0)

7

with

λb,h0=

1 +ε

1− ε

h

h+1h0

h0+1

K2/21 +

(h0

h0+1

)(R2b0

1−R2b0

)1 +

(h

h+1

)( R2

bq

1−R2

bq

)

NT2−1

,

where

R2b0 =

(b (β)− b0ιK2)′ΛW (b (β)− b0ιK2

)

(b (β)− b0ιK2)′ΛW (b (β)− b0ιK2

) + v (β),

R2bq

=(b (β)− bqιK2

)′ΛW (b (β)− bqιK2)

(b (β)− bqιK2)′ΛW (b (β)− bqιK2

) + v (β),

with b (β) = Λ−1W W ′y and v (β) = (y −Wb (β))′(y −Wb (β)),

bq =(ι′K2

ΛW ιK2

)−1ι′K2

ΛW b (β)

and

hq = min (h0, h∗)

with h∗ = max

( (NT −K2)

K2

(b (β)− bqιK2)′ΛW (b (β)− bqιK2

)

v (β)− 1

)−1

, 0

= max

( (NT −K2)

K2

(R2bq

1−R2bq

)− 1

)−1

, 0

.π∗0 (b | h0) is the pdf of a multivariate t-distribution with mean vector b∗(β | h0), variance-covariance

matrix

(ξ0,bM

−10,b

NT−2


M0,b =(h0 + 1)

v (β)ΛW and ξ0,b = 1 +

(h0

h0 + 1

)(b (β)− b0ιK2)′ΛW (b (β)− b0ιK2)

v (β).

b∗(β | h0) is the Bayes estimate of b for the prior distribution π0 (b, τ | h0) :

b∗(β | h0) =b (β) + h0b0ιK2

h0 + 1.

q∗ (b | h0) is the pdf of a multivariate t-distribution with mean vector bEB (β | h0), variance-covariance

matrix

(ξ1,bM

−11,b

NT−2


ξ1,b = 1 +

(hq

hq + 1

)(b (β)− bqιK2

)′ΛW (b (β)− bqιK2)

v (β)and M1,b =

(h+ 1

v (β)

)ΛW

8

and where bEB (β | h0) is the empirical Bayes estimator of b for the contaminated prior distributionq (b, τ | h0) :

bEB (β | h0) =β(b) + hq bqιK2

hq + 1.

The mean of the ML-II posterior density of b is hence given by:

bML−II = λbb∗(β | h0) +(

1− λβ)bEB (β | h0) .

The ML-II posterior variance-covariance matrix of b can be derived in a similar fashion8 to that ofβML−II .

3.3. Estimating the ML-II posterior variance-covariance matrix

Many have raised concerns about the unbiasedness of the posterior variance-covariance matricesof βML−II and bML−II . Indeed, they will both be biased towards zero as λβ,g0 and λb,h0

→ 0and converge to the empirical variance which is known to underestimate the true variance (see e.g.Berger and Berliner (1986); Gilks et al. (1997); Robert (2007)). Consequently, the assessment of

the performance of either βML−II or bML−II using standard quadratic loss functions can not beconducted using the analytical expressions. What is needed is an unbiased estimator of the trueML-II variances. In what follows, we propose two different strategies to approximate these, eachwith different desirable properties.

3.3.1. MCMC with multivariate t-distributions

Recall that the ML-II posterior densities of β and b, π∗(β|g0) and π∗(b|h0), are both multivariatet-distributions. We thus propose to use the following algorithm to approximate the variance matricesonce the model has been estimated:

1. Loop over D draws

2. In the first step of the model, draw a set of K1 values from the multivariate t-distributionsπ∗0 (β | g0) and q∗ (β) to get a (K1 × 1) vector of βd as

π∗ (β | g0) = λβ,g0π∗0 (β | g0) +

(1− λβ,g0

)q∗ (β | g0)

3. In the second step of the model, draw a set of N values from the multivariate t-distributionsπ∗0 (b | h0) and from q∗ (b | h0) to get a (N × 1) vector of bd as

π∗ (b | h0) = λb,h0π∗0 (b | h0) +

(1− λb,h0

)q∗ (b | h0)

4. Once the D draws are completed, use D∗(= D −Dburn) draws to compute

βML−II = E[β(D∗)

], σβML−II =

√diag

(V ar

[β(D∗)

])bML−II = E

[b(D

∗)]

, σbML−II =√diag

(V ar

[b(D∗)

]),

where Dburn are the burn in draws.

8See the supplementary appendix.

9

3.3.2. Block resampling bootstrap

As an alternative to MCMC simulations, we follow Laird and Louis (1987) and directly bootstrapthe data. In the context of panel data, various strategies may be used, and chief among them is theso-called individual block resampling bootstrap (see, e.g. Bellman et al. (1989); Andersson andKarlsson (2001); Kapetanios (2008)). Thus for an (N × T ) matrix Y , individual block resamplingconsists in drawing an (N×T ) matrix Y BR whose rows are obtained by resampling those of Y withreplacement. Conditionally on Y , the rows of Y BR are independent and identically distributed.9

We use such a procedure for all the dependent Y and explanatory X variables. Fortunately, weneed as few as BR = 20 bootstrap samples to achieve acceptable results.10 The following algorithmis used to approximate the variance matrices:

1. Loop over BR samples

2. In the first step, compute the mean of the ML-II posterior density of β using our initialshrinkage procedure

βML−II,br = E [π∗ (β | g0)]

= λβ,g0β∗(b | g0) +(

1− λβ,g0)βEB (b | g0) .

3. In the second step, compute the mean of the ML-II posterior density of b:

bML−II,br = λbb∗(β | h0) +(

1− λβ)bEB (β | h0)

4. Once the BR bootstraps are completed, use the (K1 ×BR) matrix of coefficients β(BR) andthe (N ×BR) matrix of coefficients b(BR) to compute:

βML−II = E[β(BR)

], σβML−II =

√diag

(V ar

[β(BR)

])bML−II = E

[b(BR)

], σbML−II =

√diag

(V ar

[b(BR)

])4. The robust linear static model in the three-stage hierarchy

As stressed earlier, the Bayesian literature introduces a third stage in the hierarchical model inorder to discriminate between fixed effects and random effects. Hyperparameters can be defined forthe mean and the variance-covariance of b (and sometimes β). Our goal in this paper is to considera contamination class of priors to account for uncertainty pertaining to the base prior π0 (β, b, τ),i.e., uncertainty about the prior means of the base prior. Consequently, assuming hyper priorsfor the means β0 and b0 of the base prior is tantamount to assuming the mean of the base priorto be unknown, which is contrary to our initial assumption. Following Chib and Carlin (1999),Chib (2008), Greenberg (2008), Zheng et al. (2008) among others, hyperparameters only concern

9Note that this assumes homoskedasticity as defined in (2). We do not account for heteroskedasticity in the blockbootstrap.

10For convenience, the number of bootstrap samples BR is relatively small compared to the sample size N . Increas-ing the number of bootstrap samples does not change the results but increases the computation time considerably.Note also that this approach is many times less computationally intensive than both the 2S with MCMC on themultivariate t-distributions and the full Bayesian estimator.

10

the variance-covariance matrix of the b coefficients. Because we use g-priors in the second stagefor β and b, g0 is kept fixed and assumed known. We need only define mixtures of g-priors on theprecision matrix of b, or equivalently on h0.

Maruyama and George (2011, 2014) have proposed the following generalized hyper-g prior :

p (g) =gc−1 (1 + g)

−(c+d)

B (c, d), c > 0, d > 0, (18)

where B (·) is the Beta function. This Beta-prime (or Pearson Type-VI) hyper prior for g is a gener-alization of the so-called Pareto type-II hyper-g. Using the generalized hyper-g prior specification,the three-stage hierarchy of the model can be defined as:

First stage : y ∼ N (Xβ +Wb,Σ) , Σ = τ−1INT (19)

Second stage : β ∼ N(β0ιK1

, (τg0ΛX)−1)

, b ∼ N(b0ιK2

, (τh0ΛW )−1)

Third stage : h0 ∼ β′(c, d) → p (h0) =hc−1

0 (1 + h0)−(c+d)

B (c, d), c > 0, d > 0.

We thus use hyperparameters only on the variance (τh0ΛW )−1, i.e. only on h0. Furthermore, we

do not introduce an ε-contamination class of prior distributions for this hyperparameter of thethird stage of the hierarchy, for example, p (h0) = (1− εa)π0 (h0) + εaq (h0), as our objective is toaccount for the uncertainty about the prior means of the base prior π0 (β, b, τ). The third stagedoes depend on the priors, so the specification uses Gaussian distributions for the parameters atthe second stage and a Beta-prime distribution for the priors at the third stage. Moreover, Berger(Berger, 1985, page 232) has stressed that the choice of a specific functional form for the third stagematters little.11 Therefore we restrict the ε-contamination class of prior distributions to the firststage prior only (the second stage of the hierarchy, i.e., for (β, b, τ)).

The first step of the robust Bayesian estimator in the three-stage hierarchy is strictly similarto the one in the two-stage hierarchy. But the three-stage hierarchy differs from the two-stagehierarchy in that it introduces a generalized hyper-g prior on h0. The unconditional predictivedensity corresponding to the base prior is then given by

m (y | π0, β) =

∞∫0

m (y | π0, β, h0) p(h0)dh0

=H

B (c, d)

1∫0

(ϕ)K22 +c−1

(1− ϕ)d−1

(1 + ϕ

(R2b0

1−R2b0

))−NT2dϕ

which can be written as:

m (y | π0, β) =B(d, K2

2 + c)

B (c, d)H ×2 F1

(NT

2;K2

2+ c;

K2

2+ c+ d;−

(R2b0

1−R2b0

)), (20)

11Sinha and Jayaraman (2010a,b) studied a ML-II contaminated class of priors at the third stage of hierarchicalpriors using normal, lognormal and inverse Gaussian distributions to investigate the robustness of Bayes estimateswith respect to possible misspecification at the third stage. Their results confirmed Berger (1985)’s assertion thatthe form of the second stage prior (the third stage of the hierarchy) does not affect the Bayes decision.

11

where 2F1(.) is the Gaussian hypergeometric function (see Abramovitz and Stegun (1970) and thesupplementary appendix). As shown by Liang et al. (2008), numerical overflow is problematic formoderate to large NT and large R2

b0. As the Laplace approximation involves an integral with

respect to a normal kernel, we can follow the suggestion of Liang et al. (2008) and develop an

expansion12 after a change of variable given by log(

h0

h0+1

).

Similar to the conditional predictive density corresponding to the contaminated prior on β (seeequation (9)), the unconditional predictive density corresponding to the contaminated prior on bcan be shown to give:

m (y | q, β) =H

B (c, d)

2( h∗h∗+1 )

K22

+c

K2+2c

× F1

(K2

2 + c; 1− d; NT2 ; K2

2 + c+ 1; h∗

h∗+1 ;− h∗

h∗+1

(R2

bq

1−R2

bq

))

+

[(h∗

h∗+1

)K22

(1 +

(h∗

h∗+1

)( R2

bq

1−R2

bq

))−NT2 ]

×

B (c, d)− ( h∗h∗+1 )

c

c

×2F1

(c; d− 1; c+ 1; h∗

h∗+1

)

, (21)

where F1(·) is the Appell hypergeometric function (see Appell (1882); Slater (1966); Abramovitzand Stegun (1970) and the supplementary appendix). m (y | q, β) can also be approximated usingthe same clever transformation as in Liang et al. (2008)(see the supplementary appendix).

We have shown earlier that the posterior density of (b, τ) for the base prior π0 (b, τ | h0) in thetwo-stage hierarchy model is given by:

π∗ (b, τ | h0) = λb,h0π∗0 (b, τ | h0) +

(1− λb,h0

)q∗ (b, τ | h0) ,

with

λb,h0=

(1− ε)m (y | π0, β, h0)

(1− ε)m (y | π0, β, h0) + εm (y | q, β, h0).

Hence, we can write

λb =

∞∫0

λb,h0p(h0)dh0 =

[1 +

(ε

1− ε

).m (y | q, β)

m (y | π0, β)

]−1

. (22)

Therefore, under the base prior, the Bayes estimator of b in the three-stage hierarchy model is givenby:

b∗ (β) =

∞∫0

b∗ (β | h0) p(h0)dh0 =1

c+ d

[d · b (β) + c · b0ιK2

].

12See the supplementary appendix. For the Monte Carlo simulation study and the empirical applications, insteadof using Gaussian hypergeometric integrals 2F1 and Appel integrals F1 with Laplace approximations, we prefer tosolve the integrals numerically with adaptive quadrature methods.

12

Likewise, under the contamination class of priors, the empirical Bayes estimator of b for the three-stage hierarchy model is given by

bEB (β) =

∞∫0

bEB (β | h0) p(h0)dh0 (23)

=1

B (c, d)

b (β)( h∗h∗+1 )

c

c ×2 F1

(c;−d; c+ 1; h∗

h∗+1

)+bqιK2

( h∗h∗+1 )

c+1

c+1 ×2 F1

(c+ 1; 1− d; c+ 2; h∗

h∗+1

)+{b (β)

(1

h∗+1

)+ bqιK2

(h∗

h∗+1

)}×

B (c, d)− ( h∗h∗+1 )

c

c

×2F1

(c; d− 1; c+ 1; h∗

h∗+1

)

and the ML-II posterior density of b is given by:

π∗ (b) =

∞∫0

π∗ (b, τ) dτ = λb

∞∫0

π∗0 (b, τ) dτ +(

1− λb) ∞∫

0

q∗ (b, τ) dτ

= λbπ∗0 (b) +

(1− λb

)q∗ (b) .

π∗0 (b) is the pdf of a multivariate t-distribution with mean vector b∗(β), variance-covariance matrix(ξ0,bM

−10,b

NT−2


M0,b =(h0 + 1)

v (β)ΛW and ξ0,b = 1 +

(h0

h0 + 1

)(R2b0

1−R2b0

).

q∗ (b) is the pdf of a multivariate t-distribution with mean vector bEB (β), variance-covariance

matrix

(ξq,bM

−1q,b

NT−2


ξq,b = 1 +

(hq

hq + 1

)(R2bq

1−R2bq

)and Mq,b =

((hq + 1)

v (β)

)ΛW .

The mean of the ML-II posterior density of b is thus given by

bML−II = E [π∗ (b)] = λbE [π∗0 (b)] +(

1− λb)E [q∗ (b)]

= λbb∗(β) +(

1− λb)bEB (β) . (24)

The ML-II posterior variance-covariance matrix of b can be derived as in the two-stage hierarchymodel. The main difference with the latter relates to the definition of the Bayes estimator b∗(β),

the empirical Bayes estimator bEB (β) and the weights λb (as compared to b∗(β | h0), bEB (β | h0)

and λb,h0). Since the variance-covariance matrix of both β and b are likely underestimated (see the

13

supplementary appendix) for the same reasons as those raised previously, they are computed usingboth MCMC with multivariate t-distributions and block resampling bootstrap.

As stressed earlier, the two-stage and three-stage hierarchical models correspond to fixed andrandom effects models, respectively. In the classical framework, these models refer to entirely differ-ent assumptions about the data generating process. Bayesian analysis with non-hierarchical priorsis analogous to the fixed effects model in panel data econometrics while introducing hierarchicalpriors leads to the random effects model in panel data econometrics. These specifications can betested using the Chib method of marginal likelihood (see Chib (1995) and Koop (2003)). In partic-ular, the maximum value of this marginal likelihood determines the choice between the two-stage(FE) and the three-stage (RE) estimators. Adaptation of the Chib method to our ML-II estimatoris outside the scope of our study. Our estimator needs to be compared to classical frequentistestimators as well as the standard full hierarchical model in order to assess its relative efficiency. Inthe next section, we perform a series of simulations to unearth the relative merit of each estimator.

5. A Monte Carlo simulation study

5.1. The DGP of the Monte Carlo study

Following Baltagi et al. (2003, 2009) and Baltagi and Bresson (2012), consider the static linearmodel:

yit = x1,1,itβ1,1 + x1,2,itβ1,2 + x2,itβ2 + Z1,iη1 + Z2,iη2 + µi + uit,

for i = 1, ..., N , t = 1, ..., T, with

x1,1,it = 0.7x1,1,it−1 + δi + ζit

x1,2,it = 0.7x1,2,it−1 + θi + ςit

uit ∼ N(0, τ−1

), (δi, θi, ζit, ςit) ∼ U(−2, 2)

and β1,1 = β1,2 = β2 = 1.

1. For a random effects (RE) world, we assume that:

η1 = η2 = 0

x2,it = 0.7x2,it−1 + κi + ϑit , (κi, ϑit) ∼ U(−2, 2)

µi ∼ N(0, σ2

µ

), ρ =

σ2µ

σ2µ + τ−1

= 0.3, 0.8.

Furthermore, x1,1,it, x1,2,it and x2,it are assumed to be exogenous in that they are not corre-lated with µi and uit.

2. For a Mundlak-type fixed effects (FE) world, we assume that:

η1 = η2 = 0;

x2,it = δ2,i + ω2,it , δ2,i ∼ N(mδ2 , σ2δ2), ω2,it ∼ N(mω2

, σ2ω2

);

mδ2 = mω2= 1, σ2

δ2 = 8, σ2ω2

= 2;

µi = x2,iπ + νi, νi ∼ N(0, σ2ν), x2,i =

1

T

T∑t=1

x2,it;

σ2ν = 1, π = 0.8.

14

x1,1,it and x1,2,it are assumed to be exogenous but x2,it is correlated with the µi and weassume a constant correlation coefficient, π = 0.8.

3. For a Chamberlain-type fixed effects (FE) world, we assume that:

η1 = η2 = 0;

x2,it = δ2,i + ω2,it , δ2,i ∼ N(mδ2 , σ2δ2), ω2,it ∼ N(mω2

, σ2ω2

);

mδ2 = mω2= 1, σ2

δ2 = 8, σ2ω2

= 2;

µi = x2,i1π1 + x2,i2π2 + ...+ x2,iTπT + νi, νi ∼ N(0, σ2ν);

σ2ν = 1, πt = (0.8)T−t for t = 1, ..., T.

x1,1,it and x1,2,it are assumed to be exogenous but x2,it is correlated with the µi and weassume an exponential growth for the correlation coefficient πt.

4. For a Hausman-Taylor (HT) world, we assume that:

η1 = η2 = 1;

x2,it = 0.7x2,it−1 + µi + ϑit , ϑit ∼ U(−2, 2);

Z1,i = 1, ∀i;Z2,i = µi + δi + θi + ξi, ξi ∼ U(−2, 2);

µi ∼ N(0, σ2

µ

), and ρ =

σ2µ

σ2µ + τ−1

= 0.3, 0.8.

x1,1,it and x1,2,it and Z1,i are assumed to be exogenous while x2,it and Z2,i are endogenousbecause they are correlated with the µi but not with the uit.

For each set-up, we vary the size of the sample and the duration of the panel. We choose several(N,T ) pairs with N = 100, 500 and T = 5, 10. We set the initial values of x1,1,it and x1,2,it tozero. We next generate x1,1,it, x1,2,it, uit, ζit, uit, ςit, ω2,it over T + T0 time periods and we dropthe first T0(= 50) observations to reduce the dependence on the initial values. The robust Bayesianestimators for the two-stage hierarchy (2S) and for the three-stage hierarchy (3S) are estimatedwith ε = 0.5, though we also investigate their robustness to various values of ε.13

We must set the hyperparameters values β0, b0, g0, h0, τ for the initial distributions of β ∼N(β0ιK1

, (τg0ΛX)−1)

and b ∼ N(b0ιK2

, (τh0ΛW )−1)

. While we can choose arbitrary values for

β0, b0 and τ , the literature generally recommends using the unit information prior (UIP) to setthe g-priors.14 In the normal regression case, and following Kass and Wasserman (1995), the UIPcorresponds to g0 = h0 = 1/NT , leading to Bayes factors that behave like the Bayesian InformationCriterion (BIC).

For the three-stage hierarchy (3S), we need to choose the coefficients (c, d) of the generalizedhyper-g priors. Following Liang et al. (2008) we set c = d = 1 for the Beta-prime distribution. Inthat case, the density is shaped as a hyperbola. In order to have the same shape under the UIP

13ε = 0.5 is an arbitrary value. We implicitly assume that the amount of error in the base elicited prior is 50%. Inother words, ε = 0.5 means that we elicit the π0 prior but feel we could be as much as 50% off (in terms of impliedprobability sets). We could have chosen any value for ε.

14We chose: β0 = 0, b0 = 0 and τ = 1.

15

principle (i.e., h0 close to 1/NT ), we chose c = 0.1 and d = 1. For the three-stage hierarchy (3S),we solve the integrals numerically with adaptive quadrature methods (Davis and Rabinowitz, 1984;Press et al., 2007).

Most simulations are based upon N = 100, T = 5, ρ = 0.8 and ε = 0.5, although we investigatethe robustness of our estimators to different parameter sets. For the 2S and 3S robust estimators,we use BR = 20 samples in the block resampling bootstrap and D = 1, 000 with Dburn = 500when drawing from the multivariate t-distributions. Finally, the Full Bayesian (FB) estimators arenot derived in the paper for the sake of brevity but are presented in the supplementary appendix.When estimating the FB models, we also use D=1,000 draws with Dburn = 500.

For each experiment, we run R = 1, 000 replications and we compute the means, the standarderrors and the root mean squared errors (RMSEs) of the coefficients.

5.2. The results of the Monte Carlo study

In its most general form, the Gaussian linear mixed model can be written as (see equation (1)):

y = Xβ +Wb+ u. (25)

In what follows, we show how the usual classical estimators can be rewritten as a transformation ofthis model. We then estimate the classical and Bayesian counterparts and compare their properties.

5.2.1. The random effects world

Rewrite the general model (25) as follows:

y = Xβ + Zµµ+ u,

where u ∼ N(0,Σ), Σ = τ−1INT , Zµ = IN ⊗ ιT is (NT ×N), ⊗ is the Kronecker product, ιTis a (T × 1) vector of ones and µ is a (N × 1) vector of idiosyncratic parameters. When W ≡Zµ, the random effects, µ ∼ N

(0, σ2

µIN), are associated with the error term ν = Zµµ + u with

Var (ν) = σ2µ (IN ⊗ JT ) + σ2

uINT , where JT = ιT ι′T . This model is customarily estimated using

Feasible Generalized Least Squares (FGLS)(see Hsiao (2003) or Baltagi (2013)).Table 1 reports the results of fitting the classical FGLS model along with those from the 2S,

3S and Full Bayesian (FB) models, each in a separate panel. The true parameter values appear onthe first line of the table. The last column reports the computation time in seconds.15 Note thatthe computation time increases significantly as we move from the FGLS to the 2S-3S (bootstrap),to the 2S-3S (t-dist) and finally to the FB models. The performance of the robust estimators withblock resampling is nevertheless within an acceptable range.

The first noteworthy feature of the table is that all the estimators yield essentially the sameparameter estimates, standard errors16 and RMSEs. The main difference between the 2S and the3S lies in the estimated value of the weight λµ, which tends towards zero in the 3S. This implies that

the empirical Bayes estimator bEB(β) (or equivalently µEB(β)) accounts for almost 100% of the

weight in estimating the individual specific effects µi while the empirical Bayes estimator βEB(b)

15The simulations were conducted using R version 3.3.2 on a MacBook Pro, 2.8 GHz core i7 with 16Go 1600 MGzDDR3 ram.

16Strictly speaking, we should mention “posterior means” and “posterior standard errors” whenever we refer toBayesian estimates and “coefficients” and “standard errors” when discussing frequentist ones. For the sake of brevity,we will use “coefficients” and “standard errors” in both cases.

16

(or equivalently βEB(µ)) accounts for about 95% of the weight in estimating the β coefficients. Forthe two-stage hierarchical robust Bayesian estimator, the base prior for the specific effects b (orequivalently µ) is “more consistent” with the data as compared to the base prior for β since the

weight λb (or equivalently λµ) is around 36% while the weight λβ is around 6%. For the 3S robustBayesian estimator, to the contrary, the base prior for the specific effects b (or equivalently µ) isinconsistent with the data as the ML-II posterior density of b is close to the posterior q∗ (b | h0)

and to the empirical Bayes estimator bEB (β | h0).Using either the individual block resampling bootstrap or the multivariate t-distributions yields

essentially the same standard errors for both the 2S and 3S models.17 The table also reports thenumerical standard error (nse), often referred to as the Monte Carlo error, for the Full Bayesianestimator and the 3S with multivariate t-distributions. The nse is equal to the difference between themean of the sampled values and the true posterior mean. As a rule of thumb, as many simulationsas necessary should be conducted to ensure that the Monte Carlo error of each parameter of interestis less than approximately 10% of the sample standard error (see Brooks and Gelman (1998); Gilkset al. (1997); Koop (2003)). As shown in the table, the estimated nse for the 2S and 3S withmultivariate t-distributions and the Full Bayesian estimators easily satisfy this criterion. Moreover,Table A1 in the supplementary appendix underlines the very good behavior of the 3S bootstrapcompared to FGLS for several values of N and T , and for ρ = 0.3 and ρ = 0.8.

5.2.2. The Mundlak-type fixed effects world

In the fixed effects world, we allow the individual effects µ and the covariates X to be correlated.This is usually accounted for through a Mundlak-type (see Mundlak (1978)) or a Chamberlain-typespecification (see Chamberlain (1982)). For the Mundlak-type specification, the individual effectsare defined as: µ = (Z ′µX/T )π+$, $ ∼ N(0, σ2

$IN ) where π is a (K1 × 1) vector of parameters to

be estimated. The model can be rewritten as y = Xβ + PXπ +Zµ$+ u, where P =(IN ⊗ JT

T

)is

the between-transformation (see Baltagi (2013)). We can concatenate [X,PX] into a single matrixof observables and let Wb ≡ Zµ$.

For the Mundlak world, we compare the standard FGLS estimator on the transformed model toour robust 2S, 3S and FB estimators of that same specification. In our case, as µi = x2,iπ+ νi, thetransformed model is given by: y = x1,1β1,1 +x1,2β1,2 +x2β2 +Px2π+Zµν+u. In this specification,X = [x1,1, x1,2, x2, Px2], W = Zµ and b = ν.

The simulation results are reported in Table 2. As with the fixed effects model, all the estimatorsyield essentially the same parameter estimates, standard errors and RMSEs, except for the 2S t-distestimator. Indeed, relative to the 3S t-dist estimator, the former overestimates the standard errorsof the slope parameters as well as the variance of the individual effects, σ2

µ. This result is also foundwhen investigating the Chamberlain-type fixed effects world and the Hausman-Taylor world (seeTables 3 and 4).

Finally, note that the nse statistics underlines the accuracy of the posterior estimates of the 3St-dist and FB estimators. Furthermore, increasing the size of N and/or T (N = 500, T = 5) or(N = 100, T = 10) leads to very stable results (see Table A2 in the supplementary appendix).

17Recall that we use only BR = 20 individual block bootstrap resamples. Fortunately, the results are very robustto the value of BR. For instance, increasing BR from 20 to 200 in the random effects world increases the computationtime tenfold but yields the same results.

17

5.2.3. The Chamberlain-type fixed effects world

For the Chamberlain-type specification, the individual effects are given by µ = XΠ +$, whereX is a (N × TK1) matrix with Xi = (X ′i1, ..., X

′iT ) and Π = (π′1, ..., π

′T )′ is a (TK1 × 1) vector.

Here πt is a (K1 × 1) vector of parameters to be estimated. The model can be rewritten as:y = Xβ +ZµXΠ +Zµ$+ u. We can concatenate [X,ZµX] into a single matrix of observables andlet Wb ≡ Zµ$.

For the Chamberlain world, we compare the Minimum Chi-Square (MCS) estimator (see Cham-berlain (1982); Hsiao (2003); Baltagi et al. (2009)) to our estimators. These are based on the

transformed model: yit = x1,1,itβ1,1 +x1,2,itβ1,2 +x2,itβ2 +∑Tt=1 x2,itπt + νi +uit or y = x1,1β1,1 +

x1,2β1,2 + x2β2 + x2Π +Zµν + u. In that specification, X =[x1,1, x1,2, x2, x2

], W = Zµ and b = ν.

Table 3 once again shows that the results of the 3S bootstrap, the 3S t-dist and the FB arevery close to those of the classical MCS estimator. This is true of the slope coefficients β11, β12,β2, the πt coefficients for t = 1, ..., 5 (not reported), their standard errors and their RMSEs’, aswell as of the two variances σ2

ε and σ2µ. Increasing the size of N and/or T (N = 500, T = 5) or

(N = 100, T = 10) does not change the results qualitatively, if anything it improves marginallythe 3S estimator with individual block resampling bootstrap (see Table A2 in the supplementaryappendix).18

5.2.4. The Hausman-Taylor world

The Hausman-Taylor model (henceforth HT, see Hausman and Taylor (1981)) posits that y =Xβ + Zη + Zµµ + u, where Z is a vector of time-invariant variables, and that subsets of X (e.g.,X ′2,i) and Z (e.g., Z ′2i) may be correlated with the individual effects µ, but leave the correlationsunspecified. Hausman and Taylor (1981) proposed a two-step IV estimator. For our general model(2): y = Xβ +Wb+ u, we assume that (X ′2,i, Z

′2i and µi) are jointly normally distributed: µi(

X ′2,iZ ′2i

) ∼ N 0(

EX′2EZ′2

) ,

(Σ11 Σ12

Σ21 Σ22

) ,

where X ′2,i is the individual mean of X ′2,it. The conditional distribution of µi | X ′2,i, Z ′2i is given by:

µi | X ′2,i, Z′2i ∼ N

(Σ12Σ−1

22 .

(X ′2,i − EX′2Z ′2i − EZ′2

),Σ11 − Σ12Σ−1

22 Σ21

).

Since we do not know the elements of the variance-covariance matrix Σjk, we can write:

µi =(X ′2,i − EX′2

)θX +

(Z ′2i − EZ′2

)θZ +$i,

where $i ∼ N(0,Σ11 − Σ12Σ−1

22 Σ21

)is uncorrelated with uit, and where θX and θZ are vectors of

parameters to be estimated. In order to identify the coefficient vector of Z ′2i and to avoid possiblecollinearity problems, we assume that the individual effects are given by:

µi =(X ′2,i − EX′2

)θX + f

[(X ′2,i − EX′2

)�(Z ′2i − EZ′2

)]θZ +$i, (26)

18We only report the results for MCS and the 3S with individual block resampling bootstrap for the sake of brevity.

18

where � is the Hadamard product and f[(X ′2,i − EX′2

)�(Z ′2i − EZ′2

)]can be a nonlinear function

of(X ′2,i − EX′2

)�(Z ′2i − EZ′2

). The first term on the right-hand side of equation (26) corresponds

to the Mundlak transformation while the middle term captures the correlation between Z ′2i and µi.The individual effects, µ, are a function of PX and (f [PX � Z]), i.e., a function of the column-by-column Hadamard product of PX and Z. We can once again concatenate [X,PX, f [PX � Z]]into a single matrix of observables and let Wb ≡ Zµ$.

For our model, yit = x1,1,itβ1,1 + x1,2,itβ1,2 + x2,itβ2 + Z1,iη1 + Z2,iη2 + µi + uit or y = X1β1+x2β2 + Z1η1 + Z2η2 + Zµµ+ u. Then, we assume that

µi = (x2,i − Ex2) θX + f [(x2,i − Ex2)� (Z2i − EZ2)] θZ + νi. (27)

We propose adopting the following strategy: If the correlation between µi and Z2i is quite large(> 0.2), use f [.] = (x2,i − Ex2)

2� (Z2i − EZ2)s

with s = 1. If the correlation is weak, set s = 2. Inreal-world applications, we do not know the correlation between µi and Z2i a priori. We can use a

proxy of µi defined by the OLS estimation of µ: µ =(Z ′µZµ

)−1Z ′µy where y are the fitted values

of the pooling regression y = X1β1+ x2β2 + Z1η1 + Z2η2 + ζ. Then, we compute the correlationbetween µ and Z2. In our simulation study, it turns out the correlations between µ and Z2 arelarge: 0.97 and 0.70 when ρ = 0.8, and ρ = 0.3, respectively. Hence, we choose s = 1. In thisspecification, X = [x1,1, x1,2, x2, Z1, Z2, Px2, f [Px2 � Z2]], W = Zµ and b = ν.

Table 4 compares results of the classical IV estimator to those of the robust hierarchical (2Sand 3S) and Full Bayesian estimators. Once again, the results are very close to one another. Thisis true for the slope coefficients β11, β12 of the exogenous time-varying variables x1,1,it, xi,1,2,it, forthe slope coefficients β2 of the endogenous time-varying variable xi,2,it, for the coefficient η1 of thetime-invariant variable Z1,i (uncorrelated with µi) and for the remainder variance σ2

ε . On the otherhand, the 3S bootstrap, the 3S t-dist and the FB estimators yield slightly upward biased estimates ofη2, the coefficient associated with the time-invariant variable Z2,i which is itself correlated with µi.While the biases are relatively small (4.53%, 4.54% and 5.99%, respectively), they are insensitive tothe sample size, N , but tend to taper off with T (see Table A3 in the supplementary appendix).19

Interestingly, the standard errors of that same coefficient are considerably smaller when using theBayesian estimators, irrespective of N and T . Consequently, the RMSEs of the Bayesian estimatorsare smaller and the 95% confidence intervals are narrower and entirely nested within those obtainedwith the IV procedure of Hausman-Taylor. The slight bias is thus entirely offset by increasedprecision.

Finally, note that both the IV and the Bayesian estimators yield biased estimates of σ2µ. While

the former overestimates the true value and the latter does the opposite, only the IV estimatorbehaves better when N and/or T increase.

It must be noted that the full Bayesian (FB) estimator sometimes yield smaller RMSEs for β11,β12 and β2 than the classical (FGLS, Mundlak, MCS, IV-HT) estimators and the 3S estimators inthe Chamberlain world and more specifically in the Hausman-Taylor world (see Tables 3 and 4).Since N(= 100) is relatively small, the estimated standard errors are likely underestimated. AsN gets larger (e.g., N = 500, see Table A4 in the supplementary appendix), the coefficients, SEsand RMSEs of the FB become similar to those of the MCS, IV and 3S bootstrap estimators. Note

19Table A3 only reports the results for the IV procedure and the 3S with individual block resampling bootstrapto save space.

19

that the computational burden of the full Bayesian method is measured in hours and even dayscompared to the ε-contamination approach.20

5.2.5. Sensitivity to ε-contamination values

Table 5 investigates the sensitivity of the Hausman-Taylor estimator 21 with respect to ε, thecontamination part of prior distributions, which varies between 0 and 90%. As shown, most param-eter estimates are relatively insensitive to ε. The only noteworthy change concerns the estimatedvalue of λµ in the 2S bootstrap estimator. It more or less corresponds to (1 − ε). This particular

relation may occur whenever h/(h+ 1

)= h0/ (h0 + 1) and R2

b0/(1−R2

b0

)= R2

bq/(

1−R2bq

)(see

the definition of λb,h0 in section 3.2). For the 3S bootstrap, the estimated standard errors of β2

are much larger than those obtained from usual HT method. But for larger N or larger T , thisdoes not occur (see Table A3 and A4 in the supplementary appendix). The observed stability ofthe coefficients estimates stems from the fact that the base prior is not consistent with the data asthe weight λβ → 0. The ML-II posterior mean of β is thus close to the posterior q∗ (β | g0) and to

the empirical Bayes estimator βEB (µ | g0). Hence, the numerical value of the ε-contamination, forε 6= 0, does not seem to play an important role in our simulated worlds.

As suggested by a referee, we conducted further simulations with ε = 0. Table 5 reportsthe results for the Hausman-Taylor model.22 Under the null, H0 : ε = 0, it follows that theweights λβ,g0 = 1 and λb,h0

= 1 so that the restricted ML-II estimator of β is given by βrest =

β∗(b | g0). Under H1 : ε 6= 0 the unrestricted estimator is βun

(≡ βML−II

)= λβ,g0β∗(b | g0) +(

1− λβ,g0)βEB (b | g0) . The restricted ML-II estimator β∗ (b | g0) is the Bayes estimator under

the base prior g0. It differs from the full Bayesian estimator (FB) (described in the supplementaryappendix) in that the latter is based on other priors. The use of Zellner’s g-priors leads to morediffuse priors than those of the FB. In other words, the FB is likely more sensitive to the choiceof priors than our ML-II estimator even when ε = 0. The full Bayesian estimator (FB) doesnot guarantee the posterior robustness of our ML-II estimator. Additionally, the restricted ML-II estimator (ε = 0) constrains the model to rely exclusively on a base elicited prior which isimplicitly assumed error-free. This is a strong assumption. Applying Theorem 1 of Magnus and

Durbin (1999), we can derive the conditions under which MSE(βrest

)−MSE

(βun

)is positive

or negative semidefinite (see the supplementary appendix).23 As shown in Table 5 for ε = 0,

20Recall that in our simulation study, we only use 1, 000 draws and 500 burn-in draws for the full Bayesian estimator(with N = 500, T = 5 and 1, 000 replications). Had we used more draws (as is typically done in MCMC analyses),for instance 10, 000 draws and 5, 000 burn-in draws, then the computing time would have been approximately 4 days.The computation time of our 3S bootstrap estimators would have taken only 3 hours.

21This exercise could be conducted for the other worlds (RE, FE) but we only report the results for the Hausman-Taylor world for the sake of brevity.

22Note that when ε = 0, the unconditional predictive density corresponding to the base prior, m (y | π0, β) , dependson a Gaussian hypergeometric function which can not be numerically evaluated with the Laplace approximation norwith adaptive quadrature methods if Zellner’s g-priors are too large (e.g., g0 > 1).

23The difference of MSE is given by:

MSE(βrest

)−MSE

(βun

)= QQ′ −D.

HenceMSE

(βrest

)≤ (resp.≥) MSE

(βun

)iff Q′D−Q ≤ (resp.≥) 1,

20

the β’s (of the 2S and 3S bootstrap) are slightly downward-biased, and even more so for the 2Sbootstrap η’s and the 3S bootstrap η1. Moreover, the estimated variance of the specific effects,σ2µ, are downward biased in both the 2S bootstrap (σ2

µ = 3.53) and the 3S bootstrap (σ2µ = 3.38)

estimators. Irrespective of the value of ε, the 3S bootstrap standard errors of β11 and β12 are closeto those of the Hausman-Taylor IV estimator. However, when ε = 0 the 3S bootstrap standarderrors of η2 is clearly downward-biased compared to those of the 2S and the 3S bootstrap estimatorswhen ε 6= 0. This also holds true for the RMSE of this coefficient.

The restricted ML-II estimator (ε = 0) depends on the value of Zellner’s g-priors, g0 (andh0 for b). Whatever the value of ε, we have chosen g0 and h0 so that the prior is as diffuse aspossible. A good strategy is to follow the unit information prior (UIP) principle suggested by Kassand Wasserman (1995) which corresponds to g0 = h0 = 1/NT . This leads to Bayes factors thatbehave like the Bayesian Information Criterion (BIC). In that case, the difference in the MSEs of theparameter estimates computed with ε ∈ ]0, 1[ and those with ε = 0 could be small since in both casesthe elicited prior is diffuse. From Table 5, and Tables A4 and A5 in the supplementary appendix,the results differ from the full Bayesian method. In Table A5, we investigate the sensitivity of the 3Sbootstrap estimator when both ε and Zellner’s g-priors change. When ε = 0.5, increasing Zellner’sg-priors from g0 = h0 = 1/NT to g0 = h0 = 0.1 has little impact on the results. On the otherhand, when ε = 0 (with g0 = h0 = 1/NT ), the β’s and η1 are biased and the estimated variance ofthe specific effects, σ2

µ, is downward-biased (σ2µ = 3.38) as indicated previously. Increasing Zellner’s

g-priors (g0 = h0 = 0.1) when ε = 0, worsens the biases on all the parameter estimates, andespecially for the two η’s and σ2

µ (3.30 instead of 4). Moreover, the RMSEs of all the coefficientsincrease significantly, thus emphasizing the differences in the MSEs when we move from ε = 0 toε 6= 0.

Table 5 reports the results for ε = 0.1, 0.5, 0.9. The RMSEs are almost the same across the 3Sbootstrap specifications. Recall that this estimator is data driven and implicitly adjusts the weightsto the different values of ε-contamination. This may by why, even though the choice of ε = 0.5 issomewhat arbitrary, the adjustment compensates for it not being optimal (see Berger (1985)). Apossible guide for the choice of parameters of the 3S estimator is as follows:

• First, choose g0 and h0 so that the prior is as diffuse as possible. A good approach is to followthe unit information prior (UIP) principle suggested by Kass and Wasserman (1995) whichcorresponds to g0 = h0 = 1/NT .

• Second, choose any ε value lying in ε ∈ ]0, 1[.

with

with Q =(1− λβ,g0

)(β∗(b | g0)− βEB (b | g0))

and D =(1− λβ,g0

)[V ar [q∗ (β | g0)]− V ar [π∗0 (β | g0)]]

+λβ,g0(1− λβ,g0

)(β∗(b | g0)− βEB (b | g0)) (β∗(b | g0)− βEB (b | g0))′ ,

where D− denotes the generalized inverse of D. If MSE(βrest

)and MSE

(βun

)are two positive semidefinite

matrices, the notationMSE(βrest

)≤MSE

(βun

)means thatMSE

(βun

)−MSE

(βrest

)is positive semidefinite.

In contrast, the difference of MSE between our ML-II estimator and the FB cannot be defined analytically but onlybe numerically evaluated with our simulation studies.

21

5.2.6. Departure from normality

Table 6 investigates the robustness of the estimators to a non-normal framework. The remainderdisturbances, uit, are now assumed to follow a right-skewed t-distribution instead of the usualnormal distribution. The parameters of the right-skewed t-distribution are mean = 0, degrees offreedom = 3, and shape = 2 (see Fernandez and Steel (1998)). Our 3S bootstrap estimator yieldssimilar slope parameter estimates to those of the classical estimators, i.e. Mundlak FGLS, theMCS or the IV. The main difference concerns the estimates of the remainder variance σ2

u and theindividual effects variance σ2

µ. Indeed, in the Mundlak-type fixed effects world, the estimate of σ2u

computed from the 3S bootstrap is closer to the theoretical value than that of the FGLS estimate.In the Chamberlain-type fixed effects world, the estimate of σ2

µ (resp. σ2u) computed from the 3S

bootstrap estimator is closer (resp. is further) to the theoretical value than that of the standardMCS estimator. Finally, the 3S bootstrap estimates of σ2

µ and σ2u in the Hausman-Taylor world are

both closer to their true values than those of the HT estimator.Yet, the more interesting result concerns the precision and the bias of the estimate of η2 in the

Hausman-Taylor world. Notice first that the standard error of the 3S bootstrap estimate is onceagain much smaller than what obtains using the standard HT estimator (0.2039 versus 0.3478).Consequently, the 95% confidence interval of the 3S bootstrap estimator is much narrower andentirely nested within the one obtained with the IV procedure of Hausman-Taylor ([0.6189; 1.418]versus [0.3180; 1.6814]). Second, it is worth emphasizing that in this non-Gaussian framework, thebias of η2 has decreased significantly (1.88% versus 4.53%) and that the estimate of λβ , while stillrelatively small, is now more important.

Following the suggestions of the referees, we further investigated the consequences of relaxingthe normality of the remainder terms. We first simulated the model using a t-distribution thatexhibited more right-skewness, i.e. with mean zero, five degrees of freedom and shape param-eter of 3. The simulation results show that increasing the skewness amplifies the gains of ourε-contamination methodology over the standard Hausman and Taylor (1981) estimator (see TableA6 of the supplementary appendix). Second, we assumed instead that the remainder terms followeda χ2 distribution with 2 degrees of freedom. The results in Table A7 of the supplementary appendixclearly indicate that our approach is preferable to the Hausman-Taylor estimator.

5.2.7. Sensitivity to endogenous covariates

As shown above, our robust 3S bootstrap estimator exhibits interesting properties in Gaussianand non-Gaussian frameworks. As a final check on its properties, it is worth investigating itsrelative behavior in the context of endogenous covariates.24 We focus on the Hausman-Taylor andthe Chamberlain worlds with N = 100 and T = 5. For the Hausman-Taylor world, we assume that

x2,it = 0.7x2,it−1 + µi + ϑit with ϑit ≡ uit and uit ∼ N(0, 8),

as opposed to assuming ϑit ∼ U(−2, 2) above. In addition to being correlated with µi and Z2,i,x2,it is now an endogenous covariate since it is correlated with yit through uit. We have chosena large variance for uit so as to amplify the potential impact of the endogeneity problem on theresults.

For the Chamberlain world, we assume that

x2,it = ρx2x2,it−1 + uit , uit ∼ N(0, σ2

u), σ2u = 8 and ρx2

= 0.7.

24We are grateful to Richard Blundell for this suggestion and for helpful comments.

22

The covariance between x2,it and uit is given by E [x2,ituit] ' σ2u

∑t−1j=0 ρ

2jx2

= σ2u

(1− ρ2t

x2

)/(1− ρ2

x2

).

It ranges from σ2u to approximately σ2

u/(1− ρ2

x2

)' 1.96σ2

u as t goes from 1 to T . The specific

effect µi =∑Tt=1 x2,i,tπt + νt '

∑Tt=1

(∑t−1j=0 ρ

jx2uit−j

)πt + νt is also correlated with uit and the

covariance is given by E [µiuit] = σ2u

∑Tt=1 πt = σ2

u

∑Tt=1 (0.8)

T−t= 5σ2

u

(1− 0.8T

).

Table 7 presents the results for the Chamberlain world in the upper panel and those of theHausman-Taylor world in the bottom one. As for the Chamberlain world, the endogeneity has littleif any impact on both the MCS and the robust Bayesian estimators. The parameter estimates arevery close to those found in Table 3. The correlation between x2,it and uit increases the standarderror (and the RMSE) of x2,it but less so for the other parameter estimates. Note that the Bayesianestimates of σ2

µ are closer to the true value than what obtains from the MCS estimator.In the Hausman-Taylor world, as opposed to the Chamberlain world, the classical IV estimator

is strongly affected by the endogeneity problem. Oddly, the estimate of the coefficient associatedwith x2,it and its standard error are little impacted. Rather, the estimates of the intercept andof Z2i lose all statistical significance. Likewise, the estimated variance of the specific effects σ2

µ

is strongly upward biased (+34.67%) while the estimated variance of the remainder terms σ2u is

unbiased. Lastly, the standard errors of the exogenous variables x1,1,it and x1,2,it are now twiceas large as those found in Table 4 but nevertheless lead to coefficients which are still statisticallysignificant.

The robust estimators (3S bootstrap and 3S t-dist) and the full Bayesian estimator (FB) behavebetter, especially in terms of precision. The standard errors of the intercept and the coefficient ofZ2i are smaller and lead to coefficients that are statistically significant.25 The slope parametersare all slightly biased (except β2), and in particular the one associated with Z2i. Nevertheless,the null hypothesis that β1,1 = β1,2 = β2 = η1 = η2 = 1 cannot be rejected. Furthermore, asthe standard errors of these coefficients are relatively small, the RMSEs are quite acceptable andoften smaller than those of the usual IV estimator. In fact, the IV estimator only outperformsthe Bayesian estimators in that it provides a slightly less biased estimate of the variance of thespecific effects, σ2

µ. The relative (negative) biases of the 3S bootstrap, 3S t-dist and FB amountto 46.34%, 48.11%, 47.04%, respectively. This is somewhat more than the (positive) bias of theHausman-Taylor IV estimator which amounts to 35%. But more interestingly, the intercept η1 andthe slope parameter η2 of the time-invariant variable Z2i are strongly significant as compared tothe classical IV estimator. Our 3S robust Bayesian estimator is thus more robust to endogeneityproblems than the Hausman-Taylor estimator.

One may wonder why Chamberlain’s MCS estimator is more robust to an endogeneity problemthan the Hausman-Taylor IV estimator. Recall that the MCS, sometimes called the optimal mini-mum distance estimator, is qualitatively similar to the GMM estimator and whose properties leadto more efficient estimates than the 2SLS instrumental variable of the Hausman-Taylor estimator.Notwithstanding this, our robust 3S estimators are well-behaved in the face of endogeneity problemsas evidenced above.

25The values of the coefficients c and d of the generalized hyper-g priors appear to play an important role in theprecision of the intercept. Recall that we have chosen c = d = 1 for the Beta-prime distribution to increase theprecision. With c = 0.1 and d = 1, the intercept is not significantly different from zero.

23

6. Conclusion

To our knowledge, our paper is the first to analyze the static linear panel data model using an ε-contamination approach within a two-stage and three-stage hierarchical approach. The main benefitof this approach is its ability to extract more information from the data than the classical Bayesestimator with a single base prior. In addition, we show that our approach encompasses a variety ofclassical or frequentist specifications such as random effects, Mundlak-type and Chamberlain-typefixed effects and Hausman-Taylor models. The frequentist approach, on the other hand, requiresseparate estimators for each model, respectively FGLS, MCS and IV.

Following Chaturvedi and Singh (2012), we estimate the Type-II maximum likelihood (ML-II)posterior distribution of the slope coefficients and the individual effects using a two-step procedure.If the base prior is consistent with the data, the ML-II posterior density gives more weight to theconditional posterior density derived from the elicited prior. Conversely, if the base prior is notconsistent with the data, the ML-II posterior density is closer to the conditional posterior densityderived from the ε-contaminated prior.

The finite sample performance of the two-stage and three-stage hierarchical models are investi-gated using extensive Monte Carlo experiments. The experimental design includes a random effectsworld, a Mundlak-type fixed effects world, a Chamberlain-type fixed effects world and a Hausman-Taylor-type world. The simulation results underscore the relatively superior performance of thethree-stage hierarchy estimator, irrespective of the data generating process considered. The biasesand the RMSEs are close and often smaller than those of the conventional (classical) estimators.In the two-stage hierarchy, their exists a trade-off between the Bayes estimators and the empiricalBayes estimators. In the three-stage hierarchy, this trade-off vanishes and only the empirical Bayesestimator matters in the estimation of the coefficients and the individual effects.

We also investigate the sensitivity of the estimators to the contamination part of the priordistribution. It turns out most parameter estimates are relatively stable. Next we investigatethe robustness of the estimators when the remainder disturbances are assumed to follow either aright-skewed t-distribution or a χ2 distribution. Our robust estimators globally outperform theclassical estimators both in terms of precision and bias. Finally, we investigate the consequencesof introducing an endogenous covariate. Our results show that both the MCS and our robust 3Sestimators perform well. On the other hand, our 3S estimator clearly outperforms the Hausman-Taylor IV estimator both in terms of precision and bias.

The robust Bayesian approach we propose is arguably a relevant all-in-one panel data framework.In future work we intend to broaden its scope by addressing issues such as heteroskedasticity,autocorrelation of residuals, general IV, dynamic and spatial models.

References

Abramovitz, M., Stegun, I.A., 1970. Handbook of Mathematical Functions. Dover Publications,Inc. New York.

Andersson, M., Karlsson, S., 2001. Bootstrapping error component models. Computational Statis-tics 16, 221–231.

Appell, P., 1882. Sur les fonctions hypergeometriques de deux variables. Journal de MathematiquesPures et Appliquees 3, 173–216.

Baltagi, B.H., 2013. Econometric Analysis of Panel Data. Fifth ed., Wiley, Chichester, UK.

24

Baltagi, B.H., Bresson, G., 2012. A robust Hausman-Taylor estimator, in: Baltagi, B.H., Hill, C.,Newey, W., White, H. (Eds.), Advances in Econometrics: Essays in Honor of Jerry Hausman.Emerald Group Publishing Limited. volume 29, pp. 175–214.

Baltagi, B.H., Bresson, G., Pirotte, A., 2003. Fixed effects, random effects or Hausman-Taylor? apretest estimator. Economics Letters 79, 361–369.

Baltagi, B.H., Bresson, G., Pirotte, A., 2009. Testing the fixed effects restrictions? A Monte Carlostudy of chamberlain’s minimum chi-square test. Statistics and Probability Letters 79, 1358–1362.

Bellman, L., Breitung, J., Wagner, J., 1989. Bias correction and bootstrapping of error componentmodels for panel data: Theory and applications. Empirical Economics 14, 329–342.

Berger, J., 1985. Statistical Decision Theory and Bayesian Analysis. Springer, New York.

Berger, J., Berliner, M., 1984. Bayesian input in Stein estimation and a new minimax empiricalBayes estimator. Journal of Econometrics 25, 87–108.

Berger, J., Berliner, M., 1986. Robust Bayes and empirical Bayes analysis with ε-contaminatedpriors. Annals of Statistics 14, 461–486.

Bresson, G., Hsiao, C., 2011. A functional connectivity approach for modeling cross-sectionaldependence with an application to the estimation of hedonic housing prices in Paris. Advancesin Statistical Analysis 95, 501–529.

Brooks, S.P., Gelman, A., 1998. Alternative methods for monitoring convergence of iterative sim-ulations. Journal of Computational and Graphical Statistics 7, 434–455.

Chamberlain, G., 1982. Multivariate regression models for panel data. Journal of Econometrics 18,5–46.

Chaturvedi, A., 1996. Robust Bayesian analysis of the linear regression. Journal of StatisticalPlanning and Inference 50, 175–186.

Chaturvedi, A., Singh, A., 2012. Robust Bayesian analysis of autoregressive fixed effects panel datamodel. Working Paper, Department of Statistics, University of Allahabad.

Chib, S., 1995. Marginal likelihood from the Gibbs sampler. Journal of the American StatisticalAssociation 90, 1313–1321.

Chib, S., 2008. Panel data modeling and inference: a Bayesian primer, in: Matyas, L., Sevestre, P.(Eds.), The Handbook of Panel Data, Fundamentals and Recent Developments in Theory andPractice, pp. 479–516.

Chib, S., Carlin, B.P., 1999. On MCMC sampling in hierarchical longitudinal models. Statisticsand Computing 9, 17–26.

Davis, P.J., Rabinowitz, P., 1984. Methods of Numerical Integration. 2nd ed., Academic Press,New York.

Dempster, A.P., 1977. Examples relevant to the robustness of applied inference, in: Gupta, S.S.,Moore, D.S. (Eds.), Statistical Decision Theory and Related Topics II. Academic Press, NewYork, pp. 121–138.

25

Fernandez, C., Ley, E., Steel, M.F.J., 2001. Benchmark priors for Bayesian model averaging. Journalof Econometrics 100, 381–427.

Fernandez, C., Steel, M.F.J., 1998. On Bayesian modeling of fat tails and skewness. Journal of TheAmerican Statistical Association 93, 359–371.

Gilks, W.R., Richardson, S., Spiegelhalter, D.J., 1997. Markov Chain Monte Carlo in Practice. 2nded., Chapman & Hall, London, UK.

Good, I.J., 1965. The Estimation of Probabilities. MIT Press, Cambridge, MA.

Greenberg, E., 2008. Introduction to Bayesian Econometrics. Cambridge University Press, Cam-bridge, UK.

Hausman, J.A., Taylor, W.E., 1981. Panel data and unobservable individual effects. Econometrica49, 1377–1398.

Hill, B., 1980. Invariance and robustness of the posterior distribution of characteristics of a finitepopulation with reference to contingency tables and the sampling of species, in: Zellner, A.(Ed.), Bayesian Analysis in Econometrics and Statistics, Essays in Honor of Harold Jeffreys.North-Holland, Amsterdam, pp. 383–391.

Hsiao, C., 2003. Analysis of Panel Data. Second ed., Cambridge University Press, Cambridge, MA.

Hsiao, C., Pesaran, M.H., 2008. Random coefficient models, in: The Handbook of Panel Data,Fundamentals and Recent Developments in Theory and Practice. Matyas, L. and Sevestre, P.,pp. 185–214.

Kapetanios, G., 2008. A bootstrap procedure for panel datasets with many cross-sectional units.Econometrics Journal 11, 377–395.

Kass, R.E., Wasserman, L., 1995. A reference Bayesian test for nested hypotheses and its relation-ship to the Schwarz criterion. Journal of the American Statistical Association 90, 928–934.

Koop, G., 2003. Bayesian Econometrics. Wiley, New York.

Laird, N.M., Louis, T.A., 1987. Empirical Bayes confidence intervals based on bootstrap samples.Journal of the American Statistical Association, 82, 739–750.

Laird, N.M., Ware, J.H., 1982. Random-effects models for longitudinal data. Biometrics 38, 963–74.

Liang, F., Paulo, R., Molina, G., Clyde, M.A., Berger, J.O., 2008. Mixtures of g priors for Bayesianvariable selection. Journal of the American Statistical Association 103, 410–423.

Lindley, D., Smith, A., 1972. Bayes estimates for the linear model. Journal of the Royal StatisticalSociety, Series B (Methodological), 34, 1–41.

Magnus, J.R., Durbin, J., 1999. Estimation of regression coefficients of interest when other regres-sion coefficients are of no interest. Econometrica 67, 639–643.

Maruyama, Y., George, E.I., 2011. Fully Bayes factors with a generalized g-prior. Annals ofStatistics 39, 2740–2765.

26

Maruyama, Y., George, E.I., 2014. Posterior odds with a generalized hyper-g prior. EconometricReviews 33, 251–269.

Moreno, E., Pericchi, L.R., 1993. Bayesian robustness for hierarchical ε-contamination models.Journal of Statistical Planning and Inference 37, 159–167.

Mundlak, Y., 1978. On the pooling of time series and cross-section data. Econometrica 46, 69–85.

Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 2007. Numerical Recipes: The Artof Scientific Computing. 3rd ed., Cambridge University Press, New York.

Rendon, S.R., 2013. Fixed and random effects in classical and Bayesian regression. Oxford Bulletinof Economics and Statistics, 75, 460–476.

Robert, C.P., 2007. The Bayesian Choice. From Decision-Theoretic Foundations to ComputationalImplementation. 2nd ed., Springer, New York, USA.

Rubin, H., 1977. Robust Bayesian estimation, in: Gupta, S.S., Moore, D.S. (Eds.), StatisticalDecision Theory and Related Topics II. Academic Press, New York, pp. 351–356.

Sinha, P., Jayaraman, P., 2010a. Bayes reliability measures of lognormal and inverse Gaussiandistributions under ML-II ε-contaminated class of prior distributions. Defence Science Journal60, 442–450.

Sinha, P., Jayaraman, P., 2010b. Robustness of Bayes decisions for normal and lognormal distri-butions under hierarchical priors. MPRA Working Paper No. 22416. Faculty of ManagementStudies, University of Delhi.

Slater, L.J., 1966. Generalized Hypergeometric Functions. Cambridge University Press, Cambridge,UK.

Smith, A., 1973. A general Bayesian linear model. Journal of the Royal Statistical Society, SeriesB (Methodological) 35, 67–75.

White, H., 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test forheteroskedasticity. Econometrica 48, 817–838.

Zellner, A., 1986. On assessing prior distributions and Bayesian regression analysis with g-priordistribution, in: Bayesian Inference and Decision Techniques: Essays in Honor of Bruno deFinetti, Studies in Bayesian Econometrics. North-Holland, Amsterdam. volume 6, pp. 389–399.

Zheng, Y., Zhu, J., Li, D., 2008. Analyzing spatial panel data of cigarette demand: a Bayesianhierarchical modeling approach. Journal of Data Science 6, 467–489.

27

Tab

le1:

Ran

dom

Eff

ects

Worl

d(F

GL

S,

Rob

ust

2S

,3S

an

dF

ull

Bay

esia

n)

N=

100,T

=5,ρ

=0.8

,ε

=0.5

,R

eplica

tion

s=1,0

00

β11

β12

β2

σ2 u

σ2 µ

λβ

λµ

Com

pu

tati

on

Tim

e(s

ecs.

)T

rue

11

11

4F

GL

Sco

ef0.9

986

0.9

992

1.0

005

1.0

023

3.9

701

47.7

03

se0.0

337

0.0

334

0.0

341

rmse

0.0

340

0.0

342

0.0

333

2Sb

oot

stra

pco

ef0.9

977

0.9

983

0.9

997

0.9

975

3.9

405

0.0

643

0.3

574

351.2

75

se0.0

336

0.0

333

0.0

344

rmse

0.0

352

0.0

351

0.0

350

3Sb

oot

stra

pco

ef0.9

972

0.9

982

0.9

994

0.9

974

3.9

367

0.0

660

<10−

4558.1

13

se0.0

327

0.0

327

0.0

337

rmse

0.0

346

0.0

347

0.0

341

2St-

dis

t.co

ef0.9

915

0.9

990

1.0

010

0.9

962

4.0

150

0.0

580

0.3

708

1,9

80.5

79

se0.0

410

0.0

413

0.0

402

rmse

0.0

418

0.0

413

0.0

402

nse

0.0

015

0.0

015

0.0

015

3St-

dis

t.co

ef0.9

918

0.9

988

1.0

009

0.9

959

3.9

902

0.0

480

<10−

42,6

40.0

01

se0.0

373

0.0

375

0.0

377

rmse

0.0

382

0.0

375

0.0

376

nse

0.0

016

0.0

016

0.0

017

FB

coef

0.9

987

0.9

992

1.0

004

0.9

986

3.9

101

7,7

78.2

72

se0.0

340

0.0

342

0.0

333

rmse

0.0

341

0.0

342

0.0

333

nse

0.0

017

0.0

016

0.0

017

2S

boots

trap

(3S

boots

trap

):

2S

(or

3S

)w

ith

ind

ivid

ual

blo

ckre

sam

pli

ng

boots

trap

.

2S

(3S

)t-

dis

t:2S

(3S

)w

ith

mu

ltiv

ari

atet-

dis

trib

uti

on

s.F

B:

Fu

llB

ayes

ian

.

28

Tab

le2:

Mu

nd

lak-t

yp

eF

ixed

Eff

ects

Worl

d(F

GL

S,

Rob

ust

2S

,3S

an

dF

ull

Bay

esia

n)

N=

100,T

=5,ρ

=0.8

,ε

=0.5

,re

plica

tion

s=1,0

00

β11

β12

β2

πσ

2 uσ

2 µλβ

λµ

Com

pu

tati

on

Tim

e(s

ecs.

)T

rue

11

10.8

16.3

334

Mu

nd

lak

coef

1.0

014

1.0

010

1.0

012

0.7

992

0.9

981

6.3

431

51.1

84

se0.0

231

0.0

232

0.0

354

0.0

473

rmse

0.0

236

0.0

242

0.0

348

0.0

467

2Sb

oot

stra

pco

ef1.0

000

0.9

995

1.0

010

0.8

019

0.9

928

6.3

905

0.0

008

0.4

136

369.3

46

se0.0

289

0.0

293

0.0

344

0.0

482

rmse

0.0

310

0.0

311

0.0

353

0.0

487

3Sb

oot

stra

pco

ef1.0

002

1.0

005

1.0

010

0.8

017

0.9

918

6.3

690

0.0

006

<10−

4576.0

94

se0.0

261

0.0

263

0.0

345

0.0

463

rmse

0.0

271

0.0

289

0.0

356

0.0

473

2St-

dis

tco

ef0.9

953

1.0

003

1.0

011

0.8

376

0.9

917

7.7

819

0.0

090

0.4

050

2,0

40.1

13

se0.0

394

0.0

404

0.0

348

0.2

083

rmse

0.0

396

0.0

404

0.0

348

0.2

116

nse

0.0

014

0.0

014

0.0

015

0.0

042

3St-

dis

tco

ef0.9

967

1.0

004

1.0

011

0.8

030

0.9

918

6.4

045

0.0

060

<10−

42,7

96.0

30

se0.0

304

0.0

316

0.0

349

0.0

492

rmse

0.0

306

0.0

316

0.0

349

0.0

493

nse

0.0

015

0.0

015

0.0

015

0.0

030

FB

coef

1.0

014

1.0

011

1.0

012

0.7

992

0.9

938

6.3

182

8,4

16.1

63

se0.0

237

0.0

245

0.0

348

0.0

467

rmse

0.0

237

0.0

245

0.0

348

0.0

467

nse

0.0

012

0.0

013

0.0

017

0.0

025

2S

boots

trap

(3S

boots

trap

):

2S

(or

3S

)w

ith

ind

ivid

ual

blo

ckre

sam

plin

gb

oots

trap

.

2S

(3S

)t-

dis

t:2S

(3S

)w

ith

mu

ltiv

ari

atet-

dis

trib

uti

on

s.F

B:

Fu

llB

ayes

ian

.

29

Tab

le3:

Ch

amb

erla

in-t

yp

eF

ixed

Eff

ects

Worl

d(M

CS

,R

ob

ust

2S

,3S

an

dF

ull

Bay

esia

n)

N=

100,T

=5,ρ

=0.8

,ε

=0.5

,re

plica

tion

s=1,0

00

β11

β12

β2

σ2 u

σ2 µ

λβ

λµ

Com

pu

tati

on

Tim

e(s

ecs.

)T

rue

11

11

96.2

228

MC

Sco

ef1.0

025

0.9

989

1.0

018

1.0

038

94.6

900

17.0

07

se0.0

255

0.0

256

0.0

346

rmse

0.0

286

0.0

267

0.0

372

2Sb

oot

stra

pco

ef0.9

934

0.9

961

1.0

009

0.9

935

95.6

672

<10−

40.4

136

343.3

61

se0.0

301

0.0

304

0.0

345

rmse

0.0

337

0.0

299

0.0

368

3Sb

oot

stra

pco

ef0.9

946

0.9

959

1.0

015

0.9

955

95.2

535

<10−

4<

10−

4531.1

74

se0.0

267

0.0

268

0.0

345

rmse

0.0

296

0.0

276

0.0

367

2St-

dis

tco

ef0.9

832

0.9

935

1.0

011

0.9

941

108.1

559

<10−

40.4

628

2,1

44.2

38

se0.0

452

0.0

426

0.0

362

rmse

0.0

482

0.0

430

0.0

362

nse

0.0

015

0.0

015

0.0

015

3St-

dis

tco

ef0.9

891

0.9

955

1.0

011

0.9

947

96.6

751

<10−

4<

10−

42,7

94.7

58

se0.0

322

0.0

300

0.0

361

rmse

0.0

340

0.0

304

0.0

361

nse

0.0

015

0.0

015

0.0

016

FB

coef

1.0

007

0.9

996

1.0

015

0.9

970

96.1

373

8,7

97.5

42

se0.0

251

0.0

230

0.0

362

rmse

0.0

251

0.0

230

0.0

362

nse

0.0

013

0.0

013

0.0

019

2S

boots

trap

(3S

boots

trap

):

2S

(or

3S

)w

ith

ind

ivid

ual

blo

ckre

sam

plin

gb

oots

trap

.

2S

(3S

)t-

dis

t:2S

(3S

)w

ith

mu

ltiv

ari

atet-

dis

trib

uti

on

s.F

B:

Fu

llB

ayes

ian

.T

he

para

met

ersπt

are

om

itte

dfr

om

the

tab

le.

30

Tab

le4:

Hau

sman-T

aylo

rW

orl

d(I

V,

Rob

ust

2S

,3S

an

dF

ull

Bay

esia

n)

N=

100,T

=5,ρ

=0.8

,ε

=0.5

,re

plica

tion

s=1,0

00

β11

β12

β2

η 1η 2

σ2 u

σ2 µ

λβ

λµ

Com

pu

tati

on

Tim

e(s

ecs.

)T

rue

11

11

11

4H

Tco

ef0.

9976

0.9

981

1.0

037

1.0

173

0.9

958

0.9

924

4.2

904

3.6

37

se0.

0395

0.0

394

0.0

444

0.2

143

0.1

857

rmse

0.04

050.0

385

0.0

456

0.2

071

0.1

903

2Sb

oot

stra

pco

ef0.

9965

0.9

983

1.0

002

1.0

031

1.0

320

0.9

935

3.8

244

<10−

40.5

041

335.8

15

se0.

0529

0.0

506

0.0

624

0.2

056

0.1

029

rmse

0.05

300.0

506

0.0

624

0.2

057

0.1

078

3Sb

oot

stra

pco

ef0.

9905

0.9

921

1.0

002

1.0

136

1.0

453

0.9

947

3.7

420

<10−

4<

10−

4635.6

44

se0.

0375

0.0

356

0.0

628

0.2

051

0.0

799

rmse

0.03

870.0

365

0.0

628

0.2

055

0.0

919

2St-

dis

tco

ef0.

9938

0.9

997

1.0

003

1.0

471

0.9

582

0.9

925

7.4

181

<10−

40.3

320

2,5

07.2

05

se0.

0459

0.0

420

0.0

450

1.1

895

0.3

458

rmse

0.04

630.0

419

0.0

450

1.1

899

0.3

481

nse

0.00

160.0

016

0.0

019

0.0

209

0.0

095

3St-

dis

tco

ef0.

9886

0.9

920

1.0

004

1.0

172

1.0

454

0.9

937

3.7

171

<10−

4<

10−

43,2

97.5

87

se0.

0288

0.0

268

0.0

449

0.1

966

0.0

615

rmse

0.03

100.0

279

0.0

449

0.1

972

0.0

764

nse

0.00

140.0

014

0.0

020

0.0

071

0.0

045

FB

coef

0.98

640.9

875

1.0

004

1.0

173

1.0

599

0.9

954

3.6

260

8,5

64.8

63

se0.

0205

0.0

189

0.0

450

0.1

935

0.0

459

rmse

0.02

460.0

226

0.0

450

0.1

941

0.0

754

nse

0.00

130.0

013

0.0

022

0.0

054

0.0

033

2S

boots

trap

(3S

boots

trap

):

2S

(or

3S

)w

ith

ind

ivid

ual

blo

ckre

sam

plin

gb

oots

trap

.

2S

(3S

)t-

dis

t:2S

(3S

)w

ith

mu

ltiv

ari

atet-

dis

trib

uti

on

s.F

B:

Fu

llB

ayes

ian

.

31

Tab

le5:

Hau

sman-T

aylo

rW

orl

d,

Rob

ust

nes

stoε-

conta

min

ati

on

N=

100,T

=5,ρ

=0.8

,re

plica

tion

s=1,0

00

β11

β12

β2

η1

η2

σ2 u

σ2 µ

λβ

λµ

True

11

11

11

4

HT

coef

0.9976

0.9981

1.0037

1.0173

0.9958

0.9924

4.2904

se0.0395

0.0394

0.0444

0.2143

0.1857

rmse

0.0405

0.0385

0.0456

0.2071

0.1903

ε=

02Sbootstrapcoef

0.9835

0.9851

0.9983

0.7195

0.9423

0.9941

3.5298

11

se0.0524

0.0501

0.0623

0.3006

0.1043

rmse

0.0549

0.0522

0.0623

0.4111

0.1192

3Sbootstrapcoef

0.9854

0.9839

0.9977

0.9725

1.0085

0.9904

3.3834

11

se0.0355

0.0342

0.0626

0.1944

0.0761

rmse

0.0384

0.0378

0.0627

0.1963

0.0766

ε=

0.1

2Sbootstrapcoef

0.9965

0.9983

1.0002

1.0045

1.0320

0.9935

3.8240

<10−4

0.9008

se0.0529

0.0506

0.0624

0.2058

0.1029

rmse

0.0530

0.0506

0.0624

0.2058

0.1078

3Sbootstrapcoef

0.9905

0.9919

1.0003

1.0140

1.0453

0.9956

3.7408

<10−4

<10−4

se0.0376

0.0358

0.0629

0.2042

0.0797

rmse

0.0388

0.0367

0.0629

0.2046

0.0917

ε=

0.5

2Sbootstrapcoef

0.9965

0.9983

1.0002

1.0031

1.0320

0.9935

3.8244

<10−4

0.5041

se0.0529

0.0506

0.0624

0.2056

0.1029

rmse

0.0530

0.0506

0.0624

0.2057

0.1078

3Sbootstrapcoef

0.9905

0.9921

1.0002

1.0136

1.0453

0.9947

3.7420

<10−4

<10−4

se0.0375

0.0356

0.0628

0.2051

0.0799

rmse

0.0387

0.0365

0.0628

0.2055

0.0919

ε=

0.9

2Sbootstrapcoef

0.9965

0.9983

1.0002

1.0015

1.0320

0.9935

3.8249

<10−4

0.1079

se0.0529

0.0506

0.0624

0.2058

0.1029

rmse

0.0530

0.0506

0.0624

0.2058

0.1078

3Sbootstrapcoef

0.9905

0.9919

1.0003

1.0140

1.0453

0.9956

3.7408

<10−4

<10−4

se0.0376

0.0358

0.0629

0.2042

0.0797

rmse

0.0388

0.0367

0.0629

0.2046

0.0917

32

Tab

le6:

Dep

art

ure

from

Norm

ali

ty:

Th

eS

kew

edt-

dis

trib

uti

on

N=

100,T

=5,ε

=0.5

,ρ

=0.8

,re

plica

tion

s=1,0

00

Mundlak-typeFixed

EffectsW

orld†

β11

β12

β2

η1

η2

σ2 u

σ2 µ

λβ

λµ

True

11

15.0459

6.3257

MundlakCoef

1.0007

1.0021

1.0015

6.6390

6.3816

se0.0357

0.0357

0.0898

rmse

0.0351

0.0357

0.0884

3Sbootstrapcoef

0.9979

1.0036

1.0017

6.4584

6.4126

0.0363

<10−4

se0.0521

0.0522

0.0867

rmse

0.0531

0.0541

0.0901

Chamberlain-typeFixed

EffectsW

orld‡

True

11

16.6292

95.5874

MCScoef

1.0005

1.0000

1.0008

6.9099

91.8444

se0.0334

0.0333

0.0808

rmse

0.0368

0.0374

0.0850

3Sbootstrapcoef

0.9982

0.9943

1.0004

7.4226

95.7630

0.0005

<10−4

se0.0546

0.0588

0.0871

rmse

0.0583

0.0672

0.0927

Hausm

an-T

aylor

True

11

11

16.9095

4

HT

coef

0.9979

0.9977

1.0113

0.9901

0.9997

6.8540

5.1608

se0.0897

0.0902

0.1137

0.2680

0.3478

rmse

0.0897

0.0891

0.1152

0.2529

0.3427

3Sbootstrapcoef

0.9905

0.9865

1.0051

0.9479

1.0186

6.9214

4.3901

0.0041

<10−4

se0.0986

0.0955

0.1607

0.3237

0.2039

rmse

0.0990

0.0964

0.1608

0.3278

0.2047

†T

he

para

met

erπ

isom

itte

d.

‡T

he

para

met

ersπ1,...,π

5are

om

itte

d.

33

Tab

le7:

Sen

siti

vit

yto

En

dogen

ou

sC

ovari

ate

s

N=

100,T

=5,ε

=0.5

,re

plica

tion

s=1,0

00

Chamberlain-typeFixed

EffectsW

orld†

β11

β12

β2

η1

η2

σ2 u

σ2 µ

λβ

λµ

True

11

18.0000

112.0657

MCScoef

1.0010

1.0010

0.9999

7.9944

115.3597

se0.0374

0.0374

0.0503

rmse

0.0401

0.0413

0.0518

3Sbootstrapcoef

1.0005

0.9996

0.9996

7.9584

111.4192

<10−4

<10−4

se0.0403

0.0404

0.0499

rmse

0.0410

0.0418

0.0511

3St-distcoef

0.9984

1.0001

0.9990

7.9520

112.5857

<10−4

<10−4

se0.0393

0.0404

0.0505

rmse

0.0393

0.0404

0.0505

nse

0.0020

0.0020

0.0022

FB

coef

1.0016

1.0011

0.9995

7.9517

112.5285

se0.0395

0.0406

0.0503

rmse

0.0395

0.0406

0.0503

nse

0.0021

0.0021

0.0023

Hausm

an-T

aylorW

orld

β11

β12

β2

η1

η2

σ2 u

σ2 µ

λβ

λµ

True

11

11

18

32

HT

coef

0.9904

0.9880

1.0059

0.9879

1.0195

7.8967

43.0953

se0.1082

0.1092

0.0511

0.6813

0.6150

rmse

0.1139

0.1093

0.0521

0.6424

0.5878

3Sbootstrapcoef

0.9341

0.9312

1.0011

0.9894

1.2898

7.9191

17.1702

<10−4

<10−4

se0.0668

0.0648

0.0720

0.4734

0.1386

rmse

0.0938

0.0945

0.0720

0.4735

0.3212

3St-distcoef

0.9270

0.9261

1.0005

0.9902

1.3023

7.9441

16.6030

<10−4

<10−4

se0.0433

0.0416

0.0513

0.4345

0.0947

rmse

0.0848

0.0848

0.0513

0.4344

0.3168

nse

0.0021

0.0021

0.0023

0.0090

0.0047

FB

coef

0.9325

0.9305

1.0008

0.9893

1.2948

7.9402

16.9453

se0.0438

0.0426

0.0513

0.4412

0.0951

rmse

0.0805

0.0815

0.0513

0.4411

0.3097

nse

0.0024

0.0024

0.0023

0.0107

0.0054

†T

he

para

met

ersπ1,...,π

5are

om

itte

dfr

om

the

tab

le.

34

Robust linear static panel data models using ε-contamination

Documents

Transcript of Robust linear static panel data models using ε-contamination