Lecture 1: Introduction to Spatial Autoregressive (SAR) Models 1....

Lecture 1: Introduction to Spatial Autoregressive (SAR) Models

1. Some SAR models:

Some specified models for cross sectional data which may capture possible spatial interac-

tions across spatial units.

The following models are generalizations of autoregressive processes and autoregression models

in time series.

1.1) (First order) spatial autoregressive (SAR) process:

yi = λwi,nYn + εi, i = 1, · · · , n,

where Yn = (y1, · · · , yn)′ is the column vector of dependent variables, wi,n is a n-dimensional row

vector of constants, and εi’s are i.i.d.(0, σ2). In the vector/matrix form,

Yn = λWnYn + En.

The term WnYn has been termed ‘spatial lag’. This is supposed to be an equilibrium model. Under

the assumption that Sn(λ) = In − λWn is nonsingular, one has

Yn = S−1n (λ)En.

This process model may not be not useful alone in empirical econometrics but it is used to

model possible spatial correlations in disturbances of a regression equation. The regression model

with SAR disturbance Un is specified as

Yn = Xnβ + Un, Un = ρWnUn + En,

where En has zero mean and variance σ2In. In this regression model, the disturbances εi’s

in En are correlated across units according to a SAR process. The variance matrix of Un is

σ20S

−1n (ρ0)S

′−1n (ρ0). As the diagonal elements of S−1

n (ρ0)S′−1n (ρ0) may not be zero, there are

correlations of disturbances ui’s across units.

1.2) Mixed regressive, spatial autoregressive model (MRSAR):

This model generalizes the SAR process by incorporating exogenous variables xi in the SAR

process. It has also simply been called the spatial autoregressive model. In vector/matrix form,

Yn = λWnYn + Xnβ + En,

1

where En is (0, σ2In). This model has the feature of a simultaneous equation model and its reduced

form is

Yn = S−1n (λ)Xnβ + S−1

n (λ)En.

1.3) Some Intuitions on Spatial Weights Matrix Wn

The matrix Wn is called the spatial weights matrix. The value wn,ij of the jth element of wi,n

represents the link (or distance) between the neighbor j to the spatial unit i. Usually, the diagonal

of Wn is specified to be zero, i.e., wn,ii = 0 for all i, because λwi,nYn represents the effect of other

spatial units on the spatial unit i.

In some empirical applications, it is a common practice to have Wn having a zero diagonal and

being row-normalized such that the summation of elements in each row of Wn is a unity. In some

applications, the ith row wi,n of Wn may be constructed as wi,n = (di1, di2, . . . , din)/∑n

j=1 dij ,

where dij ≥ 0, represents a function of the spatial distance of the ith and jth units, for example,

the inverse of the distance of the i and jth units, in some (characteristic) space. The weighting

operation may be interpreted as an average of neighboring values. However, in some cases, one may

argue that row-normalization for weights matrix may not be meaningful (e.g., Bell and Bochstael

(2000) RESTAT for real estate problems with microlevel data).

When neighbors are defined as adjacent ones for each unit, the correlation is local in the sense

that correlations across units will be stronger for neighbors but will be weak for units far away.

This becomes clear with an expansion of S−1n (ρ0). Suppose that ‖ ρWn ‖≤ 1 for some matrix norm

‖ ‖, then

S−1n (ρ) = In +

∞∑i=1

ρiW in,

(see, e.g., Horn and Johnson 1985). As ‖ ∑∞i=m ρiW i

n ‖≤ |ρWn|i ‖ S−1n (ρ) ‖, in particular, if Wn

is row normalized, ‖ ∑∞i=m ρiW i

n ‖∞≤ |∑∞i=m |ρ|i = |ρ|m

1−|ρ| , will become small when m becomes

larger. The Un can be represented as

Un = En + ρWnEn + ρ2W 2nEn + · · · ,

where ρWn may represent the influence of neighbors on each unit, ρ2W 2n represents the second

layer neighborhood influence, etc.

2

In the social science (or social interactions) literature, WnS−1n (ρ) is a vector of measures of

centrality, which summaries the position of each spatial unit (in a network).

In conventional spatial weights matrices, neighboring units are defined by only a few adjacent

ones. However, there are cases where ‘neighbors’ may consist of many units. An example is a

social interactions model, where ‘neighbors’ refer to individuals in a same group. The latter may

be regarded as models with large group interactions (a kind of models with social interactions).

For models with a large number of interactions for each unit, the spatial weights matrix Wn will

associate with the sample size. In a model, suppose there are R groups and there are m individuals

in each group. The sample size will be n = mR. In a model without special network (e.g., frindship)

information, one may assume that each individual in a group is given equal weight. In that case,

Wn = IR ⊗ Bm where Bm = (lml′m − Im)/(m − 1), ⊗ is the Kronecker product, and lm is the

m-dimensional vector of ones. Both m and R can be large in a sample. In which case, asymptotic

analysis for large sample will be relevant with both R and m go to infinity. Of course, in the

general case, the number of members in each district may be large but have different sizes. This

model has many interesting applications in the study of social interactions issues.

1.4) Other generalizations of those models can have high order spatial lags and/or SAR

disturbances.

A more rich SAR model may combine the MRSAR equation with SAR distrubances

Yn = λWnYn + Xnβ + Un, Un = ρMnUn + En, (1.5)

where Wn and Mn are spatial weights matrices, which may or may not be identical.

Further extension of a SAR model may allow high-order spatial lags as in

Yn =p∑

j=1

λjWjnYn + Xnβ + En,

where Wjn’s are p distinct spatial weights matrices.

2. Some matrix algebra

2.1) Vector and Matrix Norms

Definition. Let V be a vector space. A function ‖ · ‖: V −→ R is a vector norm if for any

x, y ∈ V ,

3

(1) nonnegative, ‖ x ‖≥ 0,

(1a) positive, ‖ x ‖= 0 if and only if x = 0,

(2) homogeneous, ‖ cx ‖= |c| ‖ x ‖ for any scalar c,

(3) triangle inequality, ‖ x + y ‖≤‖ x ‖ + ‖ y ‖.

Examples:

a) The Euclidean norm (or l2 norm) is

‖ x ‖2= (|x1|2 + · · ·+ |xn|2)1/2.

b) The sum norm (or l1 norm) is

‖ x ‖1= |x1| + · · ·+ |xn|.

c) The max norm (or l∞ norm) is

‖ x ‖∞= max{|x1|, · · · , |xn|}.

Definition. A function ‖ · ‖ of a square matrix is a matrix norm if, for any square matrices A

and B, it satisfies the following axioms:

(1) ‖ A ‖≥ 0,

(1a) ‖ A ‖= 0 if and only if A = 0,

(2) ‖ cA ‖= |c| ‖ A ‖ for any scalar c,

(3) ‖ A + B ‖≤‖ A ‖ + ‖ B ‖,(4) submultiplicative, ‖ AB ‖≤‖ A ‖ · ‖ B ‖.

Associated with each vector norm is a natural matrix norm induced by the vector norm.

Let ‖ · ‖ be a vector norm. Define ‖ A ‖= max‖x‖=1 ‖ Ax ‖= maxx‖Ax‖‖x‖ . This function ‖ · ‖

for A is a matrix norm and it has the properties that ‖ Ax ‖≤‖ A ‖ · ‖ x ‖ and ‖ I ‖= 1.

Examples:

a) The maximum column sum matrix norm ‖ · ‖1 is

‖ A ‖1= max1≤j≤n

n∑i=1

|aij |,

4

which is a matrix norm induced by the l1 vector norm.

b) The maximum row sum matrix norm ‖ · ‖∞ is

‖ A ‖∞= max1≤i≤n

n∑j=1

|aij |,

which is a matrix norm induced by the l∞ vector norm.

One important application of matrix norms is in giving bounds for the eigenvalues of a matrix.

Definition: The spectral radius ρ(A) of a matrix A is

ρ(A) = max{|λ| : λ is an eigenvalue of A}.

Let λ be an eigenvalue of A and x be the corresponding eigenvector, i.e., Ax = λx. Then,

|λ|· ‖ x ‖=‖ λx ‖=‖ Ax ‖≤‖ A ‖ · ‖ x ‖,

and, therefore, |λ| ≤‖ A ‖. That is, ρ(A) ≤‖ A ‖.Another useful application of the matrix norm is that, if ‖ · ‖ is a matrix norm, and if ‖ A ‖< 1,

then I −A is invertible and (I − A)−1 =∑∞

j=0 Aj .

When all of the elements of Wn are non-negative and its rows are normalized to one, ‖ Wn ‖∞=

1. For this case, when |λ| < 1,

(In − λWn)−1 =∞∑

j=0

(λWn)j = In + λWn + λ2W 2n + · · · .

2.2) Useful (important) regularity conditions on Wn and associated matrix:

Definition: Let {An} be a sequence of square matrices, where An is of dimension n.

(i) {An} is said to be uniformly bounded in row sums if {‖ An ‖1} is a bounded sequence;

(ii) {An} is said to be uniformly bounded in column sums if {‖ An ‖∞} is a bounded sequence.

A useful property is that if {An} and {Bn} are uniformly bounded in row sums (column

sums), then {AnBn} is also uniformly bounded in row sums (column sums). This follows from

the submultiplicative property of matrix norms. For example, ‖ AnBn ‖∞≤‖ An ‖∞‖ Bn ‖∞≤ c2

when ‖ An ‖∞≤ c and ‖ Bn ‖∞≤ c.

5

An Important Assumption: The spatial weights matrices {Wn} and {S−1n } are uniformly

bounded in both row and column sums.

Equivalently, this assumes that {‖ Wn ‖1} and {‖ Wn ‖∞} are bounded sequences. Similarly,

the maximum row and column sum norms of S−1n are bounded.

Rule out unit root process:

The assumption that S−1n are uniformly bounded in both row and column sums rules out the

unit root case (in time series). Consider the unit root process that yt = yt−1 + εt, t = 2, · · · , n and

y1 = ε1, i.e., ⎛⎜⎝

y1...

yT

⎞⎟⎠ =

⎛⎜⎜⎜⎜⎝

0 0 · · · 0 01 0 · · · 0 00 1 · · · 0 0...

......

...0 0 · · · 1 0

⎞⎟⎟⎟⎟⎠⎛⎜⎝

y1...

yT

⎞⎟⎠ +

⎛⎝ ε1

...εT

⎞⎠ .

It implies that

yt =t∑

j=1

εj .

That is, S−1n is a lower diagonal matrix with all nonzero elements being unity. The sum of its

first column is n, and the sum of its last row is also n. For the unit root process, S−1n are neither

bounded in row sums nor column sums.

3. Estimation Methods

Estimation methods, which have been considered in the existing literature, are mainly the ML

(or QML) method, the 2SLS (or IV) method, and the generalized method of moments (GMM).

The QML method has usually good finite sample properties relative to the other methods for the

estimation of SAR models with the first order of spatial lag. However, the ML method is not

computationally attractive for models with more than one single spatial lag. For the higher spatial

lags model, the IV and GMM methods are feasible. With properly designed moment equations, the

best GMM estimator exists and can be asymptotically efficient as the ML estimate under normal

disturbances (Lee 2007).

The most popular and traditional estimation method is the ML under the assumption that

En is N (0, σ2In). Other estimation methods are the method of moments (MOM), GMM and the

2SLS. The latter is applicable only to the MRSAR model.

6

3.1) ML for the SAR process,

ln Ln(λ, σ2) = −n

2ln(2π)− n

2lnσ2 + ln |Sn(λ)| − 1

2σ2Y ′

nS′n(λ)Sn(λ)Yn,

where Sn(λ) = In − λWn.

For the regression model with SAR disturbances, the log likelihood function is

lnLn(λ, σ2) = −n

2ln(2π)− n

2ln σ2 + ln |Sn(λ)| − 1

2σ2(Yn − Xnβ)′S′

n(λ)Sn(λ)(Yn − Xnβ).

The log likelihood function for the MRSAR model is

ln Ln(λ, β, σ2) = −n

2ln(2π)− n

2ln σ2 + ln |Sn(λ)| − 1

2σ2(YnSn(λ)− Xnβ)′(Sn(λ)Yn −Xnβ).

The likelihood function involves the computation of the determinant of Sn(λ), which is a

function of the unknown parameter λ, and may have a large dimension n.

A computationally tractable method is due to Ord 1975, where Wn is a row-normalized weights

matrix with Wn = DnW ∗n , where W ∗

n is a symmetric matrix and Dn = Diag{∑nj=1 w∗

n,ij}−1. Note

that, in this case, Wn is not a symmetric matrix. However, the eigenvalues of Wn are still all real.

This is so, because

|Wn − νIn| = |DnW ∗n − νIn| = |DnW ∗

nD1/2n D−1/2

n − νD1/2n D−1/2

n |

= |D 12n | · |D

12nW ∗

nD12n − νIn| · |D− 1

2n | = |D1/2

n W ∗nD1/2

n − νIn|,

the eigenvalues of Wn are the same as those of D1/2n W ∗

nD1/2n , which is a symmetric matrix. As

the eigenvalues of a symmetric matrix are real, the eigenvalues of Wn are real. Let μi’s be the

eigenvalues of Wn, which are the same as those of D1/2n W ∗

nD1/2n . Let Γ be the orthogonal matrix

such that D1/2n W ∗

nD1/2n = ΓDiag{μi}Γ′. The above relations show also that

|In − λWn| = |In − λDnW ∗n | = |In − λD1/2

n W ∗nD1/2

n | = |In − λΓDiag{μi}Γ′|

= |In − λDiag{μi}| =n∏

i=1

(1− λμi).

Thus, |In − λWn| can be easily updated during iterations within a maximization subroutine, as

μi’s need be computed only once.

When Wn is a spare matrix, specific numerical methods for handling spare matrices are useful.

7

Another computational tractable method is the characteristic polynomial. The determinant

|Wn −μIn| is a polynomial in μ and is called the characteristic polynomial of Wn. (note: the zeros

of the characteristic polynomial are the eigenvalues of Wn). Thus,

|In − λWn| = anλn + · · ·+ a1λ + a0,

the constants a’s depend only on Wn. So the a’s can be computed once during the maximization

algorithm.

3.2) 2SLS Estimation for the MRSAR model

For the MRSAR Yn = λWnYn + Xnβ + En, the spatial lag WnYn can be correlated with the

disturbance vector En. So, in general, OLS may not be a consistent estimator. However, there is a

class of spatial Wn (with large group interaction) that the OLS estimator can be consistent (Lee

2002).

To avoid the bias due to the correlation of WnYn with En, Kelejian and Prucha (1998) has

suggested the use of instrumental variables (IVs). Let Qn be a matrix of instrumental variables.

Denote Zn = (WnYn, Xn) and θ = (λ, β′)′. The MRSAR equation can be rewritten as Yn =

Znθ + En. The 2SLS estimator of θ with Qn is

θ2sl,n = [Z′nQn(Q′

nQn)−1Q′nZn]−1Z′

nQn(Q′nQn)−1Q′

nYn.

It can be shown that the asymptotic distribution of θ2sl,n follows from

√n(θ2sl,n − θ0)

d→ N (0, σ20 lim

n→∞{ 1n

(GnXnβ0, Xn)′Qn(Q′nQn)−1Q′

n(GnXnβ0, Xn)}−1),

where Gn = WnS−1n , under the assumption that the limiting matrix 1

n(GnXnβ0, Xn) has the full

column rank (k + 1) where k is the dimension of β or, equivalently, the number of columns of

Xn. In practice, Kelejian and Prucha (1998) suggest the use of linearly independent variables in

(Xn, WnXn) for the construction of Qn.

By the Schwartz inequality, the optimum IV matrix is (GnXnβ0, Xn).

This 2SLS method can not be used for the estimation of the (pure) SAR process. The SAR

process is a special case of the MRSAR model with β0 = 0. However, when β0 = 0, GnXnβ0 = 0

8

and, hence, (GnXnβ0, Xn) = (0, Xn) would have rank k but not full rank (k+1). Intuitively, when

β0 = 0, Xn does not appear in the model and there is no other IV available.

3.3) Method of Moments (MOM) for SAR process:

Kelejian and Prucha (1999) has suggested a MOM estimation:

minθ

g′n(θ)gn(θ).

The moment equations are based on three moments:

E(E ′nEn) = nσ2, E(E ′

nW ′nWnEn) = σ2tr(W ′

nWn), and E(E ′nWnEn) = 0.

These correspond to

gn(θ) = (Y ′nS′

n(λ)Sn(λ)Yn − nσ2, Y ′nS′

n(λ)W ′nWnSn(λ)Yn − σ2tr(W ′

nWn), Y ′nS′

n(λ)WnSn(λ)Yn)′

for the SAR process. For the regression model with SAR disturbances, Yn shall be replaced by

least squares residuals.

Remark: In Kelejian and Prucha (1999), they have shown consistency of the MOM estimator

only but not asymptotic normality of the estimator.

3.4) Moments for GMM estimation

For the MRSAR model, in addition to the moments based on the IV matrix Qn, other moment

equations can also be constructed for estimation.

Let Qn be an n×kx IV matrix constructed as functions of Xn and Wn. Let εn(θ) = Sn(λ)Yn−Xnβ for any possible value θ. The kx moment functions correspond to the orthogonality conditions

are Q′nεn(θ).

Now consider a finite number, say m, of n × n constant matrices P1n, · · · , Pmn of which each

has a zero diagonal. The moment functions (Pjnεn(θ))′εn(θ) can be used in addition to Q′nεn(θ).

These moment functions form a vector

gn(θ) = (P1nεn(θ), · · · , Pmnεn(θ), Qn)′εn(θ) =

⎛⎜⎜⎝

ε′n(θ)P1nεn(θ)...

ε′n(θ)Pmnεn(θ)Q′

nεn(θ)

⎞⎟⎟⎠

9

in the estimation in the GMM framework.

Proposition. For any constant n × n matrix Pn with tr(Pn) = 0, Pnεn is uncorrelated with

εn, i.e., E((Pnεn)′εn) = 0.

Proof: E((Pnεn)′εn) = E(ε′nP ′nεn) = E(ε′nPnεn) = σ2

0tr(Pn) = 0.

This shows, in particular, that E(gn(θ0)) = 0. Thus, gn(θ) are valid moment equations for

estimation.

INTUITION:

As WnYn = GnXnβ0 + Gnεn, where Gn = WnS−1n and Sn = Sn(λ0), and Gnεn is correlated

with the disturbance εn in the model,

Yn = λWnYn + Xnβ + εn,

any Pjnεn, which is uncorrelated with εn, can be used as IV for WnYn as long as Pjnεn and Gnεn

are correlated.

4. Basic Asymptotic Foundation

Some laws of large numbers and central limit theorem

Lemma. Suppose that {An} are uniformly bounded in both row and column sums. Then,

E(ε′nAnεn) = O(n) and var(ε′nAnεn) = O(n). Furthermore, ε′nAnεn = OP (n) and, 1nε′nAnεn −

1nE(ε′nAnεn) = oP (1).

Lemma. Suppose that An is a n × n matrix with its column sums being uniformly bounded,

elements of the n× k matrix Cn are uniformly bounded, and εn1, · · · , εnn are i.i.d. with zero mean,

finite variance σ2 and E|ε|3 < ∞. Then, 1√nC ′

nAnεn = OP (1) and 1nC ′

nAnεn = op(1). Further-

more, if the limit of 1nC ′

nAnA′nCn exists and is positive definite, then

1√n

C ′nAnεn

D→ N (0, σ2 limn→∞

1n

C ′nAnA′

nCn).

This lemma can be shown with the Liapounov central limit theorem.

Theorem: (Liapounov’s theorem) Let {Xn} be a sequence of independent random variables. Let

E(Xn) = μn, E(Xn−μn)2 = σ2n = 0. Denote Cn = (

∑ni=1 σ2

i )1/2. If 1

C2+δn

∑nk=1 E|Xk−μk|2+δ → 0

for a positive δ > 0, then∑n

i=1(Xi−μi)

Cn

d→ N (0, 1).

10

Lemma. Suppose that An is a constant n × n matrix uniformly bounded in both row and

column sums. Let cn be a column vector of constants. If 1nc′ncn = o(1), then 1√

nc′nAnεn = oP (1).

On the other hand, if 1nc′ncn = O(1), then 1√

nc′nAnεn = OP (1).

Proof: The results follow from Chebyshev’s inequality by investigating var(√

1nc′nAnεn) =

σ20

1nc′nAnA′

ncn. Let Λn be the diagonal matrix of eigenvalues of AnA′n and Γn be the orthonormal

matrix of eigenvectors. As eigenvalues in absolute values are bounded by any norm of the matrix,

eigenvalues in Λn in absolute value are uniformly bounded because ‖ An ‖∞ (and ‖ An ‖1) are

uniformly bounded. Hence, 1nc′nAnA′

ncn ≤ 1nc′nΓnΓ′

ncn · |λn,max| = 1nc′ncn|λn,max| = o(1), where

λn,max is the eigenvalue of AnA′n with the largest absolute value.

When 1nc′ncn = O(1), 1

nc′nAnA′

ncn ≤ 1nc′ncn|λn,max| = O(1). In this case, var(

√1nc′nAnεn) =

σ2 1nc′nAnA′

ncn = O(1). Therefore,√

1nc′nAnεn = OP (1). Q.E.D.

A Martingale Central Limit Theorem:

Definition: Let {zt} be a sequence of random scalars, and let {Jt} be a sequence of σ-fields

such that Jt−1 ⊆ Jt for all t. If zt is measurable with respect to Jt, then {zt,Jt} is called an

adapted stochastic sequence.

The σ-field Jt can be thought of as the σ-algebra generated by the entire current and past

history of zt and some other random variables.

Definition: Let {yt,Jt} be an adapted stochastic sequence. Then {yt,Jt} is a martingale

difference sequence if and only if E(yt|Jt−1) = 0, for all t ≥ 2.

Theorem (A Martingale Central Limit Theorem):

Let {ξT t} be a martingale-difference array. If as T → ∞,

(i)∑T

t=1 vT tp→ σ2, where σ2 > 0 and vT t = E(ξ2

T t|JT,t−1) denotes the conditional variances,

(ii) for any ε > 0,∑T

t=1 E(ξ2T tI(|ξT t| > ε)

∣∣JT,t−1) converges in probability to zero (a Lindeberg

condition),

then ξT1 + · · ·+ ξT TD→ N (0, σ2).

A CLT for quadratic-linear function of εn:

Suppose that {An} is a sequence of symmetric n × n matrices with row and column sums

uniformly bounded and bn is a n-dimensional column vector satisfying supn1n

∑ni=1 |bni|2+η < ∞

11

for some η > 0. The εn1, · · · , εnn are i.i.d. random variables with zero mean and finite variance

σ2, and its moment E(|ε|4+δ) for some δ > 0 exists. Let σ2Qn

be the variance of Qn where Qn =

ε′nAnεn + b′nεn − σ2tr(An). Assume that the variance σ2Qn

is bounded away from zero at the rate

n. Then, Qn

σQn

D−→ N (0, 1).

This theorem has been proved by Kelejian and Prucha (2001) with a martingale central limit

theorem.

5. The Cliff and Ord (1973) Test for Spatial Correlation

Consider the model

Y = Xβ + U, U = ρWU + ε,

where Y is an n-dimensional column vector of the dependent variable, X is an n × k matrix of

regressors, U is an n × 1 vector of disturbances, W is a n × n spatial weights matrix with a zero

diagonal, and ε is N (0, σ2In). One may want to test H0 : ρ = 0, i.e., the test of spatial correlation

in U .

The log likelihood of this model is

ln L(θ) = −n

2ln(2πσ2) + ln |I − ρW | − 1

2σ2(y − Xβ)′(I − ρW )′(I − ρW )(y − Xβ).

The first-order conditions are

∂ ln L

∂β′ =1σ2

X ′(I − ρW )′(I − ρW )(y −Xβ),

∂ lnL

∂σ2= − n

2σ2+

12σ4

(y − Xβ)′(I − ρW )′(I − ρW )(y −Xβ),

∂ lnL

∂ρ= −tr(Wn(I − ρW )−1) +

12σ2

(y − Xβ)′[(I − ρW )′W + W ′(I − ρW )](y −Xβ),

where ∂ ln |I−ρW |∂ρ

= −tr(W (I − ρW )−1) by using the matrix differentiation formulae ddξ

ln |A| =

tr(A−1 dAdξ

) (see, e.g., Amemiya (1985) Advanced Econometrics, p461).

The second order derivatives with respect to ρ have

∂2 ln L

∂β′∂ρ= − 1

σ2X ′[W ′(I − ρW ) + (I − ρW )′W ](y − Xβ),

∂2 ln L

∂σ2∂ρ= − 1

2σ4(y −Xβ)′[W ′(I − ρW ) + (I − ρW )′W ](y − Xβ),

12

and∂2 lnL

∂ρ2= −tr([W (I − ρW )−1]2) − 1

σ2(y − Xβ)′W ′W (y − Xβ),

where ∂(I−ρW )−1

∂ρ= (I−ρW )−1W (I−ρW )−1, by using the matrix differentiation formulae d

dξA−1 =

−A−1 dAdξ A−1. We see that EH0(

∂2 ln L∂β∂ρ ) = 0, EH0(

∂2 ln L∂σ∂ρ ) = 0 because E(ε′Wε) = σ2tr(W ) = 0 as

W has a zero diagonal. Furthermore,

EH0(∂2 lnL

∂ρ2) = −tr(W 2 + W ′W ).

Because of the block diagonal structure of the information matrix EH0(∂2 ln L∂θ∂θ′ ), an efficient score

test statistic for testing H0 can have a simple expression. The efficient score test is also known as

a Lagrange multiplier (LM) test.

The efficient score test (Rao) statistic in its general form can be

∂ ln L(θ)∂θ′

[−E(∂2 ln L(θ)

∂θ∂θ′)]−1∂ ln L(θ)

∂θ

where θ is the constrained MLE of θ under H0.

Let β and σ2 be the MLE under H0 : ρ = 0, i.e., β is the usual OLS estimate and σ2 = 1ne′e,

where e = y − Xβ the least squares residual, is the average sum of squared residuals. Then

θ = (β′, σ2, 0)′ and

∂ ln L(θ)∂ρ

= −tr(W ) +1

2σ2(y − Xβ)′(W + W ′)(y − Xβ) =

12σ2

(y − Xβ)′(W + W ′)(y − Xβ),

because tr(W ) = 0. Hence, the LM test statistic is

ne′(W + W ′)e/(2e′e)/(tr(W 2 + W ′W ))12 ,

which shall be asymptotically normal N (0, 1).

The test statistic given by Moran (1950) and Cliff and Ord (1973) is

I = e′We/e′e,

which is the the LM statistic with an unscaled denominator. The LM interpretation of the Cliff

and Ord (1973) test of spatial correlation is due to Burridge (1980), J.R.Statist.Soc.B, vol 42,

pp.107-108.

13

Burridge (1980) has also pointed out the LM test for a moving average alternative, i.e., U =

ε + ρWε, is exactly the same as that given above.

Note that for the regression model with SAR disturbances, y = Xβ + u with u = ρWu +

ε, 1√nu′Wu and 1√

nu′Wu, where u is the least squares residual vector, can be asymptotically

equivalent(under both H0 and H1, i.e., 1√nu′Wu = 1√

nu′Wu + op(1). This can be shown as

follows. As u = [I −X(X ′X)−1X ′]u,

u′Wu = u′[I −X(X ′X)−1X ′]W [I −X(X ′X)−1X ′]u

= u′Wu − u′(W + W ′)X(X ′X)−1X ′u + u′X(X ′X)−1X ′WX(X ′X)−1X ′u.

Because u = (I − ρW )−1ε, under the assumptions that elements of X are uniformly bounded and

(I − ρW )−1 are uniformly bounded in both row and column sums, 1√nX ′u = 1√

nX ′(I − ρW )−1ε =

Op(1) and 1√nX ′Wu = 1√

nX ′W (I−ρW )−1ε = Op(1) by the CLT for the linear form. These imply,

in particular, 1nX ′u = op(1) and 1

nX ′Wu = op(1). Hence,

1√n

u′Wu

=u′Wu√

n− u′X

n(X ′X

n)−1 1√

nX ′(W + W ′)u +

u′Xn

(X ′X

n)−1 X ′WX

n(X ′X

n)−1X ′u√

n

=1√n

u′Wu + oP (1).

Consequently, the Cliff-Ord test using the least squares residual is asymptotically χ2(1) under H0.

6. More on GMM and 2SLS Estimation of MRSAR Models

6.1) IVs and Quadratic Moments

The MRSAR model is

Yn = λWnYn + Xnβ + εn,

where Xn is a n × k dimensional matrix of nonstochastic exogenous variables, Wn is a spatial

weights matrix of known constants with a zero diagonal.

We assume that

Assumption 1. The εni’s are i.i.d. with zero mean, variance σ2 and that a moment of order

higher than the fourth exists.

Assumption 2. The elements of Xn are uniformly bounded constants, Xn has the full rank

k, and limn→∞ 1nX ′

nXn exists and is nonsingular.

14

Assumption 3. The spatial weights matrices {Wn} and {(In − λWn)−1} at λ = λ0 are

uniformly bounded in both row and column sums in absolute value.

Let Qn be an n × kx matrix of IVs constructed as functions of Xn and Wn in a 2SLS

approach. Denote εn(θ) = (In − λWn)Yn − Xnβ for any possible value θ. Let P1n = {P :

P is n × n matrix , tr(P ) = 0} be the class of constant n × n matrices which have a zero trace. A

subclass P2n of P1n is P2n = {P : P ∈ P1n, diag(P ) = 0}.Assumption 4. The matrices Pjns from P1n are uniformly bounded in both row and column

sums in absolute value, and elements of Qn are uniformly bounded.

With the selected matrices Pjns and IV matrices Qn, the set of moment functions forms a vector

gn(θ) = (P1nεn(θ), · · · , Pmnεn(θ), Qn)′εn(θ) = (ε′n(θ)P1nεn(θ), · · · , ε′n(θ)Pmnεn(θ), ε′n(θ)Qn)′

for the GMM estimation.

At θ0, gn(θ0) = (ε′nP1nεn, · · · , ε′nPmnεn, ε′nQn)′. It follows that E(gn(θ0)) = 0 because

E(Q′nεn) = Q′

nE(εn) = 0 and E(ε′nPjnεn) = σ20tr(Pjn) = 0 for j = 1, · · · , m. The intuition is

as follows:

(1) The IV variables in Qn, can be used as IV variables for WnYn and Xn.

(2) The Pjnεn is uncorrelated with εn. As WnYn = Gn(Xnβ0 + εn), Pjnεn can be used as an IV

for WnYn as long as Pjnεn and Gnεn are correlated, where Gn = WnS−1n .

6.2) Identification

In the GMM framework, the identification condition requires the unique solution of the limiting

equations, limn→∞ 1nE(gn(θ)) = 0 at θ0 (Hansen 1982). The moment equations corresponding to

Qn are

limn→∞

1n

Q′ndn(θ) = lim

n→∞1n

(Q′nXn, Q′

nGnXnβ0)((β0 − β)′, λ0 − λ)′ = 0.

They will have a unique solution at θ0 if (Q′nXn, Q′

nGnXnβ0) has a full column rank, i.e., rank

(k + 1), for large enough n.

This sufficient rank condition implies the necessary rank condition that (Xn, GnXnβ0) has a

full column rank (k +1) and that Qn has a rank at least (k +1), for large enough n. The sufficient

rank condition requires Qn to be correlated with WnYn in the limit as n goes to infinity. This is so

15

because E(Q′nWnYn) = Q′

nGnXnβ0. Under the sufficient rank condition, θ0 can thus be identified

via limn→∞ 1nQ′

ndn(θ) = 0.

The necessary full rank condition of (Xn, GnXnβ0) for large n is possible only if the set

consisting of GnXnβ0 and Xn is not asymptotically linearly dependent. This rank condition would

not hold, in particular, if β0 were zero. There are other cases when this dependence can occur (see,

e.g., Kelejian and Prucha 1998).

As Xn has rank k but (Xn, GnXnβ0) does not have a full rank (k + 1), the corresponding

moment equations Q′ndn(θ) = 0 will have many solutions but the solutions are all described by the

relation that β = β0 + (λ0 − λ)c as long as Q′nXn has a full rank k. Under this scenario, β0 can

be identified only if λ0 is identifiable. The identification of λ0 will rely on the remaining quadratic

moment equations. In this case,

Yn = λ0(GnXnβ0) + Xnβ0 + S−1n εn = Xn(β0 + λ0c) + un,

where un = S−1n εn. The relationship un = λ0Wnun + εn is a SAR process. The identification of

λ0 thus comes from the SAR process un.

The following assumption summarizes some sufficient conditions for the identification of θ0

from the moment equations limn→∞ 1nE(gn(θ)) = 0.

Assumption 5. Either (i) limn→∞ 1nQ′

n(GnXnβ0, Xn) has the full rank (k + 1), or (ii)

limn→∞ 1nQ′

nXn has the full rank k, limn→∞ 1ntr(P s

jnGn) = 0 for some j,

and limn→∞ 1n [tr(P s

1nGn), · · · , tr(P smnGn)]′ is linearly independent of

limn→∞

1n

[tr(G′nP1nGn), · · · , tr(G′

nPmnGn)]′.

5.3) Optimum GMM

The variance matrix of these moments functions involves variances and covariances of linear

and quadratic forms of εn. For any square n × n matrix A, let vecD(A) = (a11, · · · , ann)′ denote

the column vector formed with the diagonal elements of A. Then,

E(Q′nεn · ε′nPnεn) = Q′

n

n∑i=1

n∑j=1

pn,ijE(εnεniεnj) = Q′nvecD(Pn)μ3

16

and

E(ε′nPjnεn · ε′nPlnεn) = (μ4 − 3σ40)vec′D(Pjn)vecD(Pln) + σ4

0tr(PjnP sln).

It follows that var(gn(θ0)) = Ωn where

Ωn =(

(μ4 − 3σ40)ω

′nmωnm μ3ω

′nmQn

μ3Q′nωnm 0

)+ Vn,

with ωnm = [vecD(P1n), · · · , vecD(Pmn)] and

Vn = σ40

⎛⎜⎜⎜⎝

tr(P1nP s1n) · · · tr(P1nP s

mn) 0...

......

tr(PmnP s1n) · · · tr(PmnP s

mn) 00 · · · 0 1

σ20Q′

nQn

⎞⎟⎟⎟⎠ = σ4

0

(Δmn 0

0 1σ20Q′

nQn

),

where Δmn = [vec(P ′1n), · · · , vec(P ′

mn)]′[vec(P s1n), · · · , vec(P s

mn)], by tr(AB) = vec(A′)′vec(B) for

any conformable matrices A and B.

When εn is normally distributed, Ωn is simplified to Vn because μ3 = 0 and μ4 = 3σ40 . If

Pjn’s are from P2n, Ωn = Vn also because ωnm = 0. In general, Ωn is nonsingular if and only if

both (vec(P1n), · · · , vec(Pmn)) and Qn have full column ranks. This is so, because Ωn would be

singular if and only if the moments in gn(θ0) are functionally dependent, equivalently, if and only

if∑m

j=1 ajPjn = 0 and Qnb = 0 for some constant vector (a1, · · · , am, b′) = 0.

Let an converge to a constant full rank matrix a0.

Proposition 1. Under Assumptions 1-5, suppose that Pjn for j = 1, · · · , m, are from P1n

and Qn is a n × kx IV matrix so that a0 limn→∞ 1nE(gn(θ)) = 0 has a unique root at θ0 in Θ.

Then, the GMME θn derived from minθ∈Θ g′n(θ)a′

nangn(θ) is a consistent estimator of θ0, and√

n(θn − θ0)D→ N (0, Σ), where

Σ = limn→∞

[(1n

D′n)a′

nan(1n

Dn)]−1

(1n

D′n)a′

nan(1n

Ωn)a′nan(

1n

Dn)[(1n

D′n)a′

nan(1n

Dn)]−1

,

and

Dn =(

σ20tr(P

s1nGn) · · · σ2

0tr(PsmnGn) (GnXnβ0)′Qn

0 · · · 0 X ′nQn

)′,

under the assumption that limn→∞ 1nanDn exists and has the full rank (k + 1).

Proposition 2. Under Assumptions 1-6, suppose that ( Ωn

n )−1 − (Ωn

n )−1 = oP (1), then the

feasible OGMME θo,n derived from minθ∈Θ g′n(θ)Ω−1

n gn(θ) based on gn(θ) with Pjn’s from P1n has

17

the asymptotic distribution

√n(θo,n − θ0)

D→ N (0, ( limn→∞

1n

D′nΩ−1

n Dn)−1).

Furthermore,

g′n(θo,n)Ω−1

n gn(θo,n) D→ χ2((m + kx) − (k + 1)).

5.4) Efficiency and the BGMME

The optimal GMME θo,n can be compared with the 2SLSE. With Qn as the IV matrix, the

2SLSE of θ0 is

θ2sl,n = [Z′nQn(Q′

nQn)−1Q′nZn]−1Z′

nQn(Q′nQn)−1Q′

nYn, (4.1)

where Zn = (WnYn, Xn). The asymptotic distribution of θ2sl,n is

√n(θ2sl,n − θ0)

D→

N (0, σ20 lim

n→∞{ 1n

(GnXnβ0, Xn)′Qn(Q′nQn)−1Q′

n(GnXnβ0, Xn)}−1),(4.2)

under the assumptions that limn→∞ 1nQ′

nQn is nonsingular and limn→∞ 1nQn(GnXnβ0, Xn) has

the full column rank (k +1) (Kelejian and Prucha 1998). Because the 2SLSE can be a special case

of the GMM estimation, by Proposition 2, θo,n shall be efficient relative to θ2sl,n. The best Qn is

Q∗n = (GnXnβ0, Xn) by the Schwartz inequality.

The remaining issue is on the best selection of Pjn’s. When the disturbance εn is normally

distributed or Pjn’s are from P2n, with Q∗n for Qn,

D′nΩ−1

n Dn =(

CmnΔ−1mnC ′

mn 00 0

)+

1σ2

0

(GnXnβ0, Xn)′(GnXnβ0, Xn),

where Cmn = [tr(P s1nGn), · · · , tr(P s

mnGn)]. Note that, because tr(PjnP sln) = 1

2tr(P s

jnP sln), Δmn

can be rewritten as

Δmn =12

⎛⎜⎝

tr(P s1nP s

1n) · · · tr(P s1nP s

mn)...

...tr(P s

mnP s1n) · · · tr(P s

mnP smn)

⎞⎟⎠ =

12[vec(P s

1n) · · ·vec(P smn)]′[vec(P s

1n) · · ·vec(P smn)].

(i) When Pjns are from P2n,

tr(P sjnGn) =

12tr(P s

jn[Gn −Diag(Gn)]s)

=12vec′

([Gn − Diag(Gn)]s

)vec(P s

jn)

18

for j = 1, · · · , m, in Cmn, where Diag(A) denotes the diagonal matrix formed by the diagonal

elements of a square matrix A. Therefore, the generalized Schwartz inequality implies that

CmnΔ−1mnC ′

mn ≤ 12vec′

([Gn − Diag(Gn)]s

)· vec

([Gn − Diag(Gn)]s

)= tr

([Gn − Diag(Gn)]sGn

).

Thus, in the subclass P2n, [Gn − Diag(Gn)] and together with [GnXnβ0, Xn] provide the set of

best IV functions.

(ii) For the case where εn is N (0, σ20In), because, for any Pjn ∈ P1n,

tr(P sjnGn) =

12vec′

([Gn − tr(Gn)

nIn]s

)vec(P s

jn),

for j = 1, · · · , m, the generalized Schwartz inequality implies that

CmnΔ−1mnC ′

mn ≤ tr([Gn − tr(Gn)

nIn]sGn

).

Hence, in the broader class P1n, [Gn − tr(Gn)n

In] and [GnXnβ0, Xn] provide the best set of IV

functions.

Proposition 3. Under Assumptions 1-3, suppose that λn is a√

n-consistent estimate of λ0,

βn is a consistent estimate of β0, and σ2n is a consistent estimate of σ2

0.

Within the class of GMME’s derived with P2n, the BGMME θ2b,n has the limiting distribution

that√

n(θ2b,n − θ0)D→ N (0, Σ−1

2b ) where

Σ2b = limn→∞

1n

(tr[(Gn − Diag(Gn))sGn] + 1

σ20(GnXnβ0)′(GnXnβ0) 1

σ20(GnXnβ0)′Xn

1σ20X ′

n(GnXnβ0) 1σ20X ′

nXn

),

which is assumed to exist.

In the event that εn ∼ N (0, σ20In), within the broader class of GMME’s derived with P1n, the

BGMME θ1b,n has the limiting distribution that√

n(θ1b,n − θ0)D→ N (0, Σ−1

1b ) where

Σ1b = limn→∞

1n

(tr[(Gn − tr(Gn)

nIn)sGn] + 1

σ20(GnXnβ0)′(GnXnβ0) 1

σ20(GnXnβ0)′Xn

1σ20X ′

n(GnXnβ0) 1σ20X ′

nXn

),

which is assumed to exist.

5.5) Links between BGMME and MLE

19

When εn is N (0, σ20In), the model can be estimated by the ML method. The log likelihood

function of the MRSAR model via its reduced form equation is

lnLn = −n

2ln(2π)− n

2ln σ2 + ln |(In − λWn)|

− 12σ2

[Yn − (In − λWn)−1Xnβ]′(In − λW ′n)(In − λWn)[Yn − (In − λWn)−1Xnβ].

(4.6)

The asymptotic variance of the MLE (θml,n, σ2ml,n) is

AsyVar(θml,n, σ2ml,n)

=

⎛⎜⎜⎝

tr(G2n) + tr(G′

nGn) + 1σ20(GnXnβ0)′(GnXnβ0) 1

σ20(X ′

nGnXnβ0)′tr(Gn)

σ20

1σ20X ′

nGnXnβ01

σ20X ′

nXn 0tr(Gn)

σ20

0 n2σ4

0

⎞⎟⎟⎠

−1

(see, e.g., Anselin and Bera (1998), p.256). From the inverse of a partitioned matrix, the asymptotic

variance of the MLE θml,n is

AsyV ar(θml,n)

=

(tr(G2

n) + tr(G′nGn) + 1

σ20(GnXnβ0)′(GnXnβ0) − 2

ntr2(Gn) 1

σ20(X ′

nGnXnβ0)′1

σ20X ′

nGnXnβ01

σ20X ′

nXn

)−1.

As tr(G2n) + tr(G′

nGn) − 2ntr2(Gn) = tr((Gn − tr(Gn)

nIn)sGn), the GMME θ1b,n has the same

limiting distribution as the MLE of θ0 from Proposition 3.

There is an intuition on the best GMM approach compared with the maximum likelihood one.

The derivatives of the log likelihood in (4.6) are

∂ ln Ln

∂β=

1σ2

X ′nεn(θ),

∂ ln Ln

∂σ2= − n

2σ2+

12σ4

ε′n(θ)εn(θ),

and∂ lnLn

∂λ= −tr(Wn(In − λWn)−1) +

1σ2

[Wn(In − λWn)−1Xnβ]′εn(θ)

+1σ2

ε′n(θ)[Wn(In − λWn)−1]′εn(θ).

The equation ∂ ln Ln

∂σ2 = 0 implies that the MLE is σ2n(θ) = 1

nε′n(θ)εn(θ) for a given value θ. By

substituting σ2n(θ) into the remaining likelihood equations, the MLE θml,n will be characterized by

the moment equations: X ′nεn(θ) = 0, and

[Wn(In − λWn)−1Xnβ]′εn(θ) + ε′n(θ)[Wn(In − λWn)−1 − 1

ntr(Wn(In − λWn)−1)

]εn(θ) = 0.

20

The similarity of the best GMM moments and the above likelihood equations is revealing. The

best GMM approach has the linear and quadratic moments of εn(θ) in its formation and uses

consistently estimated matrices in its linear and quadratic form.

5.6) More recent results

1) generalization to the estimation of higher order autoregressive models

2) robust GMM estimation in the presence of unknown heteroskedasticity

3) improve GMM estimators when disturbances are non-normal.

21

Lecture 2: Econometric Models with Social Interactions

We consider interactions of individuals in a group setting.

2.1) Linear in the Mean (Expectation) Model and the Reflection Problem (Man-

ski)

Endogenous effects, wherein the propensity of an individual to behave in some way varies with

the behaviour of the group;

Exogenous (contextual) effects, wherein the propensity of an individual to behave in some way

varies with the exogenous characteristics of the group, and

Correlated effects, wherein individuals in the same group tend to behave similarly because

they have similar individual characteristics or face similar institutional environment.

The Manski’s linear model specification:

yri = λ0E(yrj) + xri,1β10 + E(xrj,2)β20 + αr + εri

where E(yrj) with the coefficient λ0 captures the endogenous interactions, E(xrj,2) with β20 cap-

tures the exogenous interactions, and αr represents the correlation effect. The E(yr) is assumed

to solve the “social equilibrium” equation,

E(yr) = λ0E(yr) + E(xr,1)β10 + E(xr,2)β20 + αr.

Provided that λ0 = 1, this equation has a unique solution, which is

E(yr) = E(xr1)β10

1 − λ0+ E(xr,2)

β20

1 − λ0+

αr

1 − λ0.

The case where xr1 = xr2(= xr), or more general, xr,1 is a subset of xr,2, will have an underiden-

tification problem.

When xr,1 = xr,2 = xr, it follows E(yr) = E(xr)β10+β20(1−λ0)

+ αr

1−λ0and that the reduced form of

this model is

yri = E(xr)[β20 +λ0

1 − λ0(β10 + β20)] + xriβ10 +

αr

1 − λ0+ εri.

The β10 and the composite parameter vecotr β20 + λ01−λ0

(β10 + β20) can be identified, but λ0 and

β20 can not be separately identified. This is so, because E(yr) is linearly dependent on E(xr) and

the constant intercept term αr — the “reflection problem”.

22

From the implied equation, β10 is apparently identifiable. For the identification of λ0 and

β20, a sufficent identification condition is that some relevant variables are in xri,1, but they are

not in xri,2 (under the random components specification for the overall disturbance). This follows,

because, with the relevant variables in xri,1 but excluded in xri,2, it allows us to identify λ0 from

the composite parameter vector. In that case, the other coefficients λ and β2 are also identifiable.

2.2) A SAR Model with Social Interactions

Instead of rational expectation formulation, an alternative model is to specify direct interac-

tions of individuals in a group setting. This model will be useful in particular for cases where each

group has a small or moderate number of members. Suppose there are mr members in the group

r, for each unit i in the group r,

yri = λ0(1

mr − 1

mr∑j=1,j =i

yrj) + xri,1β10 + (1

mr − 1

mr∑j=1,j =i

xrj,2)β20 + αr + εri,

with i = 1, · · · , mr and r = 1, · · · , R, where yri is the ith individual in the rth group, xri,1 and

xri,2 are, respectively, k1 and k2-dimensional row vectors of exogenous variables, and εri’s are i.i.d.

(0, σ20).

It is revealing to decompose this equation into two parts. Because of the group structure, the

model is equivalent to the following two equations:

(1 − λ0)yr = xr1β10 + xr2β20 + αr + εr, r = 1, · · · , R,

and

(1 +λ0

mr − 1)(yri − yr) = (xri,1 − xr1)β10 − 1

mr − 1(xri,2 − xr2)β20 + (εri − εr),

for i = 1, · · · , mr; r = 1, · · · , R, where yr = 1mr

∑mr

i=1 yri, xr1 = 1mr

∑mr

i=1 xri,1 and xr2 =

1mr

∑mr

i=1 xri,2 are means for the r-th group. The first equation may be called a ‘between’ equation

and that in the second one is a ‘within’ equation as they have similarity with those of a panel

data regression model (Hsiao 1986). The possible effects due to interactions are revealing in the

reduced-form between and within equations:

yr = xr1β10

(1 − λ0)+ xr2

β20

(1 − λ0)+

αr

(1 − λ0)+

εr

(1 − λ0), r = 1, · · · , R,

23

and(yri − yr) = (xri,1 − xr1)

(mr − 1)β10

(mr − 1 + λ0)− (xri,2 − xr2)

β20

(mr − 1 + λ0)

+(mr − 1)

(mr − 1 + λ0)(εri − εr),

with i = 1, · · · , mr; r = 1, · · · , R.

Identification of the within equation

When all groups have the same number of members, i.e., mr is a constant, say m, for all r, the

effect λ can not be identifiable from the within equation. This is apparent as only the functions(m−1)β10(m−1+λ0)

, β20(m−1+λ0)

, and (m−1)σ20

(m−1+λ0)may ever be identifiable from the within equation.

Estimation of the within equation

Conditional likelihood and CMLE:

ln Lw,n(θ) = c +R∑

r=1

(mr − 1) ln(mr − 1 + λ) − (n − R)2

lnσ2

− 12σ2

R∑r=1

[1

cr(λ)Yr − Zrδm]′Jr[

1cr(λ)

Yr − Zrδm],

where cr(λ) = ( mr−1mr−1+λ

), zri = (xri,1,− mmr−1

xri,2), m = 1R

∑Rr=1 mr; δm = (β1,

β2m

).

CMLE of β and σ2:

βn(λ) =(

Ik1 00 mIk2

)( R∑r=1

Z′rJrZr

)−1 R∑r=1

Z′rJrYr

1cr(λ)

,

and

σ2n(λ) =

1n − R

{ R∑r=1

1c2r(λ)

Y ′rJrYr

−R∑

r=1

1cr(λ)

Y ′rJrZr

(R∑

r=1

Z′rJrZr

)−1 R∑r=1

Z′rJrYr

1cr(λ)

}.

Concentrated log likelihood at λ:

lnLc,n(λ) = c1 +R∑

r=1

(mr − 1) ln(mr − 1 + λ) − (n − R)2

ln σ2n(λ).

Instrumental Variables Estimation

yri − yr = −λ0(yri − yr)mr − 1

+ (xri,1 − xr,1)β10 − (xri,2 − xr,2)mr − 1

β20 + (εri − εr).

24

The IV estimator of θ0 = (λ0, β′10, β

′20)

′ is

θn,IV =

[R∑

r=1

(Qr

mr − 1, Xr,1,− Xr,2

mr − 1)′Jr(− Yr

mr − 1, Xr,1,− Xr,2

mr − 1)

]−1

·R∑

r=1

(Qr

mr − 1, Xr,1,− Xr,2

mr − 1)′JrYr.

2.3) Expectation Model (self-influence excluded)

The social interactions model under consideration is

yri = λ0(1

mr − 1

mr∑j=1, =i

E(yrj |Jr)) + xri,1β10

+ (1

mr − 1

n∑j=1, =i


where Jr is the information set of the rth group, which includes all observed x’s and the group

fixed effect αr . This model assumes that each invidual knows all the exogenous characteristics

of all other members in the group but does not know their actual outcomes. Because he does

not know what the outcomes of others will be, he formulates his expectation about their possible

(rational expectation) outcomes.

The group means of the expected outcomes are

1mr

mr∑i=1

E(yri|Jr) =1

1 − λ0(

1mr

mr∑i=1

xri,1β10 +1

mr

mr∑i=1

xri,2β20 + αr),

for r = 1, · · · , R, which are linear functions of group means of exogenous variables. The model

implies the between equation representation

yr =1

(1 − λ0)(xr,1β10 + xr,2β20) + vr + εr,

and the within equation representation

yri − yr =mr − 1

mr − 1 + λ0(xri,1 − xr,1)β10 − 1

mr − 1 + λ0(xri,2 − xr,2)β20

+ (εri − εr).

From these equations, we see no implication on the possible correlation of disturbances due to

endogenous interactions, while the mean components are exactly the same as the simultaneous

25

interaction model. For this model, there is an underidentification when β20 + λ0β10 = 0 with

xri,1 = xri,2(= xri). These features imply (mr−1)β10−β20mr−1+λ0

= β10 and, hence,

yri − yr = (xri − xr)β10 + (εri − εr).

We see that, in this case, λ0 and β20 do not appear in the equation and they can not be identified

from the within equation. In the simultaneous equation interactions model, the λ0 can be identified

from the reduced form disturbances of the within equation, even that constraint on β10, β20 and

λ0 occurs.

Lee (2007) suggests the estimation of the within equation by the conditional ML method. In

addition, the IV method is also feasible, and optimal IV’s are also available.

We note that if the model includes self-influence as in

yri = λ0(1

mr

mr∑j=1

E(yrj |Jr)) + xri,1β10 + (1

mr

mr∑j=1


the social effects in the model would also not be identifiable when xri,1 = xri,2. For this model,

the group means of the expected outcomes are

1mr

mr∑i=1

E(yri|Jr) =1

1 − λ0(

1mr

mr∑i=1

xri,1β10 +1

mr

mr∑i=1

xri,2β20 + αr),

which is a linear function of group means of exogenous variables.

2.4) A Network Model with Social Interactions

Network Model: rth group

Yr = λWrYr + Xrβ1 + WrXrβ2 + lmrαr + ur ,

and

ur = ρMrur + εr,

where εr = (ε1r, · · · , εmrr)′, εir i.i.d. (0, σ2).

R # of groups

mr # of members in r group

Wr , Mr exogenous network (social) matrices

26

λWrYr endogenous effect

WrXrβ2 exogenous effect

αr group (fixed) effect; lmr= (1, · · · , 1)′

ρMrur correlation effects

Lin (2005, 2006) — AddHealth Data;

The Add Health Survey:

Students in grades 7-12; 132 schools

Wave I in school survey — 90,118 students;

Friendship network — name up to 5 male and 5 female friends

Special features (in this paper):

Group structure (R can be large – incidental parameters);

Wr and Mr – row normalized, i.e., Wrlmr= lmr

.

Bonacich measure of centrality of members in a group:

(Imr− λ0Wr)−1Wrlmr

= (1 − λ0)lmr,

i.e., all members in the same group has the same measure of centrality.

Related works:

Lin (2005, 2006): Eliminate αr by (In −Wn) and estimation by IV method.

Calvo-Armengol, Patacchini and Zenou (2006) — social network in education;

Bramoulle, Djebbari and Fortin (2006) — identification.

This study:

• allow ‘correlated effects’;

• more on identification issues;

• more efficient estimation methods;

• MC study on sensitivity of estimated effects by omission of a certain effects;

• empirical applications with AddHealth data.

3.5) Estimation

27

Elimination of group fixed effects:

1) Quasi-difference

RrYr = λ0RrWrYr + RrZrβ0 + (1 − ρ0)lmrαr0 + εr,

where Zr = (Xr, WrXr) and Rr = Imr− ρ0Mr .

2) Eliminate group fixed effect

• deviation from group mean

• and eliminate linear dependence on disturbances

R∗rY

∗r = λ0R

∗rW

∗r Y ∗

r + R∗rZ

∗r β0 + ε∗r ,

where

i) Jr = Imr− 1

mrlmr

l′mr= FrF

′r – an eigenvalue-eigenvector decomposition;

[Fr ,1√mr

lmr] an orthonormal matrix,

ii) R∗r = F ′

rRrFr; W ∗r = F ′

rWrFr

iii) Y ∗r = F ′

rYr; Z∗r = F ′

rZr;

iv) using Wr = Wr(FrF′r + 1

mrlmr

l′mr), Wrlmr

= lmr, and F ′

rlmr= 0, one has

F ′rWr = F ′

rWrFrF′r.

Similarly for Mr ;

v) ε∗r has zero mean and the variance matrix σ20Imr−1.

—- spatial AR model.

Estimation: quasi-ML

ln Ln(θ) = − (n − R)2

ln(2πσ2) +R∑

r=1

ln|Sr(λ)|1 − λ

+R∑

r=1

ln|Rr(ρ)|1 − ρ

− 12σ2

R∑r=1

[Rr(ρ)(Sr(λ)Yr − Zrβ)]′Jr [Rr(ρ)(Sr(λ)Yr − Zrβ)].

Concentrated log likelihood function

28

lnLn(λ, ρ) = − (n − R)2

(ln(2π) + 1) − (n − R)2

ln σ2n(λ, ρ)

+ ln |Sn(λ)|+ ln |Rn(ρ)| −R ln[(1− λ)(1− ρ)].

Another (feasible) estimation method: 2SLS or G2SLS

apply IVs

F ′rRrYr = λ0F

′rRrWrYr + F ′

rRrZrβ0 + ε∗r. (∗∗)

(stacked up all the groups together – all sample observations)

i) Estimate F ′rYr = λ0F

′rWrYr +F ′

rZrβ0+F ′rur by 2sls with some IV matrix Q1r to get an initial

(consistent) estimate of λ and β;

ii) use the estimated residuals F ′rur(= u∗

r) to estimate ρ; via (Kelejian-Prucha 1999, MOM)

(Imr−1 − ρ0M∗r )u∗

r = ε∗r ;

iii) estimate Rr;

iv) then feasible 2SLS for (**).

• The best IV matrix is F ′rRr[Wr(Imr

− λ0Wr)−1Zrβ0, Zr].

• Under normal disturbances, the (transformed) ML is asymptotically more efficient.

29

Lecture 1: Introduction to Spatial Autoregressive (SAR) Models 1....

Documents

Transcript of Lecture 1: Introduction to Spatial Autoregressive (SAR) Models 1....