Lecture 1: Introduction to Spatial Autoregressive (SAR) Models 1....

of 29 /29
Lecture 1: Introduction to Spatial Autoregressive (SAR) Models 1. Some SAR models: Some specified models for cross sectional data which may capture possible spatial interac- tions across spatial units. The following models are generalizations of autoregressive processes and autoregression models in time series. 1.1) (First order) spatial autoregressive (SAR) process: y i = λw i,n Y n + i , i =1, ··· , n, where Y n =(y 1 , ··· ,y n ) is the column vector of dependent variables, w i,n is a n-dimensional row vector of constants, and i ’s are i.i.d.(02 ). In the vector/matrix form, Y n = λW n Y n + E n . The term W n Y n has been termed ‘spatial lag’. This is supposed to be an equilibrium model. Under the assumption that S n (λ)= I n λW n is nonsingular, one has Y n = S 1 n (λ)E n . This process model may not be not useful alone in empirical econometrics but it is used to model possible spatial correlations in disturbances of a regression equation. The regression model with SAR disturbance U n is specified as Y n = X n β + U n , U n = ρW n U n + E n , where E n has zero mean and variance σ 2 I n . In this regression model, the disturbances i ’s in E n are correlated across units according to a SAR process. The variance matrix of U n is σ 2 0 S 1 n (ρ 0 )S 1 n (ρ 0 ). As the diagonal elements of S 1 n (ρ 0 )S 1 n (ρ 0 ) may not be zero, there are correlations of disturbances u i ’s across units. 1.2) Mixed regressive, spatial autoregressive model (MRSAR): This model generalizes the SAR process by incorporating exogenous variables x i in the SAR process. It has also simply been called the spatial autoregressive model. In vector/matrix form, Y n = λW n Y n + X n β + E n , 1

Embed Size (px)

Transcript of Lecture 1: Introduction to Spatial Autoregressive (SAR) Models 1....

Lecture 1: Introduction to Spatial Autoregressive (SAR) Models

1. Some SAR models:

Some specified models for cross sectional data which may capture possible spatial interac-

tions across spatial units.

The following models are generalizations of autoregressive processes and autoregression models

in time series.

1.1) (First order) spatial autoregressive (SAR) process:

yi = wi,nYn + i, i = 1, , n,

where Yn = (y1, , yn) is the column vector of dependent variables, wi,n is a n-dimensional rowvector of constants, and is are i.i.d.(0, 2). In the vector/matrix form,

Yn = WnYn + En.

The term WnYn has been termed spatial lag. This is supposed to be an equilibrium model. Under

the assumption that Sn() = In Wn is nonsingular, one has

Yn = S1n ()En.

This process model may not be not useful alone in empirical econometrics but it is used to

model possible spatial correlations in disturbances of a regression equation. The regression model

with SAR disturbance Un is specified as

Yn = Xn + Un, Un = WnUn + En,

where En has zero mean and variance 2In. In this regression model, the disturbances isin En are correlated across units according to a SAR process. The variance matrix of Un is20S

1n (0)S

1n (0). As the diagonal elements of S

1n (0)S

1n (0) may not be zero, there are

correlations of disturbances uis across units.

1.2) Mixed regressive, spatial autoregressive model (MRSAR):

This model generalizes the SAR process by incorporating exogenous variables xi in the SAR

process. It has also simply been called the spatial autoregressive model. In vector/matrix form,

Yn = WnYn + Xn + En,


where En is (0, 2In). This model has the feature of a simultaneous equation model and its reducedform is

Yn = S1n ()Xn + S1n ()En.

1.3) Some Intuitions on Spatial Weights Matrix Wn

The matrix Wn is called the spatial weights matrix. The value wn,ij of the jth element of wi,n

represents the link (or distance) between the neighbor j to the spatial unit i. Usually, the diagonal

of Wn is specified to be zero, i.e., wn,ii = 0 for all i, because wi,nYn represents the effect of other

spatial units on the spatial unit i.

In some empirical applications, it is a common practice to have Wn having a zero diagonal and

being row-normalized such that the summation of elements in each row of Wn is a unity. In some

applications, the ith row wi,n of Wn may be constructed as wi,n = (di1, di2, . . . , din)/n

j=1 dij ,

where dij 0, represents a function of the spatial distance of the ith and jth units, for example,the inverse of the distance of the i and jth units, in some (characteristic) space. The weighting

operation may be interpreted as an average of neighboring values. However, in some cases, one may

argue that row-normalization for weights matrix may not be meaningful (e.g., Bell and Bochstael

(2000) RESTAT for real estate problems with microlevel data).

When neighbors are defined as adjacent ones for each unit, the correlation is local in the sense

that correlations across units will be stronger for neighbors but will be weak for units far away.

This becomes clear with an expansion of S1n (0). Suppose that Wn 1 for some matrix norm , then

S1n () = In +


iW in,

(see, e.g., Horn and Johnson 1985). As i=m iW in |Wn|i S1n () , in particular, if Wnis row normalized, i=m iW in |i=m ||i = ||m1|| , will become small when m becomeslarger. The Un can be represented as

Un = En + WnEn + 2W 2nEn + ,

where Wn may represent the influence of neighbors on each unit, 2W 2n represents the second

layer neighborhood influence, etc.


In the social science (or social interactions) literature, WnS1n () is a vector of measures of

centrality, which summaries the position of each spatial unit (in a network).

In conventional spatial weights matrices, neighboring units are defined by only a few adjacent

ones. However, there are cases where neighbors may consist of many units. An example is a

social interactions model, where neighbors refer to individuals in a same group. The latter may

be regarded as models with large group interactions (a kind of models with social interactions).

For models with a large number of interactions for each unit, the spatial weights matrix Wn will

associate with the sample size. In a model, suppose there are R groups and there are m individuals

in each group. The sample size will be n = mR. In a model without special network (e.g., frindship)

information, one may assume that each individual in a group is given equal weight. In that case,

Wn = IR Bm where Bm = (lmlm Im)/(m 1), is the Kronecker product, and lm is them-dimensional vector of ones. Both m and R can be large in a sample. In which case, asymptotic

analysis for large sample will be relevant with both R and m go to infinity. Of course, in the

general case, the number of members in each district may be large but have different sizes. This

model has many interesting applications in the study of social interactions issues.

1.4) Other generalizations of those models can have high order spatial lags and/or SAR


A more rich SAR model may combine the MRSAR equation with SAR distrubances

Yn = WnYn + Xn + Un, Un = MnUn + En, (1.5)

where Wn and Mn are spatial weights matrices, which may or may not be identical.

Further extension of a SAR model may allow high-order spatial lags as in

Yn =p


jWjnYn + Xn + En,

where Wjns are p distinct spatial weights matrices.

2. Some matrix algebra

2.1) Vector and Matrix Norms

Definition. Let V be a vector space. A function : V R is a vector norm if for anyx, y V ,


(1) nonnegative, x 0,(1a) positive, x = 0 if and only if x = 0,(2) homogeneous, cx = |c| x for any scalar c,(3) triangle inequality, x + y x + y .


a) The Euclidean norm (or l2 norm) is

x 2= (|x1|2 + + |xn|2)1/2.

b) The sum norm (or l1 norm) is

x 1= |x1| + + |xn|.

c) The max norm (or l norm) is

x = max{|x1|, , |xn|}.

Definition. A function of a square matrix is a matrix norm if, for any square matrices Aand B, it satisfies the following axioms:

(1) A 0,(1a) A = 0 if and only if A = 0,(2) cA = |c| A for any scalar c,(3) A + B A + B ,(4) submultiplicative, AB A B .

Associated with each vector norm is a natural matrix norm induced by the vector norm.

Let be a vector norm. Define A = maxx=1 Ax = maxx Axx . This function for A is a matrix norm and it has the properties that Ax A x and I = 1.


a) The maximum column sum matrix norm 1 is

A 1= max1jn


|aij |,


which is a matrix norm induced by the l1 vector norm.

b) The maximum row sum matrix norm is

A = max1in


|aij |,

which is a matrix norm induced by the l vector norm.

One important application of matrix norms is in giving bounds for the eigenvalues of a matrix.

Definition: The spectral radius (A) of a matrix A is

(A) = max{|| : is an eigenvalue of A}.

Let be an eigenvalue of A and x be the corresponding eigenvector, i.e., Ax = x. Then,

|| x = x = Ax A x ,

and, therefore, || A . That is, (A) A .Another useful application of the matrix norm is that, if is a matrix norm, and if A < 1,

then I A is invertible and (I A)1 = j=0 Aj .When all of the elements of Wn are non-negative and its rows are normalized to one, Wn =

1. For this case, when || < 1,

(In Wn)1 =


(Wn)j = In + Wn + 2W 2n + .

2.2) Useful (important) regularity conditions on Wn and associated matrix:

Definition: Let {An} be a sequence of square matrices, where An is of dimension n.(i) {An} is said to be uniformly bounded in row sums if { An 1} is a bounded sequence;(ii) {An} is said to be uniformly bounded in column sums if { An } is a bounded sequence.

A useful property is that if {An} and {Bn} are uniformly bounded in row sums (columnsums), then {AnBn} is also uniformly bounded in row sums (column sums). This follows fromthe submultiplicative property of matrix norms. For example, AnBn An Bn c2

when An c and Bn c.


An Important Assumption: The spatial weights matrices {Wn} and {S1n } are uniformlybounded in both row and column sums.

Equivalently, this assumes that { Wn 1} and { Wn } are bounded sequences. Similarly,the maximum row and column sum norms of S1n are bounded.

Rule out unit root process:

The assumption that S1n are uniformly bounded in both row and column sums rules out the

unit root case (in time series). Consider the unit root process that yt = yt1 + t, t = 2, , n andy1 = 1, i.e.,




0 0 0 01 0 0 00 1 0 0...


...0 0 1 0







It implies that

yt =t


j .

That is, S1n is a lower diagonal matrix with all nonzero elements being unity. The sum of its

first column is n, and the sum of its last row is also n. For the unit root process, S1n are neither

bounded in row sums nor column sums.

3. Estimation Methods

Estimation methods, which have been considered in the existing literature, are mainly the ML

(or QML) method, the 2SLS (or IV) method, and the generalized method of moments (GMM).

The QML method has usually good finite sample properties relative to the other methods for the

estimation of SAR models with the first order of spatial lag. However, the ML method is not

computationally attractive for models with more than one single spatial lag. For the higher spatial

lags model, the IV and GMM methods are feasible. With properly designed moment equations, the

best GMM estimator exists and can be asymptotically efficient as the ML estimate under normal

disturbances (Lee 2007).

The most popular and traditional estimation method is the ML under the assumption that

En is N (0, 2In). Other estimation methods are the method of moments (MOM), GMM and the2SLS. The latter is applicable only to the MRSAR model.


3.1) ML for the SAR process,

ln Ln(, 2) = n2 ln(2)n

2ln2 + ln |Sn()| 122Y



where Sn() = In Wn.For the regression model with SAR disturbances, the log likelihood function is

lnLn(, 2) = n2 ln(2)n

2ln 2 + ln |Sn()| 122 (Yn Xn)

Sn()Sn()(Yn Xn).

The log likelihood function for the MRSAR model is

ln Ln(, , 2) = n2 ln(2)n

2ln 2 + ln |Sn()| 122 (YnSn() Xn)

(Sn()Yn Xn).

The likelihood function involves the computation of the determinant of Sn(), which is a

function of the unknown parameter , and may have a large dimension n.

A computationally tractable method is due to Ord 1975, where Wn is a row-normalized weights

matrix with Wn = DnW n , where Wn is a symmetric matrix and Dn = Diag{

nj=1 w

n,ij}1. Note

that, in this case, Wn is not a symmetric matrix. However, the eigenvalues of Wn are still all real.

This is so, because

|Wn In| = |DnW n In| = |DnW nD1/2n D1/2n D1/2n D1/2n |

= |D 12n | |D12nW


12n In| |D


n | = |D1/2n W nD1/2n In|,

the eigenvalues of Wn are the same as those of D1/2n W nD

1/2n , which is a symmetric matrix. As

the eigenvalues of a symmetric matrix are real, the eigenvalues of Wn are real. Let is be the

eigenvalues of Wn, which are the same as those of D1/2n W nD

1/2n . Let be the orthogonal matrix

such that D1/2n W nD1/2n = Diag{i}. The above relations show also that

|In Wn| = |In DnW n | = |In D1/2n W nD1/2n | = |In Diag{i}|

= |In Diag{i}| =n


(1 i).

Thus, |In Wn| can be easily updated during iterations within a maximization subroutine, asis need be computed only once.

When Wn is a spare matrix, specific numerical methods for handling spare matrices are useful.


Another computational tractable method is the characteristic polynomial. The determinant

|Wn In| is a polynomial in and is called the characteristic polynomial of Wn. (note: the zerosof the characteristic polynomial are the eigenvalues of Wn). Thus,

|In Wn| = ann + + a1 + a0,

the constants as depend only on Wn. So the as can be computed once during the maximization


3.2) 2SLS Estimation for the MRSAR model

For the MRSAR Yn = WnYn + Xn + En, the spatial lag WnYn can be correlated with thedisturbance vector En. So, in general, OLS may not be a consistent estimator. However, there is aclass of spatial Wn (with large group interaction) that the OLS estimator can be consistent (Lee


To avoid the bias due to the correlation of WnYn with En, Kelejian and Prucha (1998) hassuggested the use of instrumental variables (IVs). Let Qn be a matrix of instrumental variables.

Denote Zn = (WnYn, Xn) and = (, ). The MRSAR equation can be rewritten as Yn =

Zn + En. The 2SLS estimator of with Qn is

2sl,n = [ZnQn(QnQn)




It can be shown that the asymptotic distribution of 2sl,n follows from

n(2sl,n 0) d N (0, 20 lim

n{ 1n

(GnXn0, Xn)Qn(QnQn)1Qn(GnXn0, Xn)}1),

where Gn = WnS1n , under the assumption that the limiting matrix1n(GnXn0, Xn) has the full

column rank (k + 1) where k is the dimension of or, equivalently, the number of columns of

Xn. In practice, Kelejian and Prucha (1998) suggest the use of linearly independent variables in

(Xn, WnXn) for the construction of Qn.

By the Schwartz inequality, the optimum IV matrix is (GnXn0, Xn).

This 2SLS method can not be used for the estimation of the (pure) SAR process. The SAR

process is a special case of the MRSAR model with 0 = 0. However, when 0 = 0, GnXn0 = 0


and, hence, (GnXn0, Xn) = (0, Xn) would have rank k but not full rank (k+1). Intuitively, when

0 = 0, Xn does not appear in the model and there is no other IV available.

3.3) Method of Moments (MOM) for SAR process:

Kelejian and Prucha (1999) has suggested a MOM estimation:



The moment equations are based on three moments:

E(E nEn) = n2, E(E nW nWnEn) = 2tr(W nWn), and E(E nWnEn) = 0.

These correspond to

gn() = (Y nSn()Sn()Yn n2, Y nSn()W nWnSn()Yn 2tr(W nWn), Y nSn()WnSn()Yn)

for the SAR process. For the regression model with SAR disturbances, Yn shall be replaced by

least squares residuals.

Remark: In Kelejian and Prucha (1999), they have shown consistency of the MOM estimator

only but not asymptotic normality of the estimator.

3.4) Moments for GMM estimation

For the MRSAR model, in addition to the moments based on the IV matrix Qn, other moment

equations can also be constructed for estimation.

Let Qn be an nkx IV matrix constructed as functions of Xn and Wn. Let n() = Sn()YnXn for any possible value . The kx moment functions correspond to the orthogonality conditions

are Qnn().

Now consider a finite number, say m, of n n constant matrices P1n, , Pmn of which eachhas a zero diagonal. The moment functions (Pjnn())n() can be used in addition to Qnn().

These moment functions form a vector

gn() = (P1nn(), , Pmnn(), Qn)n() =




in the estimation in the GMM framework.

Proposition. For any constant n n matrix Pn with tr(Pn) = 0, Pnn is uncorrelated withn, i.e., E((Pnn)n) = 0.

Proof: E((Pnn)n) = E(nPnn) = E(

nPnn) =

20tr(Pn) = 0.

This shows, in particular, that E(gn(0)) = 0. Thus, gn() are valid moment equations for



As WnYn = GnXn0 + Gnn, where Gn = WnS1n and Sn = Sn(0), and Gnn is correlated

with the disturbance n in the model,

Yn = WnYn + Xn + n,

any Pjnn, which is uncorrelated with n, can be used as IV for WnYn as long as Pjnn and Gnn

are correlated.

4. Basic Asymptotic Foundation

Some laws of large numbers and central limit theorem

Lemma. Suppose that {An} are uniformly bounded in both row and column sums. Then,E(nAnn) = O(n) and var(

nAnn) = O(n). Furthermore,

nAnn = OP (n) and,



1nE(nAnn) = oP (1).

Lemma. Suppose that An is a n n matrix with its column sums being uniformly bounded,elements of the n k matrix Cn are uniformly bounded, and n1, , nn are i.i.d. with zero mean,finite variance 2 and E||3 < . Then, 1

nC nAnn = OP (1) and

1nC nAnn = op(1). Further-

more, if the limit of 1nCnAnA

nCn exists and is positive definite, then


C nAnnD N (0, 2 lim


C nAnAnCn).

This lemma can be shown with the Liapounov central limit theorem.

Theorem: (Liapounovs theorem) Let {Xn} be a sequence of independent random variables. LetE(Xn) = n, E(Xnn)2 = 2n = 0. Denote Cn = (


2i )

1/2. If 1C2+n

nk=1 E|Xkk|2+ 0

for a positive > 0, thenn


d N (0, 1).


Lemma. Suppose that An is a constant n n matrix uniformly bounded in both row andcolumn sums. Let cn be a column vector of constants. If 1nc

ncn = o(1), then

1ncnAnn = oP (1).

On the other hand, if 1ncncn = O(1), then

1ncnAnn = OP (1).

Proof: The results follow from Chebyshevs inequality by investigating var(

1ncnAnn) =


ncn. Let n be the diagonal matrix of eigenvalues of AnA

n and n be the orthonormal

matrix of eigenvectors. As eigenvalues in absolute values are bounded by any norm of the matrix,

eigenvalues in n in absolute value are uniformly bounded because An (and An 1) areuniformly bounded. Hence, 1


ncn 1ncnnncn |n,max| = 1ncncn|n,max| = o(1), where

n,max is the eigenvalue of AnAn with the largest absolute value.

When 1ncncn = O(1),


ncn 1ncncn|n,max| = O(1). In this case, var(

1ncnAnn) =

2 1ncnAnA

ncn = O(1). Therefore,


nAnn = OP (1). Q.E.D.

A Martingale Central Limit Theorem:

Definition: Let {zt} be a sequence of random scalars, and let {Jt} be a sequence of -fieldssuch that Jt1 Jt for all t. If zt is measurable with respect to Jt, then {zt,Jt} is called anadapted stochastic sequence.

The -field Jt can be thought of as the -algebra generated by the entire current and pasthistory of zt and some other random variables.

Definition: Let {yt,Jt} be an adapted stochastic sequence. Then {yt,Jt} is a martingaledifference sequence if and only if E(yt|Jt1) = 0, for all t 2.

Theorem (A Martingale Central Limit Theorem):

Let {T t} be a martingale-difference array. If as T ,(i)T

t=1 vT tp 2, where 2 > 0 and vT t = E(2T t|JT,t1) denotes the conditional variances,

(ii) for any > 0,T

t=1 E(2T tI(|T t| > )

JT,t1) converges in probability to zero (a Lindebergcondition),

then T1 + + T T D N (0, 2).

A CLT for quadratic-linear function of n:

Suppose that {An} is a sequence of symmetric n n matrices with row and column sumsuniformly bounded and bn is a n-dimensional column vector satisfying supn


ni=1 |bni|2+ <


for some > 0. The n1, , nn are i.i.d. random variables with zero mean and finite variance2, and its moment E(||4+) for some > 0 exists. Let 2Qn be the variance of Qn where Qn =nAnn + b

nn 2tr(An). Assume that the variance 2Qn is bounded away from zero at the rate

n. Then, QnQnD N (0, 1).

This theorem has been proved by Kelejian and Prucha (2001) with a martingale central limit


5. The Cliff and Ord (1973) Test for Spatial Correlation

Consider the model

Y = X + U, U = WU + ,

where Y is an n-dimensional column vector of the dependent variable, X is an n k matrix ofregressors, U is an n 1 vector of disturbances, W is a n n spatial weights matrix with a zerodiagonal, and is N (0, 2In). One may want to test H0 : = 0, i.e., the test of spatial correlation

in U .

The log likelihood of this model is

ln L() = n2

ln(22) + ln |I W | 122

(y X)(I W )(I W )(y X).

The first-order conditions are

ln L


X (I W )(I W )(y X),


= n22


24(y X)(I W )(I W )(y X),


= tr(Wn(I W )1) + 122 (y X)[(I W )W + W (I W )](y X),

where ln |IW |

= tr(W (I W )1) by using the matrix differentiation formulae dd

ln |A| =tr(A1 dA

d) (see, e.g., Amemiya (1985) Advanced Econometrics, p461).

The second order derivatives with respect to have

2 ln L

= 12

X [W (I W ) + (I W )W ](y X),

2 ln L2

= 124

(y X)[W (I W ) + (I W )W ](y X),


and2 lnL2

= tr([W (I W )1]2) 12

(y X)W W (y X),

where (IW )1

= (IW )1W (IW )1, by using the matrix differentiation formulae d

dA1 =

A1 dAd A1. We see that EH0(2 ln L ) = 0, EH0(

2 ln L ) = 0 because E(

W) = 2tr(W ) = 0 as

W has a zero diagonal. Furthermore,

EH0(2 lnL2

) = tr(W 2 + W W ).

Because of the block diagonal structure of the information matrix EH0(2 ln L ), an efficient score

test statistic for testing H0 can have a simple expression. The efficient score test is also known as

a Lagrange multiplier (LM) test.

The efficient score test (Rao) statistic in its general form can be

ln L()

[E(2 ln L()

)]1 ln L()

where is the constrained MLE of under H0.

Let and 2 be the MLE under H0 : = 0, i.e., is the usual OLS estimate and 2 = 1nee,

where e = y X the least squares residual, is the average sum of squared residuals. Then = (, 2, 0) and

ln L()

= tr(W ) + 122

(y X)(W + W )(y X) = 122

(y X)(W + W )(y X),

because tr(W ) = 0. Hence, the LM test statistic is

ne(W + W )e/(2ee)/(tr(W 2 + W W ))12 ,

which shall be asymptotically normal N (0, 1).

The test statistic given by Moran (1950) and Cliff and Ord (1973) is

I = eWe/ee,

which is the the LM statistic with an unscaled denominator. The LM interpretation of the Cliff

and Ord (1973) test of spatial correlation is due to Burridge (1980), J.R.Statist.Soc.B, vol 42,



Burridge (1980) has also pointed out the LM test for a moving average alternative, i.e., U =

+ W, is exactly the same as that given above.

Note that for the regression model with SAR disturbances, y = X + u with u = Wu +

, 1nuWu and 1

nuWu, where u is the least squares residual vector, can be asymptotically

equivalent(under both H0 and H1, i.e., 1n uWu = 1

nuWu + op(1). This can be shown as

follows. As u = [I X(X X)1X ]u,uWu = u[I X(X X)1X ]W [I X(X X)1X ]u

= uWu u(W + W )X(X X)1X u + uX(X X)1X WX(X X)1X u.Because u = (I W )1, under the assumptions that elements of X are uniformly bounded and(I W )1 are uniformly bounded in both row and column sums, 1

nX u = 1

nX (I W )1 =

Op(1) and 1nXWu = 1

nX W (IW )1 = Op(1) by the CLT for the linear form. These imply,

in particular, 1nX u = op(1) and 1nX

Wu = op(1). Hence,




n u


(X X



X (W + W )u +uXn

(X X



(X X


X un


uWu + oP (1).

Consequently, the Cliff-Ord test using the least squares residual is asymptotically 2(1) under H0.

6. More on GMM and 2SLS Estimation of MRSAR Models

6.1) IVs and Quadratic Moments

The MRSAR model is

Yn = WnYn + Xn + n,

where Xn is a n k dimensional matrix of nonstochastic exogenous variables, Wn is a spatialweights matrix of known constants with a zero diagonal.

We assume that

Assumption 1. The nis are i.i.d. with zero mean, variance 2 and that a moment of order

higher than the fourth exists.

Assumption 2. The elements of Xn are uniformly bounded constants, Xn has the full rank

k, and limn 1nXnXn exists and is nonsingular.


Assumption 3. The spatial weights matrices {Wn} and {(In Wn)1} at = 0 areuniformly bounded in both row and column sums in absolute value.

Let Qn be an n kx matrix of IVs constructed as functions of Xn and Wn in a 2SLSapproach. Denote n() = (In Wn)Yn Xn for any possible value . Let P1n = {P :P is n n matrix , tr(P ) = 0} be the class of constant n n matrices which have a zero trace. Asubclass P2n of P1n is P2n = {P : P P1n, diag(P ) = 0}.

Assumption 4. The matrices Pjns from P1n are uniformly bounded in both row and columnsums in absolute value, and elements of Qn are uniformly bounded.

With the selected matrices Pjns and IV matrices Qn, the set of moment functions forms a vector

gn() = (P1nn(), , Pmnn(), Qn)n() = (n()P1nn(), , n()Pmnn(), n()Qn)

for the GMM estimation.

At 0, gn(0) = (nP1nn, , nPmnn, nQn). It follows that E(gn(0)) = 0 becauseE(Qnn) = Q

nE(n) = 0 and E(

nPjnn) =

20tr(Pjn) = 0 for j = 1, , m. The intuition is

as follows:

(1) The IV variables in Qn, can be used as IV variables for WnYn and Xn.

(2) The Pjnn is uncorrelated with n. As WnYn = Gn(Xn0 + n), Pjnn can be used as an IV

for WnYn as long as Pjnn and Gnn are correlated, where Gn = WnS1n .

6.2) Identification

In the GMM framework, the identification condition requires the unique solution of the limiting

equations, limn 1nE(gn()) = 0 at 0 (Hansen 1982). The moment equations corresponding to

Qn are



Qndn() = limn


(QnXn, QnGnXn0)((0 ), 0 ) = 0.

They will have a unique solution at 0 if (QnXn, QnGnXn0) has a full column rank, i.e., rank

(k + 1), for large enough n.

This sufficient rank condition implies the necessary rank condition that (Xn, GnXn0) has a

full column rank (k +1) and that Qn has a rank at least (k +1), for large enough n. The sufficient

rank condition requires Qn to be correlated with WnYn in the limit as n goes to infinity. This is so


because E(QnWnYn) = QnGnXn0. Under the sufficient rank condition, 0 can thus be identified

via limn 1nQndn() = 0.

The necessary full rank condition of (Xn, GnXn0) for large n is possible only if the set

consisting of GnXn0 and Xn is not asymptotically linearly dependent. This rank condition would

not hold, in particular, if 0 were zero. There are other cases when this dependence can occur (see,

e.g., Kelejian and Prucha 1998).

As Xn has rank k but (Xn, GnXn0) does not have a full rank (k + 1), the corresponding

moment equations Qndn() = 0 will have many solutions but the solutions are all described by the

relation that = 0 + (0 )c as long as QnXn has a full rank k. Under this scenario, 0 canbe identified only if 0 is identifiable. The identification of 0 will rely on the remaining quadratic

moment equations. In this case,

Yn = 0(GnXn0) + Xn0 + S1n n = Xn(0 + 0c) + un,

where un = S1n n. The relationship un = 0Wnun + n is a SAR process. The identification of

0 thus comes from the SAR process un.

The following assumption summarizes some sufficient conditions for the identification of 0

from the moment equations limn 1nE(gn()) = 0.

Assumption 5. Either (i) limn 1nQn(GnXn0, Xn) has the full rank (k + 1), or (ii)

limn 1nQnXn has the full rank k, limn

1ntr(P sjnGn) = 0 for some j,

and limn 1n [tr(Ps1nGn), , tr(P smnGn)] is linearly independent of



[tr(GnP1nGn), , tr(GnPmnGn)].

5.3) Optimum GMM

The variance matrix of these moments functions involves variances and covariances of linear

and quadratic forms of n. For any square n n matrix A, let vecD(A) = (a11, , ann) denotethe column vector formed with the diagonal elements of A. Then,

E(Qnn nPnn) = Qnn



pn,ijE(nninj) = QnvecD(Pn)3



E(nPjnn nPlnn) = (4 340)vecD(Pjn)vecD(Pln) + 40tr(PjnP sln).

It follows that var(gn(0)) = n where

n =(

(4 340)nmnm 3nmQn3Q

nnm 0

)+ Vn,

with nm = [vecD(P1n), , vecD(Pmn)] and

Vn = 40

tr(P1nP s1n) tr(P1nP smn) 0...


tr(PmnP s1n) tr(PmnP smn) 00 0 1


= 40

(mn 0

0 120



where mn = [vec(P 1n), , vec(P mn)][vec(P s1n), , vec(P smn)], by tr(AB) = vec(A)vec(B) forany conformable matrices A and B.

When n is normally distributed, n is simplified to Vn because 3 = 0 and 4 = 340 . If

Pjns are from P2n, n = Vn also because nm = 0. In general, n is nonsingular if and only ifboth (vec(P1n), , vec(Pmn)) and Qn have full column ranks. This is so, because n would besingular if and only if the moments in gn(0) are functionally dependent, equivalently, if and only


j=1 ajPjn = 0 and Qnb = 0 for some constant vector (a1, , am, b) = 0.

Let an converge to a constant full rank matrix a0.

Proposition 1. Under Assumptions 1-5, suppose that Pjn for j = 1, , m, are from P1nand Qn is a n kx IV matrix so that a0 limn 1nE(gn()) = 0 has a unique root at 0 in .Then, the GMME n derived from min gn()anangn() is a consistent estimator of 0, and

n(n 0) D N (0, ), where

= limn















Dn =(

20tr(Ps1nGn) 20tr(P smnGn) (GnXn0)Qn

0 0 X nQn)


under the assumption that limn 1nanDn exists and has the full rank (k + 1).

Proposition 2. Under Assumptions 1-6, suppose that ( nn )1 (nn )1 = oP (1), then the

feasible OGMME o,n derived from min gn()1n gn() based on gn() with Pjns from P1n has


the asymptotic distribution

n(o,n 0) D N (0, ( lim


Dn1n Dn)



gn(o,n)1n gn(o,n)

D 2((m + kx) (k + 1)).

5.4) Efficiency and the BGMME

The optimal GMME o,n can be compared with the 2SLSE. With Qn as the IV matrix, the

2SLSE of 0 is

2sl,n = [ZnQn(QnQn)



1QnYn, (4.1)

where Zn = (WnYn, Xn). The asymptotic distribution of 2sl,n is

n(2sl,n 0) D

N (0, 20 limn

{ 1n

(GnXn0, Xn)Qn(QnQn)1Qn(GnXn0, Xn)}1),


under the assumptions that limn 1nQnQn is nonsingular and limn

1nQn(GnXn0, Xn) has

the full column rank (k +1) (Kelejian and Prucha 1998). Because the 2SLSE can be a special case

of the GMM estimation, by Proposition 2, o,n shall be efficient relative to 2sl,n. The best Qn is

Qn = (GnXn0, Xn) by the Schwartz inequality.

The remaining issue is on the best selection of Pjns. When the disturbance n is normally

distributed or Pjns are from P2n, with Qn for Qn,

Dn1n Dn =


mn 0

0 0



(GnXn0, Xn)(GnXn0, Xn),

where Cmn = [tr(P s1nGn), , tr(P smnGn)]. Note that, because tr(PjnP sln) = 12tr(P sjnP sln), mncan be rewritten as

mn =12

tr(P s1nPs1n) tr(P s1nP smn)


tr(P smnPs1n) tr(P smnP smn)

= 12 [vec(P s1n) vec(P smn)][vec(P s1n) vec(P smn)].

(i) When Pjns are from P2n,

tr(P sjnGn) =12tr(P sjn[Gn Diag(Gn)]s



([Gn Diag(Gn)]s

)vec(P sjn)


for j = 1, , m, in Cmn, where Diag(A) denotes the diagonal matrix formed by the diagonalelements of a square matrix A. Therefore, the generalized Schwartz inequality implies that



([Gn Diag(Gn)]s

) vec

([Gn Diag(Gn)]s

)= tr

([Gn Diag(Gn)]sGn


Thus, in the subclass P2n, [Gn Diag(Gn)] and together with [GnXn0, Xn] provide the set ofbest IV functions.

(ii) For the case where n is N (0, 20In), because, for any Pjn P1n,

tr(P sjnGn) =12vec

([Gn tr(Gn)


)vec(P sjn),

for j = 1, , m, the generalized Schwartz inequality implies that

Cmn1mnCmn tr

([Gn tr(Gn)



Hence, in the broader class P1n, [Gn tr(Gn)n In] and [GnXn0, Xn] provide the best set of IVfunctions.

Proposition 3. Under Assumptions 1-3, suppose that n is a

n-consistent estimate of 0,

n is a consistent estimate of 0, and 2n is a consistent estimate of 20.

Within the class of GMMEs derived with P2n, the BGMME 2b,n has the limiting distributionthat

n(2b,n 0) D N (0, 12b ) where

2b = limn


(tr[(Gn Diag(Gn))sGn] + 120 (GnXn0)

(GnXn0) 120(GnXn0)Xn


X n(GnXn0)1

20X nXn


which is assumed to exist.

In the event that n N (0, 20In), within the broader class of GMMEs derived with P1n, theBGMME 1b,n has the limiting distribution that

n(1b,n 0) D N (0, 11b ) where

1b = limn


(tr[(Gn tr(Gn)n In)sGn] + 120 (GnXn0)

(GnXn0) 120(GnXn0)Xn


X n(GnXn0)1

20X nXn


which is assumed to exist.

5.5) Links between BGMME and MLE


When n is N (0, 20In), the model can be estimated by the ML method. The log likelihood

function of the MRSAR model via its reduced form equation is

lnLn = n2 ln(2)n

2ln 2 + ln |(In Wn)|


[Yn (In Wn)1Xn](In W n)(In Wn)[Yn (In Wn)1Xn].(4.6)

The asymptotic variance of the MLE (ml,n, 2ml,n) is

AsyVar(ml,n, 2ml,n)


tr(G2n) + tr(GnGn) +


(GnXn0)(GnXn0) 120(X nGnXn0)



X nGnXn01

20X nXn 0


0 n240


(see, e.g., Anselin and Bera (1998), p.256). From the inverse of a partitioned matrix, the asymptotic

variance of the MLE ml,n is

AsyV ar(ml,n)


(tr(G2n) + tr(G

nGn) +


(GnXn0)(GnXn0) 2n tr2(Gn) 120 (XnGnXn0)


20X nGnXn0


X nXn


As tr(G2n) + tr(GnGn) 2n tr2(Gn) = tr((Gn tr(Gn)n In)sGn), the GMME 1b,n has the same

limiting distribution as the MLE of 0 from Proposition 3.

There is an intuition on the best GMM approach compared with the maximum likelihood one.

The derivatives of the log likelihood in (4.6) are

ln Ln


X nn(),

ln Ln2

= n22



and lnLn

= tr(Wn(In Wn)1) + 1

2[Wn(In Wn)1Xn]n()


n()[Wn(In Wn)1]n().

The equation ln Ln2

= 0 implies that the MLE is 2n() =1nn()n() for a given value . By

substituting 2n() into the remaining likelihood equations, the MLE ml,n will be characterized by

the moment equations: X nn() = 0, and

[Wn(In Wn)1Xn]n() + n()[Wn(In Wn)1 1

ntr(Wn(In Wn)1)

]n() = 0.


The similarity of the best GMM moments and the above likelihood equations is revealing. The

best GMM approach has the linear and quadratic moments of n() in its formation and uses

consistently estimated matrices in its linear and quadratic form.

5.6) More recent results

1) generalization to the estimation of higher order autoregressive models

2) robust GMM estimation in the presence of unknown heteroskedasticity

3) improve GMM estimators when disturbances are non-normal.


Lecture 2: Econometric Models with Social Interactions

We consider interactions of individuals in a group setting.

2.1) Linear in the Mean (Expectation) Model and the Reflection Problem (Man-


Endogenous effects, wherein the propensity of an individual to behave in some way varies with

the behaviour of the group;

Exogenous (contextual) effects, wherein the propensity of an individual to behave in some way

varies with the exogenous characteristics of the group, and

Correlated effects, wherein individuals in the same group tend to behave similarly because

they have similar individual characteristics or face similar institutional environment.

The Manskis linear model specification:

yri = 0E(yrj) + xri,110 + E(xrj,2)20 + r + ri

where E(yrj) with the coefficient 0 captures the endogenous interactions, E(xrj,2) with 20 cap-

tures the exogenous interactions, and r represents the correlation effect. The E(yr) is assumed

to solve the social equilibrium equation,

E(yr) = 0E(yr) + E(xr,1)10 + E(xr,2)20 + r.

Provided that 0 = 1, this equation has a unique solution, which is

E(yr) = E(xr1)10

1 0 + E(xr,2)20

1 0 +r

1 0 .

The case where xr1 = xr2(= xr), or more general, xr,1 is a subset of xr,2, will have an underiden-

tification problem.

When xr,1 = xr,2 = xr, it follows E(yr) = E(xr)10+20(10) +

r10 and that the reduced form of

this model is

yri = E(xr)[20 +0

1 0 (10 + 20)] + xri10 +r

1 0 + ri.

The 10 and the composite parameter vecotr 20 + 010 (10 + 20) can be identified, but 0 and

20 can not be separately identified. This is so, because E(yr) is linearly dependent on E(xr) and

the constant intercept term r the reflection problem.


From the implied equation, 10 is apparently identifiable. For the identification of 0 and

20, a sufficent identification condition is that some relevant variables are in xri,1, but they are

not in xri,2 (under the random components specification for the overall disturbance). This follows,

because, with the relevant variables in xri,1 but excluded in xri,2, it allows us to identify 0 from

the composite parameter vector. In that case, the other coefficients and 2 are also identifiable.

2.2) A SAR Model with Social Interactions

Instead of rational expectation formulation, an alternative model is to specify direct interac-

tions of individuals in a group setting. This model will be useful in particular for cases where each

group has a small or moderate number of members. Suppose there are mr members in the group

r, for each unit i in the group r,

yri = 0(1

mr 1mr

j=1,j =iyrj) + xri,110 + (

1mr 1

mrj=1,j =i

xrj,2)20 + r + ri,

with i = 1, , mr and r = 1, , R, where yri is the ith individual in the rth group, xri,1 andxri,2 are, respectively, k1 and k2-dimensional row vectors of exogenous variables, and ris are i.i.d.

(0, 20).

It is revealing to decompose this equation into two parts. Because of the group structure, the

model is equivalent to the following two equations:

(1 0)yr = xr110 + xr220 + r + r, r = 1, , R,


(1 +0

mr 1)(yri yr) = (xri,1 xr1)10 1

mr 1(xri,2 xr2)20 + (ri r),

for i = 1, , mr; r = 1, , R, where yr = 1mrmr

i=1 yri, xr1 =1


mri=1 xri,1 and xr2 =


mri=1 xri,2 are means for the r-th group. The first equation may be called a between equation

and that in the second one is a within equation as they have similarity with those of a panel

data regression model (Hsiao 1986). The possible effects due to interactions are revealing in the

reduced-form between and within equations:

yr = xr110

(1 0) + xr220

(1 0) +r

(1 0) +r

(1 0) , r = 1, , R,


and(yri yr) = (xri,1 xr1) (mr 1)10(mr 1 + 0) (xri,2 xr2)

20(mr 1 + 0)

+(mr 1)

(mr 1 + 0)(ri r),

with i = 1, , mr; r = 1, , R.Identification of the within equation

When all groups have the same number of members, i.e., mr is a constant, say m, for all r, the

effect can not be identifiable from the within equation. This is apparent as only the functions(m1)10(m1+0) ,

20(m1+0) , and

(m1)20(m1+0) may ever be identifiable from the within equation.

Estimation of the within equation

Conditional likelihood and CMLE:

ln Lw,n() = c +R


(mr 1) ln(mr 1 + ) (n R)2 ln2




cr()Yr Zrm]Jr[ 1

cr()Yr Zrm],

where cr() = ( mr1mr1+ ), zri = (xri,1, mmr1xri,2), m = 1RR

r=1 mr; m = (1,2m


CMLE of and 2:

n() =(

Ik1 00 mIk2

)( Rr=1


)1 Rr=1




2n() =1

n R{ R



Y rJrYr




Y rJrZr




)1 Rr=1




Concentrated log likelihood at :

lnLc,n() = c1 +R


(mr 1) ln(mr 1 + ) (n R)2 ln 2n().

Instrumental Variables Estimation

yri yr = 0 (yri yr)mr 1 + (xri,1 xr,1)10

(xri,2 xr,2)mr 1 20 + (ri r).


The IV estimator of 0 = (0, 10, 20)


n,IV =




mr 1 , Xr,1,Xr,2

mr 1)Jr( Yr

mr 1 , Xr,1,Xr,2

mr 1)]1




mr 1 , Xr,1,Xr,2

mr 1)JrYr.

2.3) Expectation Model (self-influence excluded)

The social interactions model under consideration is

yri = 0(1

mr 1mr

j=1, =iE(yrj |Jr)) + xri,110

+ (1

mr 1n

j=1, =ixrj,2)20 + r + ri,

where Jr is the information set of the rth group, which includes all observed xs and the groupfixed effect r . This model assumes that each invidual knows all the exogenous characteristics

of all other members in the group but does not know their actual outcomes. Because he does

not know what the outcomes of others will be, he formulates his expectation about their possible

(rational expectation) outcomes.

The group means of the expected outcomes are



E(yri|Jr) = 11 0 (1



xri,110 +1



xri,220 + r),

for r = 1, , R, which are linear functions of group means of exogenous variables. The modelimplies the between equation representation

yr =1

(1 0) (xr,110 + xr,220) + vr + r,

and the within equation representation

yri yr = mr 1mr 1 + 0 (xri,1 xr,1)10

1mr 1 + 0 (xri,2 xr,2)20

+ (ri r).

From these equations, we see no implication on the possible correlation of disturbances due to

endogenous interactions, while the mean components are exactly the same as the simultaneous


interaction model. For this model, there is an underidentification when 20 + 010 = 0 with

xri,1 = xri,2(= xri). These features imply(mr1)1020

mr1+0 = 10 and, hence,

yri yr = (xri xr)10 + (ri r).

We see that, in this case, 0 and 20 do not appear in the equation and they can not be identified

from the within equation. In the simultaneous equation interactions model, the 0 can be identified

from the reduced form disturbances of the within equation, even that constraint on 10, 20 and

0 occurs.

Lee (2007) suggests the estimation of the within equation by the conditional ML method. In

addition, the IV method is also feasible, and optimal IVs are also available.

We note that if the model includes self-influence as in

yri = 0(1



E(yrj |Jr)) + xri,110 + ( 1mr


xrj,2)20 + r + ri,

the social effects in the model would also not be identifiable when xri,1 = xri,2. For this model,

the group means of the expected outcomes are



E(yri|Jr) = 11 0 (1



xri,110 +1



xri,220 + r),

which is a linear function of group means of exogenous variables.

2.4) A Network Model with Social Interactions

Network Model: rth group

Yr = WrYr + Xr1 + WrXr2 + lmrr + ur ,


ur = Mrur + r,

where r = (1r, , mrr), ir i.i.d. (0, 2).R # of groups

mr # of members in r group

Wr , Mr exogenous network (social) matrices


WrYr endogenous effect

WrXr2 exogenous effect

r group (fixed) effect; lmr = (1, , 1)

Mrur correlation effects

Lin (2005, 2006) AddHealth Data;

The Add Health Survey:

Students in grades 7-12; 132 schools

Wave I in school survey 90,118 students;

Friendship network name up to 5 male and 5 female friends

Special features (in this paper):

Group structure (R can be large incidental parameters);

Wr and Mr row normalized, i.e., Wrlmr = lmr .

Bonacich measure of centrality of members in a group:

(Imr 0Wr)1Wrlmr = (1 0)lmr ,

i.e., all members in the same group has the same measure of centrality.

Related works:

Lin (2005, 2006): Eliminate r by (In Wn) and estimation by IV method.Calvo-Armengol, Patacchini and Zenou (2006) social network in education;

Bramoulle, Djebbari and Fortin (2006) identification.

This study:

allow correlated effects; more on identification issues; more efficient estimation methods; MC study on sensitivity of estimated effects by omission of a certain effects; empirical applications with AddHealth data.

3.5) Estimation


Elimination of group fixed effects:

1) Quasi-difference

RrYr = 0RrWrYr + RrZr0 + (1 0)lmrr0 + r,

where Zr = (Xr, WrXr) and Rr = Imr 0Mr .2) Eliminate group fixed effect

deviation from group mean and eliminate linear dependence on disturbances

RrYr = 0R


r Y

r + R


r 0 +

r ,


i) Jr = Imr 1mr lmr lmr = FrF r an eigenvalue-eigenvector decomposition;[Fr , 1mr lmr ] an orthonormal matrix,

ii) Rr = FrRrFr; W

r = F


iii) Y r = FrYr; Z

r = F


iv) using Wr = Wr(FrF r +1

mrlmr l


), Wrlmr = lmr , and Frlmr = 0, one has

F rWr = FrWrFrF


Similarly for Mr ;

v) r has zero mean and the variance matrix 20Imr1.

- spatial AR model.

Estimation: quasi-ML

ln Ln() = (n R)2 ln(22) +


ln|Sr()|1 +





[Rr()(Sr()Yr Zr)]Jr [Rr()(Sr()Yr Zr)].

Concentrated log likelihood function


lnLn(, ) = (n R)2 (ln(2) + 1) (n R)

2ln 2n(, )

+ ln |Sn()|+ ln |Rn()| R ln[(1 )(1 )].

Another (feasible) estimation method: 2SLS or G2SLS

apply IVs

F rRrYr = 0FrRrWrYr + F

rRrZr0 +

r. ()

(stacked up all the groups together all sample observations)

i) Estimate F rYr = 0FrWrYr +F


rur by 2sls with some IV matrix Q1r to get an initial

(consistent) estimate of and ;

ii) use the estimated residuals F rur(= ur) to estimate ; via (Kelejian-Prucha 1999, MOM)

(Imr1 0Mr )ur = r ;

iii) estimate Rr;

iv) then feasible 2SLS for (**).

The best IV matrix is F rRr[Wr(Imr 0Wr)1Zr0, Zr]. Under normal disturbances, the (transformed) ML is asymptotically more efficient.