Ekreg ho-11-spatial ec 231112

25
SPATIAL ECONOMETRICS PROGRAM PASCA SARJANA ILMU EKONOMI UNIVERSITAS INDONESIA 23 Oktober 2012

description

ekonometri spasial

Transcript of Ekreg ho-11-spatial ec 231112

Page 1: Ekreg ho-11-spatial ec 231112

SPATIAL ECONOMETRICS

PROGRAM PASCA SARJANA ILMU EKONOMI UNIVERSITAS INDONESIA

23 Oktober 2012

Page 2: Ekreg ho-11-spatial ec 231112

Ordinary Least Squares

y = Xβ + ε

y: Dependent variable (Land Price)X: independent (explanatory) variables (Center of

CBD)β: regression coefficientsε: random error term

• OLS estimates found by minimizing the sum of squared prediction errors.

• Assumptions– error has mean zero– error terms are uncorrelated– error terms have a constant variance

(homoskedastic)– error term follows a normal distribution, N(0, σ2).

Asumsi dilanggarDalam Model

Regional /Spasial

Page 3: Ekreg ho-11-spatial ec 231112

Spatial Dependence• Autocorrelation does not only occur in time

series analysis, but also over space, where error terms in a spatial unit correlate with error terms in another spatial unit

• The dependency of the error term over space is generally called spatial autocorrelation Spatial Dependence

• Occurred in applied empirical and in geographical analysis autocorrelation (Anselin and Rey, 1991; Anselin and Griffith, 1988; Siebert, 1975).

Page 4: Ekreg ho-11-spatial ec 231112

When is this an issue in regression analysis?

• Lets say that you have multivariate spatial data and you are looking for relationships between the variables

• You can use regression to find relationships as we have been doing so far in the course. However, if there is spatial autocorrelation in the residuals your model is systematically overestimating the observed values in some regions, and underestimating the observed values in other regions get unrealistic values for the significance and confidence limits for the coefficients

• Why? Because, OLS assumes the cases are all independent. If there is correlation between them, they are not independent your estimate of the number of degrees of freedom is too high, and your estimate of the standard errors will be too low lead you to believe that some coefficients are significant, when in fact they are not.

Page 5: Ekreg ho-11-spatial ec 231112

What is spatial autocorrelation? • First law of geography: “everything is related to everything else, but near

things are more related than distant things” – Waldo Tobler

• Spatial autocorrelation is said to exist when a variable exhibits a regular pattern over space in which its value at a set location depends on the values of the same variables at other locations (Odland, 1988:7)

• What happens at one location is related to what happens at other locations (Gatrell, 1979). The value at any one point in space is dependent on values at the surrounding points. The arrangement of values is not just random

• Dependency caused by: clusters, contiguity, externalities, interaction, spill-overs, heterogeneity

• Positive spatial correlation means that similar values tend to be near each other. Negative spatial correlation means that different values tend to be near each other.

Page 6: Ekreg ho-11-spatial ec 231112

Spatial Autocorrelation as missing variables

• Spatial autocorrelation arises as the manifestation of the missing variable problem, improper structural form, or measurement error problem

• The absence of the analysis of spatial factors in the model makes it difficult to interpret the results of an Ordinary Least Square (OLS) model, making the t-test and an assessment of sampling variability unreliable

• Ignoring the spatial error structure:

• OLS is unbiased but inefficient,

• standard errors and t-tests biased

Page 7: Ekreg ho-11-spatial ec 231112

Regression diagnostics: Spatial autocorrelation

• Observations from different locations for the error term are autocorrelated.

• If spatial autocorrelation is ignored, then all inference (including t and F statistics, and R2) based on the standard regression model will be incorrect.

• Moran’s I and Langrange multiplier (LM) tests – require normality of the error terms.

• Robust Langrange multiplier test

• does not require normality of the error terms

• may not have much power for small data sets.

• If normality is satisfied, use LM statistics for ‘error’ and ‘lag’.

Page 8: Ekreg ho-11-spatial ec 231112

Pattern over Space: Similarity and Dissimilarity

• Spatial autocorrelation is said to exist when a variable exhibits a regular pattern over space in which its value at a set location depends on the values of the same variables at other locations (Odland, 1988:7)

• What happens at one location is related to what happens at other locations (Gatrell, 1979). The regular pattern can be one of similarity or dissimilarity.

• Dependency caused by: clusters, contiguity, externalities, interaction, spill-overs, heterogeneity

Page 9: Ekreg ho-11-spatial ec 231112

ML spatial lag model

y = ρWy + Xβ + ε

y : Dependent variable (eg. Harga Tanah)X : independent (explanatory) variables (Pusat Pertokoan,

Perkantoran, dllΒ : regression coefficientsΕ : random error termρ : Spatial autoregressive coefficientWy : spatially lagged dependent variable

Page 10: Ekreg ho-11-spatial ec 231112

ML spatial error model

y = Xβ + ε, where ε = λWε + ξ

Y : Dependent variable (Harga Tanah)X : independent (explanatory) variables (Pusat Pertokoan,

CBD, dll.housing)Β : regression coefficients ε : random error termλ : autoregressive coefficientWε: spatial lag for the errorsξ : normal distribution with mean 0 and variance σ2I

1)( WITotal effect:

Page 11: Ekreg ho-11-spatial ec 231112

Measures of fit: R2

Measure the extent to which the predicted values match the observed values for the dependent variable

•Calculated by R2 = 1 - RSS / SST

•Model with highest R2 = best (subject to diagnostics)

•Problem: increases with additional (explanatory) variables – overfitting – adjusted R2

•For spatial error model, use pseudo R2 because R2 is not applicable. Pseudo R2 = ratio of the variance of the predicted values over the variance of the observed values for the dependent variable

Page 12: Ekreg ho-11-spatial ec 231112

Maximum likelihood (ML) estimates

• ML estimates for the parameters = those values that obtain the highest probability or joint likelihood.

• Based on the assumption of normal error terms.

• Standard regression: OLS and ML estimates are the same to compare models of standard and spatial regression, use ML based measures of fit instead of R2.

• Model with highest log likelihood = best (subject to diagnostics)

• To deal with the problem of overfitting, use Akaike Information Criterion (AIC)

• Model with lowest AIC = best (subject to diagnostics)

• use Schwartz Criterion (SC), which is similar to AIC

Page 13: Ekreg ho-11-spatial ec 231112

t-statistic

For checking the significance of individual regression coefficients, β.

•Estimate of β divided by its standard deviation, follows a t distribution with n (=no. Of observations - number of parameters estimated in the model) degrees of freedom.

•Check probability, p, of the t-statistic if p<0.05, then β ≠ 0 at the 5% significance level.

•For checking the significance of the regression specification, as a whole.

•Check probability, p, of the F-statistic: – usually significant except for very poor model specification (in which case also indicated by R2).

Page 14: Ekreg ho-11-spatial ec 231112

Example

• Imagine if you had data from 10 counties spread all over the northeast. The data includes variables such as median income, # of highways, and access to the internet

• You regress access to the internet (dependent variable) against the other two (independent) variables. You realize that n=10 is not enough to get significant results, so you need more data, get data from 10 additional counties, but you choose the counties that are immediately adjacent to the original 10

• If incomes and infrastructure hardly change from one county to the next, you are not really getting any additional information. This would be spatial autocorrelation

• So how does this affect the regression results? You would be doing the calculations as if you had n=20 cases, but in reality you only had 10 independent cases. So, you would be overestimating the number of degrees of freedom, getting unrealistic t and p values, and underestimating the standard errors of the coefficients.

Page 15: Ekreg ho-11-spatial ec 231112

Moran’s I• One of the oldest indicators of spatial autocorrelation (Moran, 1950).

Still a defacto standard for determining spatial autocorrelation

• Applied to zones or points with continuous variables associated with them.

• Compares the value of the variable at any one location with the value at all other locations

,

2,

( )( )

( ) ( )

i j i ji j

i j ii j i

N W X X X XI

W X X

2i

jiij

ij Z

ZZw

w

NI

xxzxxz jjii ;

• Moran’s I statistic can take values between –1 and 1. Positive values of Moran’s I indicate positive spatial autocorrelation in which similar values are more likely than dissimilar values between neighbors and vice versa.

N = number of regions

Page 16: Ekreg ho-11-spatial ec 231112

Geary’s C

• Value typically range between 0 and 2• If value of any one zone are spatially unrelated to any other zone, the

expected value of C will be 1

• Values less than 1 (between 1 and 2) indicate negative spatial autocorrelation• Inversely related to Moran’s I• Does not provide identical inference because it emphasizes the

differences in values between pairs of observations, rather than the covariation between the pairs.

• Moran’s I gives a more global indicator, whereas the Geary coefficient is more sensitive to differences in small neighborhoods.

i j iij

i j jiij

XXW

XXWNC

2

2

)((2

])()[1[(

Page 17: Ekreg ho-11-spatial ec 231112

Expected Value and Variance

• Expected Value

• Variance

• Statistical Significance

E IN

Str

M

N K( )

V IN

S

tr MWMW tr MW

N K N K E I( ) .

. ' ( . )

( )( ) [ ( )]

2 2

22

M I X X X ( ' ) '

Z II E I

V I( )

( )

( )

Page 18: Ekreg ho-11-spatial ec 231112

Empirical Application (1)

• Basic Model

• Block Matrices

ititit XY

NT

N

N

T

Y

Y

Y

Y

Y

Y

Y

2

1

1

12

11

NTNT

NN

NN

TT

XX

XX

XX

XX

XX

XX

X

21

22

21

12

11

12

11

122

121

112

111

1

1

1

1

1

1

2

1

0

Page 19: Ekreg ho-11-spatial ec 231112

Block Matrices

NNNN

N

N

NNNN

N

N

NNNN

N

N

www

wwwwww

N

Ttwww

wwwwww

N

twww

wwwwww

N

t

W

21

22221

11211

21

22221

11211

21

22221

11211

21

00

021

0

2

00021

1

Page 20: Ekreg ho-11-spatial ec 231112

Weight Matrix and Contiguity

Page 21: Ekreg ho-11-spatial ec 231112

Cell Locations and Values

a b c

d e f

g h i

9 6 3

8 5 2

7 4 1

  a b c d e f g h i Jmha 0 1 0 1 0 0 0 0 0 2b 1 0 1 0 1 0 0 0 0 3c 0 1 0 0 0 1 0 0 0 2d 1 0 0 0 1 0 1 0 0 3e 0 1 0 1 0 1 0 1 0 4f 0 0 1 0 1 0 0 0 1 3g 0 0 0 1 0 0 0 1 0 2h 0 0 0 0 1 0 1 0 1 3i 0 0 0 0 0 1 0 1 0 2

Jmh 2 3 2 3 4 3 2 3 2 24

Page 22: Ekreg ho-11-spatial ec 231112

Cij Matrix Based on Squared Euclidean Distance

  a b c d e f g h i

a 0 9 36 1 16 49 4 25 64b 9 0 9 4 1 16 1 4 25c 36 9 0 25 4 1 16 1 4d 1 4 25 0 9 36 1 16 49e 16 1 4 9 0 9 4 1 16f 49 16 1 36 9 0 25 4 1g 4 1 16 1 4 25 0 9 36h 25 4 1 16 1 4 9 0 9i 64 25 4 49 16 1 36 9 0

  a b c d e f g h ia 0 9 0 1 0 0 0 0 0b 9 0 9 0 1 0 0 0 0c 0 9 0 0 0 1 0 0 0d 1 0 0 0 9 0 1 0 0e 0 1 0 9 0 9 0 1 0f 0 0 1 0 9 0 0 0 1g 0 0 0 1 0 0 0 9 0h 0 0 0 0 1 0 9 0 9i 0 0 0 0 0 1 0 9 0

(xi – xj)2

The WijCij Matrix for the General Cross Product

Page 23: Ekreg ho-11-spatial ec 231112

WijCij Matrix under Moran's I

  a b c d e f g h i Jmha 0 4 0 12 0 0 0 0 0 16b 4 0 -2 0 0 0 0 0 0 2c 0 -2 0 0 0 6 0 0 0 4d 12 0 0 0 0 0 6 0 0 18e 0 0 0 0 0 0 0 0 0 0f 0 0 6 0 0 0 0 0 12 18g 0 0 0 6 0 0 0 -2 0 4h 0 0 0 0 0 0 -2 0 4 2

i 0 0 0 0 0 12 0 4 0 16

Jmh 16 2 4 18 0 18 4 2 16 80

Page 24: Ekreg ho-11-spatial ec 231112

Statistical Significance of Moran's

Page 25: Ekreg ho-11-spatial ec 231112

Latihan Spatial Statistics

7585 50

80

85 85 70 6090

82 65 5088

A

B

C

E

D

G

J

I

H M

L

F K

Sebuah provinsi mempunyai pola pendapatan per kapita sebagai tergambar dari peta hipotetikal berikut:

Pertanyaan: •Susunlah Contiguity Weight Matrix sebagai basis untuk menghitung Moran’s I•Apa kesimpulan Saudara, apakah pola pendapatan per kapita di provinsi tersebut menunjukkan pola yang sistematis? ( tak perlu uji signifikansi).