An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120420/2c8dad6a/attachment.pl>
Comparison coefficients from OLS and Spatial lag model
2 messages · David Marguerit, Roger Bivand
On Fri, 20 Apr 2012, David Marguerit wrote:
Hi dear List members, I am trying to run a spatial lag regression incorporated in the spdep package R version 0.5-41. My computer has the follow configuration: Widows XP Service Pack 3 Intel Pentium 4 CPU 2.80 Ghz 1.99 Go de Ram I have a geocoded (longitude and latitude) individual database with n=2696. On the left hand, I have the distance between each individual and the nearest polluting industries and on the right hand I have 7 explanatory variables. When I compare coefficients obtained from OLS and the Spatial Lag Model I can see huge differences. Is it normal?
Yes, but the models are different. You may compare spatial error coefficients with linear model coefficients, but not spatial lag mode coefficients (see LeSage & Pace 2009). Your response is constructed in a really wierd way, you are trying to account for the distance between individuals and polluting industry. The autocorrelation is of course being created by you, when you define the response in this way. In addition, the distances are not going to change if you change the explanatory variables. I think that your model should be reconsidered completely, with distance to polluting industry as an explanatory variable, but I don't know what your response would be. You simply cannot model in this way, look at what Waller & Gotway (2004) do, as for example in spdep in ?NY_data. Hope this clarifies, Roger
The following lines resume my program: Firstly, I run a OLS regression:
tab<-read.dta("coord_modif.dta")> form.lin<-as.formula(log(entr)~log(rev_hab)+edu2+nonblanc+nonmaison+matr2+ocup2+ocup3+nonowner)> ols.lin<-lm(form.lin,data=tab)
summary (ols.lin)
Call:
lm(formula = form.lin, data = tab)
Residuals:
Min 1Q Median 3Q Max
-3.12600 -0.42976 0.06275 0.48021 1.87649
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.71921 0.19842 3.625 0.000295 ***
log(rev_hab) 0.06113 0.01813 3.372 0.000757 ***
edu2 -0.13266 0.05570 -2.381 0.017315 *
nonblanc -0.22557 0.03892 -5.795 7.61e-09 ***
nonmaison -0.09585 0.03794 -2.526 0.011582 *
matr2 -0.04329 0.02902 -1.492 0.135905
ocup2 0.06978 0.08512 0.820 0.412425
ocup3 0.09880 0.02901 3.406 0.000670 ***
nonowner -0.09370 0.04046 -2.316 0.020625 *
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 0.7101 on 2687 degrees of freedom
Multiple R-squared: 0.0562, Adjusted R-squared: 0.05339
F-statistic: 20 on 8 and 2687 DF, p-value: < 2.2e-16
Then, I create a Weight Matrix in order to check for the presence of
Spatial Autocorrelation. I use the methodology of Sphere of Influence
describe in *Applied Spatial Data Analysis with R* (Bivand et al., 2008) :
coords<-cbind(tab$long_dav, tab$lat_dav) nb.temp<-tri2nb(coords) nb<-graph2nb(soi.graph(nb.temp, coords)) plot(coords, col="red") plot(nb, coords,add=TRUE) title(main="Sphere of Influence Graph") nb_1<-list(SOI=nb) sapply(nb_1,function(x) is.symmetric.nb(x, verbose=FALSE, force=TRUE))
SOI TRUE
listw<-nb2listw(nb,style="W")> summary(listw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 2696
Number of nonzero links: 7186
Percentage nonzero weights: 0.09886611
Average number of links: 2.66543
Link number distribution:
1 2 3 4 5 6 7 8 9
452 889 759 382 150 50 12 1 1
452 least connected regions:
2 8 16 22 23 33 35 39 66 68 70 78 80 83 84 95 101 106 110 112 120 124
126 137 140 141 146 162 163 168 177 180 190 196 197 201 206 207 213
218 220 225 235 241 254 255 260 261 268 279 282 285 286 290 298 305
314 319 327 332 335 339 344 366 373 382 383 394 395 401 402 403 404
405 406 409 411 415 417 421 422 424 430 431 440 471 474 482 483 484
486 492 498 499 502 507 513 515 526 539 540 541 545 546 547 552 554
556 578 585 600 602 604 607 613 616 625 629 658 668 669 676 683 686
696 698 701 704 718 723 729 739 751 757 759 760 766 768 773 776 779
787 792 793 798 799 802 806 807 809 811 829 835 840 851 855 860 863
868 881 888 893 895 896 903 910 913 922 938 946 947 959 970 977 990
992 995 1000 1008 1018 1022 1023 1026 1027 1028 1031 1038 1039 1046
1048 1051 1080 1097 1104 1110 1111 1120 1145 1147 1165 1171 1178 1182
1188 1194 1196 1201 1202 1220 1222 1227 1230 1242 1246 1267 1269 1271
1275 1280 1289 1290 1299 1302 1303 1304 1305 1318 1327 1328 1348 1386
1391 1394 1395 1396 1406 1425 1432 1440 1441 1442 1451 1461 1474 1479
1487 1490 1495 1503 1506 1513 1518 1528 1539 1546 1551 1552 1559 1560
1571 1583 1585 1588 1593 1594 1604 1605 1606 1608 1611 1615 1617 1620
1631 1640 1644 1646 1649 1650 1655 1662 1676 1678 1683 1684 1693 1709
1713 1720 1725 1732 1740 1751 1770 1779 1781 1790 1794 1815 1831 1833
1835 1836 1837 1841 1848 1850 1855 1857 1860 1865 1871 1893 1911 1913
1920 1926 1933 1934 1936 1947 1949 1951 1953 1960 1962 1963 1964 1965
1972 1981 1991 2000 2004 2014 2017 2021 2032 2033 2046 2056 2058 2060
2061 2065 2066 2076 2077 2090 2091 2099 2102 2107 2108 2118 2122 2131
2133 2135 2136 2148 2151 2181 2182 2186 2190 2196 2207 2209 2214 2218
2224 2226 2230 2233 2235 2238 2242 2243 2248 2250 2258 2262 2268 2290
2292 2295 2296 2300 2329 2332 2346 2352 2355 2370 2383 2390 2393 2394
2398 2404 2413 2416 2418 2422 2440 2441 2442 2444 2450 2454 2460 2464
2465 2492 2498 2508 2510 2512 2523 2529 2531 2543 2547 2551 2552 2554
2556 2580 2582 2598 2603 2615 2618 2624 2627 2631 2634 2635 2637 2644
2645 2648 2653 2654 2655 2668 2673 2681 2688 2689 2692 with 1 link
1 most connected region:
2045 with 9 links
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 2696 7268416 2696 2445.314 11176.87
Lastly, I run the Moran's I test, the LM test and the spatial lag
regression:
moran.ols<-lm.morantest(ols.lin,listw)> summary(moran.ols) Length Class Mode
statistic 1 -none- numeric
p.value 1 -none- numeric
estimate 3 -none- numeric
method 1 -none- character
alternative 1 -none- characterdata.name 1 -none-
character> print(moran.ols)
Global Moran's I for regression residuals
data:
model: lm(formula = form.lin, data = tab)
weights: listw
Moran I statistic standard deviate = 50.1449, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Observed Moran's I Expectation Variance
0.9184522908 -0.0006379116 0.0003359414
test.lm<-lm.LMtests(ols.lin,listw,test=c("LMerr","RLMerr","LMlag","RLMlag","SARMA"))> print(test.lm)
Lagrange multiplier diagnostics for spatial dependence data: model: lm(formula = form.lin, data = tab) weights: listw LMerr = 2507.369, df = 1, p-value < 2.2e-16 Lagrange multiplier diagnostics for spatial dependence data: model: lm(formula = form.lin, data = tab) weights: listw RLMerr = 7.4774, df = 1, p-value = 0.006248 Lagrange multiplier diagnostics for spatial dependence data: model: lm(formula = form.lin, data = tab) weights: listw LMlag = 2646.837, df = 1, p-value < 2.2e-16 Lagrange multiplier diagnostics for spatial dependence data: model: lm(formula = form.lin, data = tab) weights: listw RLMlag = 146.945, df = 1, p-value < 2.2e-16 Lagrange multiplier diagnostics for spatial dependence data: model: lm(formula = form.lin, data = tab) weights: listw SARMA = 2654.314, df = 2, p-value < 2.2e-16
reg.slm<-lagsarlm(form.lin, data=tab,listw=listw)> summary(reg.slm)
Call:lagsarlm(formula = form.lin, data = tab, listw = listw)
Residuals:
Min 1Q Median 3Q Max
-1.558639 -0.042609 0.019559 0.071850 1.367678
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.01565469 0.04231648 0.3699 0.71142
log(rev_hab) 0.00990161 0.00385757 2.5668 0.01026
edu2 -0.01153030 0.01184878 -0.9731 0.33049
nonblanc -0.01971576 0.00828432 -2.3799 0.01732
nonmaison -0.01407983 0.00807180 -1.7443 0.08110
matr2 -0.00027776 0.00617216 -0.0450 0.96411
ocup2 0.04241982 0.01810448 2.3431 0.01913
ocup3 0.00732140 0.00617091 1.1864 0.23545
nonowner -0.01439652 0.00860479 -1.6731 0.09431
Rho: 0.91255, LR test value: 6394.9, p-value: < 2.22e-16
Asymptotic standard error: 0.0033595
z-value: 271.63, p-value: < 2.22e-16
Wald statistic: 73784, p-value: < 2.22e-16
Log likelihood: 299.2953 for lag model
ML residual variance (sigma squared): 0.022811, (sigma: 0.15103)
Number of observations: 2696
Number of parameters estimated: 11
AIC: -576.59, (AIC for lm: 5816.3)
LM test for residual autocorrelation
test value: 0.00077273, p-value: 0.97782
I have a huge differences in the coefficients between the OLS and Spatial
Lag Model (e.g. nonblanc: -0.225 VS -0.019). Does anyone know whether is it
normal?
Thank you very much for your help
Marguerit David
Phd student
University Paris Dauphine
[[alternative HTML version deleted]]
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no