Hello All, I have gone through ASDAR and hope that I have properly applied it's content. I have water color data sampled from polygons, which lie at the mouths of a rivers around an island. Due to ocean currents, bathymetry, and wind etc., I have the need to account for spatial autocorrelation for those polygons which are near to each other. I am trying to build a statistical model to explain the correlations between land cover and water quality. Therefore: ### Coordinates of the polygons coords <- coordinates(outlets) ### k=2 - assume that each outlet point could be/is possibly only affecting the two nearest outlets outlets.nb <- knn2nb(knearneigh(coords,k=2),row.names=outlets$BASIN_ID) ### We assume that SA is related to distance between outlets ### Calculate distance (in meters, the units of our projection) distance.nb <- nbdists(outlets.nb, coords) ### According to ASDAR (Page 254) this shouldn't be a problem because these distances are not on a grid but are asymmetrically placed in space, ### but we do ignore the dynamics physical processes that might transport water (currents, wind, bathymetry etc.) and assume equal diffusion/dispersal in all directions. ### Invert the distances: the bigger the distance the smaller the weight. Weights are proportional to the inverse distance invert.dist <- lapply(distance.nb, function(x) 1/x) (outlets.lw <- nb2listw(outlets.nb, glist=invert.dist, style='B')) OK, is this a correct structure? It may be better to just base it on distance (dnearneigh) i.e. assume that beyond 50km distance a nearest neighbor has no influence. Any comments, suggestions, insults? ## Give it a whirl. moran.test(response, listw=outlets.lw, randomisation=F) Moran's I test under normality data: response weights: outlets.lw Moran I statistic standard deviate = 2.8826, p-value = 0.001972 alternative hypothesis: greater sample estimates: Moran I statistic Expectation Variance 0.48914375 -0.02325581 0.03159666 ### Tweaking suggests that heteroscedasticity is present as well the data not being normally distributed. Then a boxcox transformation suggests a 1/sqrt(response) transformation but it is difficult to interpret so I opt for a log transformation. > moran.test(log(response), listw=outlets.lw, randomisation=F) Moran's I test under normality data: log(response) weights: outlets.lw Moran I statistic standard deviate = 2.0702, p-value = 0.01922 alternative hypothesis: greater sample estimates: Moran I statistic Expectation Variance 0.34473683 -0.02325581 0.03159666 Moran's I changed quite a bit here. This is due to non-normality, right? It appears the log takes most of the heteroscedasticity out and transforms the distribution to almost normally ditributed (seen via plot(model)). ### Continuing ### after narrowing down the explanatory variables while removing the 'worst' of those that are collinear (rho>0.5). sp.model <- spautolm(log(response) ~., data=data, listw=outlets.lw) I used the SAR default. Also, without narrowing my explanatory variables down to a bare minimum, I usually run into the error: Error in solve.default(crossprod(X, as.matrix(IlW %*% X)), tol = tol.solve) : system is computationally singular: reciprocal condition number = 2.85163e-019 Which still exists even when the same non-spatial (just lm) model contains no singularities. I get this often when adding an interaction term to the model. I am worried that the spatial structure has been misspecified, when looking at the other post which addresses this same error: http://r-sig-geo.2731867.n2.nabble.com/error-in-spautolm-spdep-and-model-selection-td2766159.html > sp.model <- spautolm(form, data=borneo.data,listw=outlets.lw) > summary(sp.model,Nagelkerke=T) Call: spautolm(formula = form, data = borneo.data, listw = outlets.lw) Residuals: Min 1Q Median 3Q Max -1.1164557 -0.3936249 0.0074938 0.3652388 1.4859714 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -9.2321e+00 1.6446e+00 -5.6135 1.983e-08 p_bare 1.2761e+00 2.5148e-01 5.0743 3.889e-07 mean_rfact 1.0769e-03 2.1112e-04 5.1010 3.378e-07 km_roads 8.0991e-05 3.3864e-05 2.3917 0.016772 p_bare:mean_rfact -1.5911e-04 3.1192e-05 -5.1009 3.381e-07 p_bare:km_roads -1.5490e-05 5.9758e-06 -2.5922 0.009537 Lambda: 3040.2 LR test value: 0.94862 p-value: 0.33007 Log likelihood: -44.52488 ML residual variance (sigma squared): 0.43704, (sigma: 0.66109) Number of observations: 44 Number of parameters estimated: 8 AIC: 105.05 Nagelkerke pseudo-R-squared: 0.54572 So there seems to be little spatial autocorrelation in the residuals. ### This is how it looks in non-spatial manner. summary(model);AIC(model) Call: lm(formula = response ~ p_bare + mean_rfact + km_roads + p_bare:mean_rfact + p_bare:km_roads, data = borneo.data) Residuals: Min 1Q Median 3Q Max -0.7111 -0.3342 -0.1216 0.3065 2.0646 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.067e+00 1.378e+00 -2.951 0.00539 ** p_bare 6.878e-01 2.142e-01 3.210 0.00270 ** mean_rfact 6.132e-04 1.766e-04 3.472 0.00130 ** km_roads 6.282e-05 2.863e-05 2.194 0.03439 * p_bare:mean_rfact -8.611e-05 2.660e-05 -3.237 0.00251 ** p_bare:km_roads -1.091e-05 5.067e-06 -2.154 0.03767 * --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.559 on 38 degrees of freedom Multiple R-squared: 0.4248, Adjusted R-squared: 0.3491 F-statistic: 5.613 on 5 and 38 DF, p-value: 0.0005723 ## and then... lm.morantest(model.log,listw=outlets.lw)### Moran's I is big Global Moran's I for regression residuals data: model: lm(formula = log(response) ~ p_bare + mean_rfact + km_roads + p_bare:mean_rfact + p_bare:km_roads, data = borneo.data) weights: outlets.lw Moran I statistic standard deviate = 1.1192, p-value = 0.1315 alternative hypothesis: greater sample estimates: Observed Moran's I Expectation Variance 0.15521190 -0.04488877 0.03196742 So SA may or may not exist. It doesn't appear significant here, however, the R2 values increase in the spatial model and the AIC drops. The spatial model seems better, but is it overkill? Thanks for the comments, suggestions, or insults! Regards, Brian
spautolm - sanity check
1 message · Brian