error in spautolm spdep and model selection?

4 messages · Michael Denslow, Roger Bivand, Kingsford Jones

Original

1

4

Michael Denslow

Sat, Jan 17, 2009 2:23 PM #

Dear r-sig-geo,

I am attempting to compare the results from an lm() regression with an appropriate spatial regression. I have run into a problem when using the spautolm function in spdep.
When I include a polynomial term in the model I get the following error message:

floras2a$mid.elev +  I(floras2a$mid.elev^2), 
 listw = nb72km.w)

Error in solve.default(t(X) %*% as.matrix(IlW %*% X), tol = tol.solve) : 
  system is computationally singular: reciprocal condition number = 2.07505e-17

I am not sure how to interpret this. The function seems to work fine if I remove the polynomial term. 

Perhaps to larger question is which is the appropriate SAR/CAR model to use for my data. I realize that this is not an R question but any help would be most appreciated! My data has some spatial autocorrelation in the predictor variables. The residuals of the lm() model are not autocorrelated. Also the elevation variable is strongly autocorrelated and directional in my data set. As I understand it this would rule out CAR as an option. 

Thank you in advance for any help you can provide,
Michael




Michael Denslow

I.W. Carpenter Jr. Herbarium [BOON]
Appalachian State University
Boone, North Carolina U.S.A.

-- AND --

Communications Manager
Southeastern Regional Network of Expertise and Collections
sernec.org

Roger Bivand

Sun, Jan 18, 2009 1:41 AM #

On Sat, 17 Jan 2009, Michael Denslow wrote:

For spautolm, this is a new report, and I'll try to find out how to make 
the error message clearer. What is happening is the when the squared term 
is included, X' (I - \lambda W)' \Omega (I - \lambda W) X is singular. My 
guess would be that:

SAR <- errorsarlm(log10ns2a ~ log10a2a + floras2a$Field.Year + 
floras2a$mid.elev +  I(floras2a$mid.elev^2), listw = nb72km.w)

might very well fail too - almost certainly floras2a$mid.elev and 
I(floras2a$mid.elev^2) are highly correlated. spautolm does check whether 
the X variables are aliased, but using QR, rather than the less robust 
solve.default(). Can you try changing the scale of mid.elev, which will 
then change the scale of its square?

If you believe that the variable scaling is as it should be, and want to 
proceed, set the tol.solve= argument in spautolm to a value less than 
2.07505e-17, such as 1e-17. In principle, the internal function should not 
use solve.default(), but this is the first report of it causing 
trouble since spautolm was introduced in late 2005.

The models you are fitting are trying to harvest spatial structure in the 
residual. If there isn't any spatial structure to harvest, lm() is quite 
good enough. Spatial pattern in the X variables is not a problem, 
especially if it matches the spatial patterning in the response. Do you 
have a reference for your claims above, they sound like heresay to me? In 
this case, the only interesting spatial patterning is in the residuals - 
if they are clean, and you beleive that your model is well specified, 
there is no spatial story left.

Best wishes,

Roger

PS. access to your data set might be helpful in inserting a better error 
message into spautolm.

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

Kingsford Jones

Sun, Jan 18, 2009 10:55 AM #

On Sun, Jan 18, 2009 at 2:41 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:

As Roger mentioned, without spatially autocorrelated residuals the
spatial regression is not the 'appropriate' model.

Note that it's safer to use the 'data' argument rather than using '$'
within model formulas

Also, orthogonal polynomial terms can be created automatically using
'poly' within model formulas

It does seem suspicious that just adding a squared term causes
singularity.  I wonder if there's something else going on with the
model matrix or weights

Tobler's law guarantees that your spatial predictors will be
autocorrelated.  If spatial variables were random the world would be a
very strange place indeed!

We don't know enough about your study to offer statistical advice
(e.g. I assume it's regional/lattice/areal data?), and this really
isn't the place for consulting, but I'll reiterate what Roger said in
different terms.

The problem with spatially autocorrelated residuals in an lm model is
that you are not meeting the assumption of independence; points close
together contain redundant information and you overestimate the
degrees of freedom available for formal inference.  This not only
provides anti-conservative SEs, but when GLS estimators are used (as I
believe they are under the hood of spautolm), it also changes the
estimated coefficients.  If there is no autocorrelation in the
residuals this suggests the conditional mean part of the model has
already explained the spatial structure in the response and you don't
have lack of independence (at least not due to spatial
autocorrelation).  I've never seen a spatial regression where I
believed there was truly no spatial autocorrelation in the residuals,
but formal inference is full of approximations and if we can get close
to no autocorrelation then we can get close to an optimal estimator.
The tricky part is looking for the spatial structure in the residuals
-- it can come in many forms.

hth,

Kingsford Jones

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Michael Denslow

Sun, Jan 18, 2009 5:05 PM #

Thank you Roger and others who responded on and off list,
Roger, I am happy to send the data that generated the error and will do so off list.

I think that the problem is that I miss understood a statement that I read about the CAR model. In Dormann et al. (2008 p. 614, Methods to account for spatial autocorrelation in the analysis of species distributional data: a review) they state:

'CAR is unsuitable when directional processes such as stream flow effects or prevalent wind directions are coded as non-Euclidean distances, resulting in an asymmetric covariance matrix.'

I originally took this to mean that if you have some effect that is strongly directional CAR is not appropriate. After looking at it again, I am frankly not sure what it means. In any case, it appears that I may not need any of these models at this time. This leads to my last question. 
Should I present the results of my Moran's I test of lm() residuals (lm.morantest) to show that they are indeed not autocorrelated and a spatial regression is therefore not needed? 

Thank you again,
Michael