On Tue, 1 Oct 2013, Philine Gaffron wrote:
Thank you for the Geographical Analysis reference, very helpful. I have deduced that the method argument should be "Matrix" for my data rather than "LU", since the "W" weights are symmetric. However, I am still getting the warnings on non convergence and since you state that this maybe a specification problem, I would like to find the error.
I think that, unless you need the Hausman test from errorsarlm, just turn it off with errorsarlm(..., control=list(returnHcov=FALSE)); the ML fit is quite OK, it is only the Hausman test component that is giving the warnings for the spatial coefficient close to its limits, and when using other methods than "eigen". If you do need it and cannot use method="eigen", increase the order until the warning (it is a warning, not an error) goes away with errorsarlm(..., control=list(pWOrder=<big integer>)). I have had win-builder roll up a Windows binary of the development version with more details on the problem, on: http://win-builder.r-project.org/5IVvaH0M3Qsj See the examples for powerWeight, and look at the returned component for the warnining-displaying errorsarlm fit called $pWinternal showing how convergence failed; series needs to fall below tol for convergence. As your n is not large, you can compare the "eigen" and "Matrix" methods to see how the Hausman test output diverges as the approximation to the term needed deteriorates. The underlying problem that you have touched is (as in the sections in Bivand et al. 2013 on approximations) that using power series to generate the product of the inverse of the nxn (I - \rho W) matrix and an arbitrary vector also deteriotates when \rho is near its limits. Thanks for the detailed results; I've deleted them in this reply to save space. Roger
I did eliminate all polygons without neighbours for all my datasets before calling errorsarlm, so this should not be the problem. I include a full code example inline below (including documentation of the intermediate objects and the R outputs) for one dataset with warning and one without below in case you have time to look at it and find a possible error source that I have not identified. comments on crash also inline Am 30.09.2013 12:31, schrieb Roger Bivand:
On Mon, 30 Sep 2013, Philine Gaffron wrote:
Thank you for the response, Roger. My answers below. I have also added a question on a warning message I have received and would be glad to find out whether this means I need to change the analysis approach or just have to take it as additional information.
Thanks for reporting back: some points for clarification inline.
Am 25.09.2013 00:53, schrieb Roger Bivand:
On Wed, 25 Sep 2013, Philine Gaffron wrote:
Hello, I am trying to run a spatial regression analysis using the errorsarlm function in the sdpep package. I first ran into memory problems (on a larger dataset than the one described below) but have adjusted the memory size limit:
memory.limit(size = 800000)
[1] 8e+05
Which sets your memory limit under Windows to 781 Gb (size is in Mb) - that probably explains the error exit unless you have a lot of memory. Best avoid making any changes from the default in a fresh session unless you know exactly what you are doing.
I did up the virtual memory of the machine to 800Gb first (it has 1Tb in total and there is currently nothing else running on it) so I hope that that was not the problem. Thanks for the hint, though.
However, when I try to execute the following code in R Studio, I receive the message "R Session aborted - R encountered a fatal error - The session was terminated"
Always rerun any code with problems outside R Studio, because it can interfere with the reporting of warnings and errors. Certainly errorsarlm() has no reason to cause R to crash - and this is what you are suggesting. If you can reproduce this outside R Studio, and without setting memory limits, I may need to see your data to try to reproduce any error exit from the R session.
I tried this and had no more crashes - especially after I made the changes suggested/outlined below
Does this mean that the R error exit occurred when you used R Studio, and did not occur when R was run directly, using exactly the same data and code? If so, please also let the R Studio people know, so that they can continue debugging.
I will try to reproduce this once I have understood / solved my non-convergence problem. If and when I do, I will send the code and data to the RStudio people and cc you (?).
ElDU.slm.PM25.white <- errorsarlm(PM_PM25 ~ race_white, data = ElDorado_urban, listw=ElDoU_listw)
Did you read the advice on ?errorsarlm about the use of alternative methods, such as "LU" or "Matrix" for larger data sets? You are using the default "eigen" method, but this should certainly have worked for the modest n=2081. Can you reproduce the problem with a built-in data set, such as elect80?
I opted for the "LU" method and changed the weights style (see below). Is there any documentation that tells me the implications of using method ="LU"? I only found the note in the spdep documentation "LU_order default FALSE; used in ?LU_prepermutate?, note warnings given for lu method" but nothing further. I did receive warnings in the output, though (for several but not all of my datasets) and I am not sure what they mean / what their implications are for the results. Do I need to choose another method? The warnings in every case were:
See Bivand, Hauke & Kossowski (2013) Geographical Analysis for some details; LU_prepermutate emulates the implementation by finding an initial fill-in re-ordering, but in R has little speed-up effect, hence it is turned off by default.
1: In powerWeights(W = W, rho = lambda, order = con$pWOrder, X = B,: not converged within order iterations 2: In powerWeights(W = t(W), rho = lambda, order = con$pWOrder, X = C,: not converged within order iterations
These are unusual, and may signal a mis-specified setting. The default number of powers is 250, and lambda^250 is a very small number unless lambda is close to its (probably upper) limit. The computation in fact only affects the accuracy of the Hausman test on the difference between OLS and spatial error coefficients, so may be adjusted if lambda is close to limits by setting control argument pWOrder to a larger value than 250, see ?powerWeights. As a symptom of lambda close to its limits, it suggests mis-specification, possibly in terms of observation support, with neighbouring properties sharing important drivers. This may be suggested by your data scenario described below; scrutinising the intermediate objects in your script may also put you on the track of things behaving in unexpected ways. I would expect many observations to have no neighbours with a 150m threshold, but you don't show no-neighbour control being turned off.
I removed all polygons without neighbours prior to calling errorsarlm - see example below.
Hope the helps, Roger
I have not tried reproducing the error with the built-in dataset but if that is of interest to you from the point of view of spdep development, I could give it a try.
The data was imported from an ArcGIS shape file:
Parcels2008_inc_POP <- readOGR(".", "Parcels_2008_inc_pop")
ElDorado_urban <- subset(Parcels2008_inc_POP, COUNTYFP10 == "017")
The subset contains data from 2081 polygons The listw object was created as follows:
Eld_U_coords <- coordinates(ElDorado_urban) Eld_U_IDs <- row.names(ElDorado_urban) ElDoU_nb <- dnearneigh(Eld_U_coords, d1 = 0, d2 = 492, row.names = Eld_U_IDs)
Is 492 a well-chosen upper distance threshold? I assume that the input data are projected, so that these are measures in the units of the projection - metres? If Eld_U_coords are actually geographical coordinates, but longlat=NULL, d2 is taken as a planar measure, if you declare longlat=TRUE, it is in km. There is no easy way to trap undeclared geographical coordinates.
The distance thresholds are in feet (NAD_1983_StatePlane_California_II_FIPS_0402_Feet) and are related to dispersion distances for road traffic emissions. For explanation: my polygons are land use parcels for which I have calculated emission loads in grams/area from traffic loads on a road network. Since I am assuming a dispersion distance of 492 feet / 150 m of emissions away from the network in the example above, I am further assuming, that parcels which are within that distance of one another are affected by the same emission sources (i.e. traffic loads on the network).
ElDoU_listw <- nb2listw(ElDoU_nb, glist=NULL, style="S", zero.policy=NULL)
Does choosing a different style make a difference ("S" is not often
used)?
I chose "S" as a variance stabilising coding schemes following Bivand, Pebesma et al. (2008), p. 253 but have now switched back to "W". Apart from the warnings cited above I encountered no more issues. Thank you for your help and for making this package available. Philine
Roger
Is there anything inherent in my approach that could cause this problem or would one need to look at the data? I include the session info below. Thanks for any help, Philine
sessionInfo()
R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] spdep_0.5-65 Matrix_1.0-12 lattice_0.20-15 rgdal_0.8-11 sp_1.0-13 loaded via a namespace (and not attached): [1] boot_1.3-9 coda_0.16-1 deldir_0.0-22 grid_3.0.1 LearnBayes_2.12 MASS_7.3-26 nlme_3.1-109 splines_3.0.1 tools_3.0.1
library(help=spdep)
Information on package ?spdep? Description: Package: spdep Version: 0.5-65 Date: 2013-09-04
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no