SAR Poisson GLM model
On Mon, 1 Feb 2016, Cl?ment Gorin wrote:
Hi,
I am estimating a gravity model of migration on cross-sectional data. The Moran I statistic indicates a positive and significant spatial autocorrelation in the residuals of the a-spatail model, and the Lagrange Multiplier test points to the Spatial Autoregressive (SAR) model as the preferred specification. While I have no issue fitting a linear SAR (Le Sage and Pace 2008) to my data, it does not accommodate the very large number of zeroes (> 90%) in my dependent variable. This clearly point to a Poisson process (Santos Silva and Tenreyro 2006). In short, I am having trouble running the SAR Poisson GLM. I had two questions: (1) Is there a method to run a SAR Poisson GLM in R? (I searched a lot before posting here)
Look at your model first. A gravity model estimated as Poisson implies the 278k are interactions, that is origin/destination pairs, not counts of origins or destinations. sqrt(278784) is 528, which I think is your real n. Your data are zero-inflated, so maybe you need to use a zero-inflated approach. Just saying you want to run a SAR Poisson (where your use of SAR is ambiguous - you mean y ~ rho W y + X beta, but SAR really means simultaneous autoregressive, a distinction from conditional autoregressive (CAR)), suggests autocorrelation in the interactions, but the spatial autocorrelation is likely in the origin and destination fixed effects, not in the interactions. Did you look at the spatial regression section of the Spatial task view? There is a SAR Poisson approach (in your terms) in INLA - the slm latent model does something like this, but will not handle the zero-inflation, and most likely isn't appropriate to your setting. In any case, a Poisson approach without an offset (log expected interactions) may not be sensible, in addition to the spatial autocorrelation actually "belonging to" the origins and destinations, not the interactions.
(2) If answer to (1) is no, I should at least use a spatially filtered Poisson GLM. Yet, both SptatialFiltering() and ME() crash even using a very simple connectivity structure (symmetric knn = 5). I mean that it did not give any message error but RStudio simply "lost the connection with the R session". I suspect this is due to the large number of observation (278 784). Do you have tips to increase computational efficiency?
Never run sensitive processes under RStudio. They have not yet (tens of months' waiting) replied to a query of this kind, and the problem in some cases may be with them. In a former case under Windows, a similar message was seen under RStudio, but the underlying command (not yours) ran to completion in RGui. Always report the output of sessionInfo() - your platform is unknown. Why would you think that a dense 278k x 278k matrix could be handled? SF/ME work by selecting eigenvectors from the weights (knn is pretty unsatisfactory too - it does not yield a planar graph). Your initial memory needs are roughly 600GB, so this is probably not the way to go. If the earlier comments are correct, you actually have n=528, meaning that you'd need first to align the SF processes with the origin and destination fixed effects (or adding WX to the X) - see work by Griffith and Chun - Yongwan Chun may even have code for DOI: 10.1080/00045608.2011.561070 and other publications. IIRC you roll out the eigenvectors to the nxn interactions like other fixed effects. Hope this clarifies, Roger
Best, Cl?ment Gorin PhD student, GATE LSE
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 91 00 e-mail: Roger.Bivand at nhh.no http://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en http://depsy.org/person/434412