Skip to content

Spatial filtering with glm with grid sampling

8 messages · Roger Bivand, Sima Usvyatsov

#
Hello,

I am running a negative binomial model (MASS) on count data collected on a
grid. The dataset is large - ~4,000 points, with many predictors. Being
counts, there are a lot of zeroes. All data are collected on a grid with 20
points, with high spatial autocorrelation.

I would like to filter out the spatial autocorrelation. My question is:
since I have very limited spatial info (only 20 distinct spatial
locations), is it possible to simplify ME() so that I don't have to run it
on the whole dataset? When I try to run ME() on a 100-point subset of the
data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a single
instance of the grid, I "get away" with a warning ("algorithm did not
converge").

Here's a fake dataset. It was grinding for a while but not throwing errors
(like my original data would). Regardless, it demonstrates the repeated
sampling at the same points and the large number of zeroes.

Any advice would be most welcome.

library(spdep)
library(MASS)

df <- data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rnorm(100, 30,
0.1), Lon = rnorm(1000, -75, 1), x = rnegbin(100, 1, 1))
coordinates(df) <- ~Lon + Lat
proj4string(df) <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
nb <- dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE)
dists <- nbdists(nb, coordinates(df), longlat=TRUE)
glist <- lapply(dists, function(x) 1/x)
lw <- nb2listw(nb, glist, style="W")
me <- ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha = 0.5)
#
On Wed, 10 Jan 2018, Sima Usvyatsov wrote:

            
The data set has 1000 values in Lon, so is probably bigger than you 
intended, and when 100 is used is not autocorrelated. You seem to have a 
hierarchical model, with repeated measurements at the locations, so a 
multi-level treatment of some kind may be sensible. If you want to stay 
with ME-based spatial filtering, maybe look at the literature on spatial 
panel (repeated measurements are in time) with ME/SF, and on network 
autocorrelation (dyadic relationships with autocorrelation among origins 
and/or destinations). Both these cases use Kronecker products on the 
selected eigenvectors, I think.

Alternatively, use a standard GLMM with a grouped iid random effect and/or 
a spatially structured random effect at the 20 location level. If the 
groups are repeated observations in time, you should model the whole 
(non-)separable space-time process.

Hope this helps,

Roger

  
    
#
Thank you so much for your response.

Yes, I managed to muck up the fake data in 2 (!) ways - the 1,000 lons and
the fact that the lon/lats weren't repeated. Here's the correct structure.

df <- data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rep(rnorm(20,
30, 0.1), each = 5), Lon = rep(rnorm(20, -75, 1), each = 5), x =
rnegbin(100, 1, 1),
Stratum = rep(1:5, each = 20))

In the meantime, I (somewhat) resolved the issue with the glmmPQL fixed
effects - my fault, of course.

My current model is set up as follows:

mod1 <- glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors,
random = ~ 1 |RoundStart,
family = quasipoisson,
correlation = corExp(form=~site.Easting + site.Northing + RoundStart)

where RoundStart is the date/time of starting each count. I'm assuming that
by "using a standard GLMM" you were thinking of MASS's glmmPQL()?
Does this look correct for specifying the full space/time dependence? The
variogram and pacf look very decent, but one can never have too many
checks...
On Thu, Jan 11, 2018 at 5:01 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:

            

  
  
#
On Thu, 11 Jan 2018, Sima Usvyatsov wrote:

            
I can't judge whether it really makes sense, but I think this is much more 
robust. I'd explore some other GLMMs too, to see what they contribute.

Roger

  
    
#
What other GLMM options do I have, under the restrictions of count data
with lots of zeroes and spatial autocorrelation? I was under the impression
that glmmPQL was my only choice?
On Thu, Jan 11, 2018 at 8:18 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:

            

  
  
#
On Thu, 11 Jan 2018, Sima Usvyatsov wrote:

            
See for example:

https://www.fromthebottomoftheheap.net/2017/05/04/compare-mgcv-with-glmmTMB/

https://rjournal.github.io/archive/2017/RJ-2017-066/index.html 
(forthcoming)

for surveys and discussions; maybe your current choice is the best bet for 
negbin.

Roger

  
    
#
I had no idea this was coming up - can't believe it didn't come up in any
of my searches. Looks super useful, and I'm sure I'll be using it sooner
than later, whether on this analysis or not. The worked example is
ridiculously useful - so glad to have it.

If I'm reading the help page correctly (the autocorrelation bits,
struc(terms|group)) - to replicate the spatial autocorrelation within
Round, would I add this?

exp(site.Easting + site.Northing | RoundStart)

So something like this:

mod1 <- glmmTMB(Count ~ Stratum + SiteInStratum + ...other predictors +
# random variable
(1 | RoundStart) +
# autocorrelation
exp(site.Easting + site.Northing | RoundStart),

family = nbinom2, # or nbinom1 - I guess decide based on residuals?
correlation = corExp(form=~site.Easting + site.Northing + RoundStart)

Does that look like the twin of the glmmPQL specification I had (other than
family)? I can also email sig-ME if it's a better forum for this.
On Thu, Jan 11, 2018 at 8:33 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:

            

  
  
#
On Thu, 11 Jan 2018, Sima Usvyatsov wrote:

            
Not sure - please report back if you find an equivalent to the glmmPQL 
approach.

Roger