Skip to content

spatial autocorrelation as random effect with count data

7 messages · Sima Usvyatsov, Paul Buerkner, Ben Bolker +1 more

#
Hello,

I am working on a spatially autocorrelated dataset with a negative binomial
(count) response variable. I have been using the glmmPQL approach (MASS),
but I seem to have a hard time fitting the fixed effects. I came across the
mention that one could build the spatial autocorrelation into a random
effect (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q1/015364.html).


I've done some searching but could not find a straightforward example of
this practice. I have 20 sampling locations (sampled repeatedly to a 4,000
point dataset) and I know that there is spatial autocorrelation between
them (by looking at autocorrelation plots of a naive model). The 20 grid
points are clustered into 4 strata, and I am interested in the strata
effects (so would like to keep the strata as fixed).

How would I go about expressing the spatial autocorrelation in this setup?
In the future I'd like to explore GAMs for this application, but for now
I'm stuck with a GLM approach... I would love to be able to use glmer()
with a random effect that expresses spatial autocorrelation.

Here's a fake dataset.

library(MASS)

df <- data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rep(rnorm(20,
30, 0.1), each = 5), Lon = rep(rnorm(20, -75, 1), each = 5), x =
rnegbin(100, 1, 1), Stratum = rep(1:5, each = 20))

Thank you so much!
#
It seems as if GAM(M)s are well suited for this situation, why would you
prefer to stay with the GLM approach?

2018-01-10 21:59 GMT+01:00 Sima Usvyatsov <ghiaco at gmail.com>:

  
  
#
It will be hard to make this work with glmer at present.

 If you want to use geostatistical models (i.e., specifying spatial
autocorrelation function explicitly), as far as I know at present your
best options are

  INLA  (possibly via the inlabru package): looks like this implements
the Matern family (which includes exponential and Gaussian
autocorrelation functions as special cases)

  spaMM: spatial models etc. via h-likelihood

  would welcome more informed advice from other people on the list!
On Wed, Jan 10, 2018 at 3:59 PM, Sima Usvyatsov <ghiaco at gmail.com> wrote:
#
PS: depending on how badly you wanted this, it would be possible to
what Doug Bates said (impose spatial dependence on the random effects
for the 20 spatial points) via the modular machinery of glmer, but it
would take some effort and knowledge ...
On Wed, Jan 10, 2018 at 3:59 PM, Sima Usvyatsov <ghiaco at gmail.com> wrote:
1 day later
#
I was introduced today (by Roger Bivand) to the glmmTMB package that looks
very exciting. As a co-author, I was wondering why you didn't suggest it -
is there a reason it's a no-go in my situation? From a super quick read,
and a very naive thinking, is this not equivalent to the gpmmPQL setup
below?

mod1 <- glmmTMB(Count ~ Stratum + SiteInStratum + ...other predictors +
# random variable
(1 | RoundStart) +
# autocorrelation
exp(site.Easting + site.Northing | RoundStart),

family = nbinom2, # or nbinom1 - I guess decide based on residuals?
correlation = corExp(form=~site.Easting + site.Northing + RoundStart)

where RoundStart is the time of starting sampling along the repeated, set,
20-point sampling grid, easting and northing are the 20 points' coords,
Stratum is the allocation of the 20 sampling points to 5 strata and
SiteInStratum is  the 1:4 allocation within stratum.

This is my current setup:

mod1 <- glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors,
random = ~ 1 |RoundStart,
family = quasipoisson,
correlation = corExp(form=~site.Easting + site.Northing + RoundStart)

I have INLA and GAMs on my to-do list for this year - sounds like really
helpful ways to go about things, I just haven't gotten there yet...

Thank you so much!
On Wed, Jan 10, 2018 at 5:34 PM, Ben Bolker <bbolker at gmail.com> wrote:
binomial
(MASS),
the
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q1/015364.html).
4,000
setup?
rep(rnorm(20,

  
  
#
Because the spatial correlation machinery is extremely experimental
(and, to be honest, I didn't realize it was actually working already).
Kasper Kristensen added the spatial example to the "covstruct" vignette
a couple of months ago, and I just looked at it now - it seems
interesting.  Try it out: if it works (or doesn't work), I'd be happy to
hear about it. If you encounter problems (and please do check your
results **very** carefully), post an issue at
https://github.com/glmmtmb/glmmTMB/issues ...

  I think your "correlation=" specification below is
incorrect/redundant. Reading vignette("covstruct",package="glmmTMB") and
looking at the volcano example, I would try:

your_data$pos <- numFactor(d$site.Easting, d$site.Northing)

mod1 <- glmmTMB(Count ~ Stratum + SiteInStratum + ...other predictors +
cheers
   Ben Bolker
On 18-01-11 05:15 PM, Sima Usvyatsov wrote:
#
Hi Sima,
 You might do this as a GAM using the mgcv package with a syntax something like this -

mod1 <- gam( Count ~ Stratum + SiteInStratum + s( site.Easting , site.Northing ) + s( Roundstart , bs="re" ) , family = Poisson , data=yourdata )

I haven't used quasipoisson models in mgcv so you'd need to look to see if that's an available family

The s( site.Easting , site.Northing ) term produces a 2-dimensional smooth of the coordinate pairs, in essence modeling how Count varies by location within 2-dimensional coordinate space.

The s( Roundstart , bs="re" ) term treats Roundstart as a random effect. You might do a model just using Roundstart as a linear predictor to see how it compares.

mgcv has some nice plotting options. My preference is to create a grid of the x/y or easting/northing coordinates and predict your model to that grid

You could also do a Bayesian version using the brms package. The syntax would be nearly identical except that the function call would be brm instead of gam.

Paul
_____________________________________________
Paul M. Lantos, MD, MS GIS, FIDSA, FAAP, FAAP            
Assistant Professor of Internal Medicine and Pediatrics
  Pediatric Infectious Diseases
  General Internal Medicine
Duke University School of Medicine
Duke Global Health Institute
_____________________________________________



-----Original Message-----
From: Sima Usvyatsov [mailto:ghiaco at gmail.com] 
Sent: Thursday, January 11, 2018 5:15 PM
To: Ben Bolker <bbolker at gmail.com>
Cc: R-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] spatial autocorrelation as random effect with count data

I was introduced today (by Roger Bivand) to the glmmTMB package that looks very exciting. As a co-author, I was wondering why you didn't suggest it - is there a reason it's a no-go in my situation? From a super quick read, and a very naive thinking, is this not equivalent to the gpmmPQL setup below?

mod1 <- glmmTMB(Count ~ Stratum + SiteInStratum + ...other predictors + # random variable
(1 | RoundStart) +
# autocorrelation
exp(site.Easting + site.Northing | RoundStart),

family = nbinom2, # or nbinom1 - I guess decide based on residuals?
correlation = corExp(form=~site.Easting + site.Northing + RoundStart)

where RoundStart is the time of starting sampling along the repeated, set, 20-point sampling grid, easting and northing are the 20 points' coords, Stratum is the allocation of the 20 sampling points to 5 strata and SiteInStratum is  the 1:4 allocation within stratum.

This is my current setup:

mod1 <- glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors, random = ~ 1 |RoundStart, family = quasipoisson, correlation = corExp(form=~site.Easting + site.Northing + RoundStart)

I have INLA and GAMs on my to-do list for this year - sounds like really helpful ways to go about things, I just haven't gotten there yet...

Thank you so much!
On Wed, Jan 10, 2018 at 5:34 PM, Ben Bolker <bbolker at gmail.com> wrote:
binomial
(MASS),
the
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Dsig-2Dmixed-2Dmodels_2011q1_015364.html&d=DwICaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=pCiIbVPBNWU0azo0D48ChUH_fawb-FalNVOf1sUn1r4&m=ueQxw_NiRoOZc6904koetGkyojO7A7EYG2wnkc-_Q5o&s=inAqkRaPqS5USvZxt-SLYQqnz1nQJra1b8WGyXU9Gno&e= ).
4,000
setup?
rep(rnorm(20,