I'm running into some computer issues when trying to run a binomial model
for spatially correlated data using glmmPQL and was wondering if anyone
could help me out.
My whole dataset consists of about 300,000 points for which I have a suite
of environmental variables (I'm trying to come up with a habitat model for
a species of seal, using real (presence) and simulated dives (absence) as
my response variable).
Since my dataset is so large, I split it into thirds and ran the model
without spatial correlation. However, when checking my results, I did get a
spatial correlation in the residuals, which I'm trying to incorporate using
a variogram (spherical). The problem is that when I run it for my 1/3 of
my data (about 70k points), the calculatations go on forever. I was running
the first model for over a week and was still not getting past the first
iteration.
This is the model I used
M1.f.Spa <- glmmPQL(presence ~ sst + tmax100 + tbot + sss + ssu + tmax100d
+ sbot, random = ~1|fPTT, family = binomial, correlation =
corSpher(c(91323.53,0.4279603), form =~ x+y, nugget = TRUE), data =
sample.df1)
This week, I tried a different approach and split the 70k subset into
smaller datasets of 10,000 points, and now the model runs much faster (just
a couple of hours tops), Yet the output of these models changes with
regards to the significance of one variable, which makes me think that I
need a larger dataset than 10,000 points.