An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121010/c0cc5e98/attachment.pl>
glmmPQL and spatial correlation
2 messages · Luis Huckstadt, Ben Bolker
Luis Huckstadt <lahuckst <at> ucsc.edu> writes:
I'm running into some computer issues when trying to run a binomial model for spatially correlated data using glmmPQL and was wondering if anyone could help me out. My whole dataset consists of about 300,000 points for which I have a suite of environmental variables (I'm trying to come up with a habitat model for a species of seal, using real (presence) and simulated dives (absence) as my response variable). Since my dataset is so large, I split it into thirds and ran the model without spatial correlation. However, when checking my results, I did get a spatial correlation in the residuals, which I'm trying to incorporate using a variogram (spherical). The problem is that when I run it for my 1/3 of my data (about 70k points), the calculatations go on forever. I was running the first model for over a week and was still not getting past the first iteration. This is the model I used M1.f.Spa <- glmmPQL(presence ~ sst + tmax100 + tbot + sss + ssu + tmax100d + sbot, random = ~1|fPTT, family = binomial, correlation = corSpher(c(91323.53,0.4279603), form =~ x+y, nugget = TRUE), data = sample.df1) This week, I tried a different approach and split the 70k subset into smaller datasets of 10,000 points, and now the model runs much faster (just a couple of hours tops), Yet the output of these models changes with regards to the significance of one variable, which makes me think that I need a larger dataset than 10,000 points.
That's a bit surprising. When you say the significance of one variable changes -- does that mean it changes from (e.g.) 0.045 to 0.055, which crosses the magic line at p=0.05 but is not otherwise meaningful?
Does someone have a suggestion on how to improve/run this code with the spatial correlation for a larger dataset than 10,000 which wouldn't take weeks to run? I'm working with an Intel Core i7, 12 Gb RAM (plus a couple of 100 Gbs in virtual memory), in Windows 7 64-bit.
This is not going to be easy in any case. I would look into the INLA package and AD Model Builder ... (I would also be mildly nervous about using PQL for binary data -- it might be OK, but this is known to be more or less the worst case scenario for PQL ...) Ben Bolker