problem using geeglm: R keeps crashing
On 10-11-28 07:17 PM, Chris Howden wrote:
(sorry if this has posted before, I?ve sent it twice but because I haven?t received a copy or any replies it appears that it?s not getting onto the list)
It has showed up (I think three times including this one): you can always check the archive at <https://stat.ethz.ch/pipermail/r-sig-mixed-models/> to see if your messages get through).
I?m using geeglm to do a resource function analysis on some dingo radio collar data. When I try to fit the following model R crashes and I get the following error message (I?m actually running R through ESS, just in case that?s relevant)
fit.geeglm <- geeglm(resp~rubbish, data=data, id=dogname,
family=binomial, corstr="exchangeable") ?This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information.? I?m also getting a Microsoft Windows pop up that says: "R for Windows terminal front-end has stopped working" A problem caused the program to stop working correctly. Windows will close the program and notify you if a solution is available "
In general if a package crashes R, that by definition constitutes a bug in the package and should be reported to the maintainers -- but see below ...
The weird thing is that if I reorder my data the model seems to fit OK. I can reorder the data and run a model as follows:
sample <- data[sample(93948,93948),] # (my data has 93948 rows)
fit.geeglm
<-geeglm(resp~rubbish,data=sample,id=dogname,family=binomial,corstr="exchangeable") My understanding of the corstr="exchangeable" correlation structure in geeglm() is that it assumes equal correlation amongst all data points for each dingo?.so I can?t see why reordering the data changes anything?.if I was fitting something like an AR(1) model then I can see why reordering changes things?. One reason I have considered is that geeglm() actually fits a correlation structure for each *block *of* *id?s, rather than for all id?s? For example, if my data set had the following id structure id1, id1, id1, id2, id2, id1, id1. Then it would fit 1 correlation structure to the first set of id1?s and then a different correlation structure to the second set of id1?s? Rather than fitting a single correlation structure to id1?
Based on my reading of ?geeglm, and this note under the "id" argument description: "Data are assumed to be sorted so that observations on a cluster are contiguous rows for all entities in the formula." Thus I would not be surprised if the example you give above failed, because it violates the rules laid out in the documentation (on the other hand, this is pretty easy to check and it would be wise for the package authors to make their code a bit more "fool-resistant" by checking for this condition and throwing an error if the elements aren't properly sorted). You can easily sort your observations by id to make them conform to the rules, or, if you want to fit each contiguous block separately, you can use something like factor(c(0,cumsum(diff(as.numeric(id))!=0))) to generate a new id variable that identifies contiguous blocks. Given all this it seems really surprising that geeglm works with permuted data. Does it work consistently (say 10 or 20 times in a row) and give the same answers for different permutations ... ? Maybe you just got lucky. (It would also seem pretty dicey to me to run a random effects model with only 4 levels (animals), although if you do want to use 'bout' (contiguous block) as the id variable you will have a lot more levels ...)
Does anyone have any suggestions on why this is happening??? And what I can do to get the model to run? My fall back model is to use glmer with random intercepts, but I really want to use gee since it?s a marginal model and gives me robust SE?s. Thanks for any ideas The following is some more info that may be relevant: For those unfamiliar with this type of resource function analysis it is conceptually *similar* to a case control study. The ?used? resource units being the cases and coming from the radio collared telemetry data. The ?available? resource units are the controls and are randomly sampled from the home range of each dingo (I have sampled 5 controls for each case). The resource units are 40 by 40 m pixels from a GIS. I have data for 4 dingos. It has 93948 rows and 9 variables. The first 1/6 of my data is the radio collar data ?case? or ?used resource units? (so resp=1) and the next 5/6 of my data is the ?control? or ?available resource units? (so resp =0).