problem using geeglm: R keeps crashing

5 messages · Ben Bolker, Chris Howden

Sun, Nov 28, 2010 4:17 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20101129/6da1494d/attachment.pl>

Ben Bolker

Sun, Nov 28, 2010 5:06 PM #

On 10-11-28 07:17 PM, Chris Howden wrote:

It has showed up (I think three times including this one): you can
always check the archive at
<https://stat.ethz.ch/pipermail/r-sig-mixed-models/> to see if your
messages get through).

In general if a package crashes R, that by definition constitutes a
bug in the package and should be reported to the maintainers -- but see
below ...

Based on my reading of ?geeglm, and this note under the "id" argument
description: "Data are assumed to be sorted so that observations on a
cluster are contiguous rows for all entities in the formula."  Thus I
would not be surprised if the example you give above failed, because it
violates the rules laid out in the documentation (on the other hand,
this is pretty easy to check and it would be wise for the package
authors to make their code a bit more "fool-resistant" by checking for
this condition and throwing an error if the elements aren't properly
sorted).
  You can easily sort your observations by id to make them conform to
the rules, or, if you want to fit each contiguous block separately, you
can use something like

factor(c(0,cumsum(diff(as.numeric(id))!=0)))

to generate a new id variable that identifies contiguous blocks.
 Given all this it seems really surprising that geeglm works with
permuted data.  Does it work consistently (say 10 or 20 times in a row)
and give the same answers for different permutations ... ? Maybe you
just got lucky.

  (It would also seem pretty dicey to me to run a random effects model
with only 4 levels (animals), although if you do want to use 'bout'
(contiguous block) as the id variable you will have a lot more levels ...)

Chris Howden

Sun, Nov 28, 2010 8:57 PM #

Thanks for the reply Ben,

(And sorry for sending the same email 3 times everyone. I'm on other R
lists where I receive a copy of my sent posts. And I also logged onto the
r-sig-models admin site to confirm my settings are set so I receive copies
of emails I send to the list. So not sure why I didn't receive them.)

Sorting the id's doesn't help, R still crashes.

What's interesting is that permutating the data actually makes geeglm()
fit a model, while it is incapable of doing so on the correct id sorted
data. (As U suggested I have run it again a number of times and get
similar results each time. The model estimates are also quite similar to a
standard GLM)

Looking at the summary() results for the permutated data model has given
me a clue as to why a model can be fit to the permutated data, but not the
'correct' data. The permutated model has fit 68987 clusters, which is
incorrect since there should only be 4 (1 for each dog)

I suspect the problem here is that geeglm cannot fit an "exchangeable"
correlation structure to such large blocks of data (the smallest of which
has 24000 rows).

PS: I was wondering why u think it is dicey to run a random effects model
with only 4 levels. If I only have 4 individuals then I only have 4
levels.

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and Innovation,
Data Analysis, Modelling, and Training
(mobile) 0410 689 945
(fax / office) (+618) 8952 7878
chris at trickysolutions.com.au

-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com]
Sent: Monday, 29 November 2010 10:36 AM
To: r-sig-mixed-models at r-project.org; Chris Howden
Subject: Re: [R-sig-ME] problem using geeglm: R keeps crashing

On 10-11-28 07:17 PM, Chris Howden wrote:

haven't

It has showed up (I think three times including this one): you can
always check the archive at
<https://stat.ethz.ch/pipermail/r-sig-mixed-models/> to see if your
messages get through).

the

case

unusual

"

  In general if a package crashes R, that by definition constitutes a
bug in the package and should be reported to the maintainers -- but see
below ...

<-geeglm(resp~rubbish,data=sample,id=dogname,family=binomial,corstr="excha
ngeable")

for

correlation

id2,

set of

id1's?

Based on my reading of ?geeglm, and this note under the "id" argument
description: "Data are assumed to be sorted so that observations on a
cluster are contiguous rows for all entities in the formula."  Thus I
would not be surprised if the example you give above failed, because it
violates the rules laid out in the documentation (on the other hand,
this is pretty easy to check and it would be wise for the package
authors to make their code a bit more "fool-resistant" by checking for
this condition and throwing an error if the elements aren't properly
sorted).
  You can easily sort your observations by id to make them conform to
the rules, or, if you want to fit each contiguous block separately, you
can use something like

factor(c(0,cumsum(diff(as.numeric(id))!=0)))

to generate a new id variable that identifies contiguous blocks.
 Given all this it seems really surprising that geeglm works with
permuted data.  Does it work consistently (say 10 or 20 times in a row)
and give the same answers for different permutations ... ? Maybe you
just got lucky.

  (It would also seem pretty dicey to me to run a random effects model
with only 4 levels (animals), although if you do want to use 'bout'
(contiguous block) as the id variable you will have a lot more levels ...)

can

want

units

from

The

1/6

resource

Ben Bolker

Mon, Nov 29, 2010 11:02 AM #

If the model with only 4 clusters is the one you wanted (i.e. data
sorted in id order, contiguous blocks), then I would definitely contact
the maintainers to see why it's crashing.  That is, you seem to be
following the rules laid out in the documentation, so it shouldn't crash.

  The reason it's dicey to run a RE model with only 4 levels is that you
are in effect trying to estimate a variance from 4 observations, which
tends to be very inaccurate/unstable.  Since you apparently have very
large quantities of data per individual, I would expect you to get
approximately the same answers if you just treated individuals as a
fixed effect (although admittedly the model is philosophically different).

  Ben Bolker

On 11/28/2010 11:57 PM, Chris Howden wrote:

Thanks for the reply Ben,

(And sorry for sending the same email 3 times everyone. I'm on other R
lists where I receive a copy of my sent posts. And I also logged onto the
r-sig-models admin site to confirm my settings are set so I receive copies
of emails I send to the list. So not sure why I didn't receive them.)

Sorting the id's doesn't help, R still crashes.

What's interesting is that permutating the data actually makes geeglm()
fit a model, while it is incapable of doing so on the correct id sorted
data. (As U suggested I have run it again a number of times and get
similar results each time. The model estimates are also quite similar to a
standard GLM)

Looking at the summary() results for the permutated data model has given
me a clue as to why a model can be fit to the permutated data, but not the
'correct' data. The permutated model has fit 68987 clusters, which is
incorrect since there should only be 4 (1 for each dog)

I suspect the problem here is that geeglm cannot fit an "exchangeable"
correlation structure to such large blocks of data (the smallest of which
has 24000 rows).

PS: I was wondering why u think it is dicey to run a random effects model
with only 4 levels. If I only have 4 individuals then I only have 4
levels.

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and Innovation,
Data Analysis, Modelling, and Training
(mobile) 0410 689 945
(fax / office) (+618) 8952 7878
chris at trickysolutions.com.au

-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com]
Sent: Monday, 29 November 2010 10:36 AM
To: r-sig-mixed-models at r-project.org; Chris Howden
Subject: Re: [R-sig-ME] problem using geeglm: R keeps crashing


On 10-11-28 07:17 PM, Chris Howden wrote:

(sorry if this has posted before, I've sent it twice but because I

haven't

received a copy or any replies it appears that it's not getting onto the
list)

 It has showed up (I think three times including this one): you can
always check the archive at
<https://stat.ethz.ch/pipermail/r-sig-mixed-models/> to see if your
messages get through).

I'm using geeglm to do a resource function analysis on some dingo radio
collar data. When I try to fit the following  model R crashes and I get

the

following error message (I'm actually running R through ESS, just in

case

that's relevant)

 fit.geeglm <- geeglm(resp~rubbish, data=data, id=dogname,

family=binomial, corstr="exchangeable")

"This application has requested the Runtime to terminate it in an

unusual

way.

Please contact the application's support team for more information."

I'm also getting a Microsoft Windows pop up that says:

"R for Windows terminal front-end has stopped working"

A problem caused the program to stop working correctly.

Windows will close the program and notify you if a solution is available

"

  In general if a package crashes R, that by definition constitutes a
bug in the package and should be reported to the maintainers -- but see
below ...

The weird thing is that if I reorder my data the model seems to fit OK.

can reorder the data and run a model as follows:

 sample <- data[sample(93948,93948),]  # (my data has 93948 rows)

fit.geeglm

<-geeglm(resp~rubbish,data=sample,id=dogname,family=binomial,corstr="excha
ngeable")

My understanding of the corstr="exchangeable" correlation structure in
geeglm() is that it assumes equal correlation amongst all data points

for

each dingo..so I can't see why reordering the data changes anything..if

was fitting something like an AR(1) model then I can see why reordering
changes things..

One reason I have considered is that geeglm() actually fits a

correlation

structure for each *block *of* *id's, rather than for all id's?  For
example, if my data set had the following id structure id1, id1, id1,

id2,

id2, id1, id1. Then it would fit 1 correlation structure to the first

set of

id1's and then a different correlation structure to the second set of

id1's?

Rather than fitting a single correlation structure to id1?

Based on my reading of ?geeglm, and this note under the "id" argument
description: "Data are assumed to be sorted so that observations on a
cluster are contiguous rows for all entities in the formula."  Thus I
would not be surprised if the example you give above failed, because it
violates the rules laid out in the documentation (on the other hand,
this is pretty easy to check and it would be wise for the package
authors to make their code a bit more "fool-resistant" by checking for
this condition and throwing an error if the elements aren't properly
sorted).
  You can easily sort your observations by id to make them conform to
the rules, or, if you want to fit each contiguous block separately, you
can use something like

factor(c(0,cumsum(diff(as.numeric(id))!=0)))

to generate a new id variable that identifies contiguous blocks.
 Given all this it seems really surprising that geeglm works with
permuted data.  Does it work consistently (say 10 or 20 times in a row)
and give the same answers for different permutations ... ? Maybe you
just got lucky.

  (It would also seem pretty dicey to me to run a random effects model
with only 4 levels (animals), although if you do want to use 'bout'
(contiguous block) as the id variable you will have a lot more levels ...)

Does anyone have any suggestions on why this is happening??? And what I

can

do to get the model to run?

My fall back model is to use glmer with random intercepts, but I really

want

to use gee since it's a marginal model and gives me robust SE's.

Thanks for any ideas

The following is some more info that may be relevant:

For those unfamiliar with this type of resource function analysis it is
conceptually *similar* to a case control study. The "used" resource

units

being the cases and coming from the radio collared telemetry data. The
"available" resource units are the controls and are randomly sampled

from

the home range of each dingo (I have sampled 5 controls for each case).

The

resource units are 40 by 40 m pixels from a GIS.

I have data for 4 dingos. It has 93948 rows and 9 variables. The first

1/6

of my data is the radio collar data "case" or "used resource units" (so
resp=1) and the next 5/6 of my data is the "control" or "available

resource

units" (so resp =0).

Chris Howden

Mon, Nov 29, 2010 4:09 PM #

Thanks Ben,

I've contacted the administrators and we're looking into it.

And your absolutely right, a GLMM with random intercept is giving me
pretty much the same answer as a GLM.

Although I'm hoping the GEE model with it's robust SE's will account for
the obvious autocorrelation in my dataset.

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and Innovation,
Data Analysis, Modelling, and Training
(mobile) 0410 689 945
(fax / office) (+618) 8952 7878
chris at trickysolutions.com.au


-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com]
Sent: Tuesday, 30 November 2010 4:33 AM
To: Chris Howden
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] problem using geeglm: R keeps crashing

  If the model with only 4 clusters is the one you wanted (i.e. data
sorted in id order, contiguous blocks), then I would definitely contact
the maintainers to see why it's crashing.  That is, you seem to be
following the rules laid out in the documentation, so it shouldn't crash.

  The reason it's dicey to run a RE model with only 4 levels is that you
are in effect trying to estimate a variance from 4 observations, which
tends to be very inaccurate/unstable.  Since you apparently have very
large quantities of data per individual, I would expect you to get
approximately the same answers if you just treated individuals as a
fixed effect (although admittedly the model is philosophically different).

  Ben Bolker

On 11/28/2010 11:57 PM, Chris Howden wrote:

the

copies

the

which

model

Innovation,

the

available

<-geeglm(resp~rubbish,data=sample,id=dogname,family=binomial,corstr="excha

ngeable")

My understanding of the corstr="exchangeable" correlation structure in
geeglm() is that it assumes equal correlation amongst all data points

for

each dingo..so I can't see why reordering the data changes anything..if

was fitting something like an AR(1) model then I can see why reordering
changes things..

One reason I have considered is that geeglm() actually fits a

correlation

structure for each *block *of* *id's, rather than for all id's?  For
example, if my data set had the following id structure id1, id1, id1,

id2,

id2, id1, id1. Then it would fit 1 correlation structure to the first

set of

id1's and then a different correlation structure to the second set of

id1's?

Rather than fitting a single correlation structure to id1?

Based on my reading of ?geeglm, and this note under the "id" argument
description: "Data are assumed to be sorted so that observations on a
cluster are contiguous rows for all entities in the formula."  Thus I
would not be surprised if the example you give above failed, because it
violates the rules laid out in the documentation (on the other hand,
this is pretty easy to check and it would be wise for the package
authors to make their code a bit more "fool-resistant" by checking for
this condition and throwing an error if the elements aren't properly
sorted).
  You can easily sort your observations by id to make them conform to
the rules, or, if you want to fit each contiguous block separately, you
can use something like

factor(c(0,cumsum(diff(as.numeric(id))!=0)))

to generate a new id variable that identifies contiguous blocks.
 Given all this it seems really surprising that geeglm works with
permuted data.  Does it work consistently (say 10 or 20 times in a row)
and give the same answers for different permutations ... ? Maybe you
just got lucky.

  (It would also seem pretty dicey to me to run a random effects model
with only 4 levels (animals), although if you do want to use 'bout'
(contiguous block) as the id variable you will have a lot more levels

...)