Skip to content

Beta-binomial distributions with lmer?

16 messages · Christine Griffiths, ONKELINX, Thierry, Ben Bolker +5 more

#
Hi R users,

Just a query as to whether lme4 can handle beta-binomial distributions as I 
read that this was not available.

If not, any suggestions on how to handle such a distribution to plot the 
following model:
y<-cbind(Biotic,Abiotic)
m1<-lmer(y~Treatment+Month.rain+(1|Month)+(1|Block/EnclosureID/Quadrat))

y referring to percentage cover of biotic matter.

Cheers,
Christine
#
No.  You can use a quasi-binomial model, although
the support is a little bit spotty (and beware that
quasi- models may falsely report inflation of the
random effects).

  Ben Bolker
Christine Griffiths wrote:

  
    
#
Thanks. I was hoping for a miracle that this had been developed within the 
last couple of months.

I am on the stats learning curve and am not quite sure how flexible to be 
with regards to distributions.  Is quasibinomial acceptable, despite having 
data with a lot of 0s and a lot of 100s?

Many thanks in advance,
Christine
--On 10 June 2009 09:18 -0400 Ben Bolker <bolker at ufl.edu> wrote:

            
#
That's a good question, answers will differ.  Since "all models are
wrong" anyway, provided that a mean-variance relationship of V =
phi*N*p*(1-p) seems plausible, I would say you should go for it.  You're
near the cutting edge anyway ... (I don't have a copy, but you might see
whether Zuur et al's book has anything to say on the subject -- they're
very pragmatic ecologists, and I think they use GEE/quasi models quite a
lot ...)

  Ben Bolker
Christine Griffiths wrote:

  
    
#
Dear Christine,

We had recently a vivid discussion on whether it is appropriate to model
percentages by a (quasi)binomial model. We were modelling the precentage
of leaves that is missing from trees. The mixed model with the binomial
family had random effects with extremly small variances. My colleague
argued that this percentage did not come from a bernouilli experiment.
And hence the binomial family was not appropriate. He suggested to put
the percentage on a 0 to 100 scale and apply a log(x+1) transformation.
This resulted in a linear mixed model with random effects that had
reasonable variances. This convinced me that the binomial family only
makes sense with binary data.

HTH,

Thierry


------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens Ben Bolker
Verzonden: woensdag 10 juni 2009 15:59
Aan: Christine Griffiths
CC: r-sig-mixed-models at r-project.org
Onderwerp: Re: [R-sig-ME] Beta-binomial distributions with lmer?

  That's a good question, answers will differ.  Since "all models are
wrong" anyway, provided that a mean-variance relationship of V =
phi*N*p*(1-p) seems plausible, I would say you should go for it.  You're
near the cutting edge anyway ... (I don't have a copy, but you might see
whether Zuur et al's book has anything to say on the subject -- they're
very pragmatic ecologists, and I think they use GEE/quasi models quite a
lot ...)

  Ben Bolker
Christine Griffiths wrote:
--
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu /
www.zoology.ufl.edu/bolker GPG key:
www.zoology.ufl.edu/bolker/benbolker-publickey.asc

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.
#
Yes, but ...
  If the data get "scrunched" near 100% (as well as near zero), then
I'm not sure that this procedure would lead to stable variances?
(If it does, that's great.) Why not logit((proportion+m)/(1+2*m)) [where
m is a small value which can be interpreted as coming from a Bayesian
prior, if you like] instead? Once we've done all that, we're getting
pretty close to a quasi-binomial model anyway ...  (It sounds like all
the N values are the same in this example anyway, so there's no scaling
of variance with N to worry about.)
ONKELINX, Thierry wrote:

  
    
#
Dear Ben and Thierry,

Thank you for the advice. I tried to do both suggested methods, however got 
stumped on Ben's suggestion of logit. Thierry's suggestion did improve the 
variances (e.g. 7.7e-04 to 1.94 for the residual variance) when I used 
quasipoisson family errors. Given that the values aren't discrete I am not 
sure this is correct. Ben you only suggest this method if it leads to 
"stable variance". I have tried searching what is meant by this term, but 
have not found any information. If you could clarify or point me in the 
right direction I would gratefully appreciate the assistance.

Cheers
Christine
--On 10 June 2009 10:25 -0400 Ben Bolker <bolker at ufl.edu> wrote:

            
----------------------
Christine Griffiths
School of Biological Sciences
University of Bristol
Woodland Road
Bristol BS8 1UG
Tel: 0117 9287593
Fax 0117 3317985
Christine.Griffiths at bristol.ac.uk
http://www.bio.bris.ac.uk/research/mammal/tortoises.html
#
Hi,

You can fit this model with AD Model Builders random effects module
which is now freely available at http://admb-project.org.
Rather than arguing a priori about the applicability of various models I
prefer to fit them and look at the estimates and diagnostics for each one.

   dave
#
Christine Griffiths wrote:
If you transformed the data in some significant way, then the
residual variances aren't necessarily going to be comparable, so
I'm not sure I would take that as confirmation.

I think Thierry meant to suggest a LMM (i.e., assume normal
distributions, no transformation after the initial one) rather
than a GLMM (link function/exponential-family distribution or
quasi-distribution).

You may find more on "stabilizing variance" rather
than "stable variance" -- what I meant was that the variability in the
Pearson residuals (residuals scaled by the expected standard deviation,
which is what lmer gives you) should be independent of the fitted value
-- so try plot(sqrt(residuals(model)) ~ fitted(model)) and see if the
"amplitude" appears reasonably constant (this is approximately the same
as the "scale-location" plot that plot.lm gives you for a linear model).

  
    
#
Dear Ben,

Indeed. I suggested to use a LMM with the transformed data. Then I would
have a look at how the residuals behave.

Best regards,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens Ben Bolker
Verzonden: woensdag 10 juni 2009 20:14
Aan: Christine Griffiths
CC: r-sig-mixed-models at r-project.org
Onderwerp: Re: [R-sig-ME] Beta-binomial distributions with lmer?
Christine Griffiths wrote:
would gratefully appreciate the assistance.
If you transformed the data in some significant way, then the
residual variances aren't necessarily going to be comparable, so I'm not
sure I would take that as confirmation.

I think Thierry meant to suggest a LMM (i.e., assume normal
distributions, no transformation after the initial one) rather than a
GLMM (link function/exponential-family distribution or
quasi-distribution).

You may find more on "stabilizing variance" rather than "stable
variance" -- what I meant was that the variability in the Pearson
residuals (residuals scaled by the expected standard deviation, which is
what lmer gives you) should be independent of the fitted value
-- so try plot(sqrt(residuals(model)) ~ fitted(model)) and see if the
"amplitude" appears reasonably constant (this is approximately the same
as the "scale-location" plot that plot.lm gives you for a linear model).

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.
#
Hello All,

I would like to use GLMMs with a binary response variable (logit link) to 
model the effects of three environmental covariates on whether resource 
units were used or unused by a wildlife species.  I have 15 different study 
areas, and very different numbers of used and unused units in each.  I'm 
interested in using fixed effects parameters estimates to predict the 
relative probabilities that resource units will be used across the entire 
population of study areas.  Numbers of used and unused units in each area 
look something like this:

Area    Unused    Used
01        281        2
02        4415      1
03        343        30
04        256        1
05        2052      4
06        4050      1
07        238        2
08        743        3
09        2476      18
10        2524      1
11        805        1
12        754        4
13        272        1
14        52          1
15        124        1

I've been using study area as a grouping factor for a random intercept and 
random slope effects:

fullmodel<-glmer(Used~1+x1+x2+x3+(1+x1+x2+x3|Area), family=binomial, 
data=mydata)

Using 'glmer', I've been able to fit models to my data without convergence 
issues, model fit is pretty good, and the results seem to make sense.  My 
questions are:  Given that the number of used units in each area are very 
unbalanced, to what degree can I generalize across the entire population of 
study areas?  Will my estimates for the fixed effects parameters be so 
reliant on areas 3 and 9 that I'm really just limited to inferences on these 
two areas?  Is there a way to quantify the relative weight of each study 
area in the estimation of the fixed effects parameters (i.e. the degree to 
which I can generalize across the entire population of study areas)?

I've read of borrow strength, which will certainly play a big role with this 
dataset, but I haven't found any examples of datasets that are as unbalanced 
as mine.

I realize that my questions relate to mixed models in general and less to 
their implementation in R, so I hope I'm not out-of-line in posting these 
questions here.  I'd guess there are probably answers to these questions in 
the literature, so I'd truly appreciate any advice on where I should look 
for more info.

Thanks in advance,

-Grant
#
Hello,

as a simple quick check, have you tried fitting your model without area 3/9 or without both of them and compared the estimates? You could then also look at how well your fixed effects estimates predict the values in the left-out areas.

HTH

Cheers,

Luca


----- Original Message -----
From: Grant T. Stokke <gts127 at psu.edu>
To: r-sig-mixed-models at r-project.org
Sent: Wed, 10 Jun 2009 23:41:52 -0400 (EDT)
Subject: [R-sig-ME] GLMMs with unequal group sizes

Hello All,

I would like to use GLMMs with a binary response variable (logit link) to 
model the effects of three environmental covariates on whether resource 
units were used or unused by a wildlife species.  I have 15 different study 
areas, and very different numbers of used and unused units in each.  I'm 
interested in using fixed effects parameters estimates to predict the 
relative probabilities that resource units will be used across the entire 
population of study areas.  Numbers of used and unused units in each area 
look something like this:

Area    Unused    Used
01        281        2
02        4415      1
03        343        30
04        256        1
05        2052      4
06        4050      1
07        238        2
08        743        3
09        2476      18
10        2524      1
11        805        1
12        754        4
13        272        1
14        52          1
15        124        1

I've been using study area as a grouping factor for a random intercept and 
random slope effects:

fullmodel<-glmer(Used~1+x1+x2+x3+(1+x1+x2+x3|Area), family=binomial, 
data=mydata)

Using 'glmer', I've been able to fit models to my data without convergence 
issues, model fit is pretty good, and the results seem to make sense.  My 
questions are:  Given that the number of used units in each area are very 
unbalanced, to what degree can I generalize across the entire population of 
study areas?  Will my estimates for the fixed effects parameters be so 
reliant on areas 3 and 9 that I'm really just limited to inferences on these 
two areas?  Is there a way to quantify the relative weight of each study 
area in the estimation of the fixed effects parameters (i.e. the degree to 
which I can generalize across the entire population of study areas)?

I've read of borrow strength, which will certainly play a big role with this 
dataset, but I haven't found any examples of datasets that are as unbalanced 
as mine.

I realize that my questions relate to mixed models in general and less to 
their implementation in R, so I hope I'm not out-of-line in posting these 
questions here.  I'd guess there are probably answers to these questions in 
the literature, so I'd truly appreciate any advice on where I should look 
for more info.

Thanks in advance,

-Grant

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
#
Hello Grant,

related to the previous remark (re-fitting the model without some of  
the areas), you might be interested in the influence.ME package, which  
I developed. Although focused on measures of influential data  
regarding variables at the level of the area (in your example), you  
could use the estex() function which will return the fixed estimates  
iteratively excluding each of the areas.

If you need help with using the package, you can contact me off-list.

Kind regards,

Rense Nieuwenhuis
On 11 jun 2009, at 10:12, Luca Borger wrote:

            
#
On Wed, Jun 10, 2009 at 11:41 PM, Grant T. Stokke<gts127 at psu.edu> wrote:
Does this mean that the number of Unused units is not included
anywhere in the model?
#
Sorry, I should have been more clear.  The number of unused units is 
included.  The response variable is "Used", so that used units have Used=1 
and unused units have Used=0.

Thanks for the input.  I've been looking at the influence of the areas with 
many used units on parameter estimates (by removing areas 3 and 9), and they 
appear to be more influential than I would like.  Ideally, I'd like each 
study area to receive approximately equal weight in parameter estimation. 
Is that possible?  Would it be inappropriate to use the 'weight' option in 
glmer to give study areas more equal weight?  Any suggestions?

Thanks,

-Grant

----- Original Message ----- 
From: "Daniel Ezra Johnson" <danielezrajohnson at gmail.com>
To: "Grant T. Stokke" <gts127 at psu.edu>
Cc: <r-sig-mixed-models at r-project.org>
Sent: Thursday, June 11, 2009 8:56 AM
Subject: Re: [R-sig-ME] GLMMs with unequal group sizes