An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130703/554e02a9/attachment.pl>
Model validation for Presence / Absence, (binomial) GLMs
8 messages · Highland Statistics Ltd, Ken Knoblauch, Gabriel Baud-Bovy +2 more
1 day later
The current CRAN package of lme4 (0.999999-2), running under R 3, says that VarCorr returns an (S4) object of class VarCorr.
The function does seem not do that; it still returns a list as in earlier versions of lme4.
Have I missed something, or has the help file has got a bit ahead of the code... ?
Steve Ellison
LGC
*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
Highland Statistics Ltd <highstat at ...> writes:
This is something I always battle with given the plethora of great model fitting methods available for other models. I always use a variant of Hugh's suggestion and look at the % of correct predictions between models as a quick model fitting statistic. And for overdispersion I believe one way is to fit individual level random effects and see if this is a substantively better model. There is more on this in the wiki http://glmm.wikidot.com/faq
Yes, but this is unidentifiable for Bernoulli responses (as also explained there).
The statement on 'unidentifiable for Bernoulli responses'....well...apparently this is not that trivial. See: http://www.highstat.com/BGGLM.htm Follow the link to: the Discussion Board.... Go to: Chapter 1 Introduction to generalized linear models And see the topic: Can binary logistic models be overdispersed? Alain
That's an interesting document: I think the bottom line is: * if the Bernoulli data can be grouped, i.e. if there are in general multiple observations with the same set of covariates, then overdispersion can be identified, because the data are really equivalent to a binomial response within the groups. For example, the trivial example grp resp A 1 A 0 A 1 B 0 B 0 B 1 is equivalent to: grp successes total A 2 3 B 1 3 this sort of aggregation can be done with plyr::ddply, or in lots of different ways -- it's generally faster and more efficient to fit the data in the grouped form than in the binary form. It might have been more convenient for readers to be able to go directly to the link: http://www.highstat.com/BGS/GLMGLMM/pdfs/ HILBE-Can_binary_logistic_models_be_overdispersed2Jul2013.pdf {URL broken to make Gmane happy} rather than having to click 5 times to get through the Highland Statistics material to the PDF of interest ...
S Ellison <S.Ellison at ...> writes:
The current CRAN package of lme4 (0.999999-2), running under R 3, says that VarCorr returns an (S4) object of class VarCorr. The function does seem not do that; it still returns a list as in earlier
versions of lme4.
Have I missed something, or has the help file has got a bit ahead of the code... ?
This does look like a documentation typo. Will be fixed in lme4.0 on R-forge; we probably won't release another version of lme4 to CRAN before the 1.0 release. thanks Ben Bolker
Ben Bolker <bbolker at ...> writes:
Highland Statistics Ltd <highstat <at> ...> writes:
This is something I always battle with given the
plethora of great model
fitting methods available for other models. I always use a variant of Hugh's suggestion and
look at the % of correct
predictions between models as a quick model
fitting statistic.
And for overdispersion I believe one way is to fit
individual level random
effects and see if this is a substantively better model.
There is more on
this in the wiki http://glmm.wikidot.com/faq
Yes, but this is unidentifiable for Bernoulli
responses (as also
explained there).
The statement on 'unidentifiable for Bernoulli responses'....well...apparently this is not that trivial. See: http://www.highstat.com/BGGLM.htm Follow the link to: the Discussion Board.... Go to: Chapter 1 Introduction to generalized linear
models
And see the topic: Can binary logistic models be
overdispersed?
Alain
That's an interesting document: I think the bottom line is: * if the Bernoulli data can be grouped, i.e. if there are in general multiple observations with the same set of
covariates,
then overdispersion can be identified, because the data are really equivalent to a binomial response within the groups. For example, the trivial example grp resp A 1 A 0 A 1 B 0 B 0 B 1 is equivalent to: grp successes total A 2 3 B 1 3
Agreed that this is very interesting but still a bit mysterious
as everything looks the same on the surface.
The likelihoods only differ by the log of the binomial coefficients
as can easily be verified on Ben's example above and as expected
from the likelihood equations:
Grpd <- read.table(
textConnection("grp resp
A 1
A 0
A 1
B 0
B 0
B 1"), TRUE)
UnGrpd <- read.table(
textConnection("grp successes total
A 2 3
B 1 3"), TRUE)
-logLik(glm(resp ~ grp, binomial, Grpd)) +
logLik(glm(cbind(successes, total - successes) ~ grp, binomial, UnGrpd))
with(UnGrpd, sum(log(choose(total, successes))))
However, looking at the outputs of the glm, the degrees of freedom
differ, being 4 on the binary responses and 0 for the binomial response.
Should degrees of freedom really be computed differently in the two cases
since it is easy to transform the two cases back and forth?
And, if so, what does that mean?
Ken
Kenneth Knoblauch Inserm U846 Stem-cell and Brain Research Institute Department of Integrative Neurosciences 18 avenue du Doyen L?pine 69500 Bron France tel: +33 (0)4 72 91 34 77 fax: +33 (0)4 72 91 34 61 portable: +33 (0)6 84 10 64 10 http://www.sbri.fr/members/kenneth-knoblauch.html
On 05/07/2013 12:48 PM, Ken Knoblauch wrote:
Ben Bolker <bbolker at ...> writes:
Highland Statistics Ltd <highstat <at> ...> writes:
This is something I always battle with given the
plethora of great model
fitting methods available for other models. I always use a variant of Hugh's suggestion and
look at the % of correct
predictions between models as a quick model
fitting statistic.
And for overdispersion I believe one way is to fit
individual level random
effects and see if this is a substantively better model.
There is more on
this in the wiki http://glmm.wikidot.com/faq
Yes, but this is unidentifiable for Bernoulli
responses (as also
explained there).
The statement on 'unidentifiable for Bernoulli responses'....well...apparently this is not that trivial. See: http://www.highstat.com/BGGLM.htm Follow the link to: the Discussion Board.... Go to: Chapter 1 Introduction to generalized linear
models
And see the topic: Can binary logistic models be
overdispersed?
Alain
That's an interesting document: I think the bottom line is: * if the Bernoulli data can be grouped, i.e. if there are in general multiple observations with the same set of
covariates,
then overdispersion can be identified, because the data are really equivalent to a binomial response within the groups. For example, the trivial example grp resp A 1 A 0 A 1 B 0 B 0 B 1 is equivalent to: grp successes total A 2 3 B 1 3
Agreed that this is very interesting but still a bit mysterious
as everything looks the same on the surface.
The likelihoods only differ by the log of the binomial coefficients
as can easily be verified on Ben's example above and as expected
from the likelihood equations:
Grpd <- read.table(
textConnection("grp resp
A 1
A 0
A 1
B 0
B 0
B 1"), TRUE)
UnGrpd <- read.table(
textConnection("grp successes total
A 2 3
B 1 3"), TRUE)
-logLik(glm(resp ~ grp, binomial, Grpd)) +
logLik(glm(cbind(successes, total - successes) ~ grp, binomial, UnGrpd))
with(UnGrpd, sum(log(choose(total, successes))))
However, looking at the outputs of the glm, the degrees of freedom
differ, being 4 on the binary responses and 0 for the binomial response.
Should degrees of freedom really be computed differently in the two cases
since it is easy to transform the two cases back and forth?
And, if so, what does that mean?
Ken
The issue of DoF seems similar in assessing Goodness of Fit for logistic regression, which can be done with grouped data (or by defining bins as in Hosmer's test, which is based on the same idea of pooling data). I would have to look back at the references below to see how these goodness-of-ift tests are affected by overdispersion. Hosmer et al. (91) The Importance of Assessing the Fit of Logistic Regression Models: A Case Study. Americal Journal of Public Health, December 1991, Vol. 81, No. 12 Hosmer DW, Lemeshow S. A goodness-of-fit test for the multiple logistic regression model. Commu in Stat. 1980;A10:1043-1069. Lemeshow S, Hosmer DW. A review of goodness-of-fit statistics for use in the development of logistic regression models.Am JEpidemioL 1982;115:92-106. Hosmer DW, Lemeshow S, Klar J. Goodness-of-fit testing for multiple logistic regression analysis when the estimated probabilities are small. Biometrcal J. 1988;30(7):1-14. Gabriel
--------------------------------------------------------------------- Gabriel Baud-Bovy tel.: (+39) 02 2643 4839 (office) UHSR University (+39) 02 2643 3429 (laboratory) via Olgettina, 58 (+39) 02 2643 4891 (secretary) 20132 Milan, Italy fax: (+39) 02 2643 4892
This does look like a documentation typo. Will be fixed in lme4.0 on R-forge; we probably won't release another version of lme4 to CRAN before the 1.0 release.
No problem; the present format is reasonably easily deciphered, especially if you have an older version's help file about.
An aside, though; I found it because I was trying to get numbers out of VarCorr for further use (specifically, I needed the SDs rather than variances), and made sure I was getting the right data by comparing with the summary output.
Which leads me to a suggestion. All the numbers (at least for my model) are packaged much more tidily in the formatted table of random effects in summary() for lmer objects. That table, though, is apparently in character format. Wouldn't it be more useful to create that as a numerical data frame and use the default methods for data frame output to display it, or to format at print time? That way the summary object would hold the numbers, and we could get the random effect variances or sd's easily from the summary object..
Steve E
*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
On 13-07-05 09:20 AM, S Ellison wrote:
This does look like a documentation typo. Will be fixed in lme4.0 on R-forge; we probably won't release another version of lme4 to CRAN before the 1.0 release.
No problem; the present format is reasonably easily deciphered, especially if you have an older version's help file about. An aside, though; I found it because I was trying to get numbers out of VarCorr for further use (specifically, I needed the SDs rather than variances), and made sure I was getting the right data by comparing with the summary output. Which leads me to a suggestion. All the numbers (at least for my model) are packaged much more tidily in the formatted table of random effects in summary() for lmer objects. That table, though, is apparently in character format. Wouldn't it be more useful to create that as a numerical data frame and use the default methods for data frame output to display it, or to format at print time? That way the summary object would hold the numbers, and we could get the random effect variances or sd's easily from the summary object..
Maybe. I'd say we (or at I at least) am still thinking about the best way to return var-corr information in the most useful/general way possible. I had some ideas for this (see vcconv.R in the source code if you're interested), but I don't think they're going to make it in for release 1.0 ... the main thing about exporting the var-corr info (the reason it's done as character()) is that for more complex structures there are often empty correlation cells ... Probably the quickest solution to this would be to write an as.data.frame.VarCorr.merMod (yuck!) method based on converting lme4::formatVC() output to numeric ... If you want to post this as an issue on github that would be fine ...
Steve E
*******************************************************************
This email and any attachments are confidential. Any u...{{dropped:9}}