Skip to content

Model validation for Presence / Absence, (binomial) GLMs

8 messages · Highland Statistics Ltd, Ken Knoblauch, Gabriel Baud-Bovy +2 more

1 day later
#
The current CRAN package of lme4 (0.999999-2), running under R 3, says that VarCorr returns an (S4) object of  class VarCorr.

The function does seem not do that; it still returns a list as in earlier versions of lme4. 

Have I missed something, or has the help file has got a bit ahead of the code... ?

Steve Ellison
LGC

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
#
Highland Statistics Ltd <highstat at ...> writes:
That's an interesting document: I think the bottom line is:
  
  * if the Bernoulli data can be grouped, i.e. if there are
in general multiple observations with the same set of covariates,
then overdispersion can be identified, because the data are
really equivalent to a binomial response within the groups.

  For example, the trivial example

grp  resp
A    1
A    0
A    1
B    0
B    0
B    1

is equivalent to:

grp  successes total
A    2         3
B    1         3

this sort of aggregation can be done with plyr::ddply, or
in lots of different ways -- it's generally faster and 
more efficient to fit the data in the grouped form than
in the binary form.

It might have been more convenient for readers to be able to
go directly to the link:

http://www.highstat.com/BGS/GLMGLMM/pdfs/
 HILBE-Can_binary_logistic_models_be_overdispersed2Jul2013.pdf
{URL broken to make Gmane  happy}

rather than having to click 5 times to get through the
Highland Statistics material to the PDF of interest ...
#
S Ellison <S.Ellison at ...> writes:
versions of lme4.
This does look like a documentation typo.  Will be fixed
in lme4.0 on R-forge; we probably won't release another 
version of lme4 to CRAN before the 1.0 release.

  thanks
   Ben Bolker
#
Ben Bolker <bbolker at ...> writes:
plethora of great model
look at the % of correct
fitting statistic.
individual level random
There is more on
responses (as also
models
overdispersed?
covariates,
Agreed that this is very interesting but still a bit mysterious
as everything looks the same on the surface.  
The likelihoods only differ by the log of the binomial coefficients 
as can easily be verified on Ben's example above and as expected
from the likelihood equations:

Grpd <- read.table(
textConnection("grp  resp
A    1
A    0
A    1
B    0
B    0
B    1"), TRUE)

UnGrpd <- read.table(
textConnection("grp  successes total
A    2         3
B    1         3"), TRUE)

-logLik(glm(resp ~ grp, binomial, Grpd)) +
logLik(glm(cbind(successes, total - successes) ~ grp, binomial, UnGrpd))

with(UnGrpd, sum(log(choose(total, successes))))

However, looking at the outputs of the glm, the degrees of freedom
differ, being 4 on the binary responses and 0 for the binomial response.
Should degrees of freedom really be computed differently in the two cases
since it is easy to transform the two cases back and forth?
And, if so, what does that mean?

Ken
#
On 05/07/2013 12:48 PM, Ken Knoblauch wrote:
The issue of DoF seems similar in assessing Goodness of Fit for logistic 
regression, which can be done with grouped data (or by defining bins as 
in Hosmer's test, which is based on the same idea of pooling data).

I would have to look back at the references below to see how these 
goodness-of-ift tests are affected by overdispersion.

Hosmer et al. (91) The Importance of Assessing the Fit of Logistic 
Regression Models: A Case Study. Americal Journal of Public Health, 
December 1991, Vol. 81, No. 12

Hosmer DW, Lemeshow S. A goodness-of-fit test for the multiple logistic 
regression
model. Commu in Stat. 1980;A10:1043-1069.

Lemeshow S, Hosmer DW. A review of goodness-of-fit statistics for use in 
the development of logistic regression models.Am JEpidemioL 
1982;115:92-106.

Hosmer DW, Lemeshow S, Klar J. Goodness-of-fit testing for multiple 
logistic regression analysis when the estimated probabilities are small. 
Biometrcal J. 1988;30(7):1-14.

Gabriel
#
No problem; the present format is reasonably easily deciphered, especially if you have an older version's help file about.

An aside, though; I found it because I was trying to get numbers out of VarCorr for further use (specifically, I needed the SDs rather than variances), and made sure I was getting the right data by comparing with the summary output. 
Which leads me to a suggestion. All the numbers (at least for my model) are packaged much more tidily in the formatted table of random effects in summary() for lmer objects. That table, though, is apparently in character format. Wouldn't it be more useful to create that as a numerical data frame and use the default methods for data frame output to display it, or to format at print time? That way the summary object would hold the numbers, and we could get the random effect variances or sd's easily from the summary object..

Steve E


*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
#
On 13-07-05 09:20 AM, S Ellison wrote:
Maybe.  I'd say we (or at I at least) am still thinking about the best
way to return var-corr information in the most useful/general way
possible.  I had some ideas for this (see vcconv.R in the source code if
you're interested), but I don't think they're going to make it in for
release 1.0 ... the main thing about exporting the var-corr info (the
reason it's done as character()) is that for more complex structures
there are often empty correlation cells ...  Probably the quickest
solution to this would be to write an as.data.frame.VarCorr.merMod
(yuck!) method based on converting lme4::formatVC() output to numeric ...

  If you want to post this as an issue on github that would be fine ...