Sample size and mixed models

Sat, Dec 13, 2008 7:37 PM
Section 4.5.3 of Agresti's Categorical Data Analysis (pages 140-141 of the
second edition) discusses "grouped vs ungrouped" for the binomial case.
The same issue arises in Poisson models for count data.  All individuals
with the same "covariate pattern" can be collapsed to a single record, with
the sum of the counts as the response and an offset for the sample size, and
the same fitted model is obtained.

Agresti refers to two different versions of the "saturated" model, but I
like to reserve the term "saturated" for the model that fits the grouped
data perfectly and call the other the "perfect" model (since it predicts
all the individuals correctly).

Nagelkerke's R^2 will be larger when computed using the grouped data
likelihood, but that's because the "saturated" model is the definition
of perfection in that case.  This is analogous to defining the model
"y ~ factor(x)" as perfect when assessing "y ~ x" - you're throwing away
the "within groups" sum of squares and treating the "between groups" sum
of squares as the total.

Strictly speaking, the choice of which version of n to use should probably not
be made independently of this issue.  If the count of individuals is used with the
grouped data likelihood it reduces the amount by which the R^2 value is inflated,
which is my (admittedly weak) reason for the blanket recommendation.

Regards,   Rob
Andrew Robinson wrote:
Sample size and mixed models

Thread (6 messages)