Skip to content

deviance in glm

2 messages · Chong Gu, Brian Ripley

#
Folks,

I am not sure if it's a feature or a "bug".  The same is observed in
Splus.

Suppose I have Poisson counts, and I would like to estimate the
parameter using glm.  I would assume I can feed it the individual
counts, or I can feed it the distinctive counts with the frequency as
the weights, and I would get the same results.  I do, but the deviance
df are returned differently.  Here is a short session.

y<-rpois(1000,5)
fr<-as.vector(table(y))
yy<-0:(length(fr)-1)
glm(y~1,poisson)
glm(yy~1,poisson,weight=fr)

I believe the first call to glm gives the correct df, but with real
data, do I have to break up the tabulated data to get it right from R
(or Splus), or I just have to manually calculate the df?  Can this be
potentially misleading to practitioners?

Or maybe my thinking was off?

I tried similar things with Bernoulli data and got similar results.

Chong Gu
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 8 Mar 2001, Chong Gu wrote:

            
The deviance is by comparison with a saturated model, and because the data
are different, so is the saturated model.  For this problem, the saturated
model has one parameter per x observation, not one per y observation.  So
in the second case you are specifying that there are 14 (in my run) (x,y)
pairs that occurred a number of times *and* this would always have
occurred.   Given that you grouped on y, that seems invalid except as a
computational device.
Grouping data can also affect the likelihood and the MLE in other problems.
It's neither a feature nor a bug, but part of the definitions.