Folks, I am not sure if it's a feature or a "bug". The same is observed in Splus. Suppose I have Poisson counts, and I would like to estimate the parameter using glm. I would assume I can feed it the individual counts, or I can feed it the distinctive counts with the frequency as the weights, and I would get the same results. I do, but the deviance df are returned differently. Here is a short session. y<-rpois(1000,5) fr<-as.vector(table(y)) yy<-0:(length(fr)-1) glm(y~1,poisson) glm(yy~1,poisson,weight=fr) I believe the first call to glm gives the correct df, but with real data, do I have to break up the tabulated data to get it right from R (or Splus), or I just have to manually calculate the df? Can this be potentially misleading to practitioners? Or maybe my thinking was off? I tried similar things with Bernoulli data and got similar results. Chong Gu -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
deviance in glm
2 messages · Chong Gu, Brian Ripley
On Thu, 8 Mar 2001, Chong Gu wrote:
Folks, I am not sure if it's a feature or a "bug". The same is observed in Splus. Suppose I have Poisson counts, and I would like to estimate the parameter using glm. I would assume I can feed it the individual counts, or I can feed it the distinctive counts with the frequency as the weights, and I would get the same results. I do, but the deviance df are returned differently. Here is a short session. y<-rpois(1000,5) fr<-as.vector(table(y)) yy<-0:(length(fr)-1) glm(y~1,poisson) glm(yy~1,poisson,weight=fr) I believe the first call to glm gives the correct df, but with real data, do I have to break up the tabulated data to get it right from R (or Splus), or I just have to manually calculate the df? Can this be potentially misleading to practitioners? Or maybe my thinking was off?
The deviance is by comparison with a saturated model, and because the data are different, so is the saturated model. For this problem, the saturated model has one parameter per x observation, not one per y observation. So in the second case you are specifying that there are 14 (in my run) (x,y) pairs that occurred a number of times *and* this would always have occurred. Given that you grouped on y, that seems invalid except as a computational device.
I tried similar things with Bernoulli data and got similar results.
Grouping data can also affect the likelihood and the MLE in other problems. It's neither a feature nor a bug, but part of the definitions.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._