more on nbinom1 vs 2
Very short answer: all your insights here look correct, and well expressed. I think the problem with your earlier aggregation question (I vaguely remember it ) is a fairly common one with well-posed but moderately interesting/difficult questions: questions that take more than a few minutes to come up with an adequate answer/understanding, and that don't happen to be in someone's wheelhouse -- so that they've either thought about it before and have an answer ready, *or* it's worth it to them to take some time to work on it -- often get neglected and gradually sink into the pile. This is one of the advantages of forums like StackOverflow or CrossValidated that (1) are much easier to search for old questions; (2) allow people to offer 'brownie points' for solutions to interesting questions. (I think a sufficient interval has gone by that it would be reasonable to cross-post it to CrossValidated ...)
On 10/22/20 9:40 PM, Don Cohen wrote:
I'm still hoping to see some reaction to my message of 10-16 on aggregation of count data. In the mean while, here's an attempt to explain something related. I'm again hoping for feedback - is this all correct, am I missing something important? I now think I (finally) understand that nbinom1 is really the SAME distribution as nbinom2. How well a set of values fits a single NB distribution has nothing to do with whether the distribution is described by the parameters of nbinom1 or those of nbinom2. It is a set of different NB distributions that can fit one better than the other, and most models actually do predict a set of distributions rather than just one. In particular, if there are covariates, then a different distribution is predicted for each value of the covariates. If there are no covariates, then there should be no difference between nbinom1 and nbinom2, except for different overdispersion parameters predicting the same variance. (This variance is presumably observed in the different result values.) Getting rid of covariates, if glmmTMB(result~1,family=nbinom1,data=D) says Overdispersion parameter for nbinom1 family (): x with (Intercept) y while glmmTMB(result~1,family=nbinom2,data=D) says Overdispersion parameter for nbinom2 family (): z with (Intercept) w then y better be the same as w, since the mean would be exp(y) in the first case and exp(w) in the second. Similarly the variance would be mean * (1 + param) = exp(y) * (1+x) in the first case and mean * (1 + (mean/param)) = exp(w) * (1+ (exp(w)/z)) in the second. which again should be the same value. This was indeed what I found when I tried it. This remained true when I added an offset: result~offset(log(exposure)) However, when I added a random effect: result ~ (1|group) I was surprised to get different results for nbinom1 and nbinom2, i.e., different AIC and different intercept. I also noticed a difference in the variance of the random effect. I now think I understand why. The random effect allows different means and variances for different groups, and this (unlike any previous examples) can agree with nbinom1 better or worse than nbinom2, depending on whether the relation between the means and variances of the groups is closer to linear or quadratic. Perhaps I should stop here and wait for replies before moving on to how this is related to the aggregation issue in the earlier message.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models