Hello, What is the theoretical distribution of the residual deviances of a well-fitting logistic regression mixed model? The background of my question is as follows: I am looking for a way to combine the ideas from regression tree modelling (aka model trees) and mixed models. I have a data set to which I want to fit a logistic regression model. My data come in groups, so I need a random effect to account for those groups. The mob function in the party package does what I need, but only for fixed-effects models. As I understand it, that function arranges the data according to the levels of a certain categorical predictor. Next, it looks at the sequence obtained by cumulating the deviance residuals. Then a hypothesis test is done to assess whether this sequence can plausibly be a Brownian motion. If it isn't a Brownian motion, that is an indication that the data set should be splitted in two and that two separate models should be fitted. This process is repeated so that a binary tree is produced, with a logistic regression model for a part of the data in each leaf of the tree. I would be grateful for any advice on how this technique can be made to work for my problem. Best regards, Roelof Coster
Distribution of deviance residuals
2 messages · Roelof Coster, Ben Bolker
1 day later
Roelof Coster <roelofcoster at ...> writes:
Hello, What is the theoretical distribution of the residual deviances of a well-fitting logistic regression mixed model? The background of my question is as follows: I am looking for a way to combine the ideas from regression tree modelling (aka model trees) and mixed models. I have a data set to which I want to fit a logistic regression model. My data come in groups, so I need a random effect to account for those groups. The mob function in the party package does what I need, but only for fixed-effects models. As I understand it, that function arranges the data according to the levels of a certain categorical predictor. Next, it looks at the sequence obtained by cumulating the deviance residuals. Then a hypothesis test is done to assess whether this sequence can plausibly be a Brownian motion. If it isn't a Brownian motion, that is an indication that the data set should be splitted in two and that two separate models should be fitted. This process is repeated so that a binary tree is produced, with a logistic regression model for a part of the data in each leaf of the tree. I would be grateful for any advice on how this technique can be made to work for my problem.
Do you mean the deviance residuals (i.e. the per-observation contributions to the deviance, or the signed square root of the contribution) or the total residual deviance (i.e., the sum of squared deviance residuals)? If the former, I think you're probably in trouble -- empirically, the distribution of residuals is usually pretty ugly, and theoretically I can't think of a reason it should be nice. If the latter, then you can presumably just rely on asymptotic theory which would say (I'm being sloppy here of course) that the sum of squares of lots of iid things should be chi-square distributed (and then eventually Normal). For what it's worth, a great deal of the theory of GLMMs is inherited from GLM theory, so if you can solve your problem or find a solution for a plain old *non*-mixed logistic regression, it is likely to work reasonably well for a mixed logistic regression as well (provided you have a reasonable number of levels of the random effect/your estimate isn't singular). (Conversely if it's known to be nasty for ordinary logistic regression you're probably screwed.) As always I'm happy to be corrected by more sensible/knowledgeable people.