Comparing Gaussian and bêta regression
Thank you very much for the hints. The ? not very satisfactory ? is from a "theoretical" point of view: I'm not very comfortable with modeling with a Gaussian a value constrained between 0 and 10, with the extremes obtained not so rarely. From a practical point of view, it does not seem to produce unexpected results. Of course, there are some effects that are borderline significant, that also makes the question uprise: what is the part of true signal and basically inadequate model in these effects? Still finding them with a more sounded model would make them a little bit more "trustable"... For the ordinal outcome: I have wrongly selected my example value, it induced in error, sorry ; the step is 0.1 and not 0.5 ; in practice, 46 different values were observed. Integer values and, to a less extent, half-integer values are clearly over-represented, I guess because of inconscient rounding during scoring. I don't know how to handle this in a model, however, but that's another problem, and may be there is no need for that. But, for the ordinal aspect, I fear that would make too much parameters in the model... Just thinking... Would it be imaginable to make inferences on the beta-distribution model, since it seems to much better describe the data, but use the linear model on the raw scale just to have point-estimates of the changes in an easiest-to-interpret way? [despite it is problematic close to the boundaries...] Is the Gelmann & Hill book you're thinking about this one: ? Data Analysis Using Regression and Multilevel/Hierarchical Models Cambridge University Press ISBN-10: 052168689X
On Wed, Sep 19, 2018 at 09:49:51AM -0400, Ben Bolker wrote:
? ?
? On 2018-09-19 03:30 AM, Emmanuel Curis wrote:
? > Hello,
? >
? > I'm doing my first try on b?ta regression, with mixed effects model,
? > and was wondering if my reasonning is correct...
? >
? > The context is a clinical study where the outcome is a score variable,
? > with continuous values between 0 and 10 (both included) and, in
? > practice, values with only one decimal digit (eg. 1.5) There is
? > about 400 patients. Random effect is the clinician who does the
? > examination and afterthat collects the score that evaluates its
? > intervention.
? >
? > As a quick-and-dirty analysis, I did a linear mixed effect model on
? > the raw data, with lmer. Residuals and random effects are not so bad,
? > and results consistent & easy to interpret, but assuming a Gaussian
? > distribution is not very satisfactory.
?
? Can you expand on why "not very satisfactory"? Do you get unrealistic
? predictions etc.?
?
? This sounds like it could also be treated as an ordinal response (with
? 21 values {0, 0.5, 1, ... 9.5, 10}).
? >
? > Hence, I tried a b?ta regression on the data after the transformation
? > (y/10 * (n-1) + 0.5) / n, and used glmmTMB for that. And of course I
? > wondered if the fit was better.
? >
? > 1) Is it right that ln-likelihood of the model on the raw data
? > (Gaussian) and on the transformed data (b?ta) cannot be compared,
? > because they involve probability densities and not probabilities,
? > hence depend on the data scale ?
?
? You can compare log-likelihoods (actually technically they're
? log-likelihood *densities*, which is where the problem comes from) if
? you account for the scaling. In this case since you're doing a linear
? transformation the scaling should be pretty easy.
? >
? > 2) Is it right that the lmer model done on the raw data and the same
? > one done on the transformed data are conceptually the same, since
? > the transformation is linear ? so that the ln-likelihood it gives
? > is ? the same ? expressed in the two different scales? (of course,
? > coefficients and so on will be different because of the scale
? > change)
?
? Should be. (You could do a simple test of this ...)
? >
? > 3) And so, is it correct to compare the ln-likelihood (using logLik)
? > or the AIC given by glmmTMB with the b?ta model and by lmer on
? > transformed data to compare the two models (raw data Gaussian vs
? > b?ta)?
?
? I would think so.
? >
? > If so, the b?ta model seems better than the Gaussian one. But now
? > comes the interpretation problem, other than ? are coefficients
? > significantly different from 0? ?.
? >
? > 4) Since the default link is the logit for the mean, interpretation is
? > not quite clear for me. For the Gaussian model on raw data,
? > interpretation is clear, for instance ? men score 1 point lower
? > than women in average??. But how can the coefficients of the
? > b?ta-model be back-converted in a similar fashion ?
?
? You probably need to go read stuff about interpretation of
? logit/log-odds parameters: Gelman and Hill's book is good.
?
? Quick rules of thumb:
?
? * for ??x small, as for log (proportional)
? * for intermediate values, linear change in probability with
? slope ? ?/4
? * for large values, as for log ( 1 ? x )
? >
? > Would it be easier to use a log link and expression changes in the
? > scale as percent changes on the mean?
?
? This will work fine for low score values, but will run into trouble at
? the upper end of the score range.
?
? >
? > Thanks in advance,
? >
?
? _______________________________________________
? R-sig-mixed-models at r-project.org mailing list
? https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Emmanuel CURIS
emmanuel.curis at parisdescartes.fr
Page WWW: http://emmanuel.curis.online.fr/index.html