Comparing Gaussian and bêta regression
On 2018-09-19 03:30 AM, Emmanuel Curis wrote:
Hello, I'm doing my first try on b?ta regression, with mixed effects model, and was wondering if my reasonning is correct... The context is a clinical study where the outcome is a score variable, with continuous values between 0 and 10 (both included) and, in practice, values with only one decimal digit (eg. 1.5) There is about 400 patients. Random effect is the clinician who does the examination and afterthat collects the score that evaluates its intervention. As a quick-and-dirty analysis, I did a linear mixed effect model on the raw data, with lmer. Residuals and random effects are not so bad, and results consistent & easy to interpret, but assuming a Gaussian distribution is not very satisfactory.
Can you expand on why "not very satisfactory"? Do you get unrealistic
predictions etc.?
This sounds like it could also be treated as an ordinal response (with
21 values {0, 0.5, 1, ... 9.5, 10}).
Hence, I tried a b?ta regression on the data after the transformation (y/10 * (n-1) + 0.5) / n, and used glmmTMB for that. And of course I wondered if the fit was better. 1) Is it right that ln-likelihood of the model on the raw data (Gaussian) and on the transformed data (b?ta) cannot be compared, because they involve probability densities and not probabilities, hence depend on the data scale ?
You can compare log-likelihoods (actually technically they're log-likelihood *densities*, which is where the problem comes from) if you account for the scaling. In this case since you're doing a linear transformation the scaling should be pretty easy.
2) Is it right that the lmer model done on the raw data and the same one done on the transformed data are conceptually the same, since the transformation is linear ? so that the ln-likelihood it gives is ? the same ? expressed in the two different scales? (of course, coefficients and so on will be different because of the scale change)
Should be. (You could do a simple test of this ...)
3) And so, is it correct to compare the ln-likelihood (using logLik) or the AIC given by glmmTMB with the b?ta model and by lmer on transformed data to compare the two models (raw data Gaussian vs b?ta)?
I would think so.
If so, the b?ta model seems better than the Gaussian one. But now comes the interpretation problem, other than ? are coefficients significantly different from 0? ?. 4) Since the default link is the logit for the mean, interpretation is not quite clear for me. For the Gaussian model on raw data, interpretation is clear, for instance ? men score 1 point lower than women in average??. But how can the coefficients of the b?ta-model be back-converted in a similar fashion ?
You probably need to go read stuff about interpretation of logit/log-odds parameters: Gelman and Hill's book is good. Quick rules of thumb: * for ??x small, as for log (proportional) * for intermediate values, linear change in probability with slope ? ?/4 * for large values, as for log ( 1 ? x )
Would it be easier to use a log link and expression changes in the scale as percent changes on the mean?
This will work fine for low score values, but will run into trouble at the upper end of the score range.
Thanks in advance,