Binomial glmer(): appropriateness of link and influential points

4 messages · Ben Bolker, Hedyeh Ahmadi, John Maindonald

Original

1

4

Ben Bolker

Thu, Apr 22, 2021 6:21 PM #

On 4/22/21 11:45 AM, Hedyeh Ahmadi wrote:

Try the DHARMa package, which uses simulated quantile residuals to 
overcome this problem.

I would think so (to be honest, most of the advice about model 
diagnostics is based on "this works for linear models and should work, 
at least asymptotically, for GLM(M)s as well"

Best,

Hedyeh Ahmadi, Ph.D.
Statistician
Keck School of Medicine
Department of Preventive Medicine
University of Southern California

Postdoctoral Scholar
Institute for Interdisciplinary Salivary Bioscience Research (IISBR)
University of California, Irvine

LinkedIn
www.linkedin.com/in/hedyeh-ahmadi <http://www.linkedin.com/in/hedyeh-ahmadi>
<http://www.linkedin.com/in/hedyeh-ahmadi><http://www.linkedin.com/in/hedyeh-ahmadi>




------------------------------------------------------------------------
*From:* R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> on 
behalf of Ben Bolker <bbolker at gmail.com>
*Sent:* Thursday, April 22, 2021 8:21 AM
*To:* r-sig-mixed-models at r-project.org <r-sig-mixed-models at r-project.org>
*Subject:* Re: [R-sig-ME] Binomial glmer(): appropriateness of link and 
influential points


On 4/22/21 11:12 AM, Hedyeh Ahmadi wrote:

Hello all,
I have two questions regarding GLMM with binomial/logit link. Here are some information about my model/data before I ask my questions:

??? *?? My outcome is 0/1.
??? *?? I have continuous and categorical predictor.
??? *?? My data has 19000 rows with 2 observations per subject.
??? *?? My model only has one random intercept for each subject.
??? *?? I am using glmer() command in R.

My questions are as follows and any sample R code would be appreciated:

??? 1.? What's the best way to evaluate the appropriateness of my link function?
??? 2.? What's the best way to find influential points? Can I still use Cook's distance?
?????? *?? If yes, with what package?
?????? *?? What would be the rule of thumb for glmer() with binomial link for Cook's distance?

 ?? An inappropriate link function will lead to nonlinearity of the
response on the linear-predictor scale, so the first thing to check is
the fitted vs. residual plot (with a smoothed line added so you can see
the trends): either

plot(fitted_model, type=c("p", "smooth"))

(maybe include pch="." since your data set is big)

or the analog via ggplot+broom.mixed: use broom.mixed::augment() to get
a data frame including .fitted and .resid, then plot it with
geom_point() and geom_smooth().

 ?? There are "goodness-of-link" tests that might be generalizable to
GLMMs, but I'm not too familiar with them.

 ?? 2. There is an influence.merMod method for GLMM fits (it may be slow
for large data sets! You may want to set ncores>1). The 'car' package
has some additional functionality for plotting etc.

 ?? I'm not sure about rules of thumb.

 ?? If you are going to fit a mixed model with two binary observations
per cluster, you will be far from the range where PQL/Laplace/etc. are
going to be applicable; sonsider using nAGQ>1 to fit with Gauss-Hermite
quadrature.

Thank you in advance for your time.

Best,

Hedyeh Ahmadi, Ph.D.
Applied Statistician
Keck School of Medicine
Department of Preventive Medicine
University of Southern California

Postdoctoral Scholar
Institute for Interdisciplinary Salivary Bioscience Research (IISBR)
University of California, Irvine

LinkedIn
https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!5CAcAPB0aDcXfMTpzFhHXeJ2eDwVdhX2DJEP5cx9Y_4GNT5qtAwjVCB4cS-_Nus$

<https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!5CAcAPB0aDcXfMTpzFhHXeJ2eDwVdhX2DJEP5cx9Y_4GNT5qtAwjVCB4cS-_Nus$> 
<https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!5CAcAPB0aDcXfMTpzFhHXeJ2eDwVdhX2DJEP5cx9Y_4GNT5qtAwjVCB4cS-_Nus$

 >
<https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!5CAcAPB0aDcXfMTpzFhHXeJ2eDwVdhX2DJEP5cx9Y_4GNT5qtAwjVCB4cS-_Nus$ 
 ><https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!5CAcAPB0aDcXfMTpzFhHXeJ2eDwVdhX2DJEP5cx9Y_4GNT5qtAwjVCB4cS-_Nus$ >





??????? [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models__;!!LIr3w8kk_Xxm!5CAcAPB0aDcXfMTpzFhHXeJ2eDwVdhX2DJEP5cx9Y_4GNT5qtAwjVCB4m9HDgYw$

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models__;!!LIr3w8kk_Xxm!5CAcAPB0aDcXfMTpzFhHXeJ2eDwVdhX2DJEP5cx9Y_4GNT5qtAwjVCB4m9HDgYw$ 
<https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models__;!!LIr3w8kk_Xxm!5CAcAPB0aDcXfMTpzFhHXeJ2eDwVdhX2DJEP5cx9Y_4GNT5qtAwjVCB4m9HDgYw$>

Thu, Apr 22, 2021 7:02 PM #

Thank you for the DHARMa suggestion - I tried it but I am not sure how to interpret the plot from simulateResiduals(). I am getting the attached plot and I think this is pretty linear so is this a pass?

Best,

Hedyeh Ahmadi, Ph.D.
Statistician
Keck School of Medicine
Department of Preventive Medicine
University of Southern California

Postdoctoral Scholar
Institute for Interdisciplinary Salivary Bioscience Research (IISBR)
University of California, Irvine

LinkedIn
www.linkedin.com/in/hedyeh-ahmadi<http://www.linkedin.com/in/hedyeh-ahmadi>
<http://www.linkedin.com/in/hedyeh-ahmadi><http://www.linkedin.com/in/hedyeh-ahmadi>

John Maindonald

Thu, Apr 22, 2021 7:28 PM #

My comments, which were a bit off the cuff without looking at your queries
with all the care that was desirable, were designed to highlight issues with
binomial models.  Also, for checking purposes you want to plot partial
residuals against explanatory variables in turn.  As Ben suggests, plots
using DHARMa can be a good way to go.

Alternatives to fitting a mixed model are, in your case? a model with quasibinomial
error, or a betabinomial. A betabinomial using glmmTMB allows you to model the
scale parameter.  Those sorts of abilities are also available (and plots of  simulated
quantile residuals) in the gamlss package.  Which model is more appropriate will
depend on how the within subject component of variance (for the mixed model),
or the scale parameter varies (if at all) with the fitted value.

It is worth checking these alternatives.

John Maindonald             email: john.maindonald at anu.edu.au<mailto:john.maindonald at anu.edu.au>

On 23/04/2021, at 14:02, Hedyeh Ahmadi <hedyehah at usc.edu<mailto:hedyehah at usc.edu>> wrote:

Thank you for the DHARMa suggestion - I tried it but I am not sure how to interpret the plot from simulateResiduals(). I am getting the attached plot and I think this is pretty linear so is this a pass?

Best,

Hedyeh Ahmadi, Ph.D.
Statistician
Keck School of Medicine
Department of Preventive Medicine
University of Southern California

Postdoctoral Scholar
Institute for Interdisciplinary Salivary Bioscience Research (IISBR)
University of California, Irvine

LinkedIn
www.linkedin.com/in/hedyeh-ahmadi<http://www.linkedin.com/in/hedyeh-ahmadi><http://www.linkedin.com/in/hedyeh-ahmadi>
<http://www.linkedin.com/in/hedyeh-ahmadi><http://www.linkedin.com/in/hedyeh-ahmadi>

3 days later

Mon, Apr 26, 2021 9:05 AM #

Hi All,
Thank you for all your help on this - I finally found some good plots along with interpretation help and I thought I would share the link here just in case if anyone is interested:

https://github.com/florianhartig/DHARMa/issues/278
[https://opengraph.githubassets.com/c380efaec2f6833c58459e212b8ce5e36881f692f8c91082601c58e1409bc49d/florianhartig/DHARMa/issues/278]<https://github.com/florianhartig/DHARMa/issues/278>
Interpretation of DHARMa plot for logistic regression ? Issue #278 ? florianhartig/DHARMa<https://github.com/florianhartig/DHARMa/issues/278>
Question from a user: I am running a glmer() model with binomial/logit link and I assume that the smoother dash line (plot attached) should match the horizontal line at 0.50 closely so based on tha...
github.com



Best,

Hedyeh Ahmadi, Ph.D.
Statistician
Keck School of Medicine
Department of Preventive Medicine
University of Southern California

Postdoctoral Scholar
Institute for Interdisciplinary Salivary Bioscience Research (IISBR)
University of California, Irvine

LinkedIn
www.linkedin.com/in/hedyeh-ahmadi<http://www.linkedin.com/in/hedyeh-ahmadi>
<http://www.linkedin.com/in/hedyeh-ahmadi><http://www.linkedin.com/in/hedyeh-ahmadi>