A three-level GLMM with binomial link in R
On 3/11/21 1:11 PM, Hedyeh Ahmadi wrote:
Hi Ben,
Thank you for all the help and informative replies. Here is some info
about my data and my answers:
* Yes, I am running the exact model however this is just my toy model
to get the code to work.
* Reminder, I am trying to get lmer with a continuous outcome (but
same data structure, with 3 random intercept) work before moving to
glmer() with a logit link with a categorical 0/1 outcome.
* Here are some information about my data:
o X1: num [1:22945] 9.25 NA 9.22 NA 9.22 NA 9.42 NA 9.25 NA ...
o X2: int [1:22945] 117 140 128 125 113 138 126 130 120 121 ...
o X3: chr [1:22945] "Female" "Male" "Male" "Male" "Male"
o site: chr [1:22945] "site15" "site15" "site15" "site15" "site15"
... (I have 21 sites)
o family: int [1:22945] 0 1 1 1 1 3 3 4 4 5 ...
o id: chr [1:22945] "id1" "id2" "id2" "id3" "id3"...
* The following model gives me the error that "Error: couldn't
evaluate grouping factor id:(family:site) within model frame: try
adding grouping factor to data frame explicitly if possible" - I ran
this same model on a supercomputer and it *gave me the same error*,
so this is not a memory limit issue.
So far I can't replicate this. It seemed fishy to me that family was an
integer (I would try explicitly converting it to a factor), but doesn't
actually cause any problems in my example.
(This differs from your example but not in ways I would expect to be
important: in particular, the types of the grouping variables are the
same [chr, int, chr]
n <- 10000
set.seed(101)
dd <- data.frame(y=rnorm(n),
x1=rnorm(n),
x2=rnorm(n),
x3=rnorm(n),
site=sample(paste0("s",1:15),size=n,replace=TRUE),
family=sample(1:100,size=n,replace=TRUE),
id=sample(paste0("id",1:20),size=n,replace=TRUE))
library(lme4)
lmer(y~1 + x1 + x2 + x3 + (1|site/family/id), data=dd)
###
What is meant by "adding grouping factor to data frame explicitly" would
be in this case expanding out the nested terms:
dd <- within(dd,
{
site_family <- interaction(site,family)
site_family_id <- interaction(site,family,id)
})
lmer(y~1 + x1 + x2 + x3 + (1|site) + (1|site_family) +
(1|site_family_id), data =dd)
o m2 <- lmer(cbcl_scr_syn_internal_r ~ 1 + X1 + X2 + X3 +
(1|site/family/id), data=dd, REML = FALSE)
* The following *model ran on a supercomputer*.? This is the model
that previously would give me the error "Error: cannot allocate
vector of size 2.3 Gb". Here I am*reducing the number of sites to
see if it runs, and it did* run on a supercomputer.
o m11 <- lmer(cbcl_scr_syn_internal_r ~ 1 + X1 +X2 + X3
+(1|site/family/id) ,
? ? data=dd[-which(dd$abcd_site %in% c("site01","site02",
"site04", "site05", "site06", "site07", "site08","site09")),],
na.action=na.exclude, REML=FALSE)
Questions:
1. What I don't understand is that in m2, why lmer() does not
recognizes the grouping factor that it does recognize in m11?
I'm not sure either.
2. At this point I don't think this is a memory issue as *m2 did not
run on a supercompute*r and it's giving me *the same error as my
personal computer*. However, in m11, the same grouping structure is
being recognized when I reduce the number of sites from 21 to 13 -
could this be because of severe imbalance in my family/id nesting,
meaning most families have only one kid but some have 2 and a few
have 3 kids?
It seems surprising. Is there any way we can get a reproducible example? For example, suppose you randomize your X1, X2, X3 and anonymize all of your grouping variables: would you then be able to share the data set? Does the problem (let's say just the first problem) still occur if you leave out the fixed effects? (I would expect so, and that would simplify things somewhat - we're trying to get to the *simplest* example that still demonstrates the problem.)
Sorry about the long email and thank you for your help in advance. Best, Hedyeh Ahmadi, Ph.D. Statistician Keck School of Medicine Department of Preventive Medicine University of Southern California Postdoctoral Scholar Institute for Interdisciplinary Salivary Bioscience Research (IISBR) University of California, Irvine LinkedIn www.linkedin.com/in/hedyeh-ahmadi <http://www.linkedin.com/in/hedyeh-ahmadi> <http://www.linkedin.com/in/hedyeh-ahmadi><http://www.linkedin.com/in/hedyeh-ahmadi> ------------------------------------------------------------------------ *From:* Ben Bolker <bbolker at gmail.com> *Sent:* Tuesday, March 9, 2021 4:12 PM *To:* Hedyeh Ahmadi <hedyehah at usc.edu>; r-sig-mixed-models at r-project.org <r-sig-mixed-models at r-project.org> *Subject:* Re: [R-sig-ME] A three-level GLMM with binomial link in R ?? OK, then it would be very helpful to have more details about your data set. Are one or more of X1, X2, X3 categorical predictors with many levels ... ?? Any chance we can see str() or summary() ?? Are you using exactly the formula you told us about, or something slightly different? ?? The computational burden of a large number of fixed effect parameters will be much worse for glmer than for lmer, unless you use nAGQ=0. glmmTMB would be able to help in two ways: (1) it doesn't scale as badly for large numbers of fixed effects; (2) it allows sparse fixed-effect model matrices (something we keep meaning to (re)implement in lme4) ... ? Making X1 into a 1000-level factor brings the required memory up by about a factor of 10 and increases the run time from seconds to minutes. Base model: ????????????????????? Function_Call Elapsed_Time_sec Total_RAM_Used_MiB 1 m<-lmer(form,data=dd,REML=FALSE)??????????? 1.903?????????????? 10.4 ?? Peak_RAM_Used_MiB 1???????????? 107.7 With X=1000-level factor (no derivative calculations): ? Function_Call 1 m2<-lmer(form,data=dd2,REML=FALSE,control=lmerControl(calc.derivs=FALSE),verbose=10) ?? Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB 1????????? 559.735????????????? 608.8???????????? 984.5 With derivative calculation at the end: ????????????????????????????????? Function_Call Elapsed_Time_sec 1 m3<-lmer(form,data=dd2,REML=FALSE,verbose=10)????????? 677.975 ?? Total_RAM_Used_MiB Peak_RAM_Used_MiB 1????????????? 608.7??????????? 1148.7 ?> ?> --- ibrary(lme4) set.seed(101) N <- 22945 ns <- 100 nf <- 100 nid <- 100 dd <- data.frame(X1=rnorm(N), ????????????????? X2=rnorm(N), ????????????????? X3=rnorm(N), ????????????????? site=sample(ns, replace=TRUE, size=N), ????????????????? family=sample(nf, replace=TRUE, size=N), ????????????????? id=sample(nid, replace=TRUE, size=N)) form <- Y ~ 1 + X1 + X2 + X3 + (1|site/family/id) dd$Y <- simulate(form[-2], newdata=dd, family=gaussian, newparams=list(beta=rep(1,4),theta=rep(1,3),sigma=1))[[1]] library(peakRAM) system.time(mem <- peakRAM( ???? m <- lmer(form, data=dd, REML=FALSE) ???? )) print(mem) dd2 <- transform(dd, ???????? X1=factor(sample(1000,size=N,replace=TRUE))) system.time(mem2 <- peakRAM( ???? m2 <- lmer(form, data=dd2, REML=FALSE, control=lmerControl(calc.derivs=FALSE), verbose=10) )) print(mem2) system.time(mem3 <- peakRAM( ???? m3 <- lmer(form, data=dd2, REML=FALSE, verbose=10) )) print(mem3) On 3/8/21 11:29 AM, Hedyeh Ahmadi wrote:
Thank you for the toy example. This same example in my PC gives the following memory use output: ? > mem ? ? ? ? ? ? ? ? ? ? ? ?Function_Call Elapsed_Time_sec ? ? ? ? Total_RAM_Used_MiB ?? Peak_RAM_Used_MiB ? ? m<-lmer(form,data=dd,REML=FALSE) ? ? ? ? ? ? 2.64 ? ? ? ? ? ? ? ? ? 10.3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 130.1 When I run peakRAM() for my actual data, it take one hour plus then my PC slows down, then I have to stop R to be able to use my PC. Best, Hedyeh Ahmadi, Ph.D. Statistician Keck School of Medicine Department of Preventive Medicine University of Southern California Postdoctoral Scholar Institute for Interdisciplinary Salivary Bioscience Research (IISBR) University of California, Irvine LinkedIn https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!5O8KlsAUILt44__IHSzbCUgVs7dWxAB8FS9wiKC3j5P6hGrnuJSGckmCIR91AZs$
<https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!5O8KlsAUILt44__IHSzbCUgVs7dWxAB8FS9wiKC3j5P6hGrnuJSGckmCIR91AZs$> <https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!5O8KlsAUILt44__IHSzbCUgVs7dWxAB8FS9wiKC3j5P6hGrnuJSGckmCIR91AZs$
> <https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!5O8KlsAUILt44__IHSzbCUgVs7dWxAB8FS9wiKC3j5P6hGrnuJSGckmCIR91AZs$ ><https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!5O8KlsAUILt44__IHSzbCUgVs7dWxAB8FS9wiKC3j5P6hGrnuJSGckmCIR91AZs$ > ------------------------------------------------------------------------ *From:* R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> on behalf of Ben Bolker <bbolker at gmail.com> *Sent:* Friday, March 5, 2021 5:44 PM *To:* r-sig-mixed-models at r-project.org <r-sig-mixed-models at r-project.org> *Subject:* Re: [R-sig-ME] A three-level GLMM with binomial link in R ? ?? Here's an example that conforms approximately to the structure of your data: on my machine the peak RAM usage is 121 Mb, far short of your 2.3 Gb ...? so I still suspect there is something going on that we don't know about/haven't guessed yet. set.seed(101) N <- 22945 ns <- 100 nf <- 100 nid <- 100 dd <- data.frame(X1=rnorm(N), ? ????????????????? X2=rnorm(N), ? ????????????????? X3=rnorm(N), ? ????????????????? site=sample(ns, replace=TRUE, size=N), ? ????????????????? family=sample(nf, replace=TRUE, size=N), ? ????????????????? id=sample(nid, replace=TRUE, size=N)) form <- Y ~ 1 + X1 + X2 + X3 + (1|site/family/id) dd$Y <- simulate(form[-2], newdata=dd, family=gaussian, newparams=list(beta=rep(1,4),theta=rep(1,3),sigma=1))[[1]] library(peakRAM) mem <- peakRAM( ? ???? m <- lmer(form, data=dd, REML=FALSE) ) mem ? ????????????????????? Function_Call Elapsed_Time_sec Total_RAM_Used_MiB 1 m<-lmer(form,data=dd,REML=FALSE)??????????? 1.754?????????????? 10.3 ? ?? Peak_RAM_Used_MiB 1???????????? 120.7 On 3/5/21 4:52 PM, Robert Long wrote:
Perhaps because of the different ways they store objects internally while fitting the models. On Fri, 5 Mar 2021, 21:47 Hedyeh Ahmadi, <hedyehah at usc.edu> wrote:
Thank you for the reply Robert - If I am running out of memory in lmer(), do you know why lme() runs just fine? Best, Hedyeh Ahmadi, Ph.D. Statistician Keck School of Medicine Department of Preventive Medicine University of Southern California Postdoctoral Scholar Institute for Interdisciplinary Salivary Bioscience Research (IISBR) University of California, Irvine LinkedIn https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!7bcqr_mYJEwUBwSQ-hoyGUZPrR-Wz2RGDxVbFlFtbA0oveGsE7tg-2k70OXW4fU$
? >
? >
------------------------------ *From:* Robert Long <longrob604 at gmail.com> *Sent:* Friday, March 5, 2021 1:43 PM *To:* Hedyeh Ahmadi <hedyehah at usc.edu> *Cc:* Dexter Locke <dexter.locke at gmail.com>; R-mixed models mailing list < r-sig-mixed-models at r-project.org>; Megan M. Herting <herting at usc.edu>; Elisabeth Burnor <burnor at usc.edu> *Subject:* Re: [R-sig-ME] A three-level GLMM with binomial link in R You've run out of memory. Try running it on a machine with much larger RAM. On Fri, 5 Mar 2021, 21:18 Hedyeh Ahmadi, <hedyehah at usc.edu> wrote: Hi - Thank you for the informative replies. I will try those other packages. If MASS::glmmPQL uses lme() then this should work for me. My data structure is complex so it's hard to give reproducible example but for example for two of my models I get the following errors in lmer() while lme() runs smoothly. 1- Error: cannot allocate vector of size 2.3 Gb. 2- Error: couldn't evaluate grouping factor id:(family:site) within model frame: try adding grouping factor to data frame explicitly if possible. (note that the id variable works for simpler lmer() models so the variable itself is not an issue) My model looks like the following nested structure: lmer(Y ~ 1 + X1 + X2 + X3 + (1|site/family/id), data=dd, REML = FALSE) Best, Hedyeh Ahmadi, Ph.D. Statistician Keck School of Medicine Department of Preventive Medicine University of Southern California Postdoctoral Scholar Institute for Interdisciplinary Salivary Bioscience Research (IISBR) University of California, Irvine LinkedIn https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!7bcqr_mYJEwUBwSQ-hoyGUZPrR-Wz2RGDxVbFlFtbA0oveGsE7tg-2k70OXW4fU$
________________________________ From: Dexter Locke <dexter.locke at gmail.com> Sent: Friday, March 5, 2021 1:01 PM To: Hedyeh Ahmadi <hedyehah at usc.edu> Cc: r-sig-mixed-models at r-project.org <r-sig-mixed-models at r-project.org>; Megan M. Herting <herting at usc.edu>; Elisabeth Burnor <burnor at usc.edu> Subject: Re: [R-sig-ME] A three-level GLMM with binomial link in R Hi Hedyeh, What is the problem you are having? Specifically, what is the estimation issue with lmer and lme? Can you provide a reproducible example? Can you provide the complete errors you are seeing? The current questions are vague, so it will be challenging for list members to provide much guidance. -Dexter On Fri, Mar 5, 2021 at 3:27 PM Hedyeh Ahmadi <hedyehah at usc.edu<mailto: hedyehah at usc.edu>> wrote: Hi All, I was wondering what would be a powerful package in R to run GLMM with logit link that can handle a data set with N=22945 and 3 nested random intercepts. So far, I have tried glmer() from lme4 and it's giving me a lot of estimation issues. Any other package I should try? I am asking for another package as I am having the same issue with lmer() for similar LMM with continuous outcome, while lme() from nlme package runs the models with no problem. Thank you in advance. Best, Hedyeh Ahmadi, Ph.D. Statistician Keck School of Medicine Department of Preventive Medicine University of Southern California Postdoctoral Scholar Institute for Interdisciplinary Salivary Bioscience Research (IISBR) University of California, Irvine LinkedIn https://urldefense.com/v3/__http://www.linkedin.com/in/hedyeh-ahmadi__;!!LIr3w8kk_Xxm!7bcqr_mYJEwUBwSQ-hoyGUZPrR-Wz2RGDxVbFlFtbA0oveGsE7tg-2k70OXW4fU$
????????? [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models__;!!LIr3w8kk_Xxm!7bcqr_mYJEwUBwSQ-hoyGUZPrR-Wz2RGDxVbFlFtbA0oveGsE7tg-2k7pcSE0ic$
????????? [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models__;!!LIr3w8kk_Xxm!7bcqr_mYJEwUBwSQ-hoyGUZPrR-Wz2RGDxVbFlFtbA0oveGsE7tg-2k7pcSE0ic$
??????? [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models__;!!LIr3w8kk_Xxm!7bcqr_mYJEwUBwSQ-hoyGUZPrR-Wz2RGDxVbFlFtbA0oveGsE7tg-2k7pcSE0ic$
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models__;!!LIr3w8kk_Xxm!7bcqr_mYJEwUBwSQ-hoyGUZPrR-Wz2RGDxVbFlFtbA0oveGsE7tg-2k7pcSE0ic$