Hi all, I'm modeling fMRI imaging data using lme4. There are 4 time points and roughly 550 subjects with 27,730 regions of interest (these are the variables). Since I have access to a super computer, my thought was to create a long dataset with a repeated measures of regions of interest per time point and then subjects over the 4 time points. I'm using the model below. I gather the regions of interest using the super computer because it ends up being roughly 70 million something observations. Timepoint is discrete and timepoint.nu is just numerical time point. lmer(connectivity ~ roi * timepoint + (timepoint.nu|subjectID) + (timepoint.nu|subjectID:roi), na.action = 'na.exclude', control = lmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), REML = FALSE, data) I received back the following error: "cannot allocate vector of size 30206.2 GbExecution halted" So I'm wondering how I can only pull the essential parameters I need (group means vs individual fixed effects) while modeling, such that the super computer can finish the job without exhausting the memory. I say group means because I will eventually be adding in covariates. Also, the super computer rules are that the job must finish within two days. I'm not sure that this would, so I'm wondering whether there is any way to parallel code in lme4 such that I could make access of multiple cores and nodes. I've included a slice of data here: https://drive.google.com/file/d/1mhTj6qZZ2nT35fXUuYG_ThQ-QtWbb-8L/view?usp=sharing Thanks much, James
Pulling specific parameters from models to prevent exhausting memory.
3 messages · Voeten, C.C., Ades, James
Hi James, You may have luck using mgcv::bam instead of lme4. It can also fit random-slopes models and is optimized for "big data", in terms of memory usage and computational efficiency. The modeling syntax is slightly different, though; the correct translation of lme4 random effects into mgcv's s(...,bs='re') terms depends on whether timepoint.nu is a covariate or a factor. HTH, Cesko
-----Original Message----- From: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> On Behalf Of Ades, James Sent: Sunday, October 18, 2020 2:01 AM To: r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Pulling specific parameters from models to prevent exhausting memory. Hi all, I'm modeling fMRI imaging data using lme4. There are 4 time points and roughly 550 subjects with 27,730 regions of interest (these are the variables). Since I have access to a super computer, my thought was to create a long dataset with a repeated measures of regions of interest per time point and then subjects over the 4 time points. I'm using the model below. I gather the regions of interest using the super computer because it ends up being roughly 70 million something observations. Timepoint is discrete and timepoint.nu is just numerical time point. lmer(connectivity ~ roi * timepoint + (timepoint.nu|subjectID) + (timepoint.nu|subjectID:roi), na.action = 'na.exclude', control = lmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), REML = FALSE, data) I received back the following error: "cannot allocate vector of size 30206.2 GbExecution halted" So I'm wondering how I can only pull the essential parameters I need (group means vs individual fixed effects) while modeling, such that the super computer can finish the job without exhausting the memory. I say group means because I will eventually be adding in covariates. Also, the super computer rules are that the job must finish within two days. I'm not sure that this would, so I'm wondering whether there is any way to parallel code in lme4 such that I could make access of multiple cores and nodes. I've included a slice of data here: https://drive.google.com/file/d/1mhTj6qZZ2nT35fXUuYG_ThQ-QtWbb- 8L/view?usp=sharing Thanks much, James [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks, Cesko. I'll look into BAM. James
From: Voeten, C.C. <c.c.voeten at hum.leidenuniv.nl>
Sent: Sunday, October 18, 2020 1:16 AM
To: Ades, James <jades at health.ucsd.edu>; r-sig-mixed-models at r-project.org <r-sig-mixed-models at r-project.org>
Subject: RE: Pulling specific parameters from models to prevent exhausting memory.
Sent: Sunday, October 18, 2020 1:16 AM
To: Ades, James <jades at health.ucsd.edu>; r-sig-mixed-models at r-project.org <r-sig-mixed-models at r-project.org>
Subject: RE: Pulling specific parameters from models to prevent exhausting memory.
Hi James, You may have luck using mgcv::bam instead of lme4. It can also fit random-slopes models and is optimized for "big data", in terms of memory usage and computational efficiency. The modeling syntax is slightly different, though; the correct translation of lme4 random effects into mgcv's s(...,bs='re') terms depends on whether timepoint.nu is a covariate or a factor. HTH, Cesko > -----Original Message----- > From: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> On > Behalf Of Ades, James > Sent: Sunday, October 18, 2020 2:01 AM > To: r-sig-mixed-models at r-project.org > Subject: [R-sig-ME] Pulling specific parameters from models to prevent > exhausting memory. > > Hi all, > > I'm modeling fMRI imaging data using lme4. There are 4 time points and > roughly 550 subjects with 27,730 regions of interest (these are the variables). > Since I have access to a super computer, my thought was to create a long > dataset with a repeated measures of regions of interest per time point and > then subjects over the 4 time points. I'm using the model below. I gather the > regions of interest using the super computer because it ends up being > roughly 70 million something observations. Timepoint is discrete and > timepoint.nu is just numerical time point. > > lmer(connectivity ~ roi * timepoint + (timepoint.nu|subjectID) + > (timepoint.nu|subjectID:roi), na.action = 'na.exclude', control = > lmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), REML = FALSE, > data) > > I received back the following error: "cannot allocate vector of size 30206.2 > GbExecution halted" > > So I'm wondering how I can only pull the essential parameters I need (group > means vs individual fixed effects) while modeling, such that the super > computer can finish the job without exhausting the memory. I say group > means because I will eventually be adding in covariates. > > Also, the super computer rules are that the job must finish within two days. > I'm not sure that this would, so I'm wondering whether there is any way to > parallel code in lme4 such that I could make access of multiple cores and > nodes. > > I've included a slice of data here: > https://drive.google.com/file/d/1mhTj6qZZ2nT35fXUuYG_ThQ-QtWbb- > 8L/view?usp=sharing > > Thanks much, > > James > > > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-mixed-models at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models