Hello,
I'm trying to fit a GLMM that accounts for spatial autocorrelation (SAC)
using the spaMM::fitme() function in R. I have a longitudinal data set
where observations were collected repeatedly from a number of sites over 13
years. I'm interested in understanding what the effect of time (year) is on
the dependent variable (y), as well as the fixed effect of a categorical
variable (class) while accounting for the random factors biome, continent,
and ID (a unique ID for each site sampled). My full data set contains ~ 180
000 rows and attached is a subset of these data ('sampleDF'). My current
fitme() model looks like this:
library(spaMM)
M1 <- fitme(y ~ year + class + (1|biome) + (1|continent) + (1|ID) +
Matern(1|long + lat), data = df, family = "gaussian", method = "REML")
I have two questions:
1) I'm uncertain if this is an appropriate way of applying the
spaMM::fitme() function to longitudinal data. I have some experience with
fitting GLS models that account for SAC to a longitudinal data set where I
had to group the data set by year using the nlme::groupedData() function
before fitting the model. Does a similar method need to be used in the case
of spaMM:fitme() and longitudinal data?
2) Is there another R package out there that can create a similar model (a
GLMM that accounts for SAC)?. I've found very few resources explaining the
use of functions in the spaMM package other than the user guide (F.
Rousset, 2020. An introduction to the spaMM package for mixed models) and
I'm not quite getting the help that I need from it. I'm wondering if
there's another approach to modeling these data that has a broader user
base and thus more easily accessible resources / online help (ex. stack
exchange / cross validated Qs and As).
Thank you!
Sarah
spaMM::fitme() - a glmm for longitudinal data that accounts for spatial autocorrelation
15 messages · Francois Rousset, Mollie Brooks, Thierry Onkelinx +1 more
Dear Sarah, I don't know the spaMM package. Have a look at the inlabru package. It has several tutorials on its website (inlabru.org). Or the INLA package (r-inla.org). The same models but inlabru has a more user friendly interface. I can recommend Zuur et al (2017) Spatial, Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op ma 13 jul. 2020 om 20:31 schreef Sarah Chisholm <schis023 at uottawa.ca>:
Hello,
I'm trying to fit a GLMM that accounts for spatial autocorrelation (SAC)
using the spaMM::fitme() function in R. I have a longitudinal data set
where observations were collected repeatedly from a number of sites over 13
years. I'm interested in understanding what the effect of time (year) is on
the dependent variable (y), as well as the fixed effect of a categorical
variable (class) while accounting for the random factors biome, continent,
and ID (a unique ID for each site sampled). My full data set contains ~ 180
000 rows and attached is a subset of these data ('sampleDF'). My current
fitme() model looks like this:
library(spaMM)
M1 <- fitme(y ~ year + class + (1|biome) + (1|continent) + (1|ID) +
Matern(1|long + lat), data = df, family = "gaussian", method = "REML")
I have two questions:
1) I'm uncertain if this is an appropriate way of applying the
spaMM::fitme() function to longitudinal data. I have some experience with
fitting GLS models that account for SAC to a longitudinal data set where I
had to group the data set by year using the nlme::groupedData() function
before fitting the model. Does a similar method need to be used in the case
of spaMM:fitme() and longitudinal data?
2) Is there another R package out there that can create a similar model (a
GLMM that accounts for SAC)?. I've found very few resources explaining the
use of functions in the spaMM package other than the user guide (F.
Rousset, 2020. An introduction to the spaMM package for mixed models) and
I'm not quite getting the help that I need from it. I'm wondering if
there's another approach to modeling these data that has a broader user
base and thus more easily accessible resources / online help (ex. stack
exchange / cross validated Qs and As).
Thank you!
Sarah
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Sarah, perhaps try to contact that package's author directly... That being said, I am not quite sure what the question is, maybe because I am not familiar with constraints on the models nlme can fit and with its syntax. What would be the formula you would use with glmer if there were no spatial random effect? Best, F. Le 12/07/2020 ? 23:25, Sarah Chisholm a ?crit?:
Hello,
I'm trying to fit a GLMM that accounts for spatial autocorrelation (SAC)
using the spaMM::fitme() function in R. I have a longitudinal data set
where observations were collected repeatedly from a number of sites over 13
years. I'm interested in understanding what the effect of time (year) is on
the dependent variable (y), as well as the fixed effect of a categorical
variable (class) while accounting for the random factors biome, continent,
and ID (a unique ID for each site sampled). My full data set contains ~ 180
000 rows and attached is a subset of these data ('sampleDF'). My current
fitme() model looks like this:
library(spaMM)
M1 <- fitme(y ~ year + class + (1|biome) + (1|continent) + (1|ID) +
Matern(1|long + lat), data = df, family = "gaussian", method = "REML")
I have two questions:
1) I'm uncertain if this is an appropriate way of applying the
spaMM::fitme() function to longitudinal data. I have some experience with
fitting GLS models that account for SAC to a longitudinal data set where I
had to group the data set by year using the nlme::groupedData() function
before fitting the model. Does a similar method need to be used in the case
of spaMM:fitme() and longitudinal data?
2) Is there another R package out there that can create a similar model (a
GLMM that accounts for SAC)?. I've found very few resources explaining the
use of functions in the spaMM package other than the user guide (F.
Rousset, 2020. An introduction to the spaMM package for mixed models) and
I'm not quite getting the help that I need from it. I'm wondering if
there's another approach to modeling these data that has a broader user
base and thus more easily accessible resources / online help (ex. stack
exchange / cross validated Qs and As).
Thank you!
Sarah
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thank you both for your reply. Thierry, the inlabru package sounds interesting. However, I should have mentioned that I'm not very familiar with Bayesian statistics and would prefer to use other methods if possible. Francois, I apologize for not contacting you directly first. To clarify my question, when using the nlme::gls() function with longitudinal data, it is necessary to group the data first. I'm *pretty sure* this is to avoid having distances of zero in the corSpatial object, although I'm not entirely sure of the details of fitting this model. What I'm wondering is, will the fitme() function recognize that there are repeated measurements through time on the same sites (and thus, duplicates of the sites' coordinate points in the data set) to avoid calculating distances of zero between the same site from different years. If I were to use lme4::lmer (for a normally distributed response variable) without a spatial random effect, the model would look like this: M1 <- lmer(y ~ year + class + (1| biome ) + (1| continent ) + (1|ID) , data = df, family = "gaussian" , REML = TRUE) Thanks so much, Sarah On Mon, Jul 13, 2020 at 4:01 PM Francois Rousset <
francois.rousset at umontpellier.fr> wrote:
Dear Sarah, perhaps try to contact that package's author directly... That being said, I am not quite sure what the question is, maybe because I am not familiar with constraints on the models nlme can fit and with its syntax. What would be the formula you would use with glmer if there were no spatial random effect? Best, F. Le 12/07/2020 ? 23:25, Sarah Chisholm a ?crit :
Hello, I'm trying to fit a GLMM that accounts for spatial autocorrelation (SAC) using the spaMM::fitme() function in R. I have a longitudinal data set where observations were collected repeatedly from a number of sites over
13
years. I'm interested in understanding what the effect of time (year) is
on
the dependent variable (y), as well as the fixed effect of a categorical variable (class) while accounting for the random factors biome,
continent,
and ID (a unique ID for each site sampled). My full data set contains ~
180
000 rows and attached is a subset of these data ('sampleDF'). My current
fitme() model looks like this:
library(spaMM)
M1 <- fitme(y ~ year + class + (1|biome) + (1|continent) + (1|ID) +
Matern(1|long + lat), data = df, family = "gaussian", method = "REML")
I have two questions:
1) I'm uncertain if this is an appropriate way of applying the
spaMM::fitme() function to longitudinal data. I have some experience with
fitting GLS models that account for SAC to a longitudinal data set where
I
had to group the data set by year using the nlme::groupedData() function before fitting the model. Does a similar method need to be used in the
case
of spaMM:fitme() and longitudinal data? 2) Is there another R package out there that can create a similar model
(a
GLMM that accounts for SAC)?. I've found very few resources explaining
the
use of functions in the spaMM package other than the user guide (F. Rousset, 2020. An introduction to the spaMM package for mixed models) and I'm not quite getting the help that I need from it. I'm wondering if there's another approach to modeling these data that has a broader user base and thus more easily accessible resources / online help (ex. stack exchange / cross validated Qs and As). Thank you! Sarah
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Sarah Chisholm MSc Candidate Department of Biology University of Ottawa Linkedin <http://www.linkedin.com/in/sarah-chisholm-422a5785> [[alternative HTML version deleted]]
Le 13/07/2020 ? 22:42, Sarah Chisholm a ?crit?:
Thank you both?for your reply. Thierry, the inlabru package sounds interesting. However, I should have mentioned that I'm not very familiar with Bayesian statistics and would prefer to use other methods if possible. Francois, I apologize?for not contacting you directly first. To clarify my question, when using the nlme::gls() function with longitudinal data, it is necessary to group the data first. I'm *pretty sure* this is to avoid having distances of zero in the corSpatial object,?although I'm not entirely sure of the?details of fitting this model. What I'm wondering is, will the fitme() function recognize that there are repeated measurements through time on the same sites (and thus, duplicates of the sites' coordinate points in the data set)
yes it does
to avoid calculating distances of zero between the same site from different years.
Internally, spaMM avoids zero distances (or rather, the singularities that would occur if different rows of a distance matrix represented the same location) by handling a distance matrix only among distinct spatial locations in the data.? There is no need to declare something like nlme::groupedData() to achieve this, and your call to spaMM::fitme() is OK. If there are non-zero but very close locations in the data, near-singularities may occur but spaMM also tries to deal with them automatically. Best, F.
If I were to use lme4::lmer (for a normally distributed response
variable) without a spatial random effect, the model would look like this:
M1 <- lmer(y ~ year + class + (1| biome ) + (1| continent ) + (1|ID) ,
data = df, family = "gaussian" , REML = TRUE)
Thanks so much,
Sarah
On Mon, Jul 13, 2020 at 4:01 PM Francois Rousset
<francois.rousset at umontpellier.fr
<mailto:francois.rousset at umontpellier.fr>> wrote:
Dear Sarah,
perhaps try to contact that package's author directly...
That being said, I am not quite sure what the question is, maybe
because
I am not familiar with constraints on the models nlme can fit and
with
its syntax. What would be the formula you would use with glmer if
there
were no spatial random effect?
Best,
F.
Le 12/07/2020 ? 23:25, Sarah Chisholm a ?crit?:
> Hello,
>
> I'm trying to fit a GLMM that accounts for spatial
autocorrelation (SAC)
> using the spaMM::fitme() function in R. I have a longitudinal
data set
> where observations were collected repeatedly from a number of
sites over 13
> years. I'm interested in understanding what the effect of time
(year) is on
> the dependent variable (y), as well as the fixed effect of a
categorical
> variable (class) while accounting for the random factors biome,
continent,
> and ID (a unique ID for each site sampled). My full data set
contains ~ 180
> 000 rows and attached is a subset of these data ('sampleDF'). My
current
> fitme() model looks like this:
>
> library(spaMM)
>
> M1 <- fitme(y ~ year + class + (1|biome) + (1|continent) + (1|ID) +
> Matern(1|long + lat), data = df, family = "gaussian", method =
"REML")
> I have two questions:
>
> 1) I'm uncertain if this is an appropriate way of applying the
> spaMM::fitme() function to longitudinal data. I have some
experience with
> fitting GLS models that account for SAC to a longitudinal data
set where I
> had to group the data set by year using the nlme::groupedData()
function
> before fitting the model. Does a similar method need to be used
in the case
> of spaMM:fitme() and longitudinal data?
>
> 2) Is there another R package out there that can create a
similar model (a
> GLMM that accounts for SAC)?. I've found very few resources
explaining the
> use of functions in the spaMM package other than the user guide (F.
> Rousset, 2020. An introduction to the spaMM package for mixed
models) and
> I'm not quite getting the help that I need from it. I'm wondering if
> there's another approach to modeling these data that has a
broader user
> base and thus more easily accessible resources / online help
(ex. stack
> exchange / cross validated Qs and As).
>
> Thank you!
> Sarah
> _______________________________________________
> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
-- Sarah Chisholm MSc Candidate Department of Biology University of Ottawa Linkedin <http://www.linkedin.com/in/sarah-chisholm-422a5785>
Hi Sarah, Sorry my reply is a bit late, but I think you could also fit matern and other spatial correlation structures via glmmTMB. They are documented in this vignette https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.html I think the code might be something like df2 <- transform(df, pos = numFactor(lat, long), group = factor(1) ) M1 <- glmmTMB(y ~ year + class + (1|biome) + (1|continent) + (1|ID) + matern(pos+0 | group), dispformula=~0, data = df2, REML=TRUE) I don't have the most experience with this type of model, so maybe someone else has more advice to give. cheers, Mollie
On Mon, Jul 13, 2020 at 8:31 PM Sarah Chisholm <schis023 at uottawa.ca> wrote:
Hello,
I'm trying to fit a GLMM that accounts for spatial autocorrelation (SAC)
using the spaMM::fitme() function in R. I have a longitudinal data set
where observations were collected repeatedly from a number of sites over 13
years. I'm interested in understanding what the effect of time (year) is on
the dependent variable (y), as well as the fixed effect of a categorical
variable (class) while accounting for the random factors biome, continent,
and ID (a unique ID for each site sampled). My full data set contains ~ 180
000 rows and attached is a subset of these data ('sampleDF'). My current
fitme() model looks like this:
library(spaMM)
M1 <- fitme(y ~ year + class + (1|biome) + (1|continent) + (1|ID) +
Matern(1|long + lat), data = df, family = "gaussian", method = "REML")
I have two questions:
1) I'm uncertain if this is an appropriate way of applying the
spaMM::fitme() function to longitudinal data. I have some experience with
fitting GLS models that account for SAC to a longitudinal data set where I
had to group the data set by year using the nlme::groupedData() function
before fitting the model. Does a similar method need to be used in the case
of spaMM:fitme() and longitudinal data?
2) Is there another R package out there that can create a similar model (a
GLMM that accounts for SAC)?. I've found very few resources explaining the
use of functions in the spaMM package other than the user guide (F.
Rousset, 2020. An introduction to the spaMM package for mixed models) and
I'm not quite getting the help that I need from it. I'm wondering if
there's another approach to modeling these data that has a broader user
base and thus more easily accessible resources / online help (ex. stack
exchange / cross validated Qs and As).
Thank you!
Sarah
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Hi Mollie, thank you for your suggestion. glmmTMB seems like a good option for my needs as well. In your sample code above, can you explain what the term 'group' does in matern(pos+0|group)? Does this allow the spatial correlation structure to be applied to specific groupings in the data (in my case, for example, by 'continent')? Francois, thank you for this very clear answer. This is a very convenient feature of the function! May I ask you a couple of other questions about some issues that I've had with spaMM::fitme()? In particular, when I try fitting this model to a large data set (~14 000 rows x 7 columns, ~2 MB), the model will run for an extended period of time, to the point where I've had to terminate the computation. I've tried applying the suggestions that are mentioned in the user guide, i.e. setting init=list(lambda=0.1) and init=list(lambda=NaN). Implementing init=list(lambda=0.1) returned an error suggesting that there was a lack of memory, while running the model with init=list(lambda=NaN) also ran for an extended period of time without completing. Is there something else I can do to speed up the fit of these models? I've had a similar problem with an even larger data set (~185 000 rows x 8 columns, ~21 MB), where, when I try running the model, this error is returned immediately: Error in ZA %*% xmatrix : Cholmod error 'problem too large' at file ../Core/ cholmod_dense.c, line 105 I've tried running this model on two devices, both with a 64-bit OS with Windows 10, one with 32 GB of RAM and the other with 64 GB. I've gotten the same error from both devices. Is there a way that fitme() can accommodate these large data sets? Thank you, Sarah
Dear Sarah, Le 14/07/2020 ? 16:55, Sarah Chisholm a ?crit?:
Hi Mollie, thank you for your suggestion. glmmTMB seems like a good option for my needs as well. In your sample code above, can you explain what the term 'group' does in matern(pos+0|group)? Does this allow the spatial correlation structure to be applied to specific groupings in the data (in my case, for example, by 'continent')? Francois, thank you for this very clear answer. This is a very convenient feature of the function! May I ask you a couple of other questions about some issues that I've had with spaMM::fitme()? In particular, when I try fitting this model to a large data set (~14 000 rows x 7 columns, ~2 MB), the model will run for an extended period of time, to the point where I've had to terminate the computation. I've tried applying the suggestions that are mentioned in the user guide, i.e. setting?init=list(lambda=0.1) and?init=list(lambda=NaN). Implementing init=list(lambda=0.1) returned an error suggesting that there was a lack of memory, while running the model with init=list(lambda=NaN) also ran for an extended period of time without completing. Is there something else I can do to speed up the fit of these models? I've had a similar problem with an even larger data set (~185 000 rows x 8 columns, ~21 MB), where, when I try running the model, this error is returned immediately: ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file ../Core/cholmod_dense.c,line 105 I've tried running this model on two devices, both with a 64-bit OS with Windows 10, one with 32 GB of RAM and the other with 64 GB. I've gotten the same error from both devices. Is there a way that fitme() can accommodate these large data sets?
spaMM can handle large data sets, but the first issue to consider here is the number of distinct locations for the spatial random effect. The large correlation matrices of geostatistical models will always be a problem, both in terms of memory requirements and of potentially huge computation times. My guess from past experiments is that one should still be able to fit models with ~ 10K locations within a few days on a computer with <60 Gb of RAM (given perhaps some tinkering of the arguments), so at least the data set of 14 000 rows should be feasible, particularly if the number of locations is smaller. Anyone planning to analyze large spatial data sets should anticipate these problems and check by themselves whether there is any practical alternative suitable for their particular problem. The discussion in section 6.2 of the "gentle introduction" to spaMM may then be useful. Best, F.
Thank you, Sarah
Dear Fran?ois and Sarah, INLA seems more efficient. I ran a model with Mattern correlation structure on 13K locations (1 observation per location) in under 10 minutes on a laptop with 16GB RAM. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op di 14 jul. 2020 om 18:22 schreef Francois Rousset < francois.rousset at umontpellier.fr>:
Dear Sarah, Le 14/07/2020 ? 16:55, Sarah Chisholm a ?crit :
Hi Mollie, thank you for your suggestion. glmmTMB seems like a good option for my needs as well. In your sample code above, can you explain what the term 'group' does in matern(pos+0|group)? Does this allow the spatial correlation structure to be applied to specific groupings in the data (in my case, for example, by 'continent')? Francois, thank you for this very clear answer. This is a very convenient feature of the function! May I ask you a couple of other questions about some issues that I've had with spaMM::fitme()? In particular, when I try fitting this model to a large data set (~14 000 rows x 7 columns, ~2 MB), the model will run for an extended period of time, to the point where I've had to terminate the computation. I've tried applying the suggestions that are mentioned in the user guide, i.e. setting init=list(lambda=0.1) and init=list(lambda=NaN). Implementing init=list(lambda=0.1) returned an error suggesting that there was a lack of memory, while running the model with init=list(lambda=NaN) also ran for an extended period of time without completing. Is there something else I can do to speed up the fit of these models? I've had a similar problem with an even larger data set (~185 000 rows x 8 columns, ~21 MB), where, when I try running the model, this error is returned immediately: ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file ../Core/cholmod_dense.c,line 105 I've tried running this model on two devices, both with a 64-bit OS with Windows 10, one with 32 GB of RAM and the other with 64 GB. I've gotten the same error from both devices. Is there a way that fitme() can accommodate these large data sets?
spaMM can handle large data sets, but the first issue to consider here is the number of distinct locations for the spatial random effect. The large correlation matrices of geostatistical models will always be a problem, both in terms of memory requirements and of potentially huge computation times. My guess from past experiments is that one should still be able to fit models with ~ 10K locations within a few days on a computer with <60 Gb of RAM (given perhaps some tinkering of the arguments), so at least the data set of 14 000 rows should be feasible, particularly if the number of locations is smaller. Anyone planning to analyze large spatial data sets should anticipate these problems and check by themselves whether there is any practical alternative suitable for their particular problem. The discussion in section 6.2 of the "gentle introduction" to spaMM may then be useful. Best, F.
Thank you, Sarah
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Thierry, please provide a reproducible example so that we know what you have actually done. Best, F. Le 14/07/2020 ? 20:00, Thierry Onkelinx a ?crit?:
Dear Fran?ois and Sarah, INLA seems more efficient. I ran a model with Mattern correlation structure on 13K locations (1 observation per location) in under 10 minutes on a laptop with 16GB RAM. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be <mailto:thierry.onkelinx at inbo.be> Havenlaan 88 bus 73, 1000 Brussel www.inbo.be <http://www.inbo.be> /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op di 14 jul. 2020 om 18:22 schreef Francois Rousset <francois.rousset at umontpellier.fr <mailto:francois.rousset at umontpellier.fr>>: Dear Sarah, Le 14/07/2020 ? 16:55, Sarah Chisholm a ?crit?:
> Hi Mollie, thank you for your suggestion. glmmTMB seems like a good
> option for my needs as well. In your sample code above, can you
> explain what the term 'group' does in matern(pos+0|group)? Does
this
> allow the spatial correlation structure to be applied to specific
> groupings in the data (in my case, for example, by 'continent')?
>
> Francois, thank you for this very clear answer. This is a very
> convenient feature of the function! May I ask you a couple of other
> questions about some issues that I've had with spaMM::fitme()?
>
> In particular, when I try fitting this model to a large data set
(~14
> 000 rows x 7 columns, ~2 MB), the model will run for an extended
> period of time, to the point where I've had to terminate the
> computation. I've tried applying the suggestions that are
mentioned in
> the user guide, i.e. setting?init=list(lambda=0.1)
> and?init=list(lambda=NaN). Implementing init=list(lambda=0.1)
returned
> an error suggesting that there was a lack of memory, while
running the
> model with init=list(lambda=NaN) also ran for an extended period of
> time without completing. Is there something else I can do to
speed up
> the fit of these models?
>
> I've had a similar problem with an even larger data set (~185
000 rows
> x 8 columns, ~21 MB), where, when I try running the model, this
error
> is returned immediately:
>
> ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file
> ../Core/cholmod_dense.c,line 105
>
> I've tried running this model on two devices, both with a 64-bit OS
> with Windows 10, one with 32 GB of RAM and the other with 64 GB.
I've
> gotten the same error from both devices. Is there a way that
fitme()
> can accommodate these large data sets?
spaMM can handle large data sets, but the first issue to consider
here
is the number of distinct locations for the spatial random effect.
The
large correlation matrices of geostatistical models will always be a
problem, both in terms of memory requirements and of potentially huge
computation times. My guess from past experiments is that one should
still be able to fit models with ~ 10K locations within a few days
on a
computer with <60 Gb of RAM (given perhaps some tinkering of the
arguments), so at least the data set of 14 000 rows should be
feasible,
particularly if the number of locations is smaller.
Anyone planning to analyze large spatial data sets should anticipate
these problems and check by themselves whether there is any practical
alternative suitable for their particular problem. The discussion in
section 6.2 of the "gentle introduction" to spaMM may then be useful.
Best,
F.
>
> Thank you,
>
> Sarah
? ? ? ? [[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Fran?ois, Here you go: https://drive.google.com/drive/folders/1Ocq88Yq9u_lM-loayRQlMyBS2HLy_Tio Almost 30K locations. Fit in little over 7 min on my laptop with 16 GB RAM. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op wo 15 jul. 2020 om 00:10 schreef Francois Rousset < francois.rousset at umontpellier.fr>:
Dear Thierry, please provide a reproducible example so that we know what you have actually done. Best, F. Le 14/07/2020 ? 20:00, Thierry Onkelinx a ?crit : Dear Fran?ois and Sarah, INLA seems more efficient. I ran a model with Mattern correlation structure on 13K locations (1 observation per location) in under 10 minutes on a laptop with 16GB RAM. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op di 14 jul. 2020 om 18:22 schreef Francois Rousset < francois.rousset at umontpellier.fr>:
Dear Sarah, Le 14/07/2020 ? 16:55, Sarah Chisholm a ?crit :
Hi Mollie, thank you for your suggestion. glmmTMB seems like a good option for my needs as well. In your sample code above, can you explain what the term 'group' does in matern(pos+0|group)? Does this allow the spatial correlation structure to be applied to specific groupings in the data (in my case, for example, by 'continent')? Francois, thank you for this very clear answer. This is a very convenient feature of the function! May I ask you a couple of other questions about some issues that I've had with spaMM::fitme()? In particular, when I try fitting this model to a large data set (~14 000 rows x 7 columns, ~2 MB), the model will run for an extended period of time, to the point where I've had to terminate the computation. I've tried applying the suggestions that are mentioned in the user guide, i.e. setting init=list(lambda=0.1) and init=list(lambda=NaN). Implementing init=list(lambda=0.1) returned an error suggesting that there was a lack of memory, while running the model with init=list(lambda=NaN) also ran for an extended period of time without completing. Is there something else I can do to speed up the fit of these models? I've had a similar problem with an even larger data set (~185 000 rows x 8 columns, ~21 MB), where, when I try running the model, this error is returned immediately: ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file ../Core/cholmod_dense.c,line 105 I've tried running this model on two devices, both with a 64-bit OS with Windows 10, one with 32 GB of RAM and the other with 64 GB. I've gotten the same error from both devices. Is there a way that fitme() can accommodate these large data sets?
spaMM can handle large data sets, but the first issue to consider here is the number of distinct locations for the spatial random effect. The large correlation matrices of geostatistical models will always be a problem, both in terms of memory requirements and of potentially huge computation times. My guess from past experiments is that one should still be able to fit models with ~ 10K locations within a few days on a computer with <60 Gb of RAM (given perhaps some tinkering of the arguments), so at least the data set of 14 000 rows should be feasible, particularly if the number of locations is smaller. Anyone planning to analyze large spatial data sets should anticipate these problems and check by themselves whether there is any practical alternative suitable for their particular problem. The discussion in section 6.2 of the "gentle introduction" to spaMM may then be useful. Best, F.
Thank you, Sarah
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Thierry, thanks. So (expectedly) this is a different issue. spaMM can fit some correlation models described by objects produced by INLA::inla.spde2.matern() and then, in my past experiments, the computation times were close to those of INLA, and the memory requirements were much smaller than what I wrote previously where this is not what I meant by "Matern". Beyond general features that contribute to these computational differences (the use of sparse matrix methods, and to a lesser extent the constraint on the smoothness parameter of the approximated Matern model), the 'cutoff' argument in your call to inla.mesh.2d() appears important to reduce the number? of locations actually considered, in the most costly computations, below the number of locations in the data (to 8804 rather than 30K, if I get it right), and this would also allow a faster fit by spaMM when called on the resulting inla.spde2 object. Best, F. Le 15/07/2020 ? 12:50, Thierry Onkelinx a ?crit?:
Dear Fran?ois, Here you go: https://drive.google.com/drive/folders/1Ocq88Yq9u_lM-loayRQlMyBS2HLy_Tio Almost 30K locations. Fit in little over 7 min on my laptop with 16 GB RAM. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be <mailto:thierry.onkelinx at inbo.be> Havenlaan 88 bus 73, 1000 Brussel www.inbo.be <http://www.inbo.be> /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op wo 15 jul. 2020 om 00:10 schreef Francois Rousset <francois.rousset at umontpellier.fr <mailto:francois.rousset at umontpellier.fr>>: Dear Thierry, please provide a reproducible example so that we know what you have actually done. Best, F. Le 14/07/2020 ? 20:00, Thierry Onkelinx a ?crit?:
Dear Fran?ois and Sarah,
INLA seems more efficient. I ran a model with Mattern correlation
structure on 13K locations (1 observation per location) in under
10 minutes on a laptop with 16GB RAM.
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR
NATURE AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality
Assurance
thierry.onkelinx at inbo.be <mailto:thierry.onkelinx at inbo.be>
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be <http://www.inbo.be>
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be
no more than asking him to perform a post-mortem examination: he
may be able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer
does not ensure that a reasonable answer can be extracted from a
given body of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
<https://www.inbo.be>
Op di 14 jul. 2020 om 18:22 schreef Francois Rousset
<francois.rousset at umontpellier.fr
<mailto:francois.rousset at umontpellier.fr>>:
Dear Sarah,
Le 14/07/2020 ? 16:55, Sarah Chisholm a ?crit?:
> Hi Mollie, thank you for your suggestion. glmmTMB seems
like a good
> option for my needs as well. In your sample code above, can
you
> explain what the term 'group' does in matern(pos+0|group)?
Does this
> allow the spatial correlation structure to be applied to
specific
> groupings in the data (in my case, for example, by
'continent')?
>
> Francois, thank you for this very clear answer. This is a very
> convenient feature of the function! May I ask you a couple
of other
> questions about some issues that I've had with spaMM::fitme()?
>
> In particular, when I try fitting this model to a large
data set (~14
> 000 rows x 7 columns, ~2 MB), the model will run for an
extended
> period of time, to the point where I've had to terminate the
> computation. I've tried applying the suggestions that are
mentioned in
> the user guide, i.e. setting?init=list(lambda=0.1)
> and?init=list(lambda=NaN). Implementing
init=list(lambda=0.1) returned
> an error suggesting that there was a lack of memory, while
running the
> model with init=list(lambda=NaN) also ran for an extended
period of
> time without completing. Is there something else I can do
to speed up
> the fit of these models?
>
> I've had a similar problem with an even larger data set
(~185 000 rows
> x 8 columns, ~21 MB), where, when I try running the model,
this error
> is returned immediately:
>
> ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file
> ../Core/cholmod_dense.c,line 105
>
> I've tried running this model on two devices, both with a
64-bit OS
> with Windows 10, one with 32 GB of RAM and the other with
64 GB. I've
> gotten the same error from both devices. Is there a way
that fitme()
> can accommodate these large data sets?
spaMM can handle large data sets, but the first issue to
consider here
is the number of distinct locations for the spatial random
effect. The
large correlation matrices of geostatistical models will
always be a
problem, both in terms of memory requirements and of
potentially huge
computation times. My guess from past experiments is that one
should
still be able to fit models with ~ 10K locations within a few
days on a
computer with <60 Gb of RAM (given perhaps some tinkering of the
arguments), so at least the data set of 14 000 rows should be
feasible,
particularly if the number of locations is smaller.
Anyone planning to analyze large spatial data sets should
anticipate
these problems and check by themselves whether there is any
practical
alternative suitable for their particular problem. The
discussion in
section 6.2 of the "gentle introduction" to spaMM may then be
useful.
Best,
F.
>
> Thank you,
>
> Sarah
? ? ? ? [[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks Francois. I hadn't considered that the number of unique locations could be the source of the problem, rather than the size of the entire data set. It is a possibility for me to simply remove observations for a number of locations to bring the total sample size (of unique coordinates) down. I'll also test a lattice model using the IMRF() notation to describe the random spatial effect - I believe this is what you referred to in your previous email? Sarah On Wed, Jul 15, 2020 at 10:01 AM Francois Rousset <
francois.rousset at umontpellier.fr> wrote:
Dear Thierry, thanks. So (expectedly) this is a different issue. spaMM can fit some correlation models described by objects produced by INLA::inla.spde2.matern() and then, in my past experiments, the computation times were close to those of INLA, and the memory requirements were much smaller than what I wrote previously where this is not what I meant by "Matern". Beyond general features that contribute to these computational differences (the use of sparse matrix methods, and to a lesser extent the constraint on the smoothness parameter of the approximated Matern model), the 'cutoff' argument in your call to inla.mesh.2d() appears important to reduce the number of locations actually considered, in the most costly computations, below the number of locations in the data (to 8804 rather than 30K, if I get it right), and this would also allow a faster fit by spaMM when called on the resulting inla.spde2 object. Best, F. Le 15/07/2020 ? 12:50, Thierry Onkelinx a ?crit : Dear Fran?ois, Here you go: https://drive.google.com/drive/folders/1Ocq88Yq9u_lM-loayRQlMyBS2HLy_Tio Almost 30K locations. Fit in little over 7 min on my laptop with 16 GB RAM. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op wo 15 jul. 2020 om 00:10 schreef Francois Rousset < francois.rousset at umontpellier.fr>:
Dear Thierry, please provide a reproducible example so that we know what you have actually done. Best, F. Le 14/07/2020 ? 20:00, Thierry Onkelinx a ?crit : Dear Fran?ois and Sarah, INLA seems more efficient. I ran a model with Mattern correlation structure on 13K locations (1 observation per location) in under 10 minutes on a laptop with 16GB RAM. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op di 14 jul. 2020 om 18:22 schreef Francois Rousset < francois.rousset at umontpellier.fr>:
Dear Sarah, Le 14/07/2020 ? 16:55, Sarah Chisholm a ?crit :
Hi Mollie, thank you for your suggestion. glmmTMB seems like a good option for my needs as well. In your sample code above, can you explain what the term 'group' does in matern(pos+0|group)? Does this allow the spatial correlation structure to be applied to specific groupings in the data (in my case, for example, by 'continent')? Francois, thank you for this very clear answer. This is a very convenient feature of the function! May I ask you a couple of other questions about some issues that I've had with spaMM::fitme()? In particular, when I try fitting this model to a large data set (~14 000 rows x 7 columns, ~2 MB), the model will run for an extended period of time, to the point where I've had to terminate the computation. I've tried applying the suggestions that are mentioned in the user guide, i.e. setting init=list(lambda=0.1) and init=list(lambda=NaN). Implementing init=list(lambda=0.1) returned an error suggesting that there was a lack of memory, while running the model with init=list(lambda=NaN) also ran for an extended period of time without completing. Is there something else I can do to speed up the fit of these models? I've had a similar problem with an even larger data set (~185 000 rows x 8 columns, ~21 MB), where, when I try running the model, this error is returned immediately: ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file ../Core/cholmod_dense.c,line 105 I've tried running this model on two devices, both with a 64-bit OS with Windows 10, one with 32 GB of RAM and the other with 64 GB. I've gotten the same error from both devices. Is there a way that fitme() can accommodate these large data sets?
spaMM can handle large data sets, but the first issue to consider here is the number of distinct locations for the spatial random effect. The large correlation matrices of geostatistical models will always be a problem, both in terms of memory requirements and of potentially huge computation times. My guess from past experiments is that one should still be able to fit models with ~ 10K locations within a few days on a computer with <60 Gb of RAM (given perhaps some tinkering of the arguments), so at least the data set of 14 000 rows should be feasible, particularly if the number of locations is smaller. Anyone planning to analyze large spatial data sets should anticipate these problems and check by themselves whether there is any practical alternative suitable for their particular problem. The discussion in section 6.2 of the "gentle introduction" to spaMM may then be useful. Best, F.
Thank you, Sarah
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Sarah Chisholm MSc Candidate Department of Biology University of Ottawa Linkedin <http://www.linkedin.com/in/sarah-chisholm-422a5785> [[alternative HTML version deleted]]
Le 15/07/2020 ? 16:48, Sarah Chisholm a ?crit?:
Thanks Francois. I hadn't considered that the number of unique locations could be the source of the problem, rather than the size of the entire data set. It is a possibility for me to simply remove observations for a number of locations to bring the total sample size (of unique coordinates) down. I'll also test a lattice model using the IMRF() notation to describe the random spatial effect - I believe this is what you referred to in your previous email?
yes, use the IMRF formula term for this purpose. F.
Sarah
On Wed, Jul 15, 2020 at 10:01 AM Francois Rousset
<francois.rousset at umontpellier.fr
<mailto:francois.rousset at umontpellier.fr>> wrote:
Dear Thierry,
thanks. So (expectedly) this is a different issue. spaMM can fit
some correlation models described by objects produced by
INLA::inla.spde2.matern() and then, in my past experiments, the
computation times were close to those of INLA, and the memory
requirements were much smaller than what I wrote previously where
this is not what I meant by "Matern".
Beyond general features that contribute to these computational
differences (the use of sparse matrix methods, and to a lesser
extent the constraint on the smoothness parameter of the
approximated Matern model), the 'cutoff' argument in your call to
inla.mesh.2d() appears important to reduce the number? of
locations actually considered, in the most costly computations,
below the number of locations in the data (to 8804 rather than
30K, if I get it right), and this would also allow a faster fit by
spaMM when called on the resulting inla.spde2 object.
Best,
F.
Le 15/07/2020 ? 12:50, Thierry Onkelinx a ?crit?:
Dear Fran?ois,
Here you go:
https://drive.google.com/drive/folders/1Ocq88Yq9u_lM-loayRQlMyBS2HLy_Tio
Almost 30K locations. Fit in little over 7 min on my laptop with
16 GB RAM.
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR
NATURE AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality
Assurance
thierry.onkelinx at inbo.be <mailto:thierry.onkelinx at inbo.be>
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be <http://www.inbo.be>
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be
no more than asking him to perform a post-mortem examination: he
may be able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer
does not ensure that a reasonable answer can be extracted from a
given body of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
<https://www.inbo.be>
Op wo 15 jul. 2020 om 00:10 schreef Francois Rousset
<francois.rousset at umontpellier.fr
<mailto:francois.rousset at umontpellier.fr>>:
Dear Thierry,
please provide a reproducible example so that we know what
you have actually done.
Best,
F.
Le 14/07/2020 ? 20:00, Thierry Onkelinx a ?crit?:
Dear Fran?ois and Sarah,
INLA seems more efficient. I ran a model with Mattern
correlation structure on 13K locations (1 observation per
location) in under 10 minutes on a laptop with 16GB RAM.
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE
FOR NATURE AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality
Assurance
thierry.onkelinx at inbo.be <mailto:thierry.onkelinx at inbo.be>
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be <http://www.inbo.be>
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may
be no more than asking him to perform a post-mortem
examination: he may be able to say what the experiment died
of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an
answer does not ensure that a reasonable answer can be
extracted from a given body of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
<https://www.inbo.be>
Op di 14 jul. 2020 om 18:22 schreef Francois Rousset
<francois.rousset at umontpellier.fr
<mailto:francois.rousset at umontpellier.fr>>:
Dear Sarah,
Le 14/07/2020 ? 16:55, Sarah Chisholm a ?crit?:
> Hi Mollie, thank you for your suggestion. glmmTMB
seems like a good
> option for my needs as well. In your sample code
above, can you
> explain what the term 'group' does in
matern(pos+0|group)? Does this
> allow the spatial correlation structure to be applied
to specific
> groupings in the data (in my case, for example, by
'continent')?
>
> Francois, thank you for this very clear answer. This
is a very
> convenient feature of the function! May I ask you a
couple of other
> questions about some issues that I've had with
spaMM::fitme()?
>
> In particular, when I try fitting this model to a
large data set (~14
> 000 rows x 7 columns, ~2 MB), the model will run for
an extended
> period of time, to the point where I've had to
terminate the
> computation. I've tried applying the suggestions that
are mentioned in
> the user guide, i.e. setting?init=list(lambda=0.1)
> and?init=list(lambda=NaN). Implementing
init=list(lambda=0.1) returned
> an error suggesting that there was a lack of memory,
while running the
> model with init=list(lambda=NaN) also ran for an
extended period of
> time without completing. Is there something else I can
do to speed up
> the fit of these models?
>
> I've had a similar problem with an even larger data
set (~185 000 rows
> x 8 columns, ~21 MB), where, when I try running the
model, this error
> is returned immediately:
>
> ErrorinZA %*%xmatrix :Cholmoderror 'problem too
large'at file
> ../Core/cholmod_dense.c,line 105
>
> I've tried running this model on two devices, both
with a 64-bit OS
> with Windows 10, one with 32 GB of RAM and the other
with 64 GB. I've
> gotten the same error from both devices. Is there a
way that fitme()
> can accommodate these large data sets?
spaMM can handle large data sets, but the first issue to
consider here
is the number of distinct locations for the spatial
random effect. The
large correlation matrices of geostatistical models will
always be a
problem, both in terms of memory requirements and of
potentially huge
computation times. My guess from past experiments is
that one should
still be able to fit models with ~ 10K locations within
a few days on a
computer with <60 Gb of RAM (given perhaps some
tinkering of the
arguments), so at least the data set of 14 000 rows
should be feasible,
particularly if the number of locations is smaller.
Anyone planning to analyze large spatial data sets
should anticipate
these problems and check by themselves whether there is
any practical
alternative suitable for their particular problem. The
discussion in
section 6.2 of the "gentle introduction" to spaMM may
then be useful.
Best,
F.
>
> Thank you,
>
> Sarah
? ? ? ? [[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Sarah Chisholm MSc Candidate Department of Biology University of Ottawa Linkedin <http://www.linkedin.com/in/sarah-chisholm-422a5785>
1 day later
Dear Fran?ois, Point taken about the mesh size. Halving the cutoff to 0.5 increased the number of points on the mesh to 31k nodes. This increases the runtime to 65 min on my laptop. https://drive.google.com/drive/folders/1Ocq88Yq9u_lM-loayRQlMyBS2HLy_Tio I'm curious what you mean by "Matern" if it isn't similar to the spde model of INLA. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op wo 15 jul. 2020 om 16:01 schreef Francois Rousset < francois.rousset at umontpellier.fr>:
Dear Thierry, thanks. So (expectedly) this is a different issue. spaMM can fit some correlation models described by objects produced by INLA::inla.spde2.matern() and then, in my past experiments, the computation times were close to those of INLA, and the memory requirements were much smaller than what I wrote previously where this is not what I meant by "Matern". Beyond general features that contribute to these computational differences (the use of sparse matrix methods, and to a lesser extent the constraint on the smoothness parameter of the approximated Matern model), the 'cutoff' argument in your call to inla.mesh.2d() appears important to reduce the number of locations actually considered, in the most costly computations, below the number of locations in the data (to 8804 rather than 30K, if I get it right), and this would also allow a faster fit by spaMM when called on the resulting inla.spde2 object. Best, F. Le 15/07/2020 ? 12:50, Thierry Onkelinx a ?crit : Dear Fran?ois, Here you go: https://drive.google.com/drive/folders/1Ocq88Yq9u_lM-loayRQlMyBS2HLy_Tio Almost 30K locations. Fit in little over 7 min on my laptop with 16 GB RAM. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op wo 15 jul. 2020 om 00:10 schreef Francois Rousset < francois.rousset at umontpellier.fr>:
Dear Thierry, please provide a reproducible example so that we know what you have actually done. Best, F. Le 14/07/2020 ? 20:00, Thierry Onkelinx a ?crit : Dear Fran?ois and Sarah, INLA seems more efficient. I ran a model with Mattern correlation structure on 13K locations (1 observation per location) in under 10 minutes on a laptop with 16GB RAM. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op di 14 jul. 2020 om 18:22 schreef Francois Rousset < francois.rousset at umontpellier.fr>:
Dear Sarah, Le 14/07/2020 ? 16:55, Sarah Chisholm a ?crit :
Hi Mollie, thank you for your suggestion. glmmTMB seems like a good option for my needs as well. In your sample code above, can you explain what the term 'group' does in matern(pos+0|group)? Does this allow the spatial correlation structure to be applied to specific groupings in the data (in my case, for example, by 'continent')? Francois, thank you for this very clear answer. This is a very convenient feature of the function! May I ask you a couple of other questions about some issues that I've had with spaMM::fitme()? In particular, when I try fitting this model to a large data set (~14 000 rows x 7 columns, ~2 MB), the model will run for an extended period of time, to the point where I've had to terminate the computation. I've tried applying the suggestions that are mentioned in the user guide, i.e. setting init=list(lambda=0.1) and init=list(lambda=NaN). Implementing init=list(lambda=0.1) returned an error suggesting that there was a lack of memory, while running the model with init=list(lambda=NaN) also ran for an extended period of time without completing. Is there something else I can do to speed up the fit of these models? I've had a similar problem with an even larger data set (~185 000 rows x 8 columns, ~21 MB), where, when I try running the model, this error is returned immediately: ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file ../Core/cholmod_dense.c,line 105 I've tried running this model on two devices, both with a 64-bit OS with Windows 10, one with 32 GB of RAM and the other with 64 GB. I've gotten the same error from both devices. Is there a way that fitme() can accommodate these large data sets?
spaMM can handle large data sets, but the first issue to consider here is the number of distinct locations for the spatial random effect. The large correlation matrices of geostatistical models will always be a problem, both in terms of memory requirements and of potentially huge computation times. My guess from past experiments is that one should still be able to fit models with ~ 10K locations within a few days on a computer with <60 Gb of RAM (given perhaps some tinkering of the arguments), so at least the data set of 14 000 rows should be feasible, particularly if the number of locations is smaller. Anyone planning to analyze large spatial data sets should anticipate these problems and check by themselves whether there is any practical alternative suitable for their particular problem. The discussion in section 6.2 of the "gentle introduction" to spaMM may then be useful. Best, F.
Thank you, Sarah
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models