spaMM::fitme() - a glmm for longitudinal data that accounts for spatial autocorrelation

Wed, Jul 15, 2020 7:48 AM

Thanks Francois. I hadn't considered that the number of unique locations
could be the source of the problem, rather than the size of the entire data
set. It is a possibility for me to simply remove observations for a number
of locations to bring the total sample size (of unique coordinates) down.
I'll also test a lattice model using the IMRF() notation to describe the
random spatial effect - I believe this is what you referred to in your
previous email?

Sarah

On Wed, Jul 15, 2020 at 10:01 AM Francois Rousset <

francois.rousset at umontpellier.fr> wrote:

Dear Thierry,

thanks. So (expectedly) this is a different issue. spaMM can fit some
correlation models described by objects produced by
INLA::inla.spde2.matern() and then, in my past experiments, the computation
times were close to those of INLA, and the memory requirements were much
smaller than what I wrote previously where this is not what I meant by
"Matern".

Beyond general features that contribute to these computational differences
(the use of sparse matrix methods, and to a lesser extent the constraint on
the smoothness parameter of the approximated Matern model), the 'cutoff'
argument in your call to inla.mesh.2d() appears important to reduce the
number of locations actually considered, in the most costly computations,
below the number of locations in the data (to 8804 rather than 30K, if I
get it right), and this would also allow a faster fit by spaMM when called
on the resulting inla.spde2 object.

Best,

F.
Le 15/07/2020 ? 12:50, Thierry Onkelinx a ?crit :

Dear Fran?ois,

Here you go:
https://drive.google.com/drive/folders/1Ocq88Yq9u_lM-loayRQlMyBS2HLy_Tio
Almost 30K locations. Fit in little over 7 min on my laptop with 16 GB RAM.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

Op wo 15 jul. 2020 om 00:10 schreef Francois Rousset <
francois.rousset at umontpellier.fr>:

Dear Thierry,

please provide a reproducible example so that we know what you have
actually done.

Best,

F.
Le 14/07/2020 ? 20:00, Thierry Onkelinx a ?crit :

Dear Fran?ois and Sarah,

INLA seems more efficient. I ran a model with Mattern correlation
structure on 13K locations (1 observation per location) in under 10 minutes
on a laptop with 16GB RAM.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be


///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op di 14 jul. 2020 om 18:22 schreef Francois Rousset <
francois.rousset at umontpellier.fr>:

Dear Sarah,

Le 14/07/2020 ? 16:55, Sarah Chisholm a ?crit :

Hi Mollie, thank you for your suggestion. glmmTMB seems like a good
option for my needs as well. In your sample code above, can you
explain what the term 'group' does in matern(pos+0|group)? Does this
allow the spatial correlation structure to be applied to specific
groupings in the data (in my case, for example, by 'continent')?

Francois, thank you for this very clear answer. This is a very
convenient feature of the function! May I ask you a couple of other
questions about some issues that I've had with spaMM::fitme()?

In particular, when I try fitting this model to a large data set (~14
000 rows x 7 columns, ~2 MB), the model will run for an extended
period of time, to the point where I've had to terminate the
computation. I've tried applying the suggestions that are mentioned in
the user guide, i.e. setting init=list(lambda=0.1)
and init=list(lambda=NaN). Implementing init=list(lambda=0.1) returned
an error suggesting that there was a lack of memory, while running the
model with init=list(lambda=NaN) also ran for an extended period of
time without completing. Is there something else I can do to speed up
the fit of these models?

I've had a similar problem with an even larger data set (~185 000 rows
x 8 columns, ~21 MB), where, when I try running the model, this error
is returned immediately:

ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file
../Core/cholmod_dense.c,line 105

I've tried running this model on two devices, both with a 64-bit OS
with Windows 10, one with 32 GB of RAM and the other with 64 GB. I've
gotten the same error from both devices. Is there a way that fitme()
can accommodate these large data sets?

spaMM can handle large data sets, but the first issue to consider here
is the number of distinct locations for the spatial random effect. The
large correlation matrices of geostatistical models will always be a
problem, both in terms of memory requirements and of potentially huge
computation times. My guess from past experiments is that one should
still be able to fit models with ~ 10K locations within a few days on a
computer with <60 Gb of RAM (given perhaps some tinkering of the
arguments), so at least the data set of 14 000 rows should be feasible,
particularly if the number of locations is smaller.

Anyone planning to analyze large spatial data sets should anticipate
these problems and check by themselves whether there is any practical
alternative suitable for their particular problem. The discussion in
section 6.2 of the "gentle introduction" to spaMM may then be useful.

Best,

F.

Thank you,

Sarah

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Sarah Chisholm
MSc Candidate
Department of Biology
University of Ottawa
Linkedin <http://www.linkedin.com/in/sarah-chisholm-422a5785>

	[[alternative HTML version deleted]]

spaMM::fitme() - a glmm for longitudinal data that accounts for spatial autocorrelation

Thread (15 messages)