Skip to content

Large data set and mixed models

3 messages · Michael Beaulieu, Rense Nieuwenhuis, Douglas Bates

#
I would like two compare the diving behaviour of two groups of penguins 
(7 penguin in each group). Each penguin performed several dives within 
several foraging trips. As a result, I got a huge data set of dives 
(nearly 100000).
To compare the diving behaviour of the two groups, I used a mixed model 
with:
-the penguin as a random factor,
-the number of dives nested in the foraging trip as a repeated factor,
-the group, the foraging trip and maximal depth as fixed factors.
Covariance structure was auto-regressive.
I tried this model on SPSS, SAS and R but all failed.

Has anybody been faced with such a huge dataset analysed with mixed models?

Thank you

MiKL
#
Dear Michael,

perhaps you could send more details about the model you're trying to  
estimate, so we could be of help.

i.e.:
- What is the model specification
- What happens -> error message, uninterpretable findings?
- A closer description of the data
- What system are you trying to estimate this model with?

In general I wouldn't say the 100000 cases is 'huge' in terms of R- 
Project. Sure, some models will take some time to converge, but it  
should be doable.

If you'd send me (a sample of) your data, I'd be willing to take a  
look at it.

Kind regards,

Rense
On 17-okt-2008, at 15:02, Michael Beaulieu wrote:

            
#
On Fri, Oct 17, 2008 at 8:34 AM, Rense Nieuwenhuis
<rense.nieuwenhuis at me.com> wrote:
Agreed.  The largest example that I have fit with lme4 in R has about
1.7 million observations and over 60,000 non-nested random effects.
A good start would be if Michael could show us a transcript of his
attempt to fit the model he want in R, including the output from

sessionInfo()

so we know the versions of all packages being used.