Skip to content

Data frame size limits in MCMCglmm?

3 messages · Stuart Luppescu, Joshua Wiley, Jarrod Hadfield

#
Hello, I'm having problems running a simple ordinal outcome mixed
effects model, and I'm thinking it may be because of the size of the
dataset (or, it very may well be that I'm not specifying the model
correctly). I must confess to insecurity about how to specify the
priors. Here is the structure of the data frame (with columns not in
this model omitted). Note that there are more than 2.4 million rows. Is
that a problem?

 str(all.subj)
'data.frame':	2438922 obs. of  112 variables:
 $ gr10                  : num  0 0 0 0 0 0 1 0 1 0 ...
 $ gr11                  : num  0 0 0 1 1 1 0 0 0 0 ...
 $ gr12                  : num  1 1 1 0 0 0 0 1 0 0 ...
 $ tid                   : Factor w/ 14982 levels "........","A.D46607",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ final.points          : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 4 4 4 2 3 3 2 2 3 2 ...

Here are two attempts and their results:

glmm.uncond <- MCMCglmm(final.points ~ gr10 + gr11 + gr12,
                         prior=list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=0))),
                         random = ~tid ,
                         family = "ordinal",
                         nitt=100000,
                         data = all.subj)

Error: segfault from C stack overflow


 glmm.uncond <- MCMCglmm(final.points ~ gr10 + gr11 + gr12,
                         prior=list(R=list(V=1, nu=0), G=list(G1=list(V=1, nu=0))),
                         random = ~tid ,
                         family = "ordinal",
                         nitt=100000,
                         data = all.subj)


Process R segmentation fault (core dumped) at Fri Jan 18 12:53:49 2013

Here is my sessionInfo
R version 2.15.1 (2012-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] foreign_0.8-52  MCMCglmm_2.17   corpcor_1.6.4   ape_3.0-6      
[5] coda_0.16-1     Matrix_1.0-10   lattice_0.20-13 tensorA_0.36   

loaded via a namespace (and not attached):
[1] compiler_2.15.1 gee_4.13-18     grid_2.15.1     nlme_3.1-107   
[5] tools_2.15.1   

and memory info.

 gc()
            used   (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells   1196029   63.9    1835812    98.1    1710298    91.4
Vcells 678385919 5175.7 1777423119 13560.7 2114537169 16132.7

Any help will be appreciated.
#
Hi Stuart,

How many (if any) iterations completed before you got the seg fault?
Also, how much memory does your system have?

Currently at 4000 iterations, I have not reproduced the error so far
with this made up example (although I have practically finished an
algorithm to sort any array in O(log log n) while waiting to get this
far ~1 hour per thousand iterations on a 6 core 3.9GHZ machine).

require(MCMCglmm)
require(MASS)
set.seed(10)
rint <- rep(rnorm(14982, 0, 4), each = floor(2.44e6/14982))
ID <- factor(rep(1:14982,  each = floor(2.44e6/14982)))
X <- MASS::mvrnorm(length(rint), mu = c(0, 0, 0),
  Sigma = matrix(c(1, .3, .3, .3, 1, .3, .3, .3, 1), 3))
b <- matrix(c(1.2, -.5, 2))
ycont <- rnorm(length(ID), mean = 2 + rint + X %*% b)
yord <- cut(ycont, breaks = quantile(ycont, c(0, .2, .4, .6, .8, 1)),
  include.lowest=TRUE, ordered_result=TRUE)
testdat <- data.frame(yord, ID, x1 = X[,1], x2 = X[, 2], x3 = X[, 3])
## > gc()
##            used  (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells  1117490  59.7    1710298  91.4  1368491  73.1
## Vcells 25878798 197.5   47836772 365.0 47753473 364.4
m <- MCMCglmm(yord ~ x1 + x2 + x3, random = ~ ID,
  family = "ordinal", data = testdat,
  prior = list(
    R = list(V = 1, fix = 1),
    G = list(
      G1 = list(V = 1, nu = 0))),
  nitt = 13000, thin = 10, burnin = 3000)

on a Win 8 pro x64 system with 32GB of memory and

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] MASS_7.3-22        ggplot2_0.9.3      MCMCglmm_2.17
 [4] corpcor_1.6.4      ape_3.0-6          coda_0.16-1        Matrix_1.0-10
 [8] lattice_0.20-10    tensorA_0.36
On Fri, Jan 18, 2013 at 1:55 PM, Stuart Luppescu <slu at ccsr.uchicago.edu> wrote:
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
6 days later
#
Hi Stuart,

2.4 million records is bigger than anything I've tried but in theory  
it should run, or return an error if it can't allocate enough memory.  
It definitely shouldn't be seg-faulting.  If you could send a  
reproducible example (preferably one where it fails quickly) I will  
take a look into it.

Cheers,

Jarrod.






Quoting Joshua Wiley <jwiley.psych at gmail.com> on Fri, 18 Jan 2013  
20:28:33 -0800: