Dealing with large datasets in lme/lmer

Tue, Sep 11, 2007 8:30 AM

Again thanks a lot for the help, Dr. Bates!

Now with the development version of lmer I got an error for the  
following model which works fine with the old lmer:

 >fit.lmer <- lmer(y ~ FA*FB*FC+weight+(1|Subj), Model);
Error in validObject(.Object) : invalid class "lmer" object: dims  
slot not named or incorrect length

What went wrong?

Regarding compiling, it works fine on a Mac OS X 10.4 with with duo  
2.7GHz processors, but failed on a Mac OS X 10.4.10 with a 2 GHz  
Intel Core Duo processor with the following error (does it have  
something to do with the Intel processor?):

 >  R CMD INSTALL ./lme4
* Installing to library '/Library/Frameworks/R.framework/Resources/ 
library'
* Installing *source* package 'lme4' ...
** libs
** arch - i386
gcc-4.0 -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -no-cpp- 
precomp -I/Library/Frameworks/R.framework/Resources/include -I/ 
Library/Frameworks/R.framework/Resources/include/i386  -msse3 -I"/ 
Library/Frameworks/R.framework/Resources/library/Matrix/include" - 
D__NO_MATH_INLINES  -fPIC  -g -O2 -std=gnu99 -march=nocona -c init.c - 
o init.o
gcc-4.0 -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -no-cpp- 
precomp -I/Library/Frameworks/R.framework/Resources/include -I/ 
Library/Frameworks/R.framework/Resources/include/i386  -msse3 -I"/ 
Library/Frameworks/R.framework/Resources/library/Matrix/include" - 
D__NO_MATH_INLINES  -fPIC  -g -O2 -std=gnu99 -march=nocona -c lmer.c - 
o lmer.o
lmer.c:356:48: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'Mer_eta':
lmer.c:356: error: 'N_AS_CHM_DN' undeclared (first use in this function)
lmer.c:356: error: (Each undeclared identifier is reported only once
lmer.c:356: error: for each function it appears in.)
lmer.c:1448:40: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'nglmer_condMode':
lmer.c:1448: error: 'N_AS_CHM_DN' undeclared (first use in this  
function)
lmer.c:1449:30: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c:1718:45: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'lmer_MCMC_betab':
lmer.c:1718: error: 'N_AS_CHM_DN' undeclared (first use in this  
function)
make: *** [lmer.o] Error 1
chmod: /Library/Frameworks/R.framework/Versions/2.5/Resources/library/ 
lme4/libs/i386/*: No such file or directory
** arch - ppc
gcc-4.0 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk - 
std=gnu99 -no-cpp-precomp -I/Library/Frameworks/R.framework/Resources/ 
include -I/Library/Frameworks/R.framework/Resources/include/ppc  -I/ 
usr/local/include -I"/Library/Frameworks/R.framework/Resources/ 
library/Matrix/include"   -fPIC  -g -O2 -c init.c -o init.o
gcc-4.0 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk - 
std=gnu99 -no-cpp-precomp -I/Library/Frameworks/R.framework/Resources/ 
include -I/Library/Frameworks/R.framework/Resources/include/ppc  -I/ 
usr/local/include -I"/Library/Frameworks/R.framework/Resources/ 
library/Matrix/include"   -fPIC  -g -O2 -c lmer.c -o lmer.o
lmer.c:356:48: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'Mer_eta':
lmer.c:356: error: 'N_AS_CHM_DN' undeclared (first use in this function)
lmer.c:356: error: (Each undeclared identifier is reported only once
lmer.c:356: error: for each function it appears in.)
lmer.c:1448:40: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'nglmer_condMode':
lmer.c:1448: error: 'N_AS_CHM_DN' undeclared (first use in this  
function)
lmer.c:1449:30: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c:1718:45: error: macro "N_AS_CHM_DN" passed 3 arguments, but  
takes just 2
lmer.c: In function 'lmer_MCMC_betab':
lmer.c:1718: error: 'N_AS_CHM_DN' undeclared (first use in this  
function)
make: *** [lmer.o] Error 1
chmod: /Library/Frameworks/R.framework/Versions/2.5/Resources/library/ 
lme4/libs/ppc/*: No such file or directory
ERROR: compilation failed for package 'lme4'
** Removing '/Library/Frameworks/R.framework/Versions/2.5/Resources/ 
library/lme4'
** Restoring previous '/Library/Frameworks/R.framework/Versions/2.5/ 
Resources/library/lme4'


Thanks,
Gang

=========
Gang Chen, Ph. D.
National Institutes of Health, DHHS

On Sep 10, 2007, at 4:26 PM, Douglas Bates wrote:

The best way to obtain the package sources is with a subversion
client.  On Linux it is called svn.  The call to check out a copy of
the package sources is

svn co https://svn.r-project.org/R-packages/branches/gappy-lmer ./lme4

You need the whole directory to build the package.  Under Linux or Mac
OS X I use

R CMD INSTALL ./lme4

For Windows I use Uwe's win-builder at win-builder.R-project.org

On 9/10/07, Gang Chen <gangchen at mail.nih.gov> wrote:

Dr. Bates,

Thank you very much for the quick response and suggestions.

I really want to try out the lmer development package, but could not
figure out how to download it. I tried to run

wget https://svn.r-project.org/R-packages/branches/gappy-lmer/
(Or wget --no-check-certificate -r -nv -m -np -nH https://svn.r-
project.org/R-packages/branches/gappy-lmer/ )

but it didn't work. So how can I obtain the package? Do I only need
the files under

https://svn.r-project.org/R-packages/branches/gappy-lmer/R/

or do I need to compile/build the package with the source code  
somehow?

Thanks,
Gang

=========
Gang Chen, Ph. D.
National Institutes of Health, HHS


On Sep 5, 2007, at 3:11 PM, Douglas Bates wrote:

On 9/4/07, Gang Chen <gangchen at mail.nih.gov> wrote:

Dear all,

I'm running mixed-effects analysis with large datasets in a loop  
like
this:

for (i in 1:60) {
for (j in 1:60) {
for (k in 1:60) {
    [...update y here in Model here...]
    fit.lme <- lme(y ~ FA*FB*FC+weight, random = pdBlocked(list
(pdCompSymm(~FB-1), pdCompSymm(~FC-1), pdIdent(~1))),
weights=varIdent
(form=~1|FA), Model);
    Stat[i, j, k,] <- anova(fit.lme)$F[-1];
}
}
}

Did you create the array Stat outside the loop?  If not you will be
doing a lot of copying of elements of that array.

This takes a little over 100 hours to finish on a Mac G5 with duo
2.7GHz processors and 4GB memory.

In the mixed-effects model

y = X*beta + Z*b + e

the fixed-effects nxp matrix X and random-effects matrix nxq Z are
always the same for all the iterations in my case, and the only  
thing
that differs is y (and the estimates of beta, b and e also  
differ of
course). In my case n = 504 (large), p and q are moderate.  I just
read Dr. Douglas Bates's presentation during uerR! 2007 (very
informative by the way):

Thank you.

http://user2007.org/program/presentations/bates.pdf

It seems many components in the extended system matrix (equation  
(2)
on page 22) for the Cholesky decomposition remain the same  
during the
iterations. So there are a lot of repetitive computations on those
same matrix operations in the above loop. How can I achieve a  
better
efficiency? Someone suggested to me running lme/lmer with a two-
dimensional response Y instead of one-dimensional y. My questions
are:

(1) So far I have only seen people running lme/lmer with y in a
format of one-dimensional array from a table. If I combine all  
those
y's (indices i, j, k) into an two-dimensional array Y, is there  
a way
I can run lme/lmer on Y instead of y? In other words, does lme/lmer
take a two-dimensional array Y?

Not at present.

If so, do I have to save the huge
array in a table in text file and then read in R before I run lme/
lmer?

No.  There are many ways of getting data into R other than  
creating a
text file and reading it.  See the manual "R Data Import/Export" and
also Martin Maechler's presentation at useR!2007.
http://user2007.org/program/presentations/maechler.pdf

Also if that is the case, how can I label those many columns
somehow associated with Y?

(2) A more serious concern is about memory. With the current  
looping
approach it uses about 1GB. If I could possibly go with the matrix
method described in (1), I'm worried that it might not be  
practically
feasible with the current computers. Any thoughts?

Well first you are discussing the computational methods used in lmer
but you want to fit a model with different residual variances for
different groups.  At present you can't do that in lmer.

If you look at the lmer function in the development version of the
lme4 package (currently at
https://svn.r-project.org/R-packages/branches/gappy-lmer, soon to be
at http://r-forge.r-project.org/projects/lme4 for some value of
"soon") you will see that it follows the equations in my useR
presentation fairly closely.  The Xy array is n by (p + 1) with X in
the first p columns and y in the p + 1st column.  The object of  
class
"lmer" has slots named y, Zt (Z-transpose), ZtXy (Zt %*% Xy), and
XytXy (crossprod(Xy)). After fitting the model to the first  
simulated
response, producing the object 'fm',  the only operations needed to
update the model are

 fm at y <- newy
 Xy <- cbind(fm at X, fm at y)
 fm at ZtXy <- fm at Zt %*% Xy
 fm at XytXy <- crossprod(Xy)
 lme4:::mer_finalize(fm, verbose)

where 'verbose' is a logical scalar indicating if you want verbose
output during the optimization phase.  Once you get things  
working on
a small example you would probably want to turn that off.

Please note that this code applies to the development version of the
lme4 package.

Dealing with large datasets in lme/lmer

Thread (7 messages)