Skip to content

Reproducing results from an old lmer fit

4 messages · Afshartous, David, Douglas Bates

#
All,

For a paper revision I'm trying to reproduce some results from an old lmer
fit with Rv2.7.1 prior to 5/28/08.  However, when I currently load Rv2.7.1
and lmer, the variance component estimates are slightly different than the
original fit; the sessionInfo() is as follows:
R version 2.7.1 (2008-06-23)
i386-apple-darwin8.10.1

locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] lme4_0.999375-24   Matrix_0.999375-11 lattice_0.17-8

loaded via a namespace (and not attached):
[1] grid_2.7.1  nlme_3.1-89

Thus, I assume that I need to use the same older version of lme4 and/or
Matrix which might be responsible for the difference in the results?  If
this is possible, how is this done?

Cheers,
David

PS - for whatever it's worth, if I do the fit with lme (nlme_3.1-89) under
Rv2.7.1 the results are closer to the original lmer results.



___________________________________________________

Original lmer fit from 5/08:
Model 2:
AIC  BIC logLik MLdeviance REMLdeviance
 2813 2843  -1397       2829         2795
Random effects:
 Groups     Name            Variance Std.Dev. Corr
 subject   (Intercept)      2226.3   47.183
            Drug            2132.9   46.184  -0.865
 Residual                   13673.6  116.934

Current lmer fit:
 AIC  BIC logLik deviance REMLdev
 2815 2849  -1397     2830    2795
Random effects:
 Groups     Name               Variance Std.Dev. Corr
 Patient_no (Intercept)         2165.1   46.531
            Drug.full.reverseC  1386.3   37.233  -1.000
 Residual                      13947.5  118.100


Current lme fit:
   AIC      BIC    logLik
  2814.611 2848.638 -1397.305

                   StdDev    Corr
(Intercept)         47.21031 (Intr)
Drug.full.reverseC  46.14014 -0.866
Residual           116.93541
#
On Thu, Feb 26, 2009 at 10:31 AM, Afshartous, David
<DAfshartous at med.miami.edu> wrote:
Notice the large change in the estimated correlation with very little
change in the log-likelihood or deviance.  This is an indication that
the model is over-specified.

Are you able to install R packages from the sources?  If so, you could
try the branches/allcoef version from the SVN archive.  On an
optimization problem like this it may be more successful in converging
to the global optimum instead of the local optimum.

This, by the way, is why I am always looking for better optimization
code to incorporate in R.  The code in the nlme and lme4 packages just
evaluates the log-likelihood or the REML criterion for the model at
the observed data and a proposed value of the parameters.  The actual
optimization is done by the nlminb optimizer which is based on very
old Fortran code written by David Gay.  Even though the code is old
this optimizer is, in my experience, more reliable than the optimizers
used by optim and by nlm.

It is surprisingly difficult to find good optimization code that is
covered by an open source license.  There is not a strong tradition of
open source code in the numerical analysis world.  Many users are
enthused about the ipopt library (projects.coin-or.org/Ipopt) but even
though that code is open source it depends on other software, some of
which is commercial.

The optimization in lme4 is minimization of a real-valued function of
real parameters, some of which are subject to non-negativity
constraints.  It is not an unconstrained optimization problem but the
constraints are very simple.  The objective function can be evaluated
and, in theory, the gradient can also be evaluated.  However, for
models with non-nested random effects evaluation of the gradient is
much, much more difficult and time consuming than is evaluation of the
objective function.  Thus the ideal optimizer would allow for simple
"box constraints" on the parameters and would be derivative-free or at
least allow for numeric evaluation of the gradient.  If anyone knows
of such code covered by a valid open-source license I would be
delighted to hear of it.
#
I haven't installed R packages from the sources before.  Looking in the e-mail help archives, it appears that this is potentially problematic; do you have a suggested link for the entire procedure? For Windows I found a very brief description at http://win-builder.r-project.org (I think I might need more detail than this).  I can do it on
either Mac or Windows OS, is one easier than the other?

RE the branches/allcoef version you mention, I looked at https://svn.r-project.org/R/branches/ and didn't see this.

My main goal is to produce the exact results I produced before since I need to extract some of the information from this model that I hadn't saved.  I'm not sure which lmer version I had installed on 5/08.  Thus, I need to attempt to install the version that was available on 5/08 and perhaps the one preceding that, and check which one matches my results.
On 2/26/09 1:17 PM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
On Thu, Feb 26, 2009 at 10:31 AM, Afshartous, David
<DAfshartous at med.miami.edu> wrote:
Notice the large change in the estimated correlation with very little
change in the log-likelihood or deviance.  This is an indication that
the model is over-specified.

Are you able to install R packages from the sources?  If so, you could
try the branches/allcoef version from the SVN archive.  On an
optimization problem like this it may be more successful in converging
to the global optimum instead of the local optimum.


This, by the way, is why I am always looking for better optimization
code to incorporate in R.  The code in the nlme and lme4 packages just
evaluates the log-likelihood or the REML criterion for the model at
the observed data and a proposed value of the parameters.  The actual
optimization is done by the nlminb optimizer which is based on very
old Fortran code written by David Gay.  Even though the code is old
this optimizer is, in my experience, more reliable than the optimizers
used by optim and by nlm.

It is surprisingly difficult to find good optimization code that is
covered by an open source license.  There is not a strong tradition of
open source code in the numerical analysis world.  Many users are
enthused about the ipopt library (projects.coin-or.org/Ipopt) but even
though that code is open source it depends on other software, some of
which is commercial.

The optimization in lme4 is minimization of a real-valued function of
real parameters, some of which are subject to non-negativity
constraints.  It is not an unconstrained optimization problem but the
constraints are very simple.  The objective function can be evaluated
and, in theory, the gradient can also be evaluated.  However, for
models with non-nested random effects evaluation of the gradient is
much, much more difficult and time consuming than is evaluation of the
objective function.  Thus the ideal optimizer would allow for simple
"box constraints" on the parameters and would be derivative-free or at
least allow for numeric evaluation of the gradient.  If anyone knows
of such code covered by a valid open-source license I would be
delighted to hear of it.
#
I was actually able to install the old lme4 version and did indeed produce my previous results.  For those interested, below is a summary of the steps I followed.   There is more detailed information at http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset for those that want to build R as well as packages, but for those only interested in packages, perhaps the instructions below well be more useful.  (thanks to Henrik Parn for several useful pointers).

1) The Toolset
Install the R Toolset from http://www.murdoch-sutherland.com/Rtools/
I installed it directly under C:\

Although the instructions during installation mention that there are several remaining tasks to complete installation, it appears that these are only necessary if one wants to build R in addition to just add-on packages.

2) The old version of the add-on package
I downloaded the old version of lme4 that I needed from http://cran.r-project.org/src/contrib/Archive/lme4/

I put it under C:\Documents and Settings\parn\Desktop\lme4_0.99875-9.tar.gz" and did NOT unpack it.

3) Setting the PATH:
I added the following to the beginning of my PATH variable:
C:\Rtools\bin;C:\Rtools\perl\bin;C:\Rtools\MinGW\bin;C:\R\R-2.7.1\bin;

I didn't add the other parts mentioned at http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset.
My entire path was now thus:
C:\Rtools\bin;C:\Rtools\perl\bin;C:\Rtools\MinGW\bin;C:\R\R-2.7.1\bin;C:\Program Files\MiKTeX 2.6\miktex\bin;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;C:\Program Files\WinLD;

I noticed also that the change in the path was not dynamic, i.e., for it to take effect I had to exit the DOS prompt and re-enter the DOS prompt, then type "path" at the DOS prompt to see make sure that it had taken effect.

4) Deleting old version of lme4:
I deleted the entire folder of the unwanted version of lme4, on my machine located at C:\R\R-2.7.1\library

5) R CMD
At the DOS prompt, I typed:
R CMD "C:\Documents and Settings\parn\Desktop\lme4_0.99385-9.tar.gz"

A 'new' lme4-folder with all its contents was created 'automatically' in
the appropriate place, i.e. here: C:\R\R-2.8.1\library. I did not specify this target myself.  Thus, I could load the package from the menu interface after starting R.
On 2/26/09 1:17 PM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
On Thu, Feb 26, 2009 at 10:31 AM, Afshartous, David
<DAfshartous at med.miami.edu> wrote:
Notice the large change in the estimated correlation with very little
change in the log-likelihood or deviance.  This is an indication that
the model is over-specified.

Are you able to install R packages from the sources?  If so, you could
try the branches/allcoef version from the SVN archive.  On an
optimization problem like this it may be more successful in converging
to the global optimum instead of the local optimum.

This, by the way, is why I am always looking for better optimization
code to incorporate in R.  The code in the nlme and lme4 packages just
evaluates the log-likelihood or the REML criterion for the model at
the observed data and a proposed value of the parameters.  The actual
optimization is done by the nlminb optimizer which is based on very
old Fortran code written by David Gay.  Even though the code is old
this optimizer is, in my experience, more reliable than the optimizers
used by optim and by nlm.

It is surprisingly difficult to find good optimization code that is
covered by an open source license.  There is not a strong tradition of
open source code in the numerical analysis world.  Many users are
enthused about the ipopt library (projects.coin-or.org/Ipopt) but even
though that code is open source it depends on other software, some of
which is commercial.

The optimization in lme4 is minimization of a real-valued function of
real parameters, some of which are subject to non-negativity
constraints.  It is not an unconstrained optimization problem but the
constraints are very simple.  The objective function can be evaluated
and, in theory, the gradient can also be evaluated.  However, for
models with non-nested random effects evaluation of the gradient is
much, much more difficult and time consuming than is evaluation of the
objective function.  Thus the ideal optimizer would allow for simple
"box constraints" on the parameters and would be derivative-free or at
least allow for numeric evaluation of the gradient.  If anyone knows
of such code covered by a valid open-source license I would be
delighted to hear of it.