Julia vs nlme vs lme4 implementation of fitting linear mixed models - R-SIG-mixed-models

Thu, Oct 16, 2014 1:31 AM #

What are the resources that compare how linear and generalised linear 
mixed models are fitted in julia, lme4 and nlme in terms of the how they 
differ in their implementation, and what advantages/disadvantages each 
has. I'm asking about the theoretical and computational issues rather 
than comparing speeds for any particular dataset/model.

Phillip Alday

Thu, Oct 16, 2014 1:40 AM #

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

There's a bit on the FAQ under "Which R packages (functions) fit GLMMs?":

http://glmm.wikidot.com/faq

whihc is fleshed out on this page:

http://glmm.wikidot.com/pkg-comparison

And check out this question on StackOverflow:

http://stats.stackexchange.com/questions/5344/how-to-choose-nlme-or-lme4-r-library-for-mixed-effects-models


Those pages only discuss R packages, for the Julia package(s), you
should check out the Julia package from Doug Bates, which has examples
worked as parallels to the lme4 examples:

https://github.com/dmbates/MixedModels.jl

If I recall correctly, he also has an iPython Notebook with a more
involved technical examination of mixed models in Julia, but I can't
find the link at the moment.

Best,
Phillip

On 16.10.2014 10:31, W Robert Long wrote:

_______________________________________________ 
R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBAgAGBQJUP4SEAAoJEH6E4TigDpMcqbwH/jGQO84Dc5C5xZkWKPuoBqzu
THaoJnkacieeCMFt1FF2z8wlHVlYDWNqkIWzSuE0fzTASywH+AG5Dj/D5KPdT78/
1M6LAAN1HiEdKztC5wa1ceAzlE2EyOHoQFSvSxLtxl/PmEZj/BmODaMRBPJHUkeN
ZxwR4y0V9/DdEtRtFeddnETg6YzFnITZh6r3cqXSp83McSx6MAcI3NXfTKtjQVbW
/hrW1KKacngo1o48INDfocEddbuUNKx4bT3DzN0iwLSoHRhGKADQr8VEZjtbJXvK
SVk5PFu/en9szvcdPd2HV/2plAE0weepTmf1gvce8C7Oxn20369drqxsTZrrtgw=
=cuZ2
-----END PGP SIGNATURE-----

Robert Long

Thu, Oct 16, 2014 1:55 AM #

Thanks Phillip

I've seen the glmm wikidot pages. I've also seen some of the code 
comparisons between lme4 and julia on github.  I've also seen the book 
by Pinheiro and Bates about nlme and the recent paper on lme4 at
http://arxiv.org/abs/1406.5823
[this paper gives some hints about what is done differently in julia, 
but I would really like more detail if possible]

I'm interested in the differences in internal implementation, rather 
than the differences that a typical user of the packages would be 
concerned with.

Thanks
Rob

On 16/10/2014 09:40, Phillip Alday wrote:

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Phillip Alday

Thu, Oct 16, 2014 2:23 AM #

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Then this might be more what you're looking for:

http://dmbates.github.io/MixedModels.jl/

although the mathematical discussion is largely restricted to the
introduction.

Doug will probably comment when it's a decent hour in Wisconsin, but
my understanding was that the Julia code was based on roughly the same
computational approach as lme4, but plays to Julia's strengths. The
standard reference for nlme is Pinherio and Bates, I'm still waiting
for my copy to be delivered, but the portions viewable on Google books
seem to suggest that the computational approach behind nlme is also
discussed in there. The approach in lme4 is based on sparse matrix
methods, so lme4 tends to be faster and more memory efficient (this
was recently hinted at on this list:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q4/022752.html )

Some aspects of the computational approach behind lme4 were
tangentially discussed recently
(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q4/022773.html), but
I suspect that discussion was part of the motivation for your question!

Cheers,
Phillip

On 16.10.2014 10:55, W Robert Long wrote:

_______________________________________________ 
R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________ 
R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________ 
R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBAgAGBQJUP45+AAoJEH6E4TigDpMcK4IH/2nzBi+ZPf7PuOkHrS+bVw5a
HI4lV+oQ85WqyRENy8TqdEr5IyU4LtONAPQ0Nx7R4+8oFOPkH5PgWnpyqH7UBkvT
PtPG/anZgrpSec0s4/lSeAbFmmO/z4FawwfuD/nS79t/JdCzoeKi4L0pZj70sXP7
URWd3ShiB6yddJ0Defuhhrw2qj3rIG1Jc5sS1cURncjSgsZbWQKJi791fC84gSiH
yOHYitDOliwkbRXRzpGTFQIjmYRzbZnDJOnNot4Tu3kpy91lWvOW7GYw9j+xOMjA
b+OdheNqkZwE5A0aDv2D3JXjefvIFxIA6FotArNw4U9aw9kXXFrAmachjJLMj6M=
=9lKp
-----END PGP SIGNATURE-----

Robert Long

Thu, Oct 16, 2014 7:25 AM #

Thanks for those links Phillip - though the github.io link does not 
properly render the maths for me. Indeed my question was partly 
motivated by the discussion on here yesterday (I had wondered if the 
discussion ended, or went off-list; I hope it's not the latter).

On 16/10/2014 10:23, Phillip Alday wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Then this might be more what you're looking for:

http://dmbates.github.io/MixedModels.jl/

although the mathematical discussion is largely restricted to the
introduction.

Doug will probably comment when it's a decent hour in Wisconsin, but
my understanding was that the Julia code was based on roughly the same
computational approach as lme4, but plays to Julia's strengths. The
standard reference for nlme is Pinherio and Bates, I'm still waiting
for my copy to be delivered, but the portions viewable on Google books
seem to suggest that the computational approach behind nlme is also
discussed in there. The approach in lme4 is based on sparse matrix
methods, so lme4 tends to be faster and more memory efficient (this
was recently hinted at on this list:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q4/022752.html )

Some aspects of the computational approach behind lme4 were
tangentially discussed recently
(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q4/022773.html), but
I suspect that discussion was part of the motivation for your question!

Cheers,
Phillip


On 16.10.2014 10:55, W Robert Long wrote:

Thanks Phillip

I've seen the glmm wikidot pages. I've also seen some of the code
comparisons between lme4 and julia on github.  I've also seen the
book by Pinheiro and Bates about nlme and the recent paper on lme4
at http://arxiv.org/abs/1406.5823 [this paper gives some hints
about what is done differently in julia, but I would really like
more detail if possible]

I'm interested in the differences in internal implementation,
rather than the differences that a typical user of the packages
would be concerned with.

Thanks Rob



On 16/10/2014 09:40, Phillip Alday wrote: There's a bit on the FAQ
under "Which R packages (functions) fit GLMMs?":

http://glmm.wikidot.com/faq

whihc is fleshed out on this page:

http://glmm.wikidot.com/pkg-comparison

And check out this question on StackOverflow:

http://stats.stackexchange.com/questions/5344/how-to-choose-nlme-or-lme4-r-library-for-mixed-effects-models




Those pages only discuss R packages, for the Julia package(s), you
should check out the Julia package from Doug Bates, which has
examples worked as parallels to the lme4 examples:

https://github.com/dmbates/MixedModels.jl

If I recall correctly, he also has an iPython Notebook with a more
involved technical examination of mixed models in Julia, but I
can't find the link at the moment.

Best, Phillip


On 16.10.2014 10:31, W Robert Long wrote:

What are the resources that compare how linear and
generalised linear mixed models are fitted in julia, lme4 and
nlme in terms of the how they differ in their implementation,
and what advantages/disadvantages each has. I'm asking about
the theoretical and computational issues rather than
comparing speeds for any particular dataset/model.

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Douglas Bates

Thu, Oct 16, 2014 9:57 AM #

Thanks for the question. It is a good exercise for me to answer it so that
I can get things straight in my mind. I am doing this off the top of my
head, relying on my all-too-faulty memory. Corrections and clarifications
are welcome. Ben, should we archive such information somewhere?

The basics:
nlme:
- developed in mid-1990's by Pinheiro and Bates
- documented in P & B (2000, Springer) and several papers
- implemented in R and C using the .Call interface and R's C API.
- fits linear mixed-effects models and nonlinear mixed-effects models
- allows for additional variance and or correlation structures in the
conditional distribution of the response, given the random effects.
- multiple formula model specification with PDMat, VarCorr, ... classes.
- allows for multiple nested grouping factors. There is a kludgy method
for fitting models with crossed grouping factors but I wouldn't push it too
hard.
- uses S3 methods and class tags (IMO "S3 classes" don't exist)
- optimization of parameter estimates via EM and Newton-Raphson
algorithms, profiled w.r.t. residual variance only
- Internal representation uses relative precision factor and log-Cholesky
formulation. Singular covariance matrices correspond to infinite parameter
values.
- no longer actively developed. Maintained by R-Core primarily to adjust
to changing requirements on R packages
- no explicit methods for fitting GLMMs.
- lme with iterative reweighting is used in the glmmPQL function in the
MASS package.

lme4:
- development began in late 90's by Bates, Maechler, DebRoy, Sarkar and
others. Current development is primarily by Bolker and Walker.
- documented in papers, vignettes. Bates started writing a book but didn't
complete the project - some draft chapters still online.
- implemented in R, C and C++ using facilities provided by Rcpp and Eigen
- fits linear mixed-effects models and GLMMs. Nominally has NLMM
capabilities but the implementation is not solid
- single formula model specification. Grouping factors can be nested,
partially or fully crossed.
- uses S4 classes and methods, S3 methods, reference classes, Matrix
package.
- internal representation uses relative covariance factor and sparse
Cholesky factorization. Optimization of profiled log-likelihood uses
general nonlinear optimizers that allow for box constraints (in practice
only require non-negativity of some of the parameters) for LMMs. GLMMs use
PIRLS (penalized iteratively reweighted least squares) to determine the
conditional modes of the random effects with Laplace or adaptive
Gauss-Hermite approximation to the log-likelihood. Settling on a single
robust and reliable optimizer has been difficult.
- all models use a sparse matrix representation to solve the penalized
least squares problem
- actively maintained and developed

MixedModels
- development started in 2012 by Bates
- little or no documentation outside of examples
- implemented exclusively in Julia (about 1600 lines of code)
- fits LMMs. Development of GLMM capabilities is planned.
- single formula specification similar to lme4. Because Julia is still a
young language not all lme4 formulas will work with MixedModels. (Building
the formula language interpretation code from scratch is not easy.)
Grouping factors can be nested or crossed (partially or completely).
- uses Julia methods and user-defined types. Julia methods provide for
multiple dispatch like S4.
- internal representation (all at the Julia level) provides for several
different PLS (penalized least squares) solvers according to the
configuration of the grouping factors.
- Models with a nested sequence of grouping factors, including a single
grouping factor as a trivial special case, use dense matrix methods and
provide an analytic gradient of the profiled log-likelihood. Similarly for
models with two crossed or nearly crossed grouping factors (think "subject"
and "item").
- Optimizers from Steven Johnson's NLopt Julia package that interfaces to
his C library, which is also used in the R packages nloptr and nloptwrap.
At present LN_BOBYQA is used for derivative-free optimization and LD_MMA
for models that provide a gradient but switching optimizers within the
NLopt library is trivial.
- Considerable effort made to reuse storage and cut down on
allocation/garbage collection.
- Actively developed. Maintenance hasn't been too much of an issue because
I don't think there are many users at present.

Examples of MixedModels fits are at

http://nbviewer.ipython.org/github/dmbates/slides/blob/master/2014-10-01-Ithaca/lmm%20Examples.ipynb

Because I was involved in the development of all three packages I will take
the liberty of commenting on the philosophy.

The purpose of developing nlme was for fitting nonlinear mixed-effects
models. Linear mixed-effects models were incorporated mainly as an
iterative step in nlme. Numerical methods used are not nearly as
sophisticated as those in lme4 and MixedModels.

lme4 was developed to provide a use-case for S4 classes and methods. The
Matrix package was initially part of lme4 then split off into a separate
package. The numerical methods implemented in lme4 are, in my opinion,
superior to those in nlme, mainly through the use of the relative
covariance factor and the profiled log-likelihood. These may seem like
details but to me they are very important. The motiviation for
incorporating sparse matrix classes in the Matrix package and accessing the
CHOLMOD code was to provide a general method for fitting such models.
Using C++, Rcpp and RcppEigen was motivated by trying to provide generality
and speed. The end result is confusing (my fault entirely) and fragile.

MixedModels was developed because I became interested in the Julia language
at the same time that I was disillusioned with at least some aspects of R.
As a new language Julia doesn't have the infrastructure of R (dataframes,
formula language, graphics, ...) but does have a clean implementation of
the parts that are available. The most important aspect of Julia is "one
language". You develop in the same language in which you optimize. The
type system in Julia allows me to incorporate the different kinds of
penalized least squares solvers in what to me is a clean way, thereby
taking advantage of structural simplifications in simple, but common,
cases. It is possible to do this in R/C++/Rcpp/EIgen but it would be a
massive headache and perhaps beyond my abilities to do it well.

Optimizing Julia code is often done at the expense of transparency. It is
obvious what

C = C - A*B

means when C, A and B are matrices (* means matrix multiplication in
Julia). It is less obvious what

BLAS.gemm!('N','N',-1.,A,B,1.,C)

means but the second form avoids taking two unnecessary copies of the
matrix C and running a couple of extra loops. This isn't important if you
only do it once. If you do it several dozen times in each function
evaluation in an optimization that requires tens of thousands function
evaluations, it is important.

Please let me know if this answers your question.

Robert Long

Fri, Oct 17, 2014 5:34 AM #

Dear Professor Bates

This is a great help - thank you. I may have some follow-on questions, 
in which case I shall reply to the list but your response below gives me 
plenty to occupy myself with for now.

Regards
Robert Long

On 16/10/2014 17:57, Douglas Bates wrote:

Thanks for the question. It is a good exercise for me to answer it so
that I can get things straight in my mind. I am doing this off the top
of my head, relying on my all-too-faulty memory. Corrections and
clarifications are welcome. Ben, should we archive such information
somewhere?

The basics:
nlme:
- developed in mid-1990's by Pinheiro and Bates
- documented in P & B (2000, Springer) and several papers
- implemented in R and C using the .Call interface and R's C API.
- fits linear mixed-effects models and nonlinear mixed-effects models
- allows for additional variance and or correlation structures in the
conditional distribution of the response, given the random effects.
- multiple formula model specification with PDMat, VarCorr, ... classes.
- allows for multiple nested grouping factors. There is a kludgy method
for fitting models with crossed grouping factors but I wouldn't push it
too hard.
- uses S3 methods and class tags (IMO "S3 classes" don't exist)
- optimization of parameter estimates via EM and Newton-Raphson
algorithms, profiled w.r.t. residual variance only
- Internal representation uses relative precision factor and
log-Cholesky formulation. Singular covariance matrices correspond to
infinite parameter values.
- no longer actively developed. Maintained by R-Core primarily to
adjust to changing requirements on R packages
- no explicit methods for fitting GLMMs.
- lme with iterative reweighting is used in the glmmPQL function in the
MASS package.

lme4:
- development began in late 90's by Bates, Maechler, DebRoy, Sarkar and
others. Current development is primarily by Bolker and Walker.
- documented in papers, vignettes. Bates started writing a book but
didn't complete the project - some draft chapters still online.
- implemented in R, C and C++ using facilities provided by Rcpp and Eigen
- fits linear mixed-effects models and GLMMs. Nominally has NLMM
capabilities but the implementation is not solid
- single formula model specification. Grouping factors can be nested,
partially or fully crossed.
- uses S4 classes and methods, S3 methods, reference classes, Matrix
package.
- internal representation uses relative covariance factor and sparse
Cholesky factorization. Optimization of profiled log-likelihood uses
general nonlinear optimizers that allow for box constraints (in practice
only require non-negativity of some of the parameters) for LMMs. GLMMs
use PIRLS (penalized iteratively reweighted least squares) to determine
the conditional modes of the random effects with Laplace or adaptive
Gauss-Hermite approximation to the log-likelihood. Settling on a single
robust and reliable optimizer has been difficult.
- all models use a sparse matrix representation to solve the penalized
least squares problem
- actively maintained and developed

MixedModels
- development started in 2012 by Bates
- little or no documentation outside of examples
- implemented exclusively in Julia (about 1600 lines of code)
- fits LMMs. Development of GLMM capabilities is planned.
- single formula specification similar to lme4. Because Julia is still
a young language not all lme4 formulas will work with MixedModels.
(Building the formula language interpretation code from scratch is not
easy.) Grouping factors can be nested or crossed (partially or completely).
- uses Julia methods and user-defined types. Julia methods provide for
multiple dispatch like S4.
- internal representation (all at the Julia level) provides for several
different PLS (penalized least squares) solvers according to the
configuration of the grouping factors.
- Models with a nested sequence of grouping factors, including a single
grouping factor as a trivial special case, use dense matrix methods and
provide an analytic gradient of the profiled log-likelihood. Similarly
for models with two crossed or nearly crossed grouping factors (think
"subject" and "item").
- Optimizers from Steven Johnson's NLopt Julia package that interfaces
to his C library, which is also used in the R packages nloptr and
nloptwrap. At present LN_BOBYQA is used for derivative-free
optimization and LD_MMA for models that provide a gradient but switching
optimizers within the NLopt library is trivial.
- Considerable effort made to reuse storage and cut down on
allocation/garbage collection.
- Actively developed. Maintenance hasn't been too much of an issue
because I don't think there are many users at present.

Examples of MixedModels fits are at

http://nbviewer.ipython.org/github/dmbates/slides/blob/master/2014-10-01-Ithaca/lmm%20Examples.ipynb

Because I was involved in the development of all three packages I will
take the liberty of commenting on the philosophy.

The purpose of developing nlme was for fitting nonlinear mixed-effects
models. Linear mixed-effects models were incorporated mainly as an
iterative step in nlme. Numerical methods used are not nearly as
sophisticated as those in lme4 and MixedModels.

lme4 was developed to provide a use-case for S4 classes and methods. The
Matrix package was initially part of lme4 then split off into a separate
package. The numerical methods implemented in lme4 are, in my opinion,
superior to those in nlme, mainly through the use of the relative
covariance factor and the profiled log-likelihood. These may seem like
details but to me they are very important. The motiviation for
incorporating sparse matrix classes in the Matrix package and accessing
the CHOLMOD code was to provide a general method for fitting such
models. Using C++, Rcpp and RcppEigen was motivated by trying to
provide generality and speed. The end result is confusing (my fault
entirely) and fragile.

MixedModels was developed because I became interested in the Julia
language at the same time that I was disillusioned with at least some
aspects of R. As a new language Julia doesn't have the infrastructure
of R (dataframes, formula language, graphics, ...) but does have a clean
implementation of the parts that are available. The most important
aspect of Julia is "one language". You develop in the same language in
which you optimize. The type system in Julia allows me to incorporate
the different kinds of penalized least squares solvers in what to me is
a clean way, thereby taking advantage of structural simplifications in
simple, but common, cases. It is possible to do this in
R/C++/Rcpp/EIgen but it would be a massive headache and perhaps beyond
my abilities to do it well.

Optimizing Julia code is often done at the expense of transparency. It
is obvious what

C = C - A*B

means when C, A and B are matrices (* means matrix multiplication in
Julia). It is less obvious what

BLAS.gemm!('N','N',-1.,A,B,1.,C)

means but the second form avoids taking two unnecessary copies of the
matrix C and running a couple of extra loops. This isn't important if
you only do it once. If you do it several dozen times in each function
evaluation in an optimization that requires tens of thousands function
evaluations, it is important.

Please let me know if this answers your question.