nlme and NONMEM
On 11/1/07, Rob Forsyth <r.j.forsyth at newcastle.ac.uk> wrote:
I'd appreciate hearing from anyone (off list if you think it more appropriate) who can share their comparative experiences of non- linear mixed effects modelling with both nlme and NONMEM. The latter appears the traditional tool of choice particularly in pharmacology. Having built up some familiarity with nlme I am now collaborating (on a non-pharmacological project) with someone strongly encouraging me to move to NONMEM, although that clearly represents another considerable learning curve. The main argument in favour is the relative difficulty I have had in getting convergence with nlme models in my relatively sparse datasets particularly when (as in my case) I am interested in the random effects covariance matrix and wish to avoid having to coerce it using pdDiag().
I note the following comment from Douglas Bates on the R-help archive
The nonlinear optimization codes used by S-PLUS and R are different. There are advantages to the code used in R relative to the code used in S-PLUS but there are also disadvantages. One of the disadvantages is that the code in R will try very large steps during its initial exploration phase then it gets trapped in remote regions of the parameter space. For nlme this means that the estimate of the variance-covariance matrix of the random effects becomes singular.
Recent versions of the nlme library for R have a subdirectory called scripts that contains R scripts for the examples from each of the chapters in our book. If you check them you will see that not all of the nonlinear examples work in the R version of nlme. We plan to modify the choice of starting estimates and the internal algorithms to improve this but it is a long and laborious process. I ask for your patience.
Can Doug or anyone comment on whether the development work on lme4:::nlmer has included any steps in this direction or not?
Yes. The algorithm in nlme alternates between solving a linear mixed-effects problem to update estimates of the variance components and solving a penalized nonlinear least squares problem to update estimates of the fixed-effects parameters and our approximation to the conditional distribution of the random effects. This type of algorithm that alternates between two conditional optimizations is appealing because each of the sub-problems is much simpler than the general problem. However it may have poor convergence properties. In particular it may end up bouncing back and forth between two different conditional optima. Also, at the time we wrote nlme we tried to remove the constraints on the variance components by transforming them away (In simple situations we iterate on the logarithm of the relative variances of the random effects.) This works well except when the estimate of the variance component is zero. Trying to reach zero when iterating on the logarithm scale can lead to very flat likelihood surfaces. In the nlmer function I use the same parameterization of the variance-covariance of the random effects as in lmer and use the Laplace approximation to the log-likelihood. Both of these changes should provide more reliable convergence, although the nlmer code has not been vetted to nearly the same extent as has the nlme code. In other words, I am confident that the algorithm is superior but the implementation may still need some work. Regarding NONMEM, I think the work Jose Pinheiro and I did on nlme and my current work on lme4 is based on a different philosophy than is the basis of NONMEM. As I have mentioned on this and other forums (fora?) I want to be confident that the results from the code that I write actually do represent an optimum of the objective function (such as the likelihood or log-likelihood). Nonlinear mixed-effects models for sparse data frequently end up being over-parameterized. In such cases I view it as a feature and not a bug that nlme or nlmer will indicate failure to converge. They may also fail to converge when there is a well-defined optimum. That behavior is not a feature. As I understand it from people who have used NONMEM (I once had access to a copy of NONMEM but was never successful in getting it to run and haven't tried since then) it will produce estimates just about every time it is run. Considering how ill-defined the parameter estimates in some nonlinear mixed-effects model fits can be, I don't view this as a feature. Many people feel that statistical techniques and statistical software are some sort of magic that can extract information from data, even when the information is not there. As I understand it from conversations many years ago with Lewis Sheiner, his motivation in developing NONMEM (with Stu Beal) was to be able to use routine clinical data (such as the Quinidine data in the nlme package) to estimate population pharmacokinetic parameters. Routine clinical data like these are very sparse. In the Quinidine example the majority of subjects have 1, 2 or 3 concentration measurements
table(table(subset(Quinidine, !is.na(Quinidine$conc))$Subject))
1 2 3 4 5 6 7 10 11 46 33 31 9 3 8 2 1 3 and frequently these measurements are at widely spaced time points relative to the dosing schedule. Such cases contribute almost no information to the parameter estimates, yet I have had pharmacologists suggest to me that it would be wonderful to use study designs in which each patient has only one concentration measurement and somehow the magic of nonlinear mixed effects will conjure estimates from such data. The real world doesn't work like that. If you have only one observation per person it should make sense that no amount of statistical magic will be able to separate the per-observation noise from the per-person variability. So when I am told that NONMEM converged to parameter estimates on a problem where nlme or nlmer failed to converge I think (and sometimes say) "You mean NONMEM *declared* convergence to a set of estimates". Declaring convergence and converging can be different.