nlme and NONMEM

I'd appreciate hearing from anyone (off list if you think it more  
appropriate) who can share their comparative experiences of non- 
linear mixed effects modelling with both nlme and NONMEM. The latter  
appears the traditional tool of choice particularly in pharmacology.  
Having built up some familiarity with nlme I am now collaborating (on  
a non-pharmacological project) with someone strongly encouraging me  
to move to NONMEM, although that clearly represents another  
considerable learning curve. The main argument in favour is the  
relative difficulty I have had in getting convergence with nlme  
models in my relatively sparse datasets particularly when (as in my  
case) I am interested in the random effects covariance matrix and  
wish to avoid having to coerce it using pdDiag().

I note the following comment from Douglas Bates on the R-help archive