Skip to content

question about nls

4 messages · John C Nash, Gabor Grothendieck

#
Actually, it likely won't matter where you start. The Gauss-Newton 
direction is nearly always close to 90 degrees from the gradient, as 
seen by turning trace=TRUE in the package nlmrt function nlxb(), which 
does a safeguarded Marquardt calculation. This can be used in place of 
nls(), except you need to put your data in a data frame. It finds a 
solution pretty straightforwardly, though with quite a few iterations 
and function evaluations.

Of course, one may not really want to do any statistics with 4 
observations and 3 parameters, but the problem illustrates the GN vs. 
Marquardt directions.

JN


 > sol<-nlxb(y ~ exp(a + b*x)+d,start=list(a=0,b=0,d=1), data=mydata, 
trace=T)
formula: y ~ exp(a + b * x) + d
lower:[1] -Inf -Inf -Inf
upper:[1] Inf Inf Inf
...snip...
Data variable  y :[1]  0.8  6.5 20.5 45.9
Data variable  x :[1]  60  80 100 120
Start:lamda: 1e-04  SS= 2291.15  at  a = 0  b = 0  d = 1  1 / 0
gradient projection =  -2191.093  g-delta-angle= 90.47372
Stepsize= 1
lamda: 0.001  SS= 4.408283e+55  at  a = -25.29517  b = 0.74465  d = 
-24.29517  2 / 1
gradient projection =  -2168.709  g-delta-angle= 90.48307
Stepsize= 1
lamda: 0.01  SS= 3.986892e+54  at  a = -24.55223  b = 0.7284461  d = 
-23.55223  3 / 1
gradient projection =  -1991.804  g-delta-angle= 90.58199
Stepsize= 1
lamda: 0.1  SS= 2.439544e+46  at  a = -18.71606  b = 0.6010118  d = 
-17.71606  4 / 1
gradient projection =  -1476.935  g-delta-angle= 92.79733
Stepsize= 1
lamda: 1  SS= 4.114152e+23  at  a = -2.883776  b = 0.2505892  d = 
-1.883776  5 / 1
gradient projection =  -954.5234  g-delta-angle= 91.78881
Stepsize= 1
lamda: 10  SS= 39033042903  at  a = 2.918809  b = 0.07709855  d = 
3.918809  6 / 1
gradient projection =  -264.9953  g-delta-angle= 91.41647
Stepsize= 1
<<lamda: 4  SS= 571.451  at  a = 1.023367  b = 0.01762421  d = 2.023367 
  7 / 1
gradient projection =  -60.46016  g-delta-angle= 90.96421
Stepsize= 1
<<lamda: 1.6  SS= 462.3257  at  a = 1.080764  b = 0.0184132  d = 
1.981399  8 / 2
gradient projection =  -56.91866  g-delta-angle= 90.08103
Stepsize= 1
<<lamda: 0.64  SS= 359.6233  at  a = 1.135265  b = 0.01942354  d = 
0.9995471  9 / 3
gradient projection =  -65.90027  g-delta-angle= 90.04527
Stepsize= 1

... snip ...

lamda: 0.2748779  SS= 0.5742948  at  a = -0.1491842  b = 0.03419761  d = 
-6.196575  31 / 20
gradient projection =  -6.998402e-25  g-delta-angle= 90.07554
Stepsize= 1
lamda: 2.748779  SS= 0.5742948  at  a = -0.1491842  b = 0.03419761  d = 
-6.196575  32 / 20
gradient projection =  -2.76834e-25  g-delta-angle= 90.16973
Stepsize= 1
lamda: 27.48779  SS= 0.5742948  at  a = -0.1491842  b = 0.03419761  d = 
-6.196575  33 / 20
gradient projection =  -4.632864e-26  g-delta-angle= 90.08759
Stepsize= 1
No parameter change
On 13-03-15 07:00 AM, r-help-request at r-project.org wrote:
#
On Fri, Mar 15, 2013 at 9:45 AM, Prof J C Nash (U30A) <nashjc at uottawa.ca> wrote:
Interesting observation but it does converge in 5 iterations with the
improved starting value whereas it fails due to a singular gradient
with the original starting value.
+ x    y
+ 60 0.8
+ 80 6.5
+ 100 20.5
+ 120 45.9
+ "
Error in nlsModel(formula, mf, start, wts) :
  singular gradient matrix at initial parameter estimates
Nonlinear regression model
  model:  y ~ exp(a + b * x) + d
   data:  DF
      a       b       d
-0.1492  0.0342 -6.1966
 residual sum-of-squares: 0.5743

Number of iterations to convergence: 5
Achieved convergence tolerance: 6.458e-07
#
As Gabor indicates, using a start based on a good approximation is 
usually helpful, and nls() will generally find solutions to problems 
where there are such starts, hence the SelfStart methods. The Marquardt 
approaches are more of a pit-bull approach to the original 
specification. They grind away at the problem without much finesse, but 
generally get there eventually. If one is solving lots of problems of a 
similar type, good starts are the way to go. One-off (or being lazy), I 
like Marquardt.

It would be interesting to know what proportion of random starting 
points in some reasonable bounding box get the "singular gradient" 
message or other early termination with nls() vs. a Marquardt approach, 
especially as this is a tiny problem. This is just one example of the 
issue R developers face in balancing performance and robustness. The GN 
method in nls() is almost always a good deal more efficient than 
Marquardt approaches when it works, but suffers from a fairly high 
failure rate.


JN
On 13-03-15 10:01 AM, Gabor Grothendieck wrote:
#
I decided to follow up my own suggestion and look at the robustness of 
nls vs. nlxb. NOTE: this problem is NOT one that nls() would usually be 
applied to. The script below is very crude, but does illustrate that 
nls() is unable to find a solution in >70% of tries where nlxb (a 
Marquardt approach) succeeds.

I make no claim for elegance of this code -- very quick and dirty.

JN



debug<-FALSE
library(nlmrt)
x<-c(60,80,100,120)
y<-c(0.8,6.5,20.5,45.9)
mydata<-data.frame(x,y)
mydata
xmin<-c(0,0,0)
xmax<-c(8,8,8)
set.seed(123456)
nrep <- as.numeric(readline("Number of reps:"))
pnames<-c("a", "b", "d")
npar<-length(pnames)
# set up structure to record results
#  need start, failnls, parnls, ssnls, failnlxb, parnlxb, ssnlxb
tmp<-matrix(NA, nrow=nrep, ncol=3*npar+4)
outcome<-as.data.frame(tmp)
rm(tmp)
colnames(outcome)<-c(paste("st-",pnames[[1]],''),
	paste("st-",pnames[[2]],''),
	paste("st-",pnames[[3]],''),
         "failnls", paste("nls-",pnames[[1]],''),
	paste("nls",pnames[[1]],''),
	paste("nls-",pnames[[1]],''), "ssnls",
         "failnlxb", paste("nlxb-",pnames[[1]],''),
	paste("nlxb",pnames[[1]],''),
	paste("nlxb-",pnames[[1]],''), "ssnlxb")


for (i in 1:nrep){

cat("Try ",i,":\n")

st<-runif(3)
names(st)<-pnames
if (debug) print(st)
rnls<-try(nls(y ~ exp(a + b*x)+d,start=st, data=mydata), silent=TRUE)
if (class(rnls) == "try-error") {
    failnls<-TRUE
    parnls<-rep(NA,length(st))
    ssnls<-NA
} else {
    failnls<-FALSE
    ssnls<-deviance(rnls)
    parnls<-coef(rnls)
}
names(parnls)<-pnames
if (debug) {
   cat("nls():")
   print(rnls)
}
rnlxb<-try(nlxb(y ~ exp(a + b*x)+d,start=st, data=mydata), silent=TRUE)
if (class(rnlxb) == "try-error") {
    failnxlb<-TRUE
    parnlxb<-rep(NA,length(st))
    ssnlxb<-NA
} else {
    failnlxb<-FALSE
    ssnlxb<-rnlxb$ssquares
    parnlxb<-rnlxb$coeffs
}
names(parnls)<-pnames
if (debug) {
   cat("nlxb():")
   print(rnlxb)
   tmp<-readline()
   cat("\n")
   }
  solrow<-c(st, failnls=failnls, parnls, ssnls=ssnls,
      failnlxb=failnlxb, parnlxb, ssnlxb=ssnlxb)
   outcome[i,]<-solrow
} # end loop

cat("Proportion of nls  runs that failed = ",sum(outcome$failnls)/nrep,"\n")
cat("Proportion of nlxb runs that failed = 
",sum(outcome$failnlxb)/nrep,"\n")