Skip to content

non-differentiable evaluation points in nlminb(), follow-up of PR#15052

4 messages · Ravi Varadhan, Spencer Graves, Sebastian Meyer

#
This is a follow-up question for PR#15052
<http://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15052>

There is another thing I would like to discuss wrt how nlminb() should
proceed with NAs. The question is: What would be a successful way to
deal with an evaluation point of the objective function where the
gradient and the hessian are not well defined?

If the gradient and the hessian both return NA values (assuming R <
r60789, e.g. R 2.15.1), and also if both return +Inf values, nlminb
steps to an NA parameter vector.
Here is a really artificial one-dimensional example for demonstration:

f <- function (x) {
  cat("evaluating f(", x, ")\n")
  if(is.na(x)) {Inf   # to prevent an infinite loop for R < r60789
  } else abs(x)
}
gr <- function (x) if (abs(x) < 1e-5) Inf else sign(x)
hess <- function (x) matrix(if (abs(x) < 1e-5) Inf else 0, 1L, 1L)
trace(gr)
trace(hess)
nlminb(5, f, gr, hess, control=list(eval.max=30, trace=1))

Thus, if nlminb reaches a point where the derivatives are not defined,
optimization is effectively lost. Is there a way to deal with such
points in nlminb? Otherwise, the objective function is doomed to
emergency stop() if it receives NA parameters because nlminb won't pick
up courage - regardless of the following return value of the objective
function.
As far as I would assess the situation, nlminb is currently not capable
of optimizing objective functions with non-differentiable points.

Best regards,
  Sebastian Meyer
1 day later
#
Can you provide a correct/sensible example that illustrates the problem?

Your gradient function is wrong.  So, how do you expect the algorithms to work?  

Why is the gradient Inf when |x| < 1.e-5?   It should be 0.

Here the following works fine:

require(optimx)

f <- function (x) {
if(is.na(x)) Inf else abs(x)
}

gr <- function (x) if (abs(x) < 1e-5) 0 else sign(x)

hess <- function (x) matrix(if (abs(x) < 1e-5) 0 else 0, 1L, 1L)

nlminb(5, f, gr, hess, control=list(eval.max=30, trace=1))

ans <- optimx(par=5, fn=f, gr=gr, control=list(all.methods=TRUE))

Ravi
#
On 9/26/2012 2:13 AM, Sebastian Meyer wrote:
Are you familiar with the CRAN Task View on Optimization and 
Mathematical Programming?  I ask, because as far as I know, "nlminb" is 
one of the oldest nonlinear optimizer in R.  If I understand the 
history, it was ported from S-Plus after at least one individual in the 
R Core team decided it was better for a certain task than "optim", and 
it seemed politically too difficult to enhance "optim".  Other nonlinear 
optimizers have been developed more recently and are available in 
specialized packages.


       In my opinion, functions like "nlminb" should never stop because 
it gets NA for a derivative at some point -- unless that honestly 
happened to be a local optimum.  If a function like "nlminb" computes an 
NA for a derivative not at a local optimum, it should then call a 
derivative-free optimizer, then try to compute the derivative at a local 
optimum.


       Also, any general optimizer that uses analytic derivatives should 
check to make sure that the analytic derivatives computed are reasonably 
close to numeric derivatives.  This can easily be done using the 
compareDerivatives function in the maxLik package.


       Hope this helps.
       Spencer

  
    
1 day later
#
Thanks for the pointers to the packages optimx and maxLik.
Up to now, I actually did not spend much time in searching for elaborate
R packages specifically dealing with optimization problems, but was just
satisfied using the optim methods or nlminb. I chose nlminb, because it
can take advantage of the analytical hessian which optim cannot. In my
experience with numerical log-likelihood maximizations it works pretty
good and is more efficient (in the sense of fast and precise) in finding
an optimum than the optim methods (if both analytical derivatives are
provided).

The situation where I was confronted with non-differentiable evaluation
points involved an iterative optimization between a penalized likelihood
of regression coefficients (about 430) and a marginal likelihood of 6
variance parameters for spatio-temporal data (I won't bother you with
any more details). If the starting values of the regression coefficients
were bad, the marginal likelihood looked really irregular with multiple
local non-differentiable maxima, where my analytical gradient and
hessian were not well-defined (an implicit high-dimensional matrix which
needs inversion was (numerically) singular). However, returning NA from
the gradient or hessian function to nlminb was not helpful as
illustrated by my very simple _artificial_ example (@Ravi: I know that
the gradient is not correct, but it illustrates how nlminb might get
lost in NA's). Meanwhile I probably solved the problem by simply
continuing with the generalized inverse implemented in MASS::ginv, which
pushed the algorithm back to work in my case. A proper alternative would
be to exit from nlminb and to switch to Nelder-Mead at that point.

Best regards,
   Sebastian
On 28.09.2012 10:53, Spencer Graves wrote: