Skip to content

checkpointing

10 messages · Brian Ripley, Gabor Grothendieck, Roger D. Peng +3 more

#
I would like to checkpoint some of my calculations in R, specifically
those using optim.  As far as I can tell, R doesn't have this facility,
and there seems to have been little discussion of it.

checkpointing is saving enough of the current state so that work can
resume where things were left off if, to take my own example, the system
crashes after 8 days of calculation.

My thought is that this could be added as an option to optim as one of
the control parameters.

I thought I'd check here to see if anyone is aware of any work in this
area or has any thoughts about how to proceed.  In particular, is save a
reasonable way to save a few variables to disk?  I could also make the
code available when/if I get it working.
#
I use save.image() or save(), which seem exactly what you are asking for.
On Mon, 2 Jan 2006, Ross Boylan wrote:

            

  
    
#
On Jan 3, 2006, at 9:36 AM, Brian D Ripley wrote:

            
I have the (perhaps unsupported) impression that Ross wanted to save  
the progress during the optim run. Since it spends most of its time  
in the .Internal(optim(***)) call, save/save.image would not work.

/Kasper
#
One possibility for overcoming this problem might be to divide the
variables being optimized over into two sets using a grid over one
set (which should probably consist of only one or two variables) and then
fixing the gridded variables use optim over the rest.  In many problems its
really just one or two variables that cause all the problems and if that
were the case, each of the many runs of optim would be fast
and one could save its state upon completion.

Of course it would be even more convenient if there were some
builtin facility as the poster stated but this might work depending
on the particulars of the problem.
On 1/3/06, Kasper Daniel Hansen <khansen at stat.berkeley.edu> wrote:
#
On Tue, 3 Jan 2006, Kasper Daniel Hansen wrote:

            
It certainly does not!  It is most likely spending time in the callbacks 
to evaluate the function/gradient.  We have used save() to save the 
current information (e.g. current parameter values) from inside optim so a 
restart could be done, but then I have only once encountered someone 
running a single optimization for over a week: there normally are ways to 
speed things up.

  
    
#
One possibility is to write in some checkpointing into your objective function, 
such as saving the current parameter values via 'save()' or 'dput()'.

-roger
Ross Boylan wrote:

  
    
#
Roger D. Peng wrote:
Has anyone successfully checkpointed and restarted R using any of the 
linux process checkpointing solutions I find when I google for 'linux 
process checkpointing'? I cant see why you'd bother implementing 
checkpointing within optim() if you can do it at the process level and 
hence in the middle of anything.

  Unless you're running Windows.

An example and some links here:

  http://www.cise.ufl.edu/~mfoster/research/uclik/uclik.htm

Barry
#
On Jan 3, 2006, at 2:26 PM, Prof Brian Ripley wrote:

            
I stand corrected. Actually I should have thought of this.
.
/Kasper
#
On Tue, Jan 03, 2006 at 01:26:39PM +0000, Prof Brian Ripley wrote:
I'm having trouble following; does that sentence mean the preceding
one is wrong, or that save won't work.
Yes.
Did you do this by
* using an existing feature of optim I don't know about;
* modifying the code for optim
* writing an objective function that saved the parameters with which
  it was called (which, now that I think of it, might be the simplest
  approach)?

My guess was that optim keeps its state in local variables that would
not be captured by a save.image.  Are you saying the relevant
variables are saved and can be fished out if needed?

It would also probably save some time if the estimated matrix of 2nd
derivatives were saved too (I supply only the objective function, not
derivatives), but that's minor compared to having the parameter
values.
I certainly hope so.  However, the problem size is likely to remain
large.

In answer to the other question about using OS checkpointing
facilities, I haven't tried them since the application will be running
on a cluster.  More precisely, the optimization will be driven from a
single machine, but the calculation of the objective function will be
distributed.  So checkpointing at the level of the optimization
function is a good fit to my needs.  There are some cluster OS's that
provide a kind of unified process space across the processors (scyld,
mosix), but we're not using them and checkpointing them is an unsolved
problem.  At least, it was unsolved a couple of years ago when I
looked into it.

Ross
3 days later
#
Here's some code I put together for checkpointing a function being
optimized. Hooking directly into optim would require modifying its C
code, so this seemed the easiest route.  I've wanted more information on
the iterations than is currently provided, so this stuff some info back
in the calling environment (by default).

# wrapper to do checkpointing

# Ross Boylan ross at biostat.ucsf.edu
# 06-Jan-2006
# (C) 2006 Regents of University of California
# Distributed under the Gnu Public License v2 or later at your option

# If you want to checkpoint the optimization of a function f
# Use checkpoint(f) instead.  See below for other possible arguments.

# default operation for checkpoint(fnfoo) is to record the iterations
# in fnfoo.trace in the calling environment

# WARNING: Any existing variable with name in argument name
# will be deleted from the indicated frame
checkpoint <- function(f,
                       name = paste(substitute(f), ".trace", sep=""),
                       fileName = substitute(f),
                       nCalls = 1,
                       nTime = 60*15,
                       frame = parent.frame()) {
  # f is the objective function
  # frame is where to put the variable name
  # name will be a data.frame with rows containing
  #   iteration, time, value, parameters
  # fileName is the stem of the name to save for checkpointing
  #  saving will alternate between files with 0 and 1 appended
  # Saving to disk will happen every nCalls or nTime seconds,
  # whichever comes first
  if (exists(name, where=frame))
      rm(list=name, pos=frame)
  ckpt.lastSave <- 0 # alternate 0/1 for file to write to
  ckpt.lastTime <- Sys.time()  # last time saved
  function(params, ...) {
    p <- as.list(params)
    names(p) <- seq(length(params))
    if (exists(name, where=frame, inherits=FALSE)) {
      progress <- get(name, pos=frame)
      progress <- rbind(progress,
                        data.frame(row.names=dim(progress)[1]+1,
time=Sys.time(),
                        val=NA, p), deparse.level=0)
    } else
        progress <- data.frame(row.names=1, time=Sys.time(), val=NA, p)
    n <- dim(progress)[1]
    # write to disk
    if (n%%nCalls == 0 || progress[n, 1]- ckpt.lastTime > nTime) {
      ckpt.lastSave <<- (ckpt.lastSave+1) %% 2
      save(progress, file=paste(fileName, ckpt.lastSave, sep=""))
      ckpt.lastTime <<- progress[n, 1]
    }
    v <- f(params, ...)
    progress[n, 2] <- v
    assign(name, progress, pos=frame)
    v
  }
}