Skip to content

caching frequently used values

8 messages · Tamas K Papp, Robert Gentleman, Seth Falcon +2 more

#
Hi,

I am trying to find an elegant way to compute and store some
frequently used matrices "on demand".  The Matrix package already uses
something like this for storing decompositions, but I don't know how
to do it.

The actual context is the following:

A list has information about a basis of a B-spline space (nodes,
order) and gridpoints at which the basis functions would be evaluated
(not necessarily the nodes).  Something like this:

bsplinegrid <- list(nodes=1:8,order=4,grid=seq(2,5,by=.2))

I need the design matrix (computed by splineDesign) for various
derivatives (not necessarily known in advance), to be calculated by
the function

bsplinematrix <- function(bsplinegrid, deriv=0) {
  x <- bsplinegrid$grid
  Matrix(splineDesign(bslinegrid$knots, x, ord=basis$order,
                      derivs = rep(deriv,length(x))))
}

However, I don't want to call splineDesign all the time.  A smart way
would be storing the calculated matrices in a list inside bsplinegrid.
Pseudocode would look like this:

bsplinematrix <- function(bsplinegrid, deriv=0) {
  if (is.null(bsplinegrid$matrices[[deriv+1]])) {
    ## compute the matrix and put it in the list bsplinegrid$matrices,
    ## but not of the local copy
  }
  bsplinegrid$matrices[[deriv+1]]
}

My problem is that I don't know how to modify bsplinegrid$matrices
outside the function -- assignment inside would only modify the local
copy.

Any help would be appreciated -- I wanted to learn how Matrix does it,
but don't know how to display the source with s3 methods (getAnywhere
doesn't work).

Tamas
#
the idea you are considering is also, at times, referred to as 
memoizing. I would not use a list, but rather an environment, and 
basically you implement something that first looks to see if there is a 
value, and if not, compute and store. It can speed things up a lot in 
some examples (and slow them down a lot in others).

Wikipedia amongst other sources:
  http://en.wikipedia.org/wiki/Memoization

Environments have advantages over lists here (if there are lots of 
matrices the lookup can be faster - make sure you use hash=TRUE), and 
reference semantics, which you probably want.
Tamas K Papp wrote:

  
    
#
Hi Robert,

Thanks for your answer.  I would create and environment with
new.env(), but how can I assign and retrieve values based on a
numerical index (the derivative)?  The example of the help page of
assign explicitly shows that assign("a[1]") does not work for this
purpose.

Thanks,

Tamas
On Wed, Dec 13, 2006 at 01:54:28PM -0800, Robert Gentleman wrote:

            
#
e1 = new.env(hash=TRUE)

e1[["1"]] = whateveryouwant

ie. just transform to characters, but I don't see why you want to do 
that - surely there are more informative names to be used -
Tamas K Papp wrote:

  
    
#
On Wed, Dec 13, 2006 at 03:05:46PM -0800, Robert Gentleman wrote:

            
Because they are derivatives, and best indexed by numbers.  I wrote an
example to demonstrate what I think the solution is (for memoizing
powers of numbers).  It works, but I am not an experienced programmer:
can you please look at it to check that I do things right and do not
abuse any feature of R?

## memoize powers of integers

createpowerlist <- function(n) {
  list(n=n,env=new.env(hash=TRUE))
}

getpower <- function(powerlist,i) {
  iname <- as.character(i)
  if (exists(iname,powerlist$env))
    get(iname,powerlist$env)
  else {
    res <- i^powerlist$n                # result
    assign(iname,res,powerlist$env)
    res
  }
}

cubelist <- createpowerlist(3)
exists("12",cubelist$env)               # FALSE
getpower(cubelist,12)                   # 1728
exists("12",cubelist$env)               # TRUE


Thanks,

Tamas
#
Tamas K Papp <tpapp at princeton.edu> writes:
I would do:
  new.env(hash=TRUE, parent=emptyenv())
That's fine, but you could do:

  get(iname, powerlist$env) ==> powerlist$env[[iname]]

  assing(iname, res, powerlist$env) ==> powerlist$env[[iname]] <- res

Perhaps that's easier to read since is familiar to those who have used
lists?
#
I use the R.oo Object class for what has been suggested previously.
The Object class can be thought of as utility wrapper class for
environments (actually environments gained much of its behavior some
time ago when "$" etc was being mapped to get() calls).

For caching to file, take a look at the R.cache package.  From ?loadCache:

 simulate <- function(mean, sd) {
       # 1. Try to load cached data, if already generated
       key <- list(mean, sd)
       data <- loadCache(key)
       if (!is.null(data)) {
         cat("Loaded cached data\n")
         return(data);
       }

       # 2. If not available, generate it.
       cat("Generating data from scratch...")
       data <- rnorm(1000, mean=mean, sd=sd)
       Sys.sleep(1)             # Emulate slow algorithm
       cat("ok\n")
       saveCache(data, key=key, comment="simulate()")

       data;
     }

     data <- simulate(2.3, 3.0)
     data <- simulate(2.3, 3.5)
     data <- simulate(2.3, 3.0) # Will load cached data
hexadecimal hashcode (using the digest package), which is used as a
filename in the .Rcache/ directory.  Thus, you don't have to worry
about coming up with unique cache names; the chances for a false hit
using the digest algorithms are very small. As long as the 'key'
object is not too large, this procedure is very fast.

Then, combining both in-memory and on-file caching you can start doing
clever this.  In my already too long wish list, I have some ideas on a
special class for in-memory/on-file caching which can be used
globally. For instance, when we run out of memory, in-memory cached
values can be stored to file or discarded and so on.  Now probably
someone says this is just about reinventing the swap of already well
developed OSes; I would say we can do much better than a generic swap
since we know what the data is and how long it takes to recalculate it
etc.

Hope this helps

Henrik



Combining the k
On 12/14/06, Tamas K Papp <tpapp at princeton.edu> wrote:
2 days later
#
See:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/83547.html
On 12/13/06, Tamas K Papp <tpapp at princeton.edu> wrote: