Skip to content
Prev 255746 / 398506 Next

How to speed up grouping time series, help please

I did some tests on Your and Gabor solutions, below my findings:

- Your solution is fast as my solution in xts (below) but MUCH MORE
READABLE, in particular I think your test should take into account xts
creation from the data.frame (see below);
- Gabor's solution with read.zoo is fast as xts but give an xts object
that has some problems with time zones.

Any better idea to speed up grouping time series?

Thanks!

Below few line of codes to test (I suggest to grow X size to get
better comparison results):
xtsSplit <- function(x)
{
  x <- xts(x[,c("ID","VALUE")], as.POSIXct(x[,"DATE"]))
  x <- do.call(merge, split(x$VALUE,x$ID))
  return(x)
}

xtsSplitTime <- replicate(100,
  system.time(xtsSplit(X))[[1]])
median(xtsTime)

zooReadTime <- replicate(100,
 system.time(z <- read.zoo(X, split = 1, index = 2, tz = ""))[[1]])
median(zooReadTime)

And my (unreadable) solution:
library(xts)
buildXtsFromDataFrame <- function(x, env)
{
  {
    if(exists("xx", envir = env))
    {
      VALUE <- as.matrix(x$VALUE)
      colnames(VALUE) <- as.character(x$ID[1])
      assign("xx",
        cbind(get("xx", env), xts(VALUE,
          as.POSIXct(x$DATE, tz = "GMT",
            format = '%Y-%m-%d %H:%M:%S'),
          tzone = 'GMT')),
        envir = env)
    } else
    {
      VALUE <- as.matrix(x$VALUE)
      colnames(VALUE) <- as.character(x$ID[1])
      assign("xx",
        xts(VALUE, as.POSIXct(x$DATE, tz = "GMT",
            format = '%Y-%m-%d %H:%M:%S'),
          tzone = 'GMT'),
        envir = env)
    }
    return(TRUE)
  }
}

xtsDaply <- function(...)
{
  e1 <- new.env(parent = baseenv())
  res <- daply(X, "ID", buildXtsFromDataFrame,
      env = e1)
  return(get("xx", e1))
}

Time04 <- replicate(100,
  system.time(xtsDaply(X, X$ID))[[1]])




2011/4/4 Joshua Ulrich <josh.m.ulrich at gmail.com>: