Skip to content
Prev 199943 / 398502 Next

partial cumsum

William Dunlap wrote:
Nice simple function!  Here's a different approach I use that's faster for long vectors with many NA values.  Note however, that this approach can suffer from catastrophic round-off error because it does a cumsum over the whole vector (after replacing NA's with zeros) and then subtracting out the cumsum at the most recent NA values.  Most of the body of this function is devoted to allowing (an unreasonable degree of) flexibility in specification of where to reset.

cumsum.reset <- function(x, reset.at=which(is.na(x)), na.rm=F) {
    # compute the cumsum of x, resetting the cumsum to 0 at each element indexed by reset.at
    if (is.logical(reset.at)) {
        if (length(reset.at)>length(x)) {
            if ((length(reset.at) %% length(x))!=0)
                stop("length of reset.at must be a multiple of length of x")
            x <- rep(x, len=length(reset.at))
        } else if (length(reset.at)<length(x)) {
            if ((length(x) %% length(reset.at))!=0)
                stop("length of x must be a multiple of length of reset.at")
            reset.at <- rep(reset.at, len=length(x))
        }
        reset.at <- which(reset.at)
    } else if (!is.numeric(reset.at)) {
        stop("reset.at must be logical or numeric")
    }
    if (length(i <- which(reset.at<=1)))
        reset.at <- reset.at[-i]
    if (any(i <- is.na(x[reset.at])))
        x[reset.at[i]] <- 0
    if (na.rm && any(i <- which(is.na(x))))
        x[i] <- 0
    y <- cumsum(x)
    d <- diff(c(0, y[reset.at-1]))
    r <- rep(0, length(x))
    r[reset.at] <- d
    return(y - cumsum(r))
}
[1]  1  3  6  0  5 11 18 26 35 45
[1]  1  3  6 NA  5 11 18 26 35 45
The speedup from not breaking the input vector into smaller vectors is (to me) surprisingly small -- only a factor of 3:
[1] TRUE
user  system elapsed 
   0.31    0.03    0.35
user  system elapsed 
   0.99    0.05    1.15
So, I'd go with the ave() approach unless this type of cumsum is the core of a long computationally intensive job.  And if that's the case, it would make sense to code it in C.

-- Tony Plate