Skip to content
Prev 324074 / 398503 Next

adding rows without loops

"exact", really?

Is the time-consuming part the initial merge
   DFm <- merge(DF1, DF2, by=c("X.DATE", "X.TIME"), all=TRUE)

or the postprocessing to turn runs of NAs into the last non-NA
value in the column
  while(any(is.na(DFm))){
    if (any(is.na(DFm[1,]))) stop("Complete first row required!")
    ind <- which(is.na(DFm), arr.ind=TRUE)
    prind <- matrix(c(ind[,"row"]-1, ind[,"col"]), ncol=2)
    DFm[is.na(DFm)] <- DFm[prind]
 }

If it is the latter, you may get better results from applying zoo::na.locf()
to each non-key column of DFm.  E.g.,
   library(zoo)
   f2 <- function(DFm) {
      for(i in 3:length(DFm)) {
         DFm[[i]] <- na.locf(DFm[[i]])
      }
      DFm
   }
   f(DFm)
gives the same result as Blaser's algorithm
  f1 <- function (DFm)  {
     while (any(is.na(DFm))) {
         if (any(is.na(DFm[1, ]))) 
             stop("Complete first row required!")
         ind <- which(is.na(DFm), arr.ind = TRUE)
         prind <- matrix(c(ind[, "row"] - 1, ind[, "col"]), ncol = 2)
         DFm[is.na(DFm)] <- DFm[prind]
     }
     DFm
 }

If there are not a huge number of columns I would guess that f2() would be much
faster.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com