Skip to content
Prev 120 / 15274 Next

How can I do this better? (Filling in last traded price for NA)

Hi Ajay,

You will probably get other suggestions
along the following lines,
which use 'rle' and 'rep' to speed things up.

fillIn2 <- function(x)
{ bef <- x # keep a copy for display purposes only.
  xRle <- rle(is.na(x))
  # get indices where each NA seq starts (low) and stops (upp)
  upp <- (sumX <- cumsum(xRle$lengths))[xRle$values]
  low <- sumX[which(xRle$values)-1]+1
  # special case: NA at start _only_ i.e. c(NA, ..., NA, notNa, ..., notNA)
  if (length(low) == 0) return(cbind(before = x , after = x))
  # special case: NA at start and else where 
  if (length(upp) == length(low)+1) upp <- upp[-1]
  # Critical bit is 'rep' on RHS. 
  # On LHS, dont replace NAs at the start, if any.
  ind <- low[1]-1
  x[ind + which(is.na(x[-seq(ind)]))] <- x[rep(low-1, upp-low+1)]
  cbind(before = bef , after = x) # show off before and after effect
}
set.seed(123)
x <- 1:10
x[sample(length(x), floor(length(x)/2))] <- NA
fillIn2(x)

should produce
before after
 [1,]      1     1
 [2,]      2     2
 [3,]     NA     2
 [4,]     NA     2
 [5,]      5     5
 [6,]     NA     5
 [7,]     NA     5
 [8,]     NA     5
 [9,]      9     9
[10,]     10    10

The code seems clunky and has special cases
so it is probably not optimal.

However, it is faster than, say, using 'mapply'

fillIn <- function(x)
{ bef <- x
  xRle <- rle(is.na(x))
  upp <- cumsum(xRle$lengths)[xRle$values]
  low <- cumsum(xRle$lengths)[which(xRle$values)-1]+1
  if (length(upp) == length(low)+1) upp <- upp[-1]
  mapply(function(l, u) x[l:u] <<- x[l-1], low, upp)
  cbind(before = bef , after = x) # show off before and after effect
}
fillIn(x)

Some simulations to compare times,
based on vectors of varying lengths with 50% of elements set to NA

simFillIn <- function(n, method = c("rep", "mapply"))
{ aa <- rpois(n, 5)
  aa[sample(seq(n), floor(n * .5))] <- NA
  method = match.arg(method)
  ansTime <- system.time(ans <- 
    switch(method,
      mapply = fillIn(aa),
      rep = fillIn2(aa), 
      stop("wrong method")
  )) # switch system.time
  list(time = ansTime) # ans = ans, 
}
ans <- lapply(c(2e4, 1e4, 1e3, 1e2, 1e1), simFillIn, method = "mapply")
lapply(ans, "[[", "time")
ans <- lapply(c(2e4, 1e4, 1e3, 1e2, 1e1), simFillIn, method = "rep")
lapply(ans, "[[", "time")

simFillIn (with 'mapply') seems at least 10 times slower
than simFillIn2 (with 'rep').

Regards,

John.

John Gavin <john.gavin at ubs.com>,
Quantitative Risk Models and Statistics,
UBS Investment Bank, 6th floor, 
100 Liverpool St., London EC2M 2RH, UK.
Phone +44 (0) 207 567 4289
Fax   +44 (0) 207 568 5352
Ajay Shah wrote:

            
Visit our website at http://www.ubs.com

This message contains confidential information and is intend...{{dropped}}