Skip to content

(no subject)

2 messages · Nick Manginelli, Dennis Murphy

#
Hi:

To quote one of the sages of this list: 'Loops? We don't need no
steenking loops!!'.

Here's one way to do what you were asking with a two-pass approach.
Generate some random data, use the sample() function to get 20 indices which
are then used to generate NAs in the original vector. Then replace the missing
values by the preceding values (with an ifelse() statement to handle the first
position case) and then replace the remaining NAs with the vector's mean.

# Generate 100 random Poisson(10) values
x <- rpois(100, 10)
# Get the indices to set to NA
midx <- sample(length(x), 20)
# Replace x[midx] with NA
x[midx] <- NA
# If first value of x is NA, keep NA, else replace missing value
# by previous value
x[midx] <- x[ifelse(midx == 1L, NA, midx - 1)]
# Replace remaining NAs with the vector's mean
x[is.na(x)] <- mean(x, na.rm = TRUE)

To do all of this at once, wrap it up into a function and then
use the raply() function in plyr or the replicate() function in base R to
run it and put the result into a 1000 x 100 matrix:

hdimp <- function() {
  x <- rpois(100, 10)
  midx <- sample(length(x), 20)
  x[midx] <- NA
  x[midx] <- x[ifelse(midx == 1L, NA, midx - 1)]
  x[is.na(x)] <- mean(x, na.rm = TRUE)
  x
 }

library(plyr)
u <- raply(1000, hdimp)

An alternative is to use the replicate() function:

v <- t(replicate(1000, hdimp()))

The latter approach is about 20% faster in my tests.

HTH,
Dennis
On Fri, May 6, 2011 at 2:32 PM, Nick Manginelli <themang99 at yahoo.com> wrote: