Skip to content
Prev 226161 / 398500 Next

anyone know why package "RandomForest" na.roughfix is so slow??

Here's another version that's a bit easier to read:

na.roughfix2 <- function (object, ...) {
  res <- lapply(object, roughfix)
  structure(res, class = "data.frame", row.names = seq_len(nrow(object)))
}

roughfix <- function(x) {
  missing <- is.na(x)
  if (!any(missing)) return(x)

  if (is.numeric(x)) {
    x[missing] <- median.default(x[!missing])
  } else if (is.factor(x)) {
    freq <- table(x)
    x[missing] <- names(freq)[which.max(freq)]
  } else {
    stop("na.roughfix only works for numeric or factor")
  }
  x
}

I'm cheating a bit because as.data.frame is so slow.

Hadley
On Thu, Jul 1, 2010 at 6:44 PM, Mike Williamson <this.is.mvw at gmail.com> wrote: