Data Extraction
On 22-11-2012, at 16:50, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
Hi Berend, You have compared all 3 ways. ... very nicely evaluated.
Bert's solution is indeed nice and simple. But Petr's solution is still the quickest:
N <- 100000
set.seed(13)
df <- data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
library(rbenchmark)
f1 <- function(df) {df[apply(df, 1, function(x)all(!is.na(x))),]}
f2 <- function(df) {df[!is.na(rowSums(df)),]}
f3 <- function(df) {df[complete.cases(df),]}
f4 <- function(df) {data.frame(na.omit(df))}
benchmark(d1 <- f1(df), d2 <- f2(df), d3 <- f3(df), d4 <- f4(df), columns=c("test","elapsed", "relative", "replications"))
test elapsed relative replications 1 d1 <- f1(df) 3.588 14.888 100 2 d2 <- f2(df) 0.403 1.672 100 3 d3 <- f3(df) 0.241 1.000 100 4 d4 <- f4(df) 0.557 2.311 100
identical(d1,d2)
[1] TRUE
identical(d1,d3)
[1] TRUE
identical(d1,d4)
[1] TRUE Berend