Message-ID: <F60BD9DD-4EB0-4B93-9CDC-ACB4E8EFECBA@xs4all.nl>
Date: 2012-11-22T16:03:10Z
From: Berend Hasselman
Subject: Data Extraction
In-Reply-To: <A9F29369832FC5489130D24565FAA6B90143C6C86245@PL-EMSMB3.ees.hhs.gov>
On 22-11-2012, at 16:50, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> Hi Berend,
>
> You have compared all 3 ways. ... very nicely evaluated.
>
Bert's solution is indeed nice and simple. But Petr's solution is still the quickest:
>N <- 100000
> set.seed(13)
> df <- data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
> library(rbenchmark)
>
> f1 <- function(df) {df[apply(df, 1, function(x)all(!is.na(x))),]}
> f2 <- function(df) {df[!is.na(rowSums(df)),]}
> f3 <- function(df) {df[complete.cases(df),]}
> f4 <- function(df) {data.frame(na.omit(df))}
> benchmark(d1 <- f1(df), d2 <- f2(df), d3 <- f3(df), d4 <- f4(df), columns=c("test","elapsed", "relative", "replications"))
test elapsed relative replications
1 d1 <- f1(df) 3.588 14.888 100
2 d2 <- f2(df) 0.403 1.672 100
3 d3 <- f3(df) 0.241 1.000 100
4 d4 <- f4(df) 0.557 2.311 100
>
> identical(d1,d2)
[1] TRUE
> identical(d1,d3)
[1] TRUE
> identical(d1,d4)
[1] TRUE
Berend