Back to formatted view
Raw Message

Message-ID: <F60BD9DD-4EB0-4B93-9CDC-ACB4E8EFECBA@xs4all.nl>
Date: 2012-11-22T16:03:10Z
From: Berend Hasselman
Subject: Data Extraction
In-Reply-To: <A9F29369832FC5489130D24565FAA6B90143C6C86245@PL-EMSMB3.ees.hhs.gov>

On 22-11-2012, at 16:50, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hi Berend,
> 
> You have compared all 3 ways.  ... very nicely evaluated. 
> 

Bert's solution is indeed nice and simple. But Petr's solution is still the quickest:

>N <- 100000
> set.seed(13)
> df <- data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
> library(rbenchmark)
>
> f1 <- function(df) {df[apply(df, 1, function(x)all(!is.na(x))),]}
> f2 <- function(df) {df[!is.na(rowSums(df)),]}
> f3 <- function(df) {df[complete.cases(df),]}
> f4 <- function(df) {data.frame(na.omit(df))}
> benchmark(d1 <- f1(df), d2 <- f2(df), d3 <- f3(df), d4 <- f4(df), columns=c("test","elapsed", "relative", "replications"))
          test elapsed relative replications
1 d1 <- f1(df)   3.588   14.888          100
2 d2 <- f2(df)   0.403    1.672          100
3 d3 <- f3(df)   0.241    1.000          100
4 d4 <- f4(df)   0.557    2.311          100
>
> identical(d1,d2)
[1] TRUE
> identical(d1,d3)
[1] TRUE
> identical(d1,d4)
[1] TRUE

Berend