Skip to content

recode: how to avoid nested ifelse

6 messages · Joshua Wiley, Neal Fultz, Paul Johnson

#
Hi Paul,

Unless you have truly offended the data generating oracle*, the
pattern: NA, 1, NA, should be a data entry error --- graduating HS
implies graduating ES, no?  I would argue fringe cases like that
should be corrected in the data, not through coding work arounds.
Then you can just do:

x <- do.call(paste0, list(es, hs, cg))
none   es   hs   cg
   4    1    1    2

Cheers,

Josh

*Drawn from comments by Judea Pearl one lively session.
On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:

  
    
#
I would do this to get the highest non-missing level:

x <- pmax(3*cg, 2*hs, es, 0, na.rm=TRUE)

rock chalk...

-nfultz
On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote:
#
I still argue for na.rm=FALSE, but that is cute, also substantially faster

f1 <- function(x1, x2, x3) do.call(paste0, list(x1, x2, x3))
f2 <- function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE)
f3 <- function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3))
f4 <- function(x1, x2, x3) rowSums(cbind(x1, x2, x3))

es <- rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000)
hs <- rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000)
cg <- rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000)

system.time(replicate(1000, f1(cg, hs, es)))
system.time(replicate(1000, f2(cg, hs, es)))
system.time(replicate(1000, f3(cg, hs, es)))
system.time(replicate(1000, f4(cg, hs, es)))
user  system elapsed
  22.73    0.03   22.76
user  system elapsed
   0.92    0.04    0.95
user  system elapsed
   0.19    0.02    0.20
 > system.time(replicate(1000, f4(cg, hs, es)))
   user  system elapsed
   0.95    0.03    0.98


R version 3.0.0 (2013-04-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz <nfultz at gmail.com> wrote:

  
    
#
rowSums and Reduce will have the same problems with bad data you alluded to earlier, eg
cg = 1, hs = 0

But that's something to check for with crosstabs anyway.


Side note: you should check out the microbenchmark pkg, it's quite handy.


R>require(microbenchmark)
R>microbenchmark(
+   f1(cg,hs,es),
+   f2(cg,hs,es),
+   f3(cg,hs,es),
+   f4(cg,hs,es)
+ )
Unit: microseconds
           expr       min         lq     median         uq       max neval
 f1(cg, hs, es) 23029.848 25279.9660 27024.9640 29996.6810 55444.112   100
 f2(cg, hs, es)   730.665   755.5750   811.7445   934.3320  6179.798   100
 f3(cg, hs, es)    85.029   101.6785   129.8605   196.2835  2820.187   100
 f4(cg, hs, es)   762.232   804.4850   843.7170  1079.0800 24869.548   100
On Fri, Jun 07, 2013 at 08:03:26PM -0700, Joshua Wiley wrote:
1 day later