How to create a new column based on the values from multiple columns which are matching a particular string?

While Eric's solution is correct( mod "corner" cases like all NA's in
a row), it can be made considerably more efficient.

One minor improvement can be made by using the idiom
any(x == "A")
instead of matching via %in% for the simple case of matching just a
single value.

However, a considerable improvement can be made by getting fancy,
taking advantage of do.call() and the pmax() function to mostly
vectorize the calculation. Here are the details and timing on a large
data frame.

(Note: I removed the names in the %in% approach for simplicity. It has
almost no effect on timings.
I also moved the as.integer() call out of the function so that it is
called only once at the end, which improves efficiency a bit)

1. Eric's original:
fun1 <-function(df,what)
{
  as.integer(unname(apply(df,MARGIN = 1,function(v) { what %in% v })))
}

2. Using any( x == "A") instead:
fun2 <- function(df,what)
{
   as.integer(unname(apply(df,MARGIN =1, function(x)any(x == what,
na.rm=TRUE))))
}

3. Getting fancy to use pmax()
fun3 <- function(df,what)
{
   z <- lapply(df,function(x)as.integer((x==what)))
   do.call(pmax,c(z,na.rm=TRUE))
}

Here are the timings:

How to create a new column based on the values from multiple columns which are matching a particular string?

Thread (9 messages)