Skip to content
Prev 380553 / 398500 Next

How to create a new column based on the values from multiple columns which are matching a particular string?

While Eric's solution is correct( mod "corner" cases like all NA's in
a row), it can be made considerably more efficient.

One minor improvement can be made by using the idiom
any(x == "A")
instead of matching via %in% for the simple case of matching just a
single value.

However, a considerable improvement can be made by getting fancy,
taking advantage of do.call() and the pmax() function to mostly
vectorize the calculation. Here are the details and timing on a large
data frame.

(Note: I removed the names in the %in% approach for simplicity. It has
almost no effect on timings.
I also moved the as.integer() call out of the function so that it is
called only once at the end, which improves efficiency a bit)

1. Eric's original:
fun1 <-function(df,what)
{
  as.integer(unname(apply(df,MARGIN = 1,function(v) { what %in% v })))
}

2. Using any( x == "A") instead:
fun2 <- function(df,what)
{
   as.integer(unname(apply(df,MARGIN =1, function(x)any(x == what,
na.rm=TRUE))))
}

3. Getting fancy to use pmax()
fun3 <- function(df,what)
{
   z <- lapply(df,function(x)as.integer((x==what)))
   do.call(pmax,c(z,na.rm=TRUE))
}

Here are the timings:
[1] 100000    250
user  system elapsed
  2.204   0.432   2.637
user  system elapsed
  1.898   0.403   2.302
user  system elapsed
  0.187   0.048   0.235

## 10 times faster!
[1] TRUE
[1] TRUE


NB: I freely admit that Eric's original solution may well be perfectly
adequate, and the speed improvement is pointless. In that case, maybe
this is at least somewhat instructive for someone.

Nevertheless, I would welcome further suggestions for improvement, as
I suspect my "fancy" approach is still a ways from what one can do (in
R code, without resorting to C++).

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Jul 29, 2019 at 12:38 PM Eric Berger <ericjberger at gmail.com> wrote: