Skip to content
Prev 248533 / 398506 Next

identifying when one element of a row has a positive number

Hi,

This problem seemed deceptively simple to me.  After chasing a
considerable number of dead ends, I came up with fg().  It lacks the
elegance of Dennis' solution, but (particularly for large datasets),
it is substantially faster.  I still feel like I'm missing something,
but....

###############################################
## Data
df1 <- data.frame(x = seq(1860,1950,by=10),
  y = seq(-290,-200,by=10), ANN = c(3,0,0,0,1,0,1,1,0,0),
  CTA = c(0,1,0,0,0,0,1,0,0,2), GLM = c(0,0,2,0,0,0,0,1,0,0))
## larger test dataset
dftest <- do.call("rbind", rep(list(df1), 100))


f <- function(x) ifelse(sum(x > 0) == 1L, names(which(x > 0)), NA)
g <- function(x) ifelse(sum(x > 0) == 2L, names(which(x == 0L)), NA)

fg <- function(dat) {
  cnames <- colnames(dat)
  dat <- dat > 0; z <- rowSums(dat)
  z1 <- z == 1L; z2 <- z == 2L; rm(z)
  output <- matrix(NA, nrow = nrow(dat), ncol = 2)
  output[z1, 1] <- apply(dat[z1, ], 1, function(x) cnames[x])
  output[z2, 2] <- apply(dat[z2, ], 1, function(x) cnames[!x])
  return(output)
}

## Compare times on larger dataset
system.time(cbind(apply(dftest[, 3:5], 1, f),
  apply(dftest[, 3:5], 1, g)))
system.time(fg(dftest[, 3:5]))

## compare times under repetitions
system.time(for (i in 1:100) cbind(apply(df1[, 3:5], 1, f),
  apply(df1[, 3:5], 1, g)))
system.time(for (i in 1:100) fg(df1[, 3:5]))
###############################################

Josh
On Thu, Jan 27, 2011 at 12:36 AM, Dennis Murphy <djmuser at gmail.com> wrote: