identifying when one element of a row has a positive number
Hi,
This problem seemed deceptively simple to me. After chasing a
considerable number of dead ends, I came up with fg(). It lacks the
elegance of Dennis' solution, but (particularly for large datasets),
it is substantially faster. I still feel like I'm missing something,
but....
###############################################
## Data
df1 <- data.frame(x = seq(1860,1950,by=10),
y = seq(-290,-200,by=10), ANN = c(3,0,0,0,1,0,1,1,0,0),
CTA = c(0,1,0,0,0,0,1,0,0,2), GLM = c(0,0,2,0,0,0,0,1,0,0))
## larger test dataset
dftest <- do.call("rbind", rep(list(df1), 100))
f <- function(x) ifelse(sum(x > 0) == 1L, names(which(x > 0)), NA)
g <- function(x) ifelse(sum(x > 0) == 2L, names(which(x == 0L)), NA)
fg <- function(dat) {
cnames <- colnames(dat)
dat <- dat > 0; z <- rowSums(dat)
z1 <- z == 1L; z2 <- z == 2L; rm(z)
output <- matrix(NA, nrow = nrow(dat), ncol = 2)
output[z1, 1] <- apply(dat[z1, ], 1, function(x) cnames[x])
output[z2, 2] <- apply(dat[z2, ], 1, function(x) cnames[!x])
return(output)
}
## Compare times on larger dataset
system.time(cbind(apply(dftest[, 3:5], 1, f),
apply(dftest[, 3:5], 1, g)))
system.time(fg(dftest[, 3:5]))
## compare times under repetitions
system.time(for (i in 1:100) cbind(apply(df1[, 3:5], 1, f),
apply(df1[, 3:5], 1, g)))
system.time(for (i in 1:100) fg(df1[, 3:5]))
###############################################
Josh
On Thu, Jan 27, 2011 at 12:36 AM, Dennis Murphy <djmuser at gmail.com> wrote:
Hi: Try this: f <- function(x) ifelse(sum(x > 0) == 1L, names(which(x > 0)), NA) g <- function(x) ifelse(sum(x > 0) == 2L, names(which(x == 0L)), NA)
apply(df1[, 3:5], 1, f)
?[1] "ANN" "CTA" "GLM" NA ? ?"ANN" NA ? ?NA ? ?NA ? ?NA ? ?"CTA"
apply(df1[, 3:5], 1, g)
?[1] NA ? ?NA ? ?NA ? ?NA ? ?NA ? ?NA ? ?"GLM" "CTA" NA ? ?NA HTH, Dennis On Wed, Jan 26, 2011 at 9:36 PM, Daisy Englert Duursma < daisy.duursma at gmail.com> wrote:
Hello,
I am not sure where to begin with this problem or what to search for
in r-help. I just don't know what to call this.
If I have 5 columns, the first 2 are the x,y, locations and the last
three are variables about those locations.
x<-seq(1860,1950,by=10)
y<-seq(-290,-200,by=10)
ANN<-c(3,0,0,0,1,0,1,1,0,0)
CTA<-c(0,1,0,0,0,0,1,0,0,2)
GLM<-c(0,0,2,0,0,0,0,1,0,0)
df1<-as.data.frame(cbind(x,y,ANN,CTA,GLM))
What I would like to produce is an additional column that tells when
only 1 of the three variables has a value greater than 0. I would like
this new column to give the name of the variable. Likewise, I would
like a column that tells one only one of the three variables for a
given row has a value of 0. For my example the new columns would be:
one_presence<-c("ANN","CTA","GLM","NA","ANN","NA","NA","NA","NA","CTA")
one_absence<-c("NA","NA","NA","NA","NA","NA","GLM","CTA","NA","NA")
The end result should look like
df2<-(cbind(df1,one_presence,one_absence))
I am sure I can do this with a loop or maybe grep but I am out of ideas.
Any help would be appreciated.
Cheers,
Daisy
--
Daisy Englert Duursma
Room E8C156
Dept. Biological Sciences
Macquarie University ?NSW ?2109
Australia
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/