Hi, I have a dataset in which I would like to select rows based on matching conditions and return the maximum value of a variable else return one row if duplicate counts exist. My dataset looks like this: PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53121 2009 2 0 6755 53121 2009 3 0 6755 53122 2008 1 0 6755 53122 2008 2 0 6755 53122 2008 3 1 6755 53122 2009 1 0 6755 53122 2009 2 1 6755 53122 2009 3 2 I would like to select rows if PTID and Year match and return the maximum count else return one row if counts are the same, such that I get this output PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53122 2008 3 1 6755 53122 2009 3 2 I tried the following code and the output is almost correct but duplicate values were included df2<-with(df, sapply(split(df, list(PTID, Year)), function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),])) df<-do.call(rbind,df) rownames(df)<-1:nrow(df) Any suggestions? Thanks much for your responses! -- View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-on-matching-conditions-and-logical-operators-tp4637809.html Sent from the R help mailing list archive at Nabble.com.
Select rows based on matching conditions and logical operators
9 messages · Rui Barradas, kborgmann, arun +2 more
Hello,
Apart from the output order this does it.
(I have changed 'df' to 'df1', 'df' is an R function, the F distribution
density.)
df1 <- read.table(text="
PGID PTID Year Visit Count
6755 53121 2009 1 0
6755 53121 2009 2 0
6755 53121 2009 3 0
6755 53122 2008 1 0
6755 53122 2008 2 0
6755 53122 2008 3 1
6755 53122 2009 1 0
6755 53122 2009 2 1
6755 53122 2009 3 2", header=TRUE)
df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
function(x) if (nrow(x)) x[which.max(x$Count), ]))
df2 <- do.call(rbind, df2)
rownames(df2) <- 1:nrow(df2)
df2
which.max(9, not which().
Hope this helps,
Rui Barradas
Em 25-07-2012 18:10, kborgmann escreveu:
Hi, I have a dataset in which I would like to select rows based on matching conditions and return the maximum value of a variable else return one row if duplicate counts exist. My dataset looks like this: PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53121 2009 2 0 6755 53121 2009 3 0 6755 53122 2008 1 0 6755 53122 2008 2 0 6755 53122 2008 3 1 6755 53122 2009 1 0 6755 53122 2009 2 1 6755 53122 2009 3 2 I would like to select rows if PTID and Year match and return the maximum count else return one row if counts are the same, such that I get this output PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53122 2008 3 1 6755 53122 2009 3 2 I tried the following code and the output is almost correct but duplicate values were included df2<-with(df, sapply(split(df, list(PTID, Year)), function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),])) df<-do.call(rbind,df) rownames(df)<-1:nrow(df) Any suggestions? Thanks much for your responses! -- View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-on-matching-conditions-and-logical-operators-tp4637809.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks! which.max did the trick -- View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-on-matching-conditions-and-logical-operators-tp4637809p4637816.html Sent from the R help mailing list archive at Nabble.com.
Hi, Try this: dat1<-read.table(text=" PGID??? PTID??? Year??? Visit? Count 6755??? 53121??? 2009??? 1??? 0 6755??? 53121??? 2009??? 2??? 0 6755??? 53121??? 2009??? 3??? 0 6755??? 53122??? 2008??? 1??? 0 6755??? 53122??? 2008??? 2??? 0 6755??? 53122??? 2008??? 3??? 1 6755??? 53122??? 2009??? 1??? 0 6755??? 53122??? 2009??? 2??? 1 6755??? 53122??? 2009??? 3??? 2 ",sep="",header=TRUE) dat2<-lapply(split(dat1,dat1$Count),function(x) x[which.max(x$Count),]) ?do.call(rbind,dat2) ? PGID? PTID Year Visit Count 0 6755 53121 2009???? 1???? 0 1 6755 53122 2008???? 3???? 1 2 6755 53122 2009???? 3???? 2 A.K. ----- Original Message ----- From: kborgmann <borgmann at email.arizona.edu> To: r-help at r-project.org Cc: Sent: Wednesday, July 25, 2012 1:10 PM Subject: [R] Select rows based on matching conditions and logical operators Hi, I have a dataset in which I would like to select rows based on matching conditions and return the maximum value of a variable else return one row if duplicate counts exist.? My dataset looks like this: PGID??? PTID??? Year??? Visit? Count 6755??? 53121??? 2009??? 1??? 0 6755??? 53121??? 2009??? 2??? 0 6755??? 53121??? 2009??? 3??? 0 6755??? 53122??? 2008??? 1??? 0 6755??? 53122??? 2008??? 2??? 0 6755??? 53122??? 2008??? 3??? 1 6755??? 53122??? 2009??? 1??? 0 6755??? 53122??? 2009??? 2??? 1 6755??? 53122??? 2009??? 3??? 2 I would like to select rows if PTID and Year match and return the maximum count else return one row if counts are the same, such that I get this output PGID??? PTID??? Year??? Visit? Count 6755??? 53121??? 2009??? 1??? 0 6755??? 53122??? 2008??? 3??? 1 6755??? 53122??? 2009??? 3??? 2 I tried the following code and the output is almost correct but duplicate values were included df2<-with(df, sapply(split(df, list(PTID, Year)), function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),])) df<-do.call(rbind,df) rownames(df)<-1:nrow(df) Any suggestions? Thanks much for your responses! -- View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-on-matching-conditions-and-logical-operators-tp4637809.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Rui,
Your solution works, but it can be faster for large data.frames if you compute
the indices of the desired rows of the input data.frame and then using one
subscripting call to select the rows instead of splitting the input data.frame
into a list of data.frames, extracting the desired row from each component,
and then calling rbind to put the rows together again. E.g., compare your
approach, which I've put into the function f1
f1 <- function (dataFrame) {
retval <- with(dataFrame, sapply(split(dataFrame, list(PTID,
Year)), function(x) if (nrow(x))
x[which.max(x$Count), ]))
retval <- do.call(rbind, retval)
rownames(retval) <- 1:nrow(retval)
retval
}
with one that computes a logical subscripting vector (by splitting just the
Counts vector, not the whole data.frame)
f2 <- function (dataFrame) {
keep <- as.logical(ave(dataFrame$Count, droplevels(interaction(dataFrame$PTID,
dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x) ==
which.max(x)))
dataFrame[keep, ]
}
The both compute the same thing, aside from the fact that the rows
are in a different order (f2 keeps the order of the original data.frame)
and f2 leaves the original row label with the row.
f1(df1)
PGID PTID Year Visit Count 1 6755 53122 2008 3 1 2 6755 53121 2009 1 0 3 6755 53122 2009 3 2
f2(df1)
PGID PTID Year Visit Count 1 6755 53121 2009 1 0 6 6755 53122 2008 3 1 9 6755 53122 2009 3 2 When there are a lot of output rows the f2 can be quite a bit faster. (I put the call to droplevels(interaction(...)) into the call to ave because ave can waste a lot of time calling FUN for nonexistent interaction levels.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Rui Barradas
Sent: Wednesday, July 25, 2012 10:24 AM
To: kborgmann
Cc: r-help
Subject: Re: [R] Select rows based on matching conditions and logical operators
Hello,
Apart from the output order this does it.
(I have changed 'df' to 'df1', 'df' is an R function, the F distribution
density.)
df1 <- read.table(text="
PGID PTID Year Visit Count
6755 53121 2009 1 0
6755 53121 2009 2 0
6755 53121 2009 3 0
6755 53122 2008 1 0
6755 53122 2008 2 0
6755 53122 2008 3 1
6755 53122 2009 1 0
6755 53122 2009 2 1
6755 53122 2009 3 2", header=TRUE)
df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
function(x) if (nrow(x)) x[which.max(x$Count), ]))
df2 <- do.call(rbind, df2)
rownames(df2) <- 1:nrow(df2)
df2
which.max(9, not which().
Hope this helps,
Rui Barradas
Em 25-07-2012 18:10, kborgmann escreveu:
Hi, I have a dataset in which I would like to select rows based on matching conditions and return the maximum value of a variable else return one row if duplicate counts exist. My dataset looks like this: PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53121 2009 2 0 6755 53121 2009 3 0 6755 53122 2008 1 0 6755 53122 2008 2 0 6755 53122 2008 3 1 6755 53122 2009 1 0 6755 53122 2009 2 1 6755 53122 2009 3 2 I would like to select rows if PTID and Year match and return the maximum count else return one row if counts are the same, such that I get this output PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53122 2008 3 1 6755 53122 2009 3 2 I tried the following code and the output is almost correct but duplicate values were included df2<-with(df, sapply(split(df, list(PTID, Year)), function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),])) df<-do.call(rbind,df) rownames(df)<-1:nrow(df) Any suggestions? Thanks much for your responses! -- View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-
on-matching-conditions-and-logical-operators-tp4637809.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello, You're right, thanks. In my solution, I had tried to keep to the op as much as possible. A glance at it made me realize that one change only would do the job, and that was it, no performance worries. I particularly liked the interaction/droplevels trick. Rui Barradas Em 25-07-2012 22:13, William Dunlap escreveu:
Rui,
Your solution works, but it can be faster for large data.frames if you compute
the indices of the desired rows of the input data.frame and then using one
subscripting call to select the rows instead of splitting the input data.frame
into a list of data.frames, extracting the desired row from each component,
and then calling rbind to put the rows together again. E.g., compare your
approach, which I've put into the function f1
f1 <- function (dataFrame) {
retval <- with(dataFrame, sapply(split(dataFrame, list(PTID,
Year)), function(x) if (nrow(x))
x[which.max(x$Count), ]))
retval <- do.call(rbind, retval)
rownames(retval) <- 1:nrow(retval)
retval
}
with one that computes a logical subscripting vector (by splitting just the
Counts vector, not the whole data.frame)
f2 <- function (dataFrame) {
keep <- as.logical(ave(dataFrame$Count, droplevels(interaction(dataFrame$PTID,
dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x) ==
which.max(x)))
dataFrame[keep, ]
}
The both compute the same thing, aside from the fact that the rows
are in a different order (f2 keeps the order of the original data.frame)
and f2 leaves the original row label with the row.
f1(df1)
PGID PTID Year Visit Count 1 6755 53122 2008 3 1 2 6755 53121 2009 1 0 3 6755 53122 2009 3 2
f2(df1)
PGID PTID Year Visit Count 1 6755 53121 2009 1 0 6 6755 53122 2008 3 1 9 6755 53122 2009 3 2 When there are a lot of output rows the f2 can be quite a bit faster. (I put the call to droplevels(interaction(...)) into the call to ave because ave can waste a lot of time calling FUN for nonexistent interaction levels.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Rui Barradas
Sent: Wednesday, July 25, 2012 10:24 AM
To: kborgmann
Cc: r-help
Subject: Re: [R] Select rows based on matching conditions and logical operators
Hello,
Apart from the output order this does it.
(I have changed 'df' to 'df1', 'df' is an R function, the F distribution
density.)
df1 <- read.table(text="
PGID PTID Year Visit Count
6755 53121 2009 1 0
6755 53121 2009 2 0
6755 53121 2009 3 0
6755 53122 2008 1 0
6755 53122 2008 2 0
6755 53122 2008 3 1
6755 53122 2009 1 0
6755 53122 2009 2 1
6755 53122 2009 3 2", header=TRUE)
df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
function(x) if (nrow(x)) x[which.max(x$Count), ]))
df2 <- do.call(rbind, df2)
rownames(df2) <- 1:nrow(df2)
df2
which.max(9, not which().
Hope this helps,
Rui Barradas
Em 25-07-2012 18:10, kborgmann escreveu:
Hi, I have a dataset in which I would like to select rows based on matching conditions and return the maximum value of a variable else return one row if duplicate counts exist. My dataset looks like this: PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53121 2009 2 0 6755 53121 2009 3 0 6755 53122 2008 1 0 6755 53122 2008 2 0 6755 53122 2008 3 1 6755 53122 2009 1 0 6755 53122 2009 2 1 6755 53122 2009 3 2 I would like to select rows if PTID and Year match and return the maximum count else return one row if counts are the same, such that I get this output PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53122 2008 3 1 6755 53122 2009 3 2 I tried the following code and the output is almost correct but duplicate values were included df2<-with(df, sapply(split(df, list(PTID, Year)), function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),])) df<-do.call(rbind,df) rownames(df)<-1:nrow(df) Any suggestions? Thanks much for your responses! -- View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-
on-matching-conditions-and-logical-operators-tp4637809.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Wouldn't
interaction(..., drop=TRUE)
be the same, but terser in this situation? Also I tend to use paste() for this, i.e. instead of
interaction(v1,v2, drop=TRUE)
simply
paste(v1,v2)
Again, this seems shorter and simpler -- but are there good reasons to prefer the use of interaction()? Cheers, Bert
On Wed, Jul 25, 2012 at 2:51 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello, You're right, thanks. In my solution, I had tried to keep to the op as much as possible. A glance at it made me realize that one change only would do the job, and that was it, no performance worries. I particularly liked the interaction/droplevels trick. Rui Barradas Em 25-07-2012 22:13, William Dunlap escreveu:
Rui,
Your solution works, but it can be faster for large data.frames if you
compute
the indices of the desired rows of the input data.frame and then using one
subscripting call to select the rows instead of splitting the input
data.frame
into a list of data.frames, extracting the desired row from each
component,
and then calling rbind to put the rows together again. E.g., compare your
approach, which I've put into the function f1
f1 <- function (dataFrame) {
retval <- with(dataFrame, sapply(split(dataFrame, list(PTID,
Year)), function(x) if (nrow(x))
x[which.max(x$Count), ]))
retval <- do.call(rbind, retval)
rownames(retval) <- 1:nrow(retval)
retval
}
with one that computes a logical subscripting vector (by splitting just
the
Counts vector, not the whole data.frame)
f2 <- function (dataFrame) {
keep <- as.logical(ave(dataFrame$Count,
droplevels(interaction(dataFrame$PTID,
dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x)
==
which.max(x)))
dataFrame[keep, ]
}
The both compute the same thing, aside from the fact that the rows
are in a different order (f2 keeps the order of the original data.frame)
and f2 leaves the original row label with the row.
f1(df1)
PGID PTID Year Visit Count 1 6755 53122 2008 3 1 2 6755 53121 2009 1 0 3 6755 53122 2009 3 2
f2(df1)
PGID PTID Year Visit Count 1 6755 53121 2009 1 0 6 6755 53122 2008 3 1 9 6755 53122 2009 3 2 When there are a lot of output rows the f2 can be quite a bit faster. (I put the call to droplevels(interaction(...)) into the call to ave because ave can waste a lot of time calling FUN for nonexistent interaction levels.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Rui Barradas
Sent: Wednesday, July 25, 2012 10:24 AM
To: kborgmann
Cc: r-help
Subject: Re: [R] Select rows based on matching conditions and logical
operators
Hello,
Apart from the output order this does it.
(I have changed 'df' to 'df1', 'df' is an R function, the F distribution
density.)
df1 <- read.table(text="
PGID PTID Year Visit Count
6755 53121 2009 1 0
6755 53121 2009 2 0
6755 53121 2009 3 0
6755 53122 2008 1 0
6755 53122 2008 2 0
6755 53122 2008 3 1
6755 53122 2009 1 0
6755 53122 2009 2 1
6755 53122 2009 3 2", header=TRUE)
df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
function(x) if (nrow(x)) x[which.max(x$Count), ]))
df2 <- do.call(rbind, df2)
rownames(df2) <- 1:nrow(df2)
df2
which.max(9, not which().
Hope this helps,
Rui Barradas
Em 25-07-2012 18:10, kborgmann escreveu:
Hi, I have a dataset in which I would like to select rows based on matching conditions and return the maximum value of a variable else return one row if duplicate counts exist. My dataset looks like this: PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53121 2009 2 0 6755 53121 2009 3 0 6755 53122 2008 1 0 6755 53122 2008 2 0 6755 53122 2008 3 1 6755 53122 2009 1 0 6755 53122 2009 2 1 6755 53122 2009 3 2 I would like to select rows if PTID and Year match and return the maximum count else return one row if counts are the same, such that I get this output PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53122 2008 3 1 6755 53122 2009 3 2 I tried the following code and the output is almost correct but duplicate values were included df2<-with(df, sapply(split(df, list(PTID, Year)), function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),])) df<-do.call(rbind,df) rownames(df)<-1:nrow(df) Any suggestions? Thanks much for your responses! -- View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-
on-matching-conditions-and-logical-operators-tp4637809.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Any of those would work. I wish ave() did that part of the job. I don't think there is any reason it shouldn't. The following only needs to call FUN three times, not 9: > z <- ave(LETTERS[1:3], 1:3, 1:3, FUN=function(x)print(x)) [1] "A" character(0) character(0) character(0) [1] "B" character(0) character(0) character(0) [1] "C" > z [1] "A" "B" "C" Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: Bert Gunter [mailto:gunter.berton at gene.com] Sent: Wednesday, July 25, 2012 3:04 PM To: Rui Barradas Cc: William Dunlap; r-help Subject: Re: [R] Select rows based on matching conditions and logical operators Wouldn't
interaction(..., drop=TRUE)
be the same, but terser in this situation? Also I tend to use paste() for this, i.e. instead of
interaction(v1,v2, drop=TRUE)
simply
paste(v1,v2)
Again, this seems shorter and simpler -- but are there good reasons to prefer the use of interaction()? Cheers, Bert On Wed, Jul 25, 2012 at 2:51 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello, You're right, thanks. In my solution, I had tried to keep to the op as much as possible. A glance at it made me realize that one change only would do the job, and that was it, no performance worries. I particularly liked the interaction/droplevels trick. Rui Barradas Em 25-07-2012 22:13, William Dunlap escreveu:
Rui,
Your solution works, but it can be faster for large data.frames if you
compute
the indices of the desired rows of the input data.frame and then using one
subscripting call to select the rows instead of splitting the input
data.frame
into a list of data.frames, extracting the desired row from each
component,
and then calling rbind to put the rows together again. E.g., compare your
approach, which I've put into the function f1
f1 <- function (dataFrame) {
retval <- with(dataFrame, sapply(split(dataFrame, list(PTID,
Year)), function(x) if (nrow(x))
x[which.max(x$Count), ]))
retval <- do.call(rbind, retval)
rownames(retval) <- 1:nrow(retval)
retval
}
with one that computes a logical subscripting vector (by splitting just
the
Counts vector, not the whole data.frame)
f2 <- function (dataFrame) {
keep <- as.logical(ave(dataFrame$Count,
droplevels(interaction(dataFrame$PTID,
dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x)
==
which.max(x)))
dataFrame[keep, ]
}
The both compute the same thing, aside from the fact that the rows
are in a different order (f2 keeps the order of the original data.frame)
and f2 leaves the original row label with the row.
f1(df1)
PGID PTID Year Visit Count 1 6755 53122 2008 3 1 2 6755 53121 2009 1 0 3 6755 53122 2009 3 2
f2(df1)
PGID PTID Year Visit Count 1 6755 53121 2009 1 0 6 6755 53122 2008 3 1 9 6755 53122 2009 3 2 When there are a lot of output rows the f2 can be quite a bit faster. (I put the call to droplevels(interaction(...)) into the call to ave because ave can waste a lot of time calling FUN for nonexistent interaction levels.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Rui Barradas
Sent: Wednesday, July 25, 2012 10:24 AM
To: kborgmann
Cc: r-help
Subject: Re: [R] Select rows based on matching conditions and logical
operators
Hello,
Apart from the output order this does it.
(I have changed 'df' to 'df1', 'df' is an R function, the F distribution
density.)
df1 <- read.table(text="
PGID PTID Year Visit Count
6755 53121 2009 1 0
6755 53121 2009 2 0
6755 53121 2009 3 0
6755 53122 2008 1 0
6755 53122 2008 2 0
6755 53122 2008 3 1
6755 53122 2009 1 0
6755 53122 2009 2 1
6755 53122 2009 3 2", header=TRUE)
df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
function(x) if (nrow(x)) x[which.max(x$Count), ]))
df2 <- do.call(rbind, df2)
rownames(df2) <- 1:nrow(df2)
df2
which.max(9, not which().
Hope this helps,
Rui Barradas
Em 25-07-2012 18:10, kborgmann escreveu:
Hi, I have a dataset in which I would like to select rows based on matching conditions and return the maximum value of a variable else return one row if duplicate counts exist. My dataset looks like this: PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53121 2009 2 0 6755 53121 2009 3 0 6755 53122 2008 1 0 6755 53122 2008 2 0 6755 53122 2008 3 1 6755 53122 2009 1 0 6755 53122 2009 2 1 6755 53122 2009 3 2 I would like to select rows if PTID and Year match and return the maximum count else return one row if counts are the same, such that I get this output PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53122 2008 3 1 6755 53122 2009 3 2 I tried the following code and the output is almost correct but duplicate values were included df2<-with(df, sapply(split(df, list(PTID, Year)), function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),])) df<-do.call(rbind,df) rownames(df)<-1:nrow(df) Any suggestions? Thanks much for your responses! -- View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-
on-matching-conditions-and-logical-operators-tp4637809.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- biostatistics/pdb-ncb-home.htm
And another way to drop the unneed interaction levels is to supply drop=TRUE to ave(): > z <- ave(LETTERS[1:3], 1:3, 1:3, FUN=function(x)print(x), drop=TRUE) [1] "A" [1] "B" [1] "C" Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap Sent: Wednesday, July 25, 2012 3:37 PM To: Bert Gunter; Rui Barradas Cc: r-help Subject: Re: [R] Select rows based on matching conditions and logical operators Any of those would work. I wish ave() did that part of the job. I don't think there is any reason it shouldn't. The following only needs to call FUN three times, not 9:
> z <- ave(LETTERS[1:3], 1:3, 1:3, FUN=function(x)print(x))
[1] "A" character(0) character(0) character(0) [1] "B" character(0) character(0) character(0) [1] "C"
> z
[1] "A" "B" "C" Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: Bert Gunter [mailto:gunter.berton at gene.com] Sent: Wednesday, July 25, 2012 3:04 PM To: Rui Barradas Cc: William Dunlap; r-help Subject: Re: [R] Select rows based on matching conditions and logical operators Wouldn't
interaction(..., drop=TRUE)
be the same, but terser in this situation? Also I tend to use paste() for this, i.e. instead of
interaction(v1,v2, drop=TRUE)
simply
paste(v1,v2)
Again, this seems shorter and simpler -- but are there good reasons to prefer the use of interaction()? Cheers, Bert On Wed, Jul 25, 2012 at 2:51 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello, You're right, thanks. In my solution, I had tried to keep to the op as much as possible. A glance at it made me realize that one change only would do the job, and that was it, no performance worries. I particularly liked the interaction/droplevels trick. Rui Barradas Em 25-07-2012 22:13, William Dunlap escreveu:
Rui,
Your solution works, but it can be faster for large data.frames if you
compute
the indices of the desired rows of the input data.frame and then using one
subscripting call to select the rows instead of splitting the input
data.frame
into a list of data.frames, extracting the desired row from each
component,
and then calling rbind to put the rows together again. E.g., compare your
approach, which I've put into the function f1
f1 <- function (dataFrame) {
retval <- with(dataFrame, sapply(split(dataFrame, list(PTID,
Year)), function(x) if (nrow(x))
x[which.max(x$Count), ]))
retval <- do.call(rbind, retval)
rownames(retval) <- 1:nrow(retval)
retval
}
with one that computes a logical subscripting vector (by splitting just
the
Counts vector, not the whole data.frame)
f2 <- function (dataFrame) {
keep <- as.logical(ave(dataFrame$Count,
droplevels(interaction(dataFrame$PTID,
dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x)
==
which.max(x)))
dataFrame[keep, ]
}
The both compute the same thing, aside from the fact that the rows
are in a different order (f2 keeps the order of the original data.frame)
and f2 leaves the original row label with the row.
f1(df1)
PGID PTID Year Visit Count 1 6755 53122 2008 3 1 2 6755 53121 2009 1 0 3 6755 53122 2009 3 2
f2(df1)
PGID PTID Year Visit Count 1 6755 53121 2009 1 0 6 6755 53122 2008 3 1 9 6755 53122 2009 3 2 When there are a lot of output rows the f2 can be quite a bit faster. (I put the call to droplevels(interaction(...)) into the call to ave because ave can waste a lot of time calling FUN for nonexistent interaction levels.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Rui Barradas
Sent: Wednesday, July 25, 2012 10:24 AM
To: kborgmann
Cc: r-help
Subject: Re: [R] Select rows based on matching conditions and logical
operators
Hello,
Apart from the output order this does it.
(I have changed 'df' to 'df1', 'df' is an R function, the F distribution
density.)
df1 <- read.table(text="
PGID PTID Year Visit Count
6755 53121 2009 1 0
6755 53121 2009 2 0
6755 53121 2009 3 0
6755 53122 2008 1 0
6755 53122 2008 2 0
6755 53122 2008 3 1
6755 53122 2009 1 0
6755 53122 2009 2 1
6755 53122 2009 3 2", header=TRUE)
df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
function(x) if (nrow(x)) x[which.max(x$Count), ]))
df2 <- do.call(rbind, df2)
rownames(df2) <- 1:nrow(df2)
df2
which.max(9, not which().
Hope this helps,
Rui Barradas
Em 25-07-2012 18:10, kborgmann escreveu:
Hi, I have a dataset in which I would like to select rows based on matching conditions and return the maximum value of a variable else return one row if duplicate counts exist. My dataset looks like this: PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53121 2009 2 0 6755 53121 2009 3 0 6755 53122 2008 1 0 6755 53122 2008 2 0 6755 53122 2008 3 1 6755 53122 2009 1 0 6755 53122 2009 2 1 6755 53122 2009 3 2 I would like to select rows if PTID and Year match and return the maximum count else return one row if counts are the same, such that I get this output PGID PTID Year Visit Count 6755 53121 2009 1 0 6755 53122 2008 3 1 6755 53122 2009 3 2 I tried the following code and the output is almost correct but duplicate values were included df2<-with(df, sapply(split(df, list(PTID, Year)), function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),])) df<-do.call(rbind,df) rownames(df)<-1:nrow(df) Any suggestions? Thanks much for your responses! -- View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-
on-matching-conditions-and-logical-operators-tp4637809.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- biostatistics/pdb-ncb-home.htm
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.