An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110125/d37a2786/attachment.pl>
Counting number of rows with two criteria in dataframe
8 messages · Henrique Dallazuanna, Ista Zahn, David Winsemius +4 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110125/03dd0cc1/attachment.pl>
Hi Ryan, One option would be X$a <- paste(X$x, X$y, sep=".") table(X$a) Best, Ista
On Tue, Jan 25, 2011 at 2:25 PM, Ryan Utz <utz.ryan at gmail.com> wrote:
Hi R-users, I'm trying to find an elegant way to count the number of rows in a dataframe with a unique combination of 2 values in the dataframe. My data is specifically one column with a year, one with a month, and one with a day. I'm trying to count the number of days in each year/month combination. But for simplicity's sake, the following dataset will do: x<-c(1,1,1,1,2,2,2,2,3,3,3,3) y<-c(1,1,2,2,3,3,4,4,5,5,6,6) z<-c(1,2,3,4,5,6,7,8,9,10,11,12) X<-data.frame(x y z) So with dataset X, how would I count the number of z values (3rd column in X) with unique combinations of the first two columns (x and y)? (for instance, in the above example, there are 2 instances per unique combination of the first two columns). I can do this in Matlab and it's easy, but since I'm new to R this is royally stumping me. Thanks, Ryan -- Ryan Utz Postdoctoral research scholar University of California, Santa Barbara (724) 272 7769 ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
On Jan 25, 2011, at 2:25 PM, Ryan Utz wrote:
Hi R-users, I'm trying to find an elegant way to count the number of rows in a dataframe with a unique combination of 2 values in the dataframe. My data is specifically one column with a year, one with a month, and one with a day. I'm trying to count the number of days in each year/month combination. But for simplicity's sake, the following dataset will do: x<-c(1,1,1,1,2,2,2,2,3,3,3,3) y<-c(1,1,2,2,3,3,4,4,5,5,6,6) z<-c(1,2,3,4,5,6,7,8,9,10,11,12) X<-data.frame(x y z) So with dataset X, how would I count the number of z values (3rd column in X) with unique combinations of the first two columns (x and y)? (for instance, in the above example, there are 2 instances per unique combination of the first two columns). I can do this in Matlab and it's easy, but since I'm new to R this is royally stumping me.
> tapply(X$z, list(X$x, X$y), function(xx) length(unique(xx)) )
1 2 3 4 5 6
1 2 2 NA NA NA NA
2 NA NA 2 2 NA NA
3 NA NA NA NA 2 2
Thanks, Ryan -- Ryan Utz Postdoctoral research scholar University of California, Santa Barbara (724) 272 7769 [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110125/62103033/attachment.pl>
Note that a key is not actually required, so it's even simpler syntax :
dX = as.data.table(X)
dX[,length(unique(z)),by="x,y"]
x y V1
[1,] 1 1 2
[2,] 1 2 2
[3,] 2 3 2
[4,] 2 4 2
[5,] 3 5 2
[6,] 3 6 2
or passing list() syntax to the 'by' is exactly the same :
dX[,length(unique(z)),by=list(x,y)]
The advantage of using the list() form is you can group by expressions
of columns, for example if x was a date column :
dX[,length(unique(z)),by=list(month(x),y)]
Matthew
"Dennis Murphy" <djmuser at gmail.com> wrote in message
news:AANLkTi=8TYSrRfzfm01m7fpzydh-cLS-J-cMbkAkjXxf at mail.gmail.com...
Hi:
Here are two more candidates, using the plyr and data.table packages:
library(plyr)
ddply(X, .(x, y), function(d) length(unique(d$z)))
x y V1
1 1 1 2
2 1 2 2
3 2 3 2
4 2 4 2
5 3 5 2
6 3 6 2
The function counts the number of unique z values in each sub-data frame
with the same x and y values. The argument d in the anonymous function is
a
data frame object.
# data.table version:
library(data.table)
dX <- data.table(X, key = 'x, y')
dX[, list(nz = length(unique(z))), by = 'x, y']
x y nz
[1,] 1 1 2
[2,] 1 2 2
[3,] 2 3 2
[4,] 2 4 2
[5,] 3 5 2
[6,] 3 6 2
The key columns sort the data by x, y combinations and then find nz in
each
data subset.
If you intend to do a lot of summarization/data manipulation in R, these
packages are worth learning.
HTH,
Dennis
On Tue, Jan 25, 2011 at 11:25 AM, Ryan Utz <utz.ryan at gmail.com> wrote:
Hi R-users,
I'm trying to find an elegant way to count the number of rows in a
dataframe
with a unique combination of 2 values in the dataframe. My data is
specifically one column with a year, one with a month, and one with a
day.
I'm trying to count the number of days in each year/month combination.
But
for simplicity's sake, the following dataset will do:
x<-c(1,1,1,1,2,2,2,2,3,3,3,3)
y<-c(1,1,2,2,3,3,4,4,5,5,6,6)
z<-c(1,2,3,4,5,6,7,8,9,10,11,12)
X<-data.frame(x y z)
So with dataset X, how would I count the number of z values (3rd column
in
X) with unique combinations of the first two columns (x and y)? (for
instance, in the above example, there are 2 instances per unique
combination
of the first two columns). I can do this in Matlab and it's easy, but
since
I'm new to R this is royally stumping me.
Thanks,
Ryan
--
Ryan Utz
Postdoctoral research scholar
University of California, Santa Barbara
(724) 272 7769
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
On Wed, Jan 26, 2011 at 5:27 AM, Dennis Murphy <djmuser at gmail.com> wrote:
Hi: Here are two more candidates, using the plyr and data.table packages: library(plyr) ddply(X, .(x, y), function(d) length(unique(d$z))) ?x y V1 1 1 1 ?2 2 1 2 ?2 3 2 3 ?2 4 2 4 ?2 5 3 5 ?2 6 3 6 ?2 The function counts the number of unique z values in each sub-data frame with the same x and y values. The argument d in the anonymous function is a data frame object.
Another approach is to use the much faster count function: count(unique(X)) Hadley
Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110126/468c37e7/attachment.pl>