Counting number of rows with two criteria in dataframe

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110125/d37a2786/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110125/03dd0cc1/attachment.pl>
Hi Ryan,
One option would be

X$a <- paste(X$x, X$y, sep=".")
table(X$a)

Best,
Ista
Hi R-users,

I'm trying to find an elegant way to count the number of rows in a dataframe
with a unique combination of 2 values in the dataframe. My data is
specifically one column with a year, one with a month, and one with a day.
I'm trying to count the number of days in each year/month combination. But
for simplicity's sake, the following dataset will do:

x<-c(1,1,1,1,2,2,2,2,3,3,3,3)
y<-c(1,1,2,2,3,3,4,4,5,5,6,6)
z<-c(1,2,3,4,5,6,7,8,9,10,11,12)
X<-data.frame(x y z)

So with dataset X, how would I count the number of z values (3rd column in
X) with unique combinations of the first two columns (x and y)? (for
instance, in the above example, there are 2 instances per unique combination
of the first two columns). I can do this in Matlab and it's easy, but since
I'm new to R this is royally stumping me.

Thanks,
Ryan

--
Ryan Utz
Postdoctoral research scholar
University of California, Santa Barbara
(724) 272 7769

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

Hi R-users,

I'm trying to find an elegant way to count the number of rows in a  
dataframe
with a unique combination of 2 values in the dataframe. My data is
specifically one column with a year, one with a month, and one with  
a day.
I'm trying to count the number of days in each year/month  
combination. But
for simplicity's sake, the following dataset will do:

x<-c(1,1,1,1,2,2,2,2,3,3,3,3)
y<-c(1,1,2,2,3,3,4,4,5,5,6,6)
z<-c(1,2,3,4,5,6,7,8,9,10,11,12)
X<-data.frame(x y z)

So with dataset X, how would I count the number of z values (3rd  
column in
X) with unique combinations of the first two columns (x and y)? (for
instance, in the above example, there are 2 instances per unique  
combination
of the first two columns). I can do this in Matlab and it's easy,  
but since
I'm new to R this is royally stumping me.
> tapply(X$z, list(X$x, X$y), function(xx) length(unique(xx)) )
    1  2  3  4  5  6
1  2  2 NA NA NA NA
2 NA NA  2  2 NA NA
3 NA NA NA NA  2  2
Thanks,
Ryan

-- 
Ryan Utz
Postdoctoral research scholar
University of California, Santa Barbara
(724) 272 7769

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110125/62103033/attachment.pl>
Note that a key is not actually required, so it's even simpler syntax :

dX = as.data.table(X)
dX[,length(unique(z)),by="x,y"]
     x y V1
[1,] 1 1  2
[2,] 1 2  2
[3,] 2 3  2
[4,] 2 4  2
[5,] 3 5  2
[6,] 3 6  2

or passing list() syntax to the 'by' is exactly the same :

dX[,length(unique(z)),by=list(x,y)]

The advantage of using the list() form is you can group by expressions
of columns, for example if x was a date column :

dX[,length(unique(z)),by=list(month(x),y)]

Matthew

"Dennis Murphy" <djmuser at gmail.com> wrote in message 
news:AANLkTi=8TYSrRfzfm01m7fpzydh-cLS-J-cMbkAkjXxf at mail.gmail.com...
Hi:

Here are two more candidates, using the plyr and data.table packages:

library(plyr)
ddply(X, .(x, y), function(d) length(unique(d$z)))
 x y V1
1 1 1  2
2 1 2  2
3 2 3  2
4 2 4  2
5 3 5  2
6 3 6  2

The function counts the number of unique z values in each sub-data frame
with the same x and y values. The argument d in the anonymous function is 
a
data frame object.

# data.table version:

library(data.table)
dX <- data.table(X, key = 'x, y')
dX[, list(nz = length(unique(z))), by = 'x, y']
    x y nz
[1,] 1 1  2
[2,] 1 2  2
[3,] 2 3  2
[4,] 2 4  2
[5,] 3 5  2
[6,] 3 6  2

The key columns sort the data by x, y combinations and then find nz in 
each
data subset.

If you intend to do a lot of summarization/data manipulation in R, these
packages are worth learning.

HTH,
Dennis

On Tue, Jan 25, 2011 at 11:25 AM, Ryan Utz <utz.ryan at gmail.com> wrote:

Hi R-users,

I'm trying to find an elegant way to count the number of rows in a
dataframe
with a unique combination of 2 values in the dataframe. My data is
specifically one column with a year, one with a month, and one with a 
day.
I'm trying to count the number of days in each year/month combination. 
But
for simplicity's sake, the following dataset will do:

x<-c(1,1,1,1,2,2,2,2,3,3,3,3)
y<-c(1,1,2,2,3,3,4,4,5,5,6,6)
z<-c(1,2,3,4,5,6,7,8,9,10,11,12)
X<-data.frame(x y z)

So with dataset X, how would I count the number of z values (3rd column 
in
X) with unique combinations of the first two columns (x and y)? (for
instance, in the above example, there are 2 instances per unique
combination
of the first two columns). I can do this in Matlab and it's easy, but 
since
I'm new to R this is royally stumping me.

Thanks,
Ryan

--
Ryan Utz
Postdoctoral research scholar
University of California, Santa Barbara
(724) 272 7769

       [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

Hi:

Here are two more candidates, using the plyr and data.table packages:

library(plyr)
ddply(X, .(x, y), function(d) length(unique(d$z)))
?x y V1
1 1 1 ?2
2 1 2 ?2
3 2 3 ?2
4 2 4 ?2
5 3 5 ?2
6 3 6 ?2

The function counts the number of unique z values in each sub-data frame
with the same x and y values. The argument d in the anonymous function is a
data frame object.
Another approach is to use the much faster count function:

count(unique(X))

Hadley
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110126/468c37e7/attachment.pl>