Hi again,
Let say, I have following data frame:
Dat <- structure(list(A1 = structure(c(3L, 3L, 1L, 3L, 3L, 3L, 3L, 2L,
3L, 3L, 1L, 2L, 3L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L,
3L, 2L, 3L, 1L, 1L, 3L), .Label = c("a", "b", "c"), class = "factor"),
A2 = c(2, 3, 2, 1, 3, 3, 2, 2, 3, 1, 3, 1, 3, 3, 2, 2, 1,
2, 1, 2, 1, 3, 3, 2, 1, 2, 3, 2, 2, 2), C1 = 1:30), .Names = c("A1",
"A2", "C1"), row.names = c(NA, -30L), class = "data.frame")
Now my goal is :
1: Find all possible unique combinations of column 'A1' & column 'A2'.
For example A1 = c, A2 = 2 is 1 unique combination.
2. For each such unique combination, calculate sum for 'A3'.
Is there any direct R function to achieve this faster way? I have very
large data-frame to handle with such calculation.
I tried with spilt() function. However it looks to me that, it can
split a data-frame w.r.t. only one column.
Thanks for your suggestion
Working with data-frame
4 messages · Christofer Bogaso, Bert Gunter, William Dunlap +1 more
Christopher: If I understand correctly, see ?ave, or ?tapply, depending on what form you want to be returned. The trick is to first paste the columns together on whose unique you want to split to form a singtl factor. e.g. of the form lapply(split(yourcolumn,paste0(...)),FUN= sum) However, the above functions already have this "built" in, so you don't need to do this explicitly, although my impression is that it may be a tad more efficient to do it the long way. But don't quote me on this! Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Sun, Nov 9, 2014 at 11:58 AM, Christofer Bogaso
<bogaso.christofer at gmail.com> wrote:
Hi again,
Let say, I have following data frame:
Dat <- structure(list(A1 = structure(c(3L, 3L, 1L, 3L, 3L, 3L, 3L, 2L,
3L, 3L, 1L, 2L, 3L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L,
3L, 2L, 3L, 1L, 1L, 3L), .Label = c("a", "b", "c"), class = "factor"),
A2 = c(2, 3, 2, 1, 3, 3, 2, 2, 3, 1, 3, 1, 3, 3, 2, 2, 1,
2, 1, 2, 1, 3, 3, 2, 1, 2, 3, 2, 2, 2), C1 = 1:30), .Names = c("A1",
"A2", "C1"), row.names = c(NA, -30L), class = "data.frame")
Now my goal is :
1: Find all possible unique combinations of column 'A1' & column 'A2'.
For example A1 = c, A2 = 2 is 1 unique combination.
2. For each such unique combination, calculate sum for 'A3'.
Is there any direct R function to achieve this faster way? I have very
large data-frame to handle with such calculation.
I tried with spilt() function. However it looks to me that, it can
split a data-frame w.r.t. only one column.
Thanks for your suggestion
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I tried with spilt() function. However it looks to me that, it can split a data-frame w.r.t. only one column.
(I assume you you meant 'split', not 'spilt'.)
You did not show what you tried, but the following splits Dat by its "A1"
and "A2" columns (creating a list of data.frames):
split(Dat, f=Dat[,c("A1","A2")])
aggregate(), in core R, combine the split and the lapply needed to
calculate groupwise sums. E.g.,
aggregate(Dat$C1, by=Dat[,c("A1","A2")], FUN=sum)
aggregate(C1 ~ A1 + A2, data=Dat, FUN=sum)
The plyr and dplyr packages have other ways to do this sort of thing.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Sun, Nov 9, 2014 at 11:58 AM, Christofer Bogaso <
bogaso.christofer at gmail.com> wrote:
Hi again,
Let say, I have following data frame:
Dat <- structure(list(A1 = structure(c(3L, 3L, 1L, 3L, 3L, 3L, 3L, 2L,
3L, 3L, 1L, 2L, 3L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L,
3L, 2L, 3L, 1L, 1L, 3L), .Label = c("a", "b", "c"), class = "factor"),
A2 = c(2, 3, 2, 1, 3, 3, 2, 2, 3, 1, 3, 1, 3, 3, 2, 2, 1,
2, 1, 2, 1, 3, 3, 2, 1, 2, 3, 2, 2, 2), C1 = 1:30), .Names = c("A1",
"A2", "C1"), row.names = c(NA, -30L), class = "data.frame")
Now my goal is :
1: Find all possible unique combinations of column 'A1' & column 'A2'.
For example A1 = c, A2 = 2 is 1 unique combination.
2. For each such unique combination, calculate sum for 'A3'.
Is there any direct R function to achieve this faster way? I have very
large data-frame to handle with such calculation.
I tried with spilt() function. However it looks to me that, it can
split a data-frame w.r.t. only one column.
Thanks for your suggestion
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
... so...
#1 ... flexible syntax for split-apply-combine, not very efficient for large data
library(plyr)
ddply(Dat,c("A1", "A2"), function(DF){data.frame(C1=sum(DF$C1))})
#2 ... compatible with large data on disk
library(sqldf)
sqldf("select A1,A2,sum(C1) as C1 from Dat group by A1, A2")
#3 ... better for large data in memory
library(data.table)
dtt <- data.table(Dat)
#speed for large data
setkeyv(dtt,c("A1", "A2"))
dtt[,list(C1=sum(C1)),by=list(A1,A2)]
#4 ... package still under development, but potentially can support operations on data stored in memory or relational databases
library(dplyr)
Dat %>% group_by(A1,A2) %>% summarise( C1=sum( C1 ) )
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On November 9, 2014 1:39:45 PM PST, William Dunlap <wdunlap at tibco.com> wrote:
I tried with spilt() function. However it looks to me that, it can split a data-frame w.r.t. only one column.
(I assume you you meant 'split', not 'spilt'.)
You did not show what you tried, but the following splits Dat by its
"A1"
and "A2" columns (creating a list of data.frames):
split(Dat, f=Dat[,c("A1","A2")])
aggregate(), in core R, combine the split and the lapply needed to
calculate groupwise sums. E.g.,
aggregate(Dat$C1, by=Dat[,c("A1","A2")], FUN=sum)
aggregate(C1 ~ A1 + A2, data=Dat, FUN=sum)
The plyr and dplyr packages have other ways to do this sort of thing.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Sun, Nov 9, 2014 at 11:58 AM, Christofer Bogaso <
bogaso.christofer at gmail.com> wrote:
Hi again, Let say, I have following data frame: Dat <- structure(list(A1 = structure(c(3L, 3L, 1L, 3L, 3L, 3L, 3L,
2L,
3L, 3L, 1L, 2L, 3L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L,
3L, 2L, 3L, 1L, 1L, 3L), .Label = c("a", "b", "c"), class =
"factor"),
A2 = c(2, 3, 2, 1, 3, 3, 2, 2, 3, 1, 3, 1, 3, 3, 2, 2, 1,
2, 1, 2, 1, 3, 3, 2, 1, 2, 3, 2, 2, 2), C1 = 1:30), .Names =
c("A1",
"A2", "C1"), row.names = c(NA, -30L), class = "data.frame") Now my goal is : 1: Find all possible unique combinations of column 'A1' & column
'A2'.
For example A1 = c, A2 = 2 is 1 unique combination. 2. For each such unique combination, calculate sum for 'A3'. Is there any direct R function to achieve this faster way? I have
very
large data-frame to handle with such calculation. I tried with spilt() function. However it looks to me that, it can split a data-frame w.r.t. only one column. Thanks for your suggestion
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.