Prev 314183 / 398506 Next

aggregate / collapse big data frame efficiently

jim holtman

Tue, Dec 25, 2012 10:20 AM

According to the way that you have used 'aggregate', you are taking
the column means.  Couple of suggestions for faster processing:


1. use matrices instead of data.frames ( i converted your example just
before using it)
2, use the 'colMeans'

I created a 120 x 100000 matrix with 10 levels and its does the
computation in less than 2 seconds:

int [1:120, 1:100000] 111 13 106 61 16 39 25 94 53 38 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:100000] "col1" "col2" "col3" "col4" ...

+ # split the indices of rows for each level
+ x <- split(seq(nrow(df)), df$levels)
+ result <- sapply(x, function(a) colMeans(df.m[a, ]))
+ })
   user  system elapsed
   1.33    0.00    1.35

num [1:100000, 1:10] 57 57 57 57 57 57 57 57 57 57 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:100000] "col1" "col2" "col3" "col4" ...
  ..$ : chr [1:10] "1" "2" "3" "4" ...

On Tue, Dec 25, 2012 at 11:34 AM, Martin Batholdy

<batholdy at googlemail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

Thread (7 messages)

Martin Batholdy aggregate / collapse big data frame efficiently Dec 25 Jeff Newmiller aggregate / collapse big data frame efficiently Dec 25 arun aggregate / collapse big data frame efficiently Dec 25 Patrick Burns aggregate / collapse big data frame efficiently Dec 25 jim holtman aggregate / collapse big data frame efficiently Dec 25 arun aggregate / collapse big data frame efficiently Dec 25 Wensui Liu aggregate / collapse big data frame efficiently Dec 25