Skip to content
Back to formatted view

Raw Message

Message-ID: <ce9ee9b7-3e47-405c-b554-1fb01cb7208e@email.android.com>
Date: 2012-12-25T17:00:33Z
From: Jeff Newmiller
Subject: aggregate / collapse big data frame efficiently
In-Reply-To: <E0ADE6E0-F4E8-42D0-84D5-9A1D26D43A09@googlemail.com>

You might consider using the sqldf package.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Martin Batholdy <batholdy at googlemail.com> wrote:

>Hi,
>
>
>I need to aggregate rows of a data.frame by computing the mean for rows
>with the same factor-level on one factor-variable;
>
>here is the sample code:
>
>
>x <- data.frame(rep(letters,2), rnorm(52), rnorm(52), rnorm(52))
>
>aggregate(x, list(x[,1]), mean)
>
>
>Now my problem is, that the actual data-set is much bigger (120 rows
>and approximately 100.000 columns) ? and it takes very very long
>(actually at some point I just stopped it).
>
>Is there anything that can be done to make the aggregate routine more
>efficient?
>Or is there a different approach that would work faster?
>
>
>Thanks for any suggestions!
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.