Hi,
I use the code below to aggregate / cnt my test data. It works fine,
but the problem is with my real data (33'000 rows) where the function
is really slow (nothing happened in half an hour).
Does anybody know of other functions that I could use?
Thanks,
Hans-Peter
--------------
dat <- data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656,
32656, 32656, 32672, 32672, 32699 ),
FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953,
89953, 64395, 62896, 62870 ),
Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) )
f <- function(x) data.frame( Datum = x[1,1], FischerID = x[1,2],
Anzahl = sum( x[,3] ), Cnt = dim( x )[1] )
t.a <- do.call("rbind", by(dat, dat[,1:2], f)) # slow for 33'000 rows
t.a <- t.a[order( t.a[,1], t.a[,2] ),]
# show data
dat
t.a
aggregate slow with many rows - alternative?
3 messages · Hans-Peter, Gabor Grothendieck, Frank E Harrell Jr
Convert dat to a matrix and see if working with the matrix instead of a data frame speeds things up enough.
On 10/13/05, Hans-Peter <gchappi at gmail.com> wrote:
Hi,
I use the code below to aggregate / cnt my test data. It works fine,
but the problem is with my real data (33'000 rows) where the function
is really slow (nothing happened in half an hour).
Does anybody know of other functions that I could use?
Thanks,
Hans-Peter
--------------
dat <- data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656,
32656, 32656, 32672, 32672, 32699 ),
FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953,
89953, 64395, 62896, 62870 ),
Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) )
f <- function(x) data.frame( Datum = x[1,1], FischerID = x[1,2],
Anzahl = sum( x[,3] ), Cnt = dim( x )[1] )
t.a <- do.call("rbind", by(dat, dat[,1:2], f)) # slow for 33'000 rows
t.a <- t.a[order( t.a[,1], t.a[,2] ),]
# show data
dat
t.a
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Gabor Grothendieck wrote:
Convert dat to a matrix and see if working with the matrix instead of a data frame speeds things up enough.
In the Hmisc package the asNumericMatrix and matrix2dataFrame functions facilite this. Also look at the summarize and mApply functions in Hmisc, which can be quite fast. Frank Harrell