aggregate slow with variables of type 'dates' - how to solve
On 4/15/05, Christoph Lehmann <christoph.lehmann at gmx.ch> wrote:
Dear all
I use aggregate with variables of type numeric and dates. For type numeric
functions, such as sum() are very fast, but similar simple functions, such
as min() are much slower for the variables of type 'dates'. The difference
gets bigger the larger the 'id' var is - but see this sample code:
dts <- dates(c("02/27/92", "02/27/92", "01/14/92",
"02/28/92", "02/01/92"))
ntimes <- 700000
dts <- data.frame(rep(c(1:40), ntimes/8),
chron(rep(dts, ntimes), format = c(dates = "m/d/y")),
rep(c(0.123, 0.245, 0.423, 0.634, 0.256), ntimes))
names(dts) <- c("id", "date", "tbs")
date()
dat.1st <- aggregate(dts$date, list(id = dts$id), min)$x
dat.1st <- chron(dat.1st, format = c(dates = "m/d/y"))
dat.1st
date() #82 seconds
date()
tbs.s <- aggregate(as.numeric(dts$tbs),list(id = dts$id), sum)
tbs.s
date() #17 seconds
--- is it a problem of data-type 'dates' ? if yes, is there any solution
to solve this, since for huge data-sets, this can be a problem...
as I mentioned, e.g. if we have for variable 'id' eg just 5 levels, the
two times are roughly the same, but with the 40 different ids, we have
this big difference
thanks a lot
Christoph
--
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html