tapply huge speed difference if X has names
Please use a current version of R!
This was fixed long ago, and you will find it in the NEWS file:
split() now handles vectors with names internally and so is
almost as fast as on vectors without names (and maybe 100x
faster than before).
On Mon, 8 Aug 2005, Matthew Dowle wrote:
Hi all, Apologies if this has been raised before ... R's tapply is very fast, but if X has names in this example, there seems to be a huge slow down: under 1 second compared to 151 seconds. The following timings are repeatable and are timed properly on a single user machine :
X = 1:100000 names(X) = X system.time(fast<<-tapply(as.vector(X), rep(1:10000,each=10), mean)) #
as.vector() to drop the names [1] 0.36 0.00 0.35 0.00 0.00
system.time(slow<<-tapply(X, rep(1:10000,each=10), mean))
[1] 149.95 1.83 151.79 0.00 0.00
head(fast)
1 2 3 4 5 6 5.5 15.5 25.5 35.5 45.5 55.5
head(slow)
1 2 3 4 5 6 5.5 15.5 25.5 35.5 45.5 55.5
identical(fast,slow)
[1] TRUE
Looking inside tapply, which then calls split, it seems there is an is.null(names(x)) which prevents R's internal fast version from being called. Why is that there? Could it be removed? I often do something like tapply(mat[,"colname"],...) where mat has rownames. Therefore the rownames of mat become the names of the vector mat[,"colname"], and this seems to slow down tapply a lot. Perhaps other functions which call split also suffer this problem?
split.default
function (x, f)
{
if (is.list(f))
f <- interaction(f)
f <- factor(f)
if (is.null(attr(x, "class")) && is.null(names(x)))
return(.Internal(split(x, f)))
lf <- levels(f)
y <- vector("list", length(lf))
names(y) <- lf
for (k in lf) y[[k]] <- x[f %in% k]
y
}
<environment: namespace:base>
version
_ platform x86_64-redhat-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 0.1 year 2004 month 11 day 15 language R
Thanks and regards, Matthew [[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595