Skip to content
Prev 44179 / 63421 Next

Fastest non-overlapping binning mean function out there?

On 10/02/2012 06:11 PM, Herv? Pag?s wrote:
Of course, if you have a lot of bins, using aggregate() is not optimal.
But you can replace it by your own optimized version e.g.:

   ## 'bin' must be a sorted vector of non-negative integers of the
   ## same length as 'x'.
   fast_aggregate_mean <- function(x, bin, nbins)
   {
     bin_count <- tabulate(bin + 1L, nbins=nbins)
     diff(c(0L, cumsum(x)[cumsum(bin_count)])) / bin_count
   }

Then:

   bin <- findInterval(x, bx)
   fast_aggregate_mean(x, bin, nbins=length(bx)+1L)

On my machine this is 100x faster or more than using aggregate() when
the number of bins is > 100k. Memory usage is also reduced a lot.
Another benefit of using fast_aggregate_mean() over aggregate() is
that all the bins are represented in the output (aggregate() ignores
empty bins).

Cheers,
H.