Skip to content
Prev 49997 / 63424 Next

Performance issue in stats:::weighted.mean.default method

See weightedMean() in the matrixStats package.  It's optimized for
data type, speed and memory and implemented in native code so it can
avoid some of these intermediate copies.  It's a few times faster than
weighted.mean[.default]();

library(matrixStats)
library(microbenchmark)
n <- 5000
x <- sample(500,n,replace=TRUE)
w <- sample(1000,n,replace=TRUE)/1000 *
ifelse((sample(10,n,replace=TRUE) -1) > 0, 1, 0)
fun.new <- function(x,w) {sum(x*w)/sum(w)}
fun.orig  <- function(x,w) {sum(x*w[w!=0])/sum(w)}
stats <- microbenchmark(
  weightedMean(x,w),
  weighted.mean(x,w),
  ORIGFN = fun.orig(x,w),
  NEWFN  = fun.new(x,w),
  times = 1000
)
Unit: microseconds
                expr   min    lq  mean median    uq    max neval
  weightedMean(x, w)  28.7  31.7  33.4   32.9  33.8   81.7  1000
 weighted.mean(x, w) 129.6 141.6 149.6  143.7 147.1 2332.9  1000
              ORIGFN 205.7 222.0 235.0  225.4 231.4 2655.8  1000
               NEWFN  38.9  42.3  44.3   42.8  43.6  385.8  1000

Relative performance will vary with n = length(x).

The weightedMean() function handles zero-weight Inf values:
[1] 1
[1] NaN
[1] 1

You'll find more benchmark results on weightedMean() vs
weighted.mean() on
https://github.com/HenrikBengtsson/matrixStats/wiki/weightedMean

/Henrik
On Thu, Mar 5, 2015 at 9:49 AM, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote: