Skip to content

Running median

2 messages · David Brahm, Martin Maechler

#
I have a Date x Stock (223 x 520) matrix of "trading volume".  I can calculate
a 5-day (past) average in about 1 second using: 

R> apply(vol, 1, filter, filter=c(0, rep(1/5,5)), sides=1)

I would like to do the same with a 5-day median, e.g.:

R> mymed <- function(x, n=5) {
R>   r <- rep(NA, length(x))
R>   for (i in (n+1):length(x)) r[i] <- median(x[i-(1:n)])
R>   return(r)
R> }
R> apply(vol, 1, mymed)

only faster (the above takes 65 seconds).  Is there already a function (or some
C code) to do this?  Any clever way to vectorize it?

smooth() in package "eda" with kind="3" calculates a running median of 3
values, so I may start with the code in library/eda/src/smooth.c, but it
doesn't generalize easily to N values.  Also, decmedian() in package "pastecs"
may be relevant, but doesn't seem any faster than my naive code.

Thanks!
#
DavidB> I have a Date x Stock (223 x 520) matrix of "trading
    DavidB> volume".  I can calculate a 5-day (past) average in
    DavidB> about 1 second using:

    R> apply(vol, 1, filter, filter=c(0, rep(1/5,5)), sides=1)

    DavidB> I would like to do the same with a 5-day median,
    DavidB> e.g.:


   R> mymed <- function(x, n=5) {
   R>   r <- rep(NA, length(x))
   R>   for (i in (n+1):length(x)) r[i] <- median(x[i-(1:n)])
   R>   return(r)
   R> }

   R> apply(vol, 1, mymed)

    DavidB> only faster (the above takes 65 seconds).  Is there
    DavidB> already a function (or some C code) to do this?  Any
    DavidB> clever way to vectorize it?

    DavidB> smooth() in package "eda" with kind="3" calculates a
    DavidB> running median of 3 values, so I may start with the
    DavidB> code in library/eda/src/smooth.c, but it doesn't
    DavidB> generalize easily to N values.  Also, decmedian() in
    DavidB> package "pastecs" may be relevant, but doesn't seem
    DavidB> any faster than my naive code.

(yes).

On the "R Developer page", there are TODO lists of several R core members.  
Mine (http://developer.R-project.org/TODO-MM.html) contains an item

  MM> Running Medians for library(modreg) : 
  MM>   We need a fast robust smoother; lowess() is not robust.  
  MM> 	(a non-release of a package "runmed" is available; 
  MM> 	 need to write a short paper on what I found ....)

There two big steps making it much faster :
1) move to compiled code
2) use a smarter algorithm -- I have two versions :
   a. There is a first idea of making it fast which dates back to
      Friedman & Stuetzle's work on projection pursuit in the 80s.
      
   b. Computing `running median of k' for larger `k' :
      To make this really fast (asymptotically optimal), 
      Haerdle & Steiner published a paper on how to do this
      and Berwin Turlach has programmed it (or just translated
      their program).

Yes, as I say above, this should eventually become part of the
"modreg" package.