Skip to content

Suggestion: Adding quick rowMin and rowMax functions to base package

2 messages · Sebastian Kranz, Henrik Bengtsson

#
Hi,

I wonder whether similarly to the very quick rowSums and colSums 
functions in the base package, one could add quick functions that 
calculate the min or max over rows / cols in a matrix. While 
apply(x,1,min) works, I found out by profiling a program of mine that it 
is rather slow for matrices with a very large number of rows. A quick 
functionality seems to be already there in the functions pmax and pmin, 
but it is rather cumbersume to apply them to all columns of a matrix (if 
one does not know how many columns the matrix has).  Below, I have some 
code that shows a very unelegant implementation that illustrates 
possible speed gains if apply could be avoided:

rowMin = function(x) {
    # Construct a call pmin(x[,1],x[,2],...x[,NCOL(x)])
     code = paste("x[,",1:(NCOL(x)),"]",sep="",collapse=",")
     code = paste("pmin(",code,")")
     return(eval(parse(text=code)))
}

# Speed comparison: Taking rowMin of a 1,000,000 x 10 matrix
x = matrix(rnorm(1e7),1e6,10)

# The traditional apply method
y=apply(x,1,min) # Runtime ca. 12 seconds

# My unelegant rowMin function
z=rowMin(x) # Runtime ca 0.5 seconds

Of course, the way the function rowMin is constructed is highly 
ineffective if the matrix x has many columns, but maybe there is a 
simple way to adapt the code from pmin and pmax to create quick rowMin, 
rowMax,... functions. I don't know whether it is worth the effort, but I 
guess taking minima and maxima over rows is a common task.

Best wishes,
Sebastian
#
See rowMins(), rowMaxs() and rowRanges() in matrixStats (on CRAN).

The matrixStats package was created for the purpose of providing such
row*/col*() methods.  First the functionality is provided, then the
methods are optimized for speed and memory, e.g. vectorizing,
implementing in native code, and utilizing other fast existing
functions.  Some methods have already been optimized this way.  When
mature, these may be suggested to be part of the default R
distribution.

Benchmarking reports, and contributions of code and redundancy are
welcome.  Testing the code under many different conditions is
critical, e.g. missing values or not, infinite values or not, zero,
one or many columns/rows, ...

/Henrik

PS. The rowMaxs() etc does not utilize pmax(); didn't know of it.
On Mon, Mar 29, 2010 at 9:34 PM, Sebastian Kranz <skranz at uni-bonn.de> wrote: