Timings of function execution in R [was Re: R in Industry]
Hi Profs. Ripley and Bates, I also recollect from a Tom Lumley email that when he profiled an MCMC computation, he found that pmin/pmax was the bottleneck. That is why he suggested the function that I called fast.pmax. I think that it would be nice to have restricted alternative functions dealing exclusively with numeric mode. Best, Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvaradhan at jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ---------------------------------------------------------------------------- -------- -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Douglas Bates Sent: Friday, February 09, 2007 10:05 AM To: Prof Brian Ripley Cc: R-Help Subject: Re: [R] Timings of function execution in R [was Re: R in Industry]
On 2/9/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
The other reason why pmin/pmax are preferable to your functions is that they are fully generic. It is not easy to write C code which takes into account that <, [, [<- and is.na are all generic. That is not to say that it is not worth having faster restricted alternatives, as indeed we do with rep.int and seq.int. Anything that uses arithmetic is making strong assumptions about the inputs. It ought to be possible to write a fast C version that worked for atomic vectors (logical, integer, real and character), but is there any evidence of profiled real problems where speed is an issue?
Yes. I don't have the profiled timings available now and one would need to go back to earlier versions of R to reproduce them but I did encounter a situation where the bottleneck in a practical computation was pmin/pmax. The binomial and poisson families for generalized linear models used pmin and pmax to avoid boundary conditions when evaluating the inverse link and other functions. When I profiled the execution of some generalized linear model and, more importantly for me, generalized linear mixed model fits, these calls to pmin and pmax were the bottleneck. That is why I moved some of the calculations for the binomial and poisson families in the stats package to compiled code. In that case I didn't rewrite the general form of pmin and pmax, I replaced specific calls in the compiled code.
On Fri, 9 Feb 2007, Martin Maechler wrote:
"Ravi" == Ravi Varadhan <rvaradhan at jhmi.edu>
on Thu, 8 Feb 2007 18:41:38 -0500 writes:
Ravi> Hi, Ravi> "greaterOf" is indeed an interesting function. It is much
faster than the
Ravi> equivalent R function, "pmax", because pmax does a lot of
checking for
Ravi> missing data and for recycling. Tom Lumley suggested a simple
function to
Ravi> replace pmax, without these checks, that is analogous to
greaterOf, which I
Ravi> call fast.pmax.
Ravi> fast.pmax <- function(x,y) {i<- x<y; x[i]<-y[i]; x}
Ravi> Interestingly, greaterOf is even faster than fast.pmax,
although you have to
Ravi> be dealing with very large vectors (O(10^6)) to see any real
difference.
Yes. Indeed, I have a file, first version dated from 1992 where I explore the "slowness" of pmin() and pmax() (in S-plus 3.2 then). I had since added quite a few experiments and versions to
that
file in the past. As consequence, in the robustbase CRAN package (which is only a bit more than a year old though), there's a file, available as https://svn.r-project.org/R-packages/robustbase/R/Auxiliaries.R with the very simple content {note line 3 !}:
-------------------------------------------------------------------------
### Fast versions of pmin() and pmax() for 2 arguments only: ### FIXME: should rather add these to R pmin2 <- function(k,x) (x+k - abs(x-k))/2 pmax2 <- function(k,x) (x+k + abs(x-k))/2
-------------------------------------------------------------------------
{the "funny" argument name 'k' comes from the use of these to
compute Huber's psi() fast :
psiHuber <- function(x,k) pmin2(k, pmax2(- k, x))
curve(psiHuber(x, 1.35), -3,3, asp = 1)
}
One point *is* that I think proper function names would be pmin2() and
pmax2() since they work with exactly 2 arguments,
whereas IIRC the feature to work with '...' is exactly the
reason that pmax() and pmin() are so much slower.
I've haven't checked if Gabor's
pmax2.G <- function(x,y) {z <- x > y; z * (x-y) + y}
is even faster than the abs() using one.
It may have the advantage of giving *identical* results (to the
last bit!) to pmax() which my version does not --- IIRC the
only reason I did not follow my own 'FIXME' above.
I had then planned to implement pmin2() and pmax2() in C code,
trivially,
and and hence get identical (to the last bit!) behavior as pmin()/pmax(); but I now tend to think that the proper approach is to code pmin() and pmax() via .Internal() and hence C code ... [Not before DSC and my vacations though!!] Martin Maechler, ETH Zurich
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.