Hi,
I've got a version of rowMedians(x, na.rm=FALSE) for matrices that
handles missing values implemented in C. It has been optimized for
memory and speed. To avoid coercing integers to doubles, and hence
allocate an additional 200% memory, there is one C function for
integers and one for doubles.
The rowMedians() implementation is currently sitting in my non-CRAN
package R.native available by:
source("http://www.braju.com/R/hbLite.R")
hbLite("R.native")
library(R.native)
example(rowMedians)
The source code package is available at:
http://www.braju.com/R/repos/R.native_0.1.2.tar.gz
Before I submit a package to CRAN consisting of pretty much just
rowMedians(), would it make more sense for it to go into one of the
core packages? If so, how should I proceed?
/Henrik
Native implementation of rowMedians()
9 messages · Henrik Bengtsson, Brian Ripley, Martin Maechler +2 more
Hi Henrik,
"HenrikB" == Henrik Bengtsson <hb at stat.berkeley.edu>
on Sun, 13 May 2007 21:14:24 -0700 writes:
HenrikB> Hi,
HenrikB> I've got a version of rowMedians(x, na.rm=FALSE) for matrices that
HenrikB> handles missing values implemented in C. It has been optimized for
HenrikB> memory and speed. To avoid coercing integers to doubles, and hence
HenrikB> allocate an additional 200% memory, there is one C function for
HenrikB> integers and one for doubles.
HenrikB> The rowMedians() implementation is currently sitting in my non-CRAN
HenrikB> package R.native available by:
HenrikB> source("http://www.braju.com/R/hbLite.R")
HenrikB> hbLite("R.native")
HenrikB> library(R.native)
HenrikB> example(rowMedians)
HenrikB> The source code package is available at:
HenrikB> http://www.braju.com/R/repos/R.native_0.1.2.tar.gz
HenrikB> Before I submit a package to CRAN consisting of pretty much just
HenrikB> rowMedians(), would it make more sense for it to go into one of the
HenrikB> core packages? If so, how should I proceed?
As they say: You have to convince at least one member of R-core
that ``it's worth it''. {Then he may have to bear the battle
with dissenting core members ;- }
I'm quite interested, but really you have to do the work of unbundling
it from all the R.oo stuff before I have another longer look.
Martin
On Mon, 14 May 2007, Martin Maechler wrote:
Hi Henrik,
"HenrikB" == Henrik Bengtsson <hb at stat.berkeley.edu>
on Sun, 13 May 2007 21:14:24 -0700 writes:
HenrikB> Hi,
HenrikB> I've got a version of rowMedians(x, na.rm=FALSE) for matrices that
HenrikB> handles missing values implemented in C. It has been optimized for
HenrikB> memory and speed. To avoid coercing integers to doubles, and hence
HenrikB> allocate an additional 200% memory, there is one C function for
HenrikB> integers and one for doubles.
HenrikB> The rowMedians() implementation is currently sitting in my non-CRAN
HenrikB> package R.native available by:
HenrikB> source("http://www.braju.com/R/hbLite.R")
HenrikB> hbLite("R.native")
HenrikB> library(R.native)
HenrikB> example(rowMedians)
HenrikB> The source code package is available at:
HenrikB> http://www.braju.com/R/repos/R.native_0.1.2.tar.gz
HenrikB> Before I submit a package to CRAN consisting of pretty much just
HenrikB> rowMedians(), would it make more sense for it to go into one of the
HenrikB> core packages? If so, how should I proceed?
As they say: You have to convince at least one member of R-core
that ``it's worth it''. {Then he may have to bear the battle
with dissenting core members ;- }
I'm quite interested, but really you have to do the work of unbundling
it from all the R.oo stuff before I have another longer look.
Also, the 'a version of rowMedians' made me wonder what other version there was, and it seems there is one in Biobase which looks a more natural home.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 5/14/07, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Hi Henrik,
"HenrikB" == Henrik Bengtsson <hb at stat.berkeley.edu>
on Sun, 13 May 2007 21:14:24 -0700 writes:
HenrikB> Hi,
HenrikB> I've got a version of rowMedians(x, na.rm=FALSE) for matrices that
HenrikB> handles missing values implemented in C. It has been optimized for
HenrikB> memory and speed. To avoid coercing integers to doubles, and hence
HenrikB> allocate an additional 200% memory, there is one C function for
HenrikB> integers and one for doubles.
HenrikB> The rowMedians() implementation is currently sitting in my non-CRAN
HenrikB> package R.native available by:
HenrikB> source("http://www.braju.com/R/hbLite.R")
HenrikB> hbLite("R.native")
HenrikB> library(R.native)
HenrikB> example(rowMedians)
HenrikB> The source code package is available at:
HenrikB> http://www.braju.com/R/repos/R.native_0.1.2.tar.gz
HenrikB> Before I submit a package to CRAN consisting of pretty much just
HenrikB> rowMedians(), would it make more sense for it to go into one of the
HenrikB> core packages? If so, how should I proceed?
As they say: You have to convince at least one member of R-core
that ``it's worth it''. {Then he may have to bear the battle
with dissenting core members ;- }
I'm quite interested, but really you have to do the work of unbundling
it from all the R.oo stuff before I have another longer look.
Great. The R code without R.oo (that's what it does), will be
rowMedians.matrix <- function(x, na.rm=FALSE, ...) {
.Call("rowMedians", x, as.logical(na.rm), PACKAGE="R.native");
}
rowMedians <- function(...) UseMethod("rowMedians", ...)
and the C code is in src/rowMedians.c, and the doc in
man/rowMedians.matrix.R (all in
http://www.braju.com/R/repos/R.native_0.1.2.tar.gz).
Thanks
Henrik
PS. I'll be off email until Friday (US time). DS.
Martin
On 5/14/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
On Mon, 14 May 2007, Martin Maechler wrote:
Hi Henrik,
"HenrikB" == Henrik Bengtsson <hb at stat.berkeley.edu>
on Sun, 13 May 2007 21:14:24 -0700 writes:
HenrikB> Hi,
HenrikB> I've got a version of rowMedians(x, na.rm=FALSE) for matrices that
HenrikB> handles missing values implemented in C. It has been optimized for
HenrikB> memory and speed. To avoid coercing integers to doubles, and hence
HenrikB> allocate an additional 200% memory, there is one C function for
HenrikB> integers and one for doubles.
HenrikB> The rowMedians() implementation is currently sitting in my non-CRAN
HenrikB> package R.native available by:
HenrikB> source("http://www.braju.com/R/hbLite.R")
HenrikB> hbLite("R.native")
HenrikB> library(R.native)
HenrikB> example(rowMedians)
HenrikB> The source code package is available at:
HenrikB> http://www.braju.com/R/repos/R.native_0.1.2.tar.gz
HenrikB> Before I submit a package to CRAN consisting of pretty much just
HenrikB> rowMedians(), would it make more sense for it to go into one of the
HenrikB> core packages? If so, how should I proceed?
As they say: You have to convince at least one member of R-core
that ``it's worth it''. {Then he may have to bear the battle
with dissenting core members ;- }
I'm quite interested, but really you have to do the work of unbundling
it from all the R.oo stuff before I have another longer look.
Also, the 'a version of rowMedians' made me wonder what other version there was, and it seems there is one in Biobase which looks a more natural home.
The rowMedians() in Biobase utilizes rowQ() in ditto. I actually started of by adding support for missing values to rowQ() resulting in the method rowQuantiles(), for which there are also internal functions for both integer and double matrices. rowQuantiles() is in R.native too, but since it has much less CPU milage I wanted to wait with that. The rowMedians() is developed from my rowQuantiles() optimized for the 50% quantile. Why do you think it is more natural to host rowMedians() in Biobase than in one of the core R packages? Biobase comes with a lot of overhead for people not in the Bio-world. /Henrik
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Mon, 14 May 2007, Henrik Bengtsson wrote:
On 5/14/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
Hi Henrik,
"HenrikB" == Henrik Bengtsson <hb at stat.berkeley.edu>
on Sun, 13 May 2007 21:14:24 -0700 writes:
HenrikB> Hi, HenrikB> I've got a version of rowMedians(x, na.rm=FALSE) for
matrices that
HenrikB> handles missing values implemented in C. It has been
[...]
Also, the 'a version of rowMedians' made me wonder what other version there was, and it seems there is one in Biobase which looks a more natural home.
The rowMedians() in Biobase utilizes rowQ() in ditto. I actually started of by adding support for missing values to rowQ() resulting in the method rowQuantiles(), for which there are also internal functions for both integer and double matrices. rowQuantiles() is in R.native too, but since it has much less CPU milage I wanted to wait with that. The rowMedians() is developed from my rowQuantiles() optimized for the 50% quantile. Why do you think it is more natural to host rowMedians() in Biobase than in one of the core R packages? Biobase comes with a lot of overhead for people not in the Bio-world.
Because that is where there seems to be a need for it, and having multiple functions of the same name in different packages is not ideal (and even with namespaces can cause confusion).
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
"BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk>
on Mon, 14 May 2007 11:39:18 +0100 (BST) writes:
BDR> On Mon, 14 May 2007, Henrik Bengtsson wrote:
>> On 5/14/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
>>>
>>> > Hi Henrik,
>>> >>>>>> "HenrikB" == Henrik Bengtsson <hb at stat.berkeley.edu>
>>> >>>>>> on Sun, 13 May 2007 21:14:24 -0700 writes:
>>> >
>>> > HenrikB> Hi,
>>> > HenrikB> I've got a version of rowMedians(x, na.rm=FALSE) for
>>> matrices that
>>> > HenrikB> handles missing values implemented in C. It has been
BDR> [...]
>>> Also, the 'a version of rowMedians' made me wonder what other version
>>> there was, and it seems there is one in Biobase which looks a more
>>> natural home.
>>
>> The rowMedians() in Biobase utilizes rowQ() in ditto. I actually
>> started of by adding support for missing values to rowQ() resulting in
>> the method rowQuantiles(), for which there are also internal functions
>> for both integer and double matrices. rowQuantiles() is in R.native
>> too, but since it has much less CPU milage I wanted to wait with that.
>> The rowMedians() is developed from my rowQuantiles() optimized for
>> the 50% quantile.
>>
>> Why do you think it is more natural to host rowMedians() in Biobase
>> than in one of the core R packages? Biobase comes with a lot of
>> overhead for people not in the Bio-world.
BDR> Because that is where there seems to be a need for it, and having multiple
BDR> functions of the same name in different packages is not ideal (and even
BDR> with namespaces can cause confusion).
That's correct, of course.
However, I still think that quantiles (and statistics derived
from them) in general and medians in particular are under-used
by many user groups. For some useRs, speed can be an important
reason and for that I had made a big effort to provide runmed()
in R, and I think it would be worthwhile to provide fast rowwise
medians and quantiles, here as well.
Also, BTW, I think it will be worthwhile to provide (R<->C) API
versions of median() and quantile() {with less options than the
R functions, most probably!!},
such that we'd hopefully see less re-invention of the wheel
happening in every package that needs such quantiles in its C code.
Biobase is in quite active maintenance, and I'd assume its
maintainers will remove rowMedians() from there (or first
replace it with a wrapper in order to deal with the namespace
issue you mentioned) as soon as R has its own function
with the same (or better) functionality.
In order to facilitate the transition, we'd have to make sure
that such a 'stats' function does behave " >= " to the bioBase
one.
Martin
We did think about this a lot, and decided it was better to have something like rowQ, which really returns requested order statistics, letting the user manipulate them on the return for their own version of median, or other quantiles, was a better approach. I would be happy to have this in R itself, if there is sufficient interest and we can remove the one in Biobase (without the need for deprecation/defunct as long as the args are compatible). But, if the decision is to return a particular estimate of a quantile, then we would probably want to keep our function around, with its current name. best wishes Robert
Martin Maechler wrote:
"BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk>
on Mon, 14 May 2007 11:39:18 +0100 (BST) writes:
BDR> On Mon, 14 May 2007, Henrik Bengtsson wrote:
>> On 5/14/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
>>>
>>> > Hi Henrik,
>>> >>>>>> "HenrikB" == Henrik Bengtsson <hb at stat.berkeley.edu>
>>> >>>>>> on Sun, 13 May 2007 21:14:24 -0700 writes:
>>> >
>>> > HenrikB> Hi,
>>> > HenrikB> I've got a version of rowMedians(x, na.rm=FALSE) for
>>> matrices that
>>> > HenrikB> handles missing values implemented in C. It has been
BDR> [...]
>>> Also, the 'a version of rowMedians' made me wonder what other version
>>> there was, and it seems there is one in Biobase which looks a more
>>> natural home.
>>
>> The rowMedians() in Biobase utilizes rowQ() in ditto. I actually
>> started of by adding support for missing values to rowQ() resulting in
>> the method rowQuantiles(), for which there are also internal functions
>> for both integer and double matrices. rowQuantiles() is in R.native
>> too, but since it has much less CPU milage I wanted to wait with that.
>> The rowMedians() is developed from my rowQuantiles() optimized for
>> the 50% quantile.
>>
>> Why do you think it is more natural to host rowMedians() in Biobase
>> than in one of the core R packages? Biobase comes with a lot of
>> overhead for people not in the Bio-world.
BDR> Because that is where there seems to be a need for it, and having multiple
BDR> functions of the same name in different packages is not ideal (and even
BDR> with namespaces can cause confusion).
That's correct, of course.
However, I still think that quantiles (and statistics derived
from them) in general and medians in particular are under-used
by many user groups. For some useRs, speed can be an important
reason and for that I had made a big effort to provide runmed()
in R, and I think it would be worthwhile to provide fast rowwise
medians and quantiles, here as well.
Also, BTW, I think it will be worthwhile to provide (R<->C) API
versions of median() and quantile() {with less options than the
R functions, most probably!!},
such that we'd hopefully see less re-invention of the wheel
happening in every package that needs such quantiles in its C code.
Biobase is in quite active maintenance, and I'd assume its
maintainers will remove rowMedians() from there (or first
replace it with a wrapper in order to deal with the namespace
issue you mentioned) as soon as R has its own function
with the same (or better) functionality.
In order to facilitate the transition, we'd have to make sure
that such a 'stats' function does behave " >= " to the bioBase
one.
Martin
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
On 14 May 2007 at 14:31, Martin Maechler wrote:
| However, I still think that quantiles (and statistics derived | from them) in general and medians in particular are under-used | by many user groups. For some useRs, speed can be an important | reason and for that I had made a big effort to provide runmed() | in R, and I think it would be worthwhile to provide fast rowwise | medians and quantiles, here as well. Seconded. I don't see anything particular 'bio' about that. Dirk
Hell, there are no rules here - we're trying to accomplish something.
-- Thomas A. Edison