[Bioc-devel] any interest in a BiocMatrix core package?
As Peter points out, the 'matrixStats' package provides an API with plain functions - not generic functions. This is intentional with the main purpose of this is to keep the overhead at an absolute minimum. This is also in line with the overall philosophy of 'matrixStats' where speed is maximized and memory usage is minimized to the point where you cannot do much better if you'd use native code. The user should be able to call the same matrixStats function thousands of times even on rather small matrices without getting killed by overhead due to dispatching or internal copies, e.g. [toy example] resampling 'cols' B=10,000 times in calls such as matrixStats::rowMeans2(X, cols = cols)`. You can find extensive benchmark reports at https://github.com/HenrikBengtsson/matrixStats/wiki/Benchmark-reports.
From my perspective, the role of 'matrixStats' in a software stack is
a rather low-level role where it can serve higher-level API that either replicate its API or reuse it internally, e.g. those that dispatch on S3 and S4 etc. Peter's 'DelayedMatrixStats' is one example. On Thu, Nov 2, 2017 at 2:00 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
[...]
Honestly, I (as co-maintainer of Matrix, principal maintainer
for several years now)
had been a bit surprised and frustrated that the 'matrixStats'
initiative had started w/o any contact with the Matrix package
maintainers and initially has not ever tried to use Matrix
package classes or functionality
(and this is still the case now AFAICS).
Oh no, I'm sorry that I/we've caused frustration with 'matrixStats'. I'm not sure I understand though - the overlap in API and functionality between 'matrixStats' and 'Matrix' is basically zero(?). I think of 'Matrix' a higher-level package. Do my comments above put it in a different light? Or are you saying that what's in 'matrixStats' should really have been in 'Matrix'? All the best, Henrik On Fri, Nov 3, 2017 at 7:16 AM, Martin Morgan
<martin.morgan at roswellpark.org> wrote:
On 11/02/2017 06:20 PM, Peter Hickey wrote:
As Michael notes, I think the scope here is broader than considering S4 generics for functions in base R. To summarise, I think we would be looking to have S4 generics for the following: - All(?) the row*/col* functions in matrixStats (NB: matrixStats uses plain old functions with no S3 or S4, which I believe was to avoid any overhead of method dispatch since it is explicitly targeting ordinary matrix objects as input) - Potentially new row*/col* summaries (i.e. that don't currently exist in matrixStats) - Perhaps moving from BiocGenerics the S4 generics defined in R/matrix-summary.R? - Perhaps apply() (E.g., DelayedArray defines an S4 generic for this) Having these as part of base R or in a recommended packages would be great, but of course comes with its own challenges. The alternative is a lightweight package, likely better hosted on CRAN than BioC to assist with wider adoption and integration with Matrix, matrixStats, and other non-BioC packages. As Michael notes, getting the generic signature 'right' will be important and there are undoubtedly other challenges ahead (I've started a TODO). Might Bioconductor open up a GitHub repo (MatrixGenerics?) where this can be discussed with accompanying code. I've made the skeleton of a MatrixGenerics package that I could upload to kick things off, along with adding my TODOs as Issues on GitHub for further discussion.
I did start this repository as a place to develop more concrete ideas; I think that a Bioconductor MatrixGenerics solution would not be optimal, so I think of this repository as a place to develop ideas rather than a precursor to an actual package. I invited Pete as a Collaborator with 'Admin' privileges, so I think he should be able to extend Collaborator invites to other interested parties. Martin
Cheers, Pete On Thu, 2 Nov 2017 at 13:10 Michael Lawrence <lawrence.michael at gene.com> wrote:
I'm pretty sure we're also considering generics for functions that do not exist in base R. Like rowVars() and colVars(). This sort of suggests that matrixStats should be part of base R. As an aside, we should think about the signature on those implicit generics. Should they really include na.rm and dims? The simpler the signature, the easier to understand the API. On Thu, Nov 2, 2017 at 10:38 AM, Martin Maechler < maechler at stat.math.ethz.ch
wrote:
Martin Morgan <martin.morgan at roswellpark.org>
on Thu, 2 Nov 2017 06:17:19 -0400 writes:
> On 11/02/2017 05:00 AM, Martin Maechler wrote:
>>>>>>> "ML" == Michael Lawrence <lawrence.michael at gene.com>
>>>>>>> on Wed, 1 Nov 2017 14:13:54 -0700 writes:
>>
>> > Probably way easier to add the generics to the Matrix >
>> package and everyone just depends on that.
>>
>> Yes! It is 'Recommended' and comes with every R
>> installation, and has had many such matrix S4 methods in
>> place for > 10 years, notably for dealing with (large)
>> sparse matrices.
>>
>> Honestly, I (as co-maintainer of Matrix, principal
>> maintainer for several years now) had been a bit
>> surprised and frustrated that the 'matrixStats'
>> initiative had started w/o any contact with the Matrix
>> package maintainers and initially has not ever tried to
>> use Matrix package classes or functionality (and this is
>> still the case now AFAICS).
>>
>> I'm happy to coordinate with maintainers of bioc packages
>> about which generics (and classes !) to use and export,
>> etc.
> One issue is that Matrix is a relatively large package
> (well, I wonder if that's a reasonable statement, given
> the Bioc dependencies and data involved, but perhaps in
> general...) and hence 'overkill' to obtain a collection of
> generics. Is there any prospect for factoring out the
> definition of the generics from implementation of the
> methods? Re-purposing stats4 ?
> Martin Morgan
Hmm.. we have quite a few setGenericImplicit() statements in the methods package already, notably for 'colSums' and friends, and so other decent citizen packages do *NOT* setGeneric() at all on these ... and of course, Matrix _is_ a decent citizen in the R package universe. Instead of to stats4, I'm pretty sure we should only consider what functions should be added to the implicit generics already provided by the 'methods' package itself. Could it be that (some of) you are not properly aware of implicit generics? If you start 'R --vanilla' you can say
implicitGeneric("colSums")
standardGeneric for "colSums" defined from package "base"
function (x, na.rm = FALSE, dims = 1, ...)
standardGeneric("colSums")
<bytecode: 0x6cb4798>
<environment: 0x6cab560>
Methods may be defined for arguments: x, na.rm, dims
Use showMethods("colSums") for currently available ones.
---------
so I think it is clear how *any* decent package has to define
methods for colSums(), and if they do, there should not be any
conflicts.
I think the problem is with S3 methods, not with S4 ones, where
the implicit generics I understand where made for dealing with
several packages writing methods for the same generic without
one of the packages taking precedence.
Martin M?chler
>>
>> Best, Martin Maechler ETH Zurich (and R core team)
>>
>>
>>
>> > On Wed, Nov 1, 2017 at 1:59 PM, Herv? Pag?s >
>> <hpages at fredhutch.org> wrote:
>>
>> >> That's probably a good idea but a clean solution would
>> >> need to involve all players, including the Matrix >>
>> package. Right now there are conflicts for some S4 >>
>> generics defined in Matrix and in BiocGenerics >>
>> (e.g. rowSums). I'm not sure that moving rowSums from >>
>> BiocGenerics to a new MatrixGenerics package would >>
>> address this. Unless MatrixGenerics is on CRAN and >>
>> Matrix depends on it ;-)
>> >>
>> >> How likely is this to happen?
>> >>
>> >> H.
>> >>
>> >>
>> [............]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> This email message may contain legally privileged and/or
> confidential information. If you are not the intended
> recipient(s), or the employee or agent responsible for the
> delivery of this message to the intended recipient(s), you
> are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.
> If you have received this message in error, please notify
> the sender immediately by e-mail and delete this email
> message from your computer. Thank you.
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel