Skip to content
Back to formatted view

Raw Message

Message-ID: <1549805803.3935.11.camel@gmail.com>
Date: 2019-02-10T13:36:43Z
From: Aaron Lun
Subject: [Bioc-devel] Pushing towards a better home for matrix generics
In-Reply-To: <d69da125-eac5-5579-b8e5-78f4cbdd922f@fredhutch.org>

Returning to this topic:?

It's good to hear some of the rationale behind the current state of
affairs. That said, the set-up we have now is quite difficult to work
with; as mentioned before, I've had to hack around it like:

# Example from "BiocSingular", https://github.com/LTLA/BiocSingular
.safe_colSums <- function(x) {
????if (is(x, "Matrix")) {
????????Matrix::colSums(x)
????} else {
????????colSums(x)
????}????
}

... which is ugly, and even worse, still incorrect, e.g., for non-
Matrix classes that have methods for the implicit colSums generic. This
situation is not sustainable for further package development.

Is there a path forward that is palatable to everyone? Or perhaps these
conversations are already happening on R-devel?

-A

On Tue, 2019-01-29 at 18:46 +0000, Pages, Herve wrote:
> Yes the help system could enforce the full signature for the aliases
> but?
> that means the end user then will have to always do?
> ?`colSums,SomeClass,ANY,ANY-method`, which feels unnecessary
> complicated?
> and confusing in the case of a generic where dispatching on the 2nd
> and?
> 3rd arguments hardly makes sense.
> 
> Or are you saying that the help system should enforce an alias that?
> strictly matches the signature explicitly used in the setMethod?
> statement? Problem with this is that then there is no easy way for
> the?
> end user to know a priori which form to use to access the man page.
> Is?
> it ?`colSums,dgCMatrix,ANY,ANY-method` or is it?
> ?`colSums,dgCMatrix-method`. Right now when you type colSums<ENTERN>?
> (after loading the Matrix package), you get this:
> 
> ?? > library(Matrix)
> ?? > colSums
> ?? standardGeneric for "colSums" defined from package "base"
> 
> ?? function (x, na.rm = FALSE, dims = 1, ...)
> standardGeneric("colSums")
> ?? <bytecode: 0x591c7d0>
> ?? <environment: 0x591a408>
> ?? Methods may be defined for arguments: x, na.rm, dims
> ?? Use? showMethods("colSums")? for currently available ones.
> 
> This suggests that the correct form is ?`colSums,dgCMatrix,ANY,ANY-
> method`.
> 
> All this confusion can be avoided by specifying signature="x" in the?
> definition of the implicit generic. It formalizes where dispatch
> really?
> happens and sets expectations upfront. No loose ends.
> 
> Hope this makes sense,
> 
> H.
> 
> On 1/29/19 09:43, Martin Maechler wrote:
> > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Pages, Herve
> > > > > > > ?????on Tue, 29 Jan 2019 16:44:47 +0000 writes:
> > ?????> Hi Martin.??Speed is not the concern: I just did some
> > ?????> quick benchmarking and didn't observe any significant
> > ?????> difference in method dispatch performance after doing
> > ?????> setGeneric("toto", function(x, a=0, b=0, c=0)
> > ?????> standardGeneric("toto")) vs doing setGeneric("toto",
> > ?????> signature="x", function(x, a=0, b=0, c=0)
> > ?????> standardGeneric("toto")).
> > 
> > ?????> Here is the real concern to me:
> > 
> > ?????> Aliases of the form
> > ?????> \alias{colSums,dgCMatrix,ANY,ANY-method} are a real pain
> > ?????> to maintain. It's also a pain for the end user to have to
> > ?????> do ?`colSums,dgCMatrix,ANY,ANY-method` to access the man
> > ?????> page for a particular method. I know Matrix uses "short"
> > ?????> aliases i.e. aliases of the form
> > ?????> \alias{colSums,dgCMatrix-method} so the user only needs to
> > ?????> do ?`colSums,dgCMatrix-method` but there is a lot of
> > ?????> fragility to the situation.
> > 
> > ?????> Here is why: The exact form that needs to be used for
> > ?????> these aliases can change anytime depending on what happens
> > ?????> in one of the upstream packages (not a concern for the
> > ?????> Matrix package because no upstream package defines colSums
> > ?????> methods). More precisely: If all the colSums methods
> > ?????> defined in the upstream packages and in my own package are
> > ?????> defined with setMethod statements of the form:
> > 
> > ?????>???? setMethod("colSums", signature(x="SomeClass"), ...)
> > 
> > ?????> then the aliases in the man pages must be of the form
> > ?????> \alias{colSums,SomeClass-method} and the end user can just
> > ?????> do ?`colSums,SomeClass-method`, which is good. But if
> > ?????> **one** upstream package decides to use a setMethod
> > ?????> statement of the form:
> > 
> > ?????>??? setMethod("colSums", signature(x="SomeClass",
> > ?????> na.rm="ANY", dims="ANY"), ...)
> > 
> > ?????> then the aliases for **all** the colSums methods in
> > ?????> **all** the downstream packages now must be of the form
> > ?????> \alias{colSums,SomeOtherClass,ANY,ANY-method}, even if the
> > ?????> method for SomeOtherClass is defined with
> > ?????> signature(x="SomeOtherClass").
> > 
> > Hmm... but to me, the behavior you describe in the above paragraph
> > seems rather an implementation "infelicity" in R's help /
> > documentation system,
> > than an intrinsic necessity.??Or have you thought more about
> > this and discussed it with other S4 experts (John Chambers,
> > Michael L., Martin Morgan, ...)??and came to a different
> > conclusion?
> > 
> > Very generally:
> > 
> > ???Just because the documentation (help system)
> > ???rules are implemented as they are should *NOT* influence "the
> > ???best way" to program things in R.
> > 
> > and particularly for something such as S4 which has been adapted
> > and tuned for a long time ...
> > 
> > So, rather the documentation "setup" should adapt to what seems
> > best from an R coding point of view.
> > 
> > More specifically, if we are allowed to use short signatures in R
> > code, i.e.,
> > ???????signature(x=<someClass>)
> > short for
> > ???????signature(x=<someClass>, na.m="ANY", dims="ANY")
> > 
> > then the documentation \alias{} should allow to use the same
> > principle, as the documentation / help "keys" which \alias{.}
> > constructs
> > will be similarly uniquely determined
> > (at least as long as other packages do not describe methods for
> > ? "my" <someClass>)
> > 
> > So, the help / documentation (and "R CMD check" checks) should
> > have been changed long ago, if you had sent patches to do so,
> > n'est-ce pas????:-) ;-)??[[yes, half jokingly]].
> > 
> > Martin
> > 
> > ?????> Also, as a consequence, now
> > ?????> the end user has to use the long syntax to access the man
> > ?????> pages for these methods. And if later the author of the
> > ?????> upstream package decides to switch back to the
> > ?????> setMethod("colSums", signature(x="SomeClass"), ...) form,
> > ?????> then I have to switch back all the aliases in all my
> > ?????> downstream packages to the short form again!
> > 
> > ?????> This fragility of the alias syntax was one of the
> > ?????> motivations for me to put many setGeneric statements of
> > ?????> the form setGeneric("someGeneric", signature="x") in
> > ?????> BiocGenerics over the years. So I don't have many dozens
> > ?????> of aliases that suddenly break for mysterious reasons ('R
> > ?????> CMD check' would suddenly starts reporting warnings for
> > ?????> these aliases despite no changes in my package or in R).
> > 
> > ?????> Best,
> > 
> > ?????> H.
> > 
> > ?????> On 1/29/19 03:16, Martin Maechler wrote:
> > ?????>>>>>>> Michael Lawrence on Mon, 28 Jan 2019 20:47:58 -0800
> > ?????>>>>>>> writes:
> > ?????>> > That will have some consequences; for example, >
> > ?????>> documentation aliases will need to change. Not sure how >
> > ?????>> many packages will need to be fixed outside of Matrix,
> > ?????>> but > it's not an isolated change. Martin might comment
> > ?????>> on the > rationale for the full signature, since he
> > ?????>> defined those > generics.
> > ?????>>
> > ?????>> > On Mon, Jan 28, 2019 at 7:21 PM Pages, Herve >
> > ?????>> <hpages at fredhutch.org> wrote:
> > ?????>> >>
> > ?????>> >> Actually there is a 4th solution which is to modify
> > ?????>> the >> definition of the implicit generics in the methods
> > ?????>> >> package (file makeBasicFunsList.R) to make them
> > ?????>> dispatch >> on their 1st arg only. Should be easy. Then
> > ?????>> no package >> will need to use a setGeneric statement >>
> > ?????>> anymore. Everybody will automatically get a clean >>
> > ?????>> implicit generic. Including the Matrix package which >>
> > ?????>> shouldn't need to be touched (except maybe for some >>
> > ?????>> aliases in its man page that might need to be changed >>
> > ?????>> from \alias{colSums,dgCMatrix,ANY,ANY-method} to >>
> > ?????>> \alias{colSums,dgCMatrix-method}).
> > ?????>> >>
> > ?????>> >> Anybody wants to try to make a patch for this?
> > ?????>>
> > ?????>> >> H.
> > ?????>>
> > ?????>> I've already replied without having read the above two
> > ?????>> messages.??In my reply I had indeed more or less argued
> > ?????>> as Herv? does above.
> > ?????>>
> > ?????>> Michael, Herv?, .. : Why is it really so much better to
> > ?????>> disallow dispatch for the other compulsory arguments?
> > ?????>> Dispatch there allows to use methods for class "missing"
> > ?????>> which is nicer in my eyes than the traditional default
> > ?????>> argument + missing() "tricks".
> > ?????>>
> > ?????>> Is it mainly speed you are concerned about.??If yes, do
> > ?????>> we have data (and data analysis) about performance here?
> > ?????>>
> > ?????>> Martin
> > ?????>>
> > ?????>> >>
> > ?????>> >> On 1/28/19 19:00, Michael Lawrence wrote: > I agree
> > ?????>> (2) >> is a good compromise. CC'ing Martin for his
> > ?????>> perspective.
> > ?????>> >> >
> > ?????>> >> > Michael
> > ?????>> >> >
> > ?????>> >> > On Mon, Jan 28, 2019 at 6:58 PM Pages, Herve >>
> > ?????>> <hpages at fredhutch.org> wrote: >> Hi Aaron,
> > ?????>> >> >>
> > ?????>> >> >> The 4 matrix summarization generics currently
> > ?????>> defined >> in BiocGenerics >> are defined as followed:
> > ?????>> >> >>
> > ?????>> >> >> setGeneric("rowSums", signature="x") >> >>
> > ?????>> setGeneric("colSums", signature="x") >> >>
> > ?????>> setGeneric("rowMeans", signature="x") >> >>
> > ?????>> setGeneric("colMeans", signature="x")
> > ?????>> >> >>
> > ?????>> >> >> The only reason for having these definitions in >>
> > ?????>> BiocGenerics is to >> restrict dispatch the first >>
> > ?????>> argument. This is cleaner than what we would >> get with
> > ?????>> >> the implicit generics where dispatch is on all
> > ?????>> arguments >> (it >> doesn't really make sense to dispatch
> > ?????>> on toggles >> like 'na.rm' or >> 'dims'). Sticking to
> > ?????>> simple dispatch >> when possible makes life easier for >>
> > ?????>> the developer >> (especially in times of troubleshooting)
> > ?????>> and for the user >> >> (methods are easier to discover
> > ?????>> and their man pages >> easier to access).
> > ?????>> >> >>
> > ?????>> >> >> However, the 4 statements above create new generics
> > ?????>> >> that mask the >> implicit generics defined in the
> > ?????>> Matrix >> package (Matrix doesn't contain >> any
> > ?????>> setGeneric >> statements for these generics, only
> > ?????>> setMethod >> >> statements). This is a very unsatisfying
> > ?????>> situation and it >> has hit me >> repeatedly over the
> > ?????>> last couple of years.
> > ?????>> >> >>
> > ?????>> >> >> We have basically 3 ways to go. From simpler to
> > ?????>> more >> complicated:
> > ?????>> >> >>
> > ?????>> >> >> 1) Give up on single dispatch for these
> > ?????>> generics. That >> is, we remove the >> 4 statements above
> > ?????>> from >> BiocGenerics. Then we use setMethod() in package
> > ?????>> >> code >> like Matrix does.
> > ?????>> >> >>
> > ?????>> >> >> 2) Convince the Matrix folks to put the 4
> > ?????>> statements >> above in Matrix.??>> Then any BioC package
> > ?????>> that needs to >> define methods for these generics >>
> > ?????>> would just need to >> import them from the Matrix
> > ?????>> package. Maybe we could >> >> even push this one step
> > ?????>> further by having BiocGenerics >> import and >> re-export
> > ?????>> these generics. This would make >> them "available" in
> > ?????>> BioC as >> soon as the BiocGenerics >> is loaded (and any
> > ?????>> package that needs to define >> >> methods on them would
> > ?????>> just need to import them from >> BiocGenerics).
> > ?????>> >> >>
> > ?????>> >> >> 3) Put the 4 statements above in a MatrixGenerics
> > ?????>> >> package. Then convince >> the Matrix folks to define
> > ?????>> >> methods on the generics defined in >> >>
> > ?????>> MatrixGenerics. Very unlikely to happen!
> > ?????>> >> >>
> > ?????>> >> >> IMO 2) is the best compromise. Want to give it a
> > ?????>> shot?
> > ?????>> >> >>
> > ?????>> >> >> H.
> > ?????>> >> >>
> > ?????>> >> >>
> > ?????>> >> >> On 1/27/19 13:45, Aaron Lun wrote: >>> This is a >>
> > ?????>> resurrection of some old threads:
> > ?????>> >> >>>
> > ?????>> >> >>>
> > ?????>> >>
> > ?????>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.e
> > thz.ch_pipermail_bioc-2Ddevel_2017-
> > 2DNovember_012273.html&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA
> > vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=O21AQgvbUp3XRwM4jf0WeZA2ePj9y
> > T3fc2X5hOsKNJk&s=pcpUyjpkQe6U79lZ_n2SANNp6Zj_s6i1Sq2yZx2NSjw&e=
> > ?????>> >> >>>
> > ?????>> >> >>>
> > ?????>> >>
> > ?????>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github
> > .com_Bioconductor_MatrixGenerics_issues&d=DwIDaQ&c=eRAMFD45gAfqt84V
> > tBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=O21AQgvbUp3X
> > RwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk&s=NrmcVnmvgkDp3p64J-
> > izZU9VD5nFsFCWOTI-TsnzCpY&e=
> > ?????>> >> >>>
> > ?????>> >> >>> For those who are unfamiliar with this, the basic
> > ?????>> >> issue is that various >>> Matrix and BiocGenerics >>
> > ?????>> functions mask each other. This is mildly >>> frustrating
> > ?????>> >> in interactive sessions:
> > ?????>> >> >>>
> > ?????>> >> >>>> library(Matrix) >>>> library(DelayedArray) >>>> x
> > ?????>> <- >> rsparsematrix(10, 10, 0.1) >>>> colSums(x) # fails
> > ?????>> >>>> >> Matrix::colSums(x) # okay >>> ... but quite
> > ?????>> annoying >> during package development, requiring code
> > ?????>> like >>> this:
> > ?????>> >> >>>
> > ?????>> >> >>> if (is(x, "Matrix")) { >>> z <- Matrix::colSums(x)
> > ?????>> >> >>> } else { >>> z <- colSums(x) # assuming
> > ?????>> DelayedArray >> does the masking.??>>> }
> > ?????>> >> >>>
> > ?????>> >> >>> ... which defeats the purpose of using S4 dispatch
> > ?????>> in >> the first place.
> > ?????>> >> >>>
> > ?????>> >> >>> I have been encountering this issue with
> > ?????>> increasing >> frequency in my >>> packages, as a lot of
> > ?????>> my code base >> needs to be able to interface with >>>
> > ?????>> both Matrix and >> Bioconductor objects (e.g.,
> > ?????>> DelayedMatrices) at the >>> >> same time. What needs to
> > ?????>> happen so that I can just write:
> > ?????>> >> >>>
> > ?????>> >> >>> z <- colSums(x)
> > ?????>> >> >>>
> > ?????>> >> >>> ... and everything will work for both Matrix and
> > ?????>> >> Bioconductor classes???>>> It seems that many of these
> > ?????>> >> function names are implicit generics >>> anyway, can
> > ?????>> >> BiocGenerics take advantage of that for the time
> > ?????>> being?
> > ?????>> >> >>>
> > ?????>> >> >>> Best,
> > ?????>> >> >>>
> > ?????>> >> >>> Aaron
> > ?????>> >> >>>
> > ?????>> >> >>> _______________________________________________
> > ?????>> >>> >> Bioc-devel at r-project.org mailing list >>> >>
> > ?????>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.e
> > thz.ch_mailman_listinfo_bioc-
> > 2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYb
> > W0WYiZvSXAJJKaaPhzWA&m=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk&
> > s=JtgGBnaZJH44fV8OUp-SwnHxhD_i_mdVkqoMfUoA5tM&e=
> > ?????>> >> >> --
> > ?????>> >> >> Herv? Pag?s
> > ?????>> >> >>
> > ?????>> >> >> Program in Computational Biology >> Division of
> > ?????>> Public >> Health Sciences >> Fred Hutchinson Cancer
> > ?????>> Research Center >> >> 1100 Fairview Ave. N, M1-B514 >>
> > ?????>> P.O. Box 19024 >> >> Seattle, WA 98109-1024
> > ?????>> >> >>
> > ?????>> >> >> E-mail: hpages at fredhutch.org >> Phone: (206)
> > ?????>> 667-5791 >> >> Fax: (206) 667-1319
> > ?????>> >> >>
> > ?????>> >> >> _______________________________________________ >>
> > ?????>> >> Bioc-devel at r-project.org mailing list >> >>
> > ?????>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.e
> > thz.ch_mailman_listinfo_bioc-
> > 2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYb
> > W0WYiZvSXAJJKaaPhzWA&m=c-
> > Mmi30ouubEEHC5W9_X6DIwxblt1nQlIfgCaK8uCJU&s=U8Hu1kzglD_RP7t_eR5w_nY
> > AIaupBgrEKx11geSZwVg&e=
> > ?????>> >>
> > ?????>> >> --
> > ?????>> >> Herv? Pag?s
> > ?????>> >>
> > ?????>> >> Program in Computational Biology Division of Public >>
> > ?????>> Health Sciences Fred Hutchinson Cancer Research Center >>
> > ?????>> 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
> > ?????>> >> 98109-1024
> > ?????>> >>
> > ?????>> >> E-mail: hpages at fredhutch.org Phone: (206) 667-5791
> > ?????>> Fax: >> (206) 667-1319
> > ?????>> >>
> > 
> > ?????> --
> > ?????> Herv? Pag?s
> > 
> > ?????> Program in Computational Biology Division of Public Health
> > ?????> Sciences Fred Hutchinson Cancer Research Center 1100
> > ?????> Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
> > ?????> 98109-1024
> > 
> > ?????> E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax:
> > ?????> (206) 667-1319
> > 
> --?
> Herv? Pag?s
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages at fredhutch.org
> Phone:??(206) 667-5791
> Fax:????(206) 667-1319
>