Hi,
I don't see how it needs more functions (as long as you can get developers
to agree). Suppose that someone can define a reusable PCA class. This
will contain a single "plot" generic function, defined once and reused by
other classes. The existing "plotPCA" interface can also be implemented
just once, in this class, as
plotPCA <- function(object, ...) plot(as.PCA(object), ...)
This can be exposed to users of your class through namespaces. Then the
only thing a developer needs to implement in his own class is the single
"as.PCA" function. And he/she would have already been rquired to implement
this as part of the old "plotPCA" function. So it can be extracted from
that, and the developer doesn't have to reimplement the visualization code
from the PCA class.
Best,
Kevin
On 10/20/2014 5:15 PM, davide risso wrote:
Hi Kevin,
I see your points and I agree (especially for the specific case of
plotPCA that involves some non trivial computations).
On the other hand, having a wrapper function that starting from the
"raw" data gives you a pretty picture (with virtually zero effort by the
user) using a sensible choice of parameters that are more or less OK for
RNA-seq data is useful for practitioners that just want to look for
patterns in the data.
I guess it would be the same to have a PCA method for each of the
objects and then using the plot method on those new objects, but that would
just create a lot more objects and functions than the current approach
(like Mike was saying).
Your "as.pca" or "performPCA" approach would be definitely better if all
the different methods would create objects of the *same* PCA class, but
since we are talking about different packages, I don't know how easy it
would be to coordinate. But perhaps this is the way we should go.
Best,
davide
On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes <kevin.r.coombes at gmail.com>
wrote:
Hi,
It depends.
The "traditional" R approach to these matters is that you (a) first
perform some sort of an analysis and save the results as an object and then
(b) show or plot what you got. It is part (b) that tends to be really
generic, and (in my opinion) should have really generic names -- like
"show" or "plot" or "hist" or "image".
With PCA in particular, you usually have to perform a bunch of
computations in order to get the principal components from some part of the
data. As I understand it now, these computations are performed along the
way as part of the various "plotPCA" functions. The "R way" to do this
would be something like
pca <- performPCA(mySpecialObject) # or as.PCA(mySpecialObject)
plot(pca) # to get the scatter plot
This apporach has the user-friendly advantage that you can tweak the plot
(in terms of colors, symbols, ranges, titles, etc) without having to
recompute the principal components every time. (I often find myself
re-plotting the same PCA several times, with different colors or symbols
for different factrors associated with the samples.) In addition, you could
then also do something like
screeplot(pca)
to get a plot of the percentages of variance explained.
My own feeling is that if the object doesn't know what to do when you
tell it to "plot" itself, then you haven't got the right abstraction.
You may still end up needing generics for each kind of computation you
want to perform (PCA, RLE, MA, etc), which is why I suggested an "as.PCA"
function. After all, "as" is already pretty generic. In the long run, l
this would herlp BioConductor developers, since they wouldn't all have to
reimplement the visualization code; they would just have to figure out how
to convert their own object into a PCA or RLE or MA object.
And I know that this "plotWhatever" approach is used elsewhere in
BioConductor, and it has always bothered me. It just seemed that a post
suggesting a new generic function provided a reasonable opportunity to
point out that there might be a better way.
Best,
Kevin
PS: My own "ClassDicsovery" package, which is available from RForge via
*install.packages("ClassDiscovery",
repos="http://R-Forge.R-project.org" <http://R-Forge.R-project.org>)*
includes a "SamplePCA" class that does something roughly similar to this
for microarrays.
PPS (off-topic): The worst offender in base R -- because it doesn't use
this "typical" approch -- is the "heatmap" function. Having tried to teach
this function in several different classes, I have come to the conclusion
that it is basically unusable by mortals. And I think the problem is that
it tries to combine too many steps -- clustering rows, clustering columns,
scaling, visualization -- all in a single fiunction
On 10/20/2014 3:47 PM, davide risso wrote:
Hi Kevin,
I don't agree. In the case of EDASeq (as I suppose it is the case for
DESeq/DESeq2) plotting the principal components of the count matrix is only
one of possible exploratory plots (RLE plots, MA plots, etc.).
So, in my opinion, it makes more sense from an object oriented point of
view to have multiple plotting methods for a single "RNA-seq experiment"
object.
In addition, this is the same strategy adopted elsewhere in
Bioconductor, e.g., for the plotMA method.
Just my two cents.
Best,
davide
On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes <
kevin.r.coombes at gmail.com> wrote:
I understand that breaking code is a problem, and that is admittedly
the main reason not to immediately adopt my suggestion.
But as a purely logical exercise, creating a "PCA" object X or something
similar and using either
plot(X)
or
plot(as.PCA(mySpecialObject))
is a much more sensible use of object-oriented programming/design. This
requires no new generics (to write or to learn).
And you could use it to transition away from the current system by
convincing the various package maintainers to re-implement plotPCA as
follows:
plotPCA <- function(object, ...) {
plot(as.PCA(object), ...)
}
This would be relatively easy to eventually deprecate and teach users to
switch to the alternative.
On 10/20/2014 1:07 PM, Michael Love wrote:
hi Kevin,
that would imply there is only one way to plot an object of a given
class. Additionally, it would break a lot of code.?
best,
Mike
On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes <
kevin.r.coombes at gmail.com> wrote:
But shouldn't they all really just be named "plot" for the appropriate
objects? In which case, there would already be a perfectly good generic....
On Oct 20, 2014 10:27 AM, "Michael Love" <michaelisaiahlove at gmail.com>
wrote:
I noticed that 'plotPCA' functions are defined in EDASeq, DESeq2,
DESeq,
affycoretools, Rcade, facopy, CopyNumber450k, netresponse, MAIT (maybe
more).
Sounds like a case for BiocGenerics.
best,
Mike
[[alternative HTML version deleted]]