[Bioc-devel] plotPCA for BiocGenerics

Just to bring the discussion back to the fact that there is a need to do
/something/. A function plotPCA is defined in packages EDASeq, DESeq2,
DESeq, affycoretools, Rcade, facopy, CopyNumber450k, netresponse, MAIT,
with a real potential for needless user confusion. And BiocGenerics already
defines the generics plotMA and plotDispEsts.

The need for BiocGenerics in the first place is a consequence of the S4 /
Dylan / Common LISP object system and the fact that our project releases
more than one package. We should not confuse that with the other issues
that came up in the thread.

To what extent functions that do related things should have the same name
seems a matter of taste. Reducing the number of function names that are
around, but increasing the number of classes, seems pretty much a null-sum
game to me. <irony> We could have a ?compute? generic, for all functions
that compute something? Might make things easier for some users. Until some
authors start using its argument ?what? to say what it should compute if
it?s not already clear from the class of its argument(s). </irony>

I think there are real benefits to having a general "plot" abstraction. For
example, a reporting framework or GUI could use it to render a graphical
representation of an object. That doesn't preclude specific functions for
particular plot variants. It would just be nice to have a default
visualization of an object, in the same way we can call print to produce a
textual representation at the console. They're complementary.
I second Mike?s suggestion & Kasper?s points.

Best wishes
        Wolfgang

On 1 Nov 2014, at 19:46, Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:
I see the argument for separating plotting and computation.

I don't see the argument for changing plotPCA to plot.  base R has things
that work either way; we all know hist(), boxplot() etc etc.  And for
this
specific case there are (good) arguments for the fact that one could
envision several plots on a PCA object.

But while I see the argument, by having a common class which all packages
should use, it becomes pretty hard to have package specific customization
(colors, phenodata etc etc), or it will at least require some thinking.

Best,
Kasper

On Sat, Nov 1, 2014 at 2:21 PM, Michael Love <
michaelisaiahlove at gmail.com>
wrote:

On Nov 1, 2014 1:29 PM, "Michael Love" <michaelisaiahlove at gmail.com>
wrote:
As far as the proposal of using the plot() function for all plots, I
think for the biologists who are struggling already to get R going,
and to figure out what kinds of plots are possible, plotMA (and
knowing that the help is available at ?plotMA) is just so much simpler
than the alternative (isn't it ?"plot,MA-method" for S4?).

Scratch that... I forgot that finding help has to be ugly either way.

On Fri, Oct 31, 2014 at 9:10 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
Sure, the ggplot model (returning an abstract representation of a
plot,
and
then rendering it when requested, i.e., printed) is preferable to the
side
effects of base graphics. Unfortunately, plot() implies the side
effect,
which motivated the introduction of autoplot() in ggbio, and in fact
we
used Steve's type= parameter idea in many of the autoplot methods.
While I
agree that plotScree() could be preferable to plot(type="scree"), it's
still beneficial to have the abstraction, if only for convenience and
to
support generic code. Btw, a (S3) pca object already exists: see
?princomp.
Michael

On Fri, Oct 31, 2014 at 3:53 PM, Ryan C. Thompson <
rct at thompsonclan.org>
wrote:

I'd just like to chime in that regardless of what approach is chosen,
I
definitely would appreciate a way to get the plot data without
actually
making the plot. I often end up reimplementing plots in ggplot so
that
I
can easily customize some aspect of them, so in such cases I need a
way to
just get the plot data/coordinates.

For example, if I have an edgeR DGEList and I want to get the X and Y
coordinates for the MDS plot, I need to do something like:

dev.new()
mds.coords <- plotMDS(dge)
dev.off()

which is kind of unfortunate.

So I guess this is more a reminder to people implementing plots to
also
implement a way to get the plot data.

-Ryan

On Fri 31 Oct 2014 03:43:04 PM PDT, Steve Lianoglou wrote:

Hi,

On Fri, Oct 31, 2014 at 2:35 PM, Thomas Lin Pedersen
<thomasp85 at gmail.com> wrote:

With regards to abstraction - I would personally much rather read
and
write code that contained plotScores() and plotScree() etc. where
the
intend of the code is clearly communicated, instead of relying on a
plot()
function whose result is only known from experience. Trying to
squeeze
every kind of visual output into the same plot generic seems
artificial and
constrained to me. I totally agree on the plotPCA critique on the
other
hand...

If we've bought a ticket to ride on Kevin's and Michael's (and
whoever
else) train of thought, wouldn't plot(pca(x), type='scree') or
plot(pca(x), type='scores') be the preferred way to go ... for some
definition of "preferable"?

-steve

Thomas

On 31 Oct 2014, at 22:09, Michael Lawrence <
lawrence.michael at gene.com>
wrote:

I strongly agree with Kevin's position. plotPCA() represents two
separate concerns in its very name: the computation and the
rendering.
Those need to be separated, at least behind the scenes. The syntax
of
plot(pca(x)) is preferable to plotPCA, because the structure of
the
operation is represented by in the expression itself, not just in
a
non-computable function name.

With regard to how a plot,PCA should behave: there is always a
tension
between high-level and low-level APIs. In the end, we need
multiple
levels
of abstraction.  While high-level APIs sacrifice flexibility, we
need them
because they communicate the high-level *intent* of the user in
the
code
itself (self-documenting code), and they enable reusability, which
not only
reduces redudant effort but also ensures consistency. Once our
brains no
longer need to parse low-level code, we can focus our mental power
on
correctness and efficiency. To design a high-level API, one needs
to
carefully analyze user requirements, i.e., the use cases. To
choose
the
default behavior, one needs to rate the use cases by their
prevalance, and
by how closely they match the intuition-based expectations of the
user.
The fact that at least 9 packages are performing such a similar
task
seems to indicate that a common abstraction is warranted, but I am
not sure
if BiocGenerics is the appropriate place.

Michael

On Tue, Oct 21, 2014 at 12:54 AM, Thomas Dybdal Pedersen <
thomasp85 at gmail.com <mailto:thomasp85 at gmail.com>> wrote:
While I tend to agree with you that PCA is too big an operation to
be
hidden within a plotting function (MDS is an edge-case I would
say), I
can't see how we can ever reach a point where there is only one
generic
plot function. In the case of PCA there is a number of different
plot-types
that can all lay claim to the plot function of a PCA class, for
instance
scoreplot, scatterplot matrix of all scores, biplot, screeplot,
accumulated
R^2 barplot, leverage vs. distance-to-model... (you get the idea).
So while
having some very well-thought out classes for very common result
types such
as PCA, this class would still need a lot of different plot
methods
such as
plotScores, plotScree etc (or plot(..., type='score'), but I don't
find
that very appealing). Expanding beyond PCA only muddles the water
even more
- there are very few interesting data structures that only have
one
visual
representation to-rule-them-all...

just my 2c

best
Thomas

Date: Mon, 20 Oct 2014 18:50:48 -0400
From: Kevin Coombes <kevin.r.coombes at gmail.com <mailto:
kevin.r.coombes at gmail.com>>

Well. I have two responses to that.

First, I think it would be a lot better/easier for users if
(most)
developers could make use of the same plot function for "basic"
classes
like PCA.

Second, if you think the basic PCA plotting routine needs
enhancements,
you still have two options.  On the one hand, you could (as you
said)
try to convince the maintainer of PCA to add what you want.  If
it's
generally valuable, then he'd probably do it --- and other
classes
that
use it would benefit.  On the other hand, if it really is a
special
enhancement that only makes sense for your class, then you can
derive a
class from the basic PCA class
    setClass("mySpecialPCA", contains=c("PCA"), *other stuff
here*)
 and implement your own version of the "plot" generic for this
class.
And you could tweak the "as.PCA" function so it returns an object
of
the
mySpecialPCA class. And the user could still just "plot" the
result
without hacving to care what's happening behind the scenes.

On 10/20/2014 5:59 PM, Michael Love wrote:

Ah, I see now. Personally, I don't think Bioconductor developers
should have to agree on single plotting functions for basic
classes
like 'PCA' (because this logic applies equally to the situation
of all
Bioconductor developers agreeing on single MA-plot, a single
variance-mean plot, etc). I think letting developers define
their
plotPCA makes contributions easier (I don't have to ask the
owner
of
plot.PCA to incorporate something), even though it means we have
a
growing list of generics.

Still you have a good point about splitting computation and
plotting.
In practice, we subset the rows so PCA is not laborious.

On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes
<kevin.r.coombes at gmail.com <mailto:kevin.r.coombes at gmail.com>
<mailto:kevin.r.coombes at gmail.com <mailto:
kevin.r.coombes at gmail.com>>>
wrote:

   Hi,

   I don't see how it needs more functions (as long as you can
get
   developers to agree).  Suppose that someone can define a
reusable
   PCA class.  This will contain a single "plot" generic
function,
   defined once and reused by other classes. The existing
"plotPCA"
   interface can also be implemented just once, in this class,
as
       plotPCA <- function(object, ...) plot(as.PCA(object),
...)
   This can be exposed to users of your class through
namespaces.
   Then the only thing a developer needs to implement in his own
   class is the single "as.PCA" function.  And he/she would have
   already been rquired to implement this as part of the old
   "plotPCA" function.  So it can be extracted from that, and
the
   developer doesn't have to reimplement the visualization code
from
   the PCA class.

   Best,
     Kevin

   On 10/20/2014 5:15 PM, davide risso wrote:

   Hi Kevin,

   I see your points and I agree (especially for the specific
case
   of plotPCA that involves some non trivial computations).

   On the other hand, having a wrapper function that starting
from
   the "raw" data gives you a pretty picture (with virtually
zero
   effort by the user) using a sensible choice of parameters
that
   are more or less OK for RNA-seq data is useful for
practitioners
   that just want to look for patterns in the data.

   I guess it would be the same to have a PCA method for each
of the
   objects and then using the plot method on those new objects,
but
   that would just create a lot more objects and functions than
the
   current approach (like Mike was saying).

   Your "as.pca" or "performPCA" approach would be definitely
better
   if all the different methods would create objects of the
*same*
   PCA class, but since we are talking about different
packages, I
   don't know how easy it would be to coordinate. But perhaps
this
   is the way we should go.

   Best,
   davide

   On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes
   <kevin.r.coombes at gmail.com <mailto:
kevin.r.coombes at gmail.com>
<mailto:kevin.r.coombes at gmail.com <mailto:
kevin.r.coombes at gmail.com>>>
wrote:

       Hi,

       It depends.

       The "traditional" R approach to these matters is that
you (a)
       first perform some sort of an analysis and save the
results
       as an object and then (b) show or plot what you got.  It
is
       part (b) that tends to be really generic, and (in my
opinion)
       should have really generic names -- like "show" or
"plot" or
       "hist" or "image".

       With PCA in particular, you usually have to perform a
bunch
       of computations in order to get the principal components
from
       some part of the data.  As I understand it now, these
       computations are performed along the way as part of the
       various "plotPCA" functions.  The "R way" to do this
would be
       something like
           pca <- performPCA(mySpecialObject)  # or
       as.PCA(mySpecialObject)
           plot(pca) # to get the scatter plot
       This apporach has the user-friendly advantage that you
can
       tweak the plot (in terms of colors, symbols, ranges,
titles,
       etc) without having to recompute the principal
components
       every time. (I often find myself re-plotting the same
PCA
       several times, with different colors or symbols for
different
       factrors associated with the samples.) In addition, you
could
       then also do something like
           screeplot(pca)
       to get a plot of the percentages of variance explained.

       My own feeling is that if the object doesn't know what
to do
       when you tell it to "plot" itself, then you haven't got
the
       right abstraction.

       You may still end up needing generics for each kind of
       computation you want to perform (PCA, RLE, MA, etc),
which is
       why I suggested an "as.PCA" function.  After all, "as"
is
       already pretty generic.  In the long run, l this would
herlp
       BioConductor developers, since they wouldn't all have to
       reimplement the visualization code; they would just have
to
       figure out how to convert their own object into a PCA or
RLE
       or MA object.

       And I know that this "plotWhatever" approach is used
       elsewhere in BioConductor, and it has always bothered
me. It
       just seemed that a post suggesting a new generic
function
       provided a reasonable opportunity to point out that
there
       might be a better way.

       Best,
         Kevin

       PS: My own "ClassDicsovery" package, which is available
from
       RForge via
       **|install.packages("ClassDiscovery",
       repos="http://R-Forge.R-project.org <
http://r-forge.r-project.org/>"
       <http://R-Forge.R-project.org <
http://r-forge.r-project.org/
)|**
       includes a "SamplePCA" class that does something roughly
       similar to this for microarrays.

       PPS (off-topic): The worst offender in base R -- because
it
       doesn't use this "typical" approch -- is the "heatmap"
       function.  Having tried to teach this function in
several
       different classes, I have come to the conclusion that it
is
       basically unusable by mortals.  And I think the problem
is
       that it tries to combine too many steps -- clustering
rows,
       clustering columns, scaling, visualization -- all in a
single
       fiunction

       On 10/20/2014 3:47 PM, davide risso wrote:

       Hi Kevin,

       I don't agree. In the case of EDASeq (as I suppose it
is the
       case for DESeq/DESeq2) plotting the principal
components of
       the count matrix is only one of possible exploratory
plots
       (RLE plots, MA plots, etc.).
       So, in my opinion, it makes more sense from an object
       oriented point of view to have multiple plotting
methods for
       a single "RNA-seq experiment" object.

       In addition, this is the same strategy adopted
elsewhere in
       Bioconductor, e.g., for the plotMA method.

       Just my two cents.

       Best,
       davide

       On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes
       <kevin.r.coombes at gmail.com <mailto:
kevin.r.coombes at gmail
.
com>
       <mailto:kevin.r.coombes at gmail.com <mailto:
kevin.r.coombes at gmail.com>>> wrote:

           I understand that breaking code is a problem, and
that
           is admittedly the main reason not to immediately
adopt
           my suggestion.

           But as a purely logical exercise, creating a "PCA"
           object X or something similar and using either
               plot(X)
           or
           plot(as.PCA(mySpecialObject))
           is a much more sensible use of object-oriented
           programming/design. This requires no new generics
(to
           write or to learn).

           And you could use it to transition away from the
current
           system by convincing the various package
maintainers to
           re-implement plotPCA as follows:

           plotPCA <- function(object, ...) {
             plot(as.PCA(object), ...)
           }

           This would be relatively easy to eventually
deprecate
           and teach users to switch to the alternative.

           On 10/20/2014 1:07 PM, Michael Love wrote:

           hi Kevin,

           that would imply there is only one way to plot an
           object of a given class. Additionally, it would
break a
           lot of code.?

           best,

           Mike

           On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes
           <kevin.r.coombes at gmail.com <mailto:
kevin.r.coombes at gmail.com>
           <mailto:kevin.r.coombes at gmail.com <mailto:
kevin.r.coombes at gmail.com>>> wrote:

               But shouldn't they all really just be named
"plot"
               for the appropriate objects?  In which case,
there
               would already be a perfectly good generic....

               On Oct 20, 2014 10:27 AM, "Michael Love"
               <michaelisaiahlove at gmail.com <mailto:
michaelisaiahlove at gmail.com>
               <mailto:michaelisaiahlove at gmail.com <mailto:
michaelisaiahlove at gmail.com>>> wrote:

                   I noticed that 'plotPCA' functions are
defined
                   in EDASeq, DESeq2, DESeq,
                   affycoretools, Rcade, facopy,
CopyNumber450k,
                   netresponse, MAIT (maybe
                   more).

                   Sounds like a case for BiocGenerics.

                   best,

                   Mike

                   [[alternative HTML version deleted]]

                   ______________________________
_________________
                   Bioc-devel at r-project.org <mailto:
Bioc-devel at r-project.org>
                   <mailto:Bioc-devel at r-project.org <mailto:
Bioc-devel at r-project.org>> mailing list
                   https://stat.ethz.ch/mailman/
listinfo/bioc-devel <https://stat.ethz.ch/mailman/
listinfo/bioc-devel>

           ------------------------------
------------------------------------------
           <http://www.avast.com/ <http://www.avast.com/>>

           This email is free from viruses and malware because
           avast! Antivirus <http://www.avast.com/ <
http://www.avast.com/>> protection is
           active.

       --
       Davide Risso, PhD
       Post Doctoral Scholar
       Division of Biostatistics
       School of Public Health
       University of California, Berkeley
       344 Li Ka Shing Center, #3370
       Berkeley, CA 94720-3370
       E-mail: davide.risso at berkeley.edu <mailto:
davide.risso at berkeley.edu>
       <mailto:davide.risso at berkeley.edu <mailto:
davide.risso at berkeley.edu>>

------------------------------------------------------------
------------
       <http://www.avast.com/ <http://www.avast.com/>>

       This email is free from viruses and malware because
avast!
       Antivirus <http://www.avast.com/ <http://www.avast.com/

protection is active.

   --
   Davide Risso, PhD
   Post Doctoral Scholar
   Division of Biostatistics
   School of Public Health
   University of California, Berkeley
   344 Li Ka Shing Center, #3370
   Berkeley, CA 94720-3370
   E-mail: davide.risso at berkeley.edu <mailto:
davide.risso at berkeley.
edu> <mailto:davide.risso at berkeley.edu <mailto:
davide.risso at berkeley.edu>>

   ------------------------------------------------------------
------------
   <http://www.avast.com/ <http://www.avast.com/>>

   This email is free from viruses and malware because avast!
   Antivirus <http://www.avast.com/ <http://www.avast.com/>>
protection is active.

---
This email is free from viruses and malware because avast!
Antivirus
protection is active.

      [[alternative HTML version deleted]]

------------------------------

_______________________________________________
Bioc-devel mailing list
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioc-devel <
https://stat.ethz.ch/mailman/listinfo/bioc-devel>

End of Bioc-devel Digest, Vol 127, Issue 43
*******************************************

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing
list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <
https://stat.ethz.ch/mailman/listinfo/bioc-devel>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

       [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
       [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

      [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] plotPCA for BiocGenerics

Thread (6 messages)