Skip to content

[Bioc-devel] plotPCA for BiocGenerics

8 messages · Michael Lawrence, Thomas Lin Pedersen, Steve Lianoglou +2 more

#
While I tend to agree with you that PCA is too big an operation to be hidden within a plotting function (MDS is an edge-case I would say), I can?t see how we can ever reach a point where there is only one generic plot function. In the case of PCA there is a number of different plot-types that can all lay claim to the plot function of a PCA class, for instance scoreplot, scatterplot matrix of all scores, biplot, screeplot, accumulated R^2 barplot, leverage vs. distance-to-model? (you get the idea). So while having some very well-thought out classes for very common result types such as PCA, this class would still need a lot of different plot methods such as plotScores, plotScree etc (or plot(?, type=?score?), but I don?t find that very appealing). Expanding beyond PCA only muddles the water even more - there are very few interesting data structures that only have one visual representation to-rule-them-all?

just my 2c

best
Thomas
10 days later
#
I strongly agree with Kevin's position. plotPCA() represents two separate
concerns in its very name: the computation and the rendering. Those need to
be separated, at least behind the scenes. The syntax of plot(pca(x)) is
preferable to plotPCA, because the structure of the operation is
represented by in the expression itself, not just in a non-computable
function name.

With regard to how a plot,PCA should behave: there is always a tension
between high-level and low-level APIs. In the end, we need multiple levels
of abstraction.  While high-level APIs sacrifice flexibility, we need them
because they communicate the high-level *intent* of the user in the code
itself (self-documenting code), and they enable reusability, which not only
reduces redudant effort but also ensures consistency. Once our brains no
longer need to parse low-level code, we can focus our mental power on
correctness and efficiency. To design a high-level API, one needs to
carefully analyze user requirements, i.e., the use cases. To choose the
default behavior, one needs to rate the use cases by their prevalance, and
by how closely they match the intuition-based expectations of the user.

The fact that at least 9 packages are performing such a similar task seems
to indicate that a common abstraction is warranted, but I am not sure if
BiocGenerics is the appropriate place.

Michael

On Tue, Oct 21, 2014 at 12:54 AM, Thomas Dybdal Pedersen <
thomasp85 at gmail.com> wrote:

            

  
  
#
With regards to abstraction - I would personally much rather read and write code that contained plotScores() and plotScree() etc. where the intend of the code is clearly communicated, instead of relying on a plot() function whose result is only known from experience. Trying to squeeze every kind of visual output into the same plot generic seems artificial and constrained to me. I totally agree on the plotPCA critique on the other hand?

Thomas

  
  
#
Hi,

On Fri, Oct 31, 2014 at 2:35 PM, Thomas Lin Pedersen
<thomasp85 at gmail.com> wrote:
If we've bought a ticket to ride on Kevin's and Michael's (and whoever
else) train of thought, wouldn't plot(pca(x), type='scree') or
plot(pca(x), type='scores') be the preferred way to go ... for some
definition of "preferable"?

-steve

  
    
#
I'd just like to chime in that regardless of what approach is chosen, I 
definitely would appreciate a way to get the plot data without actually 
making the plot. I often end up reimplementing plots in ggplot so that 
I can easily customize some aspect of them, so in such cases I need a 
way to just get the plot data/coordinates.

For example, if I have an edgeR DGEList and I want to get the X and Y 
coordinates for the MDS plot, I need to do something like:

dev.new()
mds.coords <- plotMDS(dge)
dev.off()

which is kind of unfortunate.

So I guess this is more a reminder to people implementing plots to also 
implement a way to get the plot data.

-Ryan
On Fri 31 Oct 2014 03:43:04 PM PDT, Steve Lianoglou wrote:
#
I could personally live with that though it requires more verbose typing for the user. But I do miss a sensible explanation for why everything that ends up on a graphic device should go through one plot generic?

To get back to the plotPCA example that started it all. At least some of us agree that this is two operations that should be separated. What is not apparent is how we should go about putting up a framework for this. The easy route could be to rely on a package such as pcaMethods that already have a defined pca class and a lot of plotting methods, but this only solves it in the case of PCA analyses - I could imagine that PCA is not the only example out there...

Thomas

  
  
#
A drag, since hist(foo, plot=FALSE) acts as expected. I emulate this behavior for most of my code nowadays, not least since I often want to return something useful, whether plotted or not. 

--t
#
Sure, the ggplot model (returning an abstract representation of a plot, and
then rendering it when requested, i.e., printed) is preferable to the side
effects of base graphics. Unfortunately, plot() implies the side effect,
which motivated the introduction of autoplot() in ggbio, and in fact we
used Steve's type= parameter idea in many of the autoplot methods. While I
agree that plotScree() could be preferable to plot(type="scree"), it's
still beneficial to have the abstraction, if only for convenience and to
support generic code. Btw, a (S3) pca object already exists: see ?princomp.

Michael

On Fri, Oct 31, 2014 at 3:53 PM, Ryan C. Thompson <rct at thompsonclan.org>
wrote: