[Bioc-devel] transitioning scater/scran to SingleCellExperiment
I guess this would be a question for the
SummarizedExperiment developers, though personally, I never liked
ExpressionSet's inclination to slap names on everything.
Too bad we?re bound to SummarizedExperiment?s ?rows? and ?cols?. Since
they always refer to features and samples, respectively: Why not name
them that?
There?s already too many APIs in too many programming languages that
confusingly have one or the other convention ? if whe know which is
which, why not name them after that knowledge?
*shrug* + *meh*. As I said, I'm the wrong person to complain to about this. Though I don't have particularly strong feelings either way.
It probably wouldn't be a good idea to store distances as expression
matrices. However, if there is a need for it, we can add a new slot
for distance matrices. I think SC3 has a similar requirement, so
perhaps this would be more generally useful than I first thought.
You can post an issue on the github repository to remind Davide or
me to do it.
Distance matrices (cell?cell) can?t only come from cell?gene matrices.
You can e.g. use dynamic time warping to create them from cell?gene?time
arrays.
I don't think there's direct support for >2-dimensional arrays in SE objects. You might be able to put them in, but I don't know how well it will interact with the subsetting machinery. One solution is to split it up by the third dimension and store each matrix as a separate assay. In any case, a distance matrix calculated from such an array would be fine, as long as the dimensions are equal to the number of cells. The question is whether it is needed by enough packages to warrant a slot in the base SCE class; I will discuss this with Davide and Vlad.
Finally, I'm not sure what advantages those ergonomics provide.
Indeed, if every package defines its own plot() S4 method for
SingleCellExperiment, they will clobber each other in the dispatch
table, resulting in some interesting results dependent on package
loading order. If you have destiny-specific data and methods, best
to keep them separate rather than stuffing them into the SCE object.
I wrote that I could e.g. create a plot_dm method, which plots a
diffusion map stored in a SCE.
Also I didn?t mean the plot method with ergonomics. I meant |fortify|,
|names|, |$|, and |[[|. Those would be very useful, as you could just do
things like the following, and have autocompletion:
sce$Predicate1 <- sce$SampleMeta1 > 40# `$` accesses counts (by gene)
and rowData. `$<-` sets rowData
qplot(Gene1, Gene2, colour = Predicate1, data = sce) # fortify creates a
data.frame containing cbind(t(counts), rowData)
The SingleCellExperiment package makes no statement on whether downstream users/packages want to (or not) use the tidy-verse or ggplot2. It simply provides the minimal class and methods; convenience wrappers are left to the discretion of each package developer. scater, for example, implements a few dplyr verbs for SCE objects. Cheers, Aaron