Skip to content
Prev 11263 / 21318 Next

[Bioc-devel] transitioning scater/scran to SingleCellExperiment

Yes, this would be quite interesting. As I mentioned, scater has some 
support for HDF5 serialization, so that's one place to start.
I guess this would be a question for the SummarizedExperiment 
developers, though personally, I never liked ExpressionSet's inclination 
to slap names on everything.
It probably wouldn't be a good idea to store distances as expression 
matrices. However, if there is a need for it, we can add a new slot for 
distance matrices. I think SC3 has a similar requirement, so perhaps 
this would be more generally useful than I first thought. You can post 
an issue on the github repository to remind Davide or me to do it.
I have thought about putting in a set of recommended assay names, along 
with various methods for them:

- counts: counts, duh
- norm_counts: "normalized" values on the same scale as the counts
- log_counts: log-normalized counts (plus pseudo-count).
- cpm, tpm, fpkm: what it says

The idea is to encourage developers to store assay entries that will 
have a reasonably consistent interpretation across packages. For this 
reason, I'm not putting in "exprs", which could mean anything really.
Not everything needs to be a SCE object. In fact, I would argue that it 
doesn't really make sense for the DiffusionMap() function to return a 
SingleCellExperiment object, as this would seem to conceptually limit 
the DiffusionMap() function to single-cell data. (By comparison, it does 
make sense to accept a SCE class - amongst others - as input, given that 
destiny is often used for this type of data.)

 From a user perspective, if the DiffusionMap() function vomits out a 
lot of metadata fields, that might not be desirable if only the final 
diffusion coordinates are of interest. In such cases, I would find it 
easier to just extract the coordinates and store it in reducedDim<- 
manually. Whether this is done from a DiffusionMap or 
SingleCellExperiment output makes little difference to me.

Finally, I'm not sure what advantages those ergonomics provide. Indeed, 
if every package defines its own plot() S4 method for 
SingleCellExperiment, they will clobber each other in the dispatch 
table, resulting in some interesting results dependent on package 
loading order. If you have destiny-specific data and methods, best to 
keep them separate rather than stuffing them into the SCE object.

Our vision for the SCE class is to coordinate inputs into many packages 
across a long, long workflow. A little detour into destiny's classes for 
a small portion of the workflow doesn't pose much trouble, as long as 
any relevant statistics can be extracted and stored in the SCE object 
when it moves to the next stage of the workflow.

-Aaron