Skip to content

[Bioc-devel] Reading and storing single cell ATAC data

1 message · Hervé Pagès

#
Hi,

Yes, I would also encourage you to explore the SummarizedExperiment
route rather than ExpressionSet one. To be more precise,
RangedSummarizedExperiment is probably what you need: you can put
a GRanges object along the 1st dimension to describe your peaks and
a DataFrame object along the 2nd dimension to describe your samples.

For the matrix-like object that you use to store the assay data, it
can be anything (e.g. sparse matrix, on-disk matrix, etc...) as long
as it supports dim, dimnames, and 2D-style subsetting. That's the
bare minimum I think. It can support more (e.g. cbind, rbind), in
which case you'll be able to cbind and/or rbind 2 SummarizedExperiment
objects together. We've tried to keep the requirements for what can
be used to store the assay data as minimalist as possible.

One matrix-like container that you might want to consider is
DelayedArray defined in the HDF5Array package. Right now it uses
hdf5 for on-disk storage but other backends could be implemented
(something I'm planning to work on after the upcoming release).
You can stick a huge DelayedArray object that wouldn't fit in memory
in a SummarizedExperiment and manipulate it *almost* like you would
do with a regular SummarizedExperiment object. See ?HDF5Array for
an example of such an hdf5-backed SummarizedExperiment.

Hope this helps,
H.
On 09/23/2016 03:46 PM, Andrew McDavid wrote: