Skip to content
Prev 5031 / 21312 Next

[Bioc-devel] XVector: abstraction

Hi Michael,

The OnDiskXRaw virtual class (if this is what you're referring to)
is still a very early work-in-progress. The idea is to experiment
with on-disk representation of atomic vectors and direct random access
to subsequences of the vector. The exact storage mode is implemented by
concrete subclasses (currently only DirectRaw and SerializedRaw).
OnDiskXRaw is actually analog to SharedRaw except that with the latter
the "shared" sequence of bytes resides in memory.

If we had "on-disk" support for all atomic vectors, it sounds like it
would then be easy to support "on-disk" versions of higher-level
objects like IRanges or GRanges. They would be defined as their
"in-memory" counterpart except that the slots that are atomic vectors
in the "in-memory" version would just need to be replaced by "on-disk"
atomic vectors. "On-disk" versions of DNAString (and even DNAStringSet)
objects could also easily be implemented e.g. by just making the
"shared" slot an OnDiskXRaw object instead of a SharedRaw object.

Putting SharedRaw and OnDiskXRaw under the same umbrella (i.e. under
a virtual class) and using that virtual class to specify the slot of
higher-level objects like DNAString is tempting but realistically we
don't operate on an on-disk object like we do on an in-memory object.

Having an "on-disk" version of DNAString with direct random access was
in fact the initial motivation for OnDiskXRaw. The use case for this
was to support direct random access in BSgenome objects without having
to change the way the chromosomes are stored on disk (they're stored
as serialized raw vectors). I've finally implemented this feature (will
soon be pushed to BioC devel) but I changed the storage and didn't use
OnDiskXRaw in the end.

H.
On 12/05/2013 06:43 AM, Michael Lawrence wrote: