Skip to content

[Bioc-devel] Long-form documentation for DelayedArray?

6 messages · Hervé Pagès, Peter Hickey, Gmail

#
Hi all,

packages submitted to Bioconductor are required to include at least
one vignette. However, it seems that this rule does not hold for some
core packages, such as HDF5Array and DelayedArray. Is there any
special reason for this?

In particular, I'd like to read more about how to create a backend for
DelayedArray. Is there any documentation available beyond the
reference manual?

Thank you very much,
Francesco.
#
Hi Francesco,
On 10/29/2017 10:10 AM, Francesco Napolitano wrote:
Infrastructure packages are not strictly required to have a vignette.
However that doesn't mean they shouldn't have one ;-)
I'm guilty. I plan to remedy this ASAP. In the mean time I'll be glad
to help. Note that other people are already working (or planning to
work) on other backends:

Backend for remote HDF5 data:

   https://github.com/vjcitn/RemoteArray

   See issues #1, #2, #3 for some discussion about this.

Backend for GDS files:

   https://github.com/Bioconductor/VariantExperiment/issues/1

Cheers,
H.

  
    
#
FYI I also began a project to support an additional backend;
https://github.com/PeteHaitch/matterArray. It's incomplete and may not work
with the current version of DelayedArray (it's ~3 months old and I was
naughtily using some internal functions of DelayedArray). I hope to return
to this soon and I have plans for 1-2 other backends, so some additional
documentation would also be appreciated by me :)
On Mon, 30 Oct 2017 at 08:01 Herv? Pag?s <hpages at fredhutch.org> wrote:

            

  
  
#
Il 29/10/2017 22:45, Herv? Pag?s ha scritto:

In particular, I'd like to read more about how to create a backend for
DelayedArray. Is there any documentation available beyond the
reference manual?


I'm guilty. I plan to remedy this ASAP. In the mean time I'll be glad
to help. Note that other people are already working (or planning to
work) on other backends:

Backend for remote HDF5 data:

  https://github.com/vjcitn/RemoteArray

  See issues #1, #2, #3 for some discussion about this.

Backend for GDS files:

  https://github.com/Bioconductor/VariantExperiment/issues/1


Thank you, Herv?. Maybe I could use some help than! Discussions in issue #2
seems useful for my case. I have a specific question: when I create a
DelayedArray with my backend, it seems that subset_seed_as_array() is
called, which is not really intuitive. Is this normal? What is this first
call supposed to do? In my case it fails with dimensionality problems (the
array is empty).

I did look into HDF5Array code, but I'm not sure I understand what the
following is doing:

.subset_HDF5ArraySeed_as_array <- function(seed, index)
{
    ans_dim <- DelayedArray:::get_Nindex_lengths(index, dim(seed))
    if (any(ans_dim == 0L)) {
        ans <- seed at first_val[0]
        dim(ans) <- ans_dim
    } else {
        ans <- h5read2(seed at file, seed at name, index)
    }
    ans
}

In particular, I'm not sure how to interpret the index variable, which
seems to be a list. Each element "i" is a vector of indices for the i-th
dimension? And what does seed at first_val[0] do? And the first_val attribute
in general.

Thank you very much for your help,
francesco
#
Oh, I forgot an important point. Does the seed class need to contain
"Array"? What about "array"? I just remembered that I changed it to
"array" because I have no "Array" in my namespace.

On Mon, Oct 30, 2017 at 11:24 AM, Francesco Napolitano
<franapoli at gmail.com> wrote:
#
Thank you, Peter, I will have a look!
On Sun, Oct 29, 2017 at 11:08 PM, Peter Hickey <peter.hickey at gmail.com> wrote: