Skip to content

[Bioc-devel] gds nodes dimensions are inconsistent

3 messages · Liu, Qian, Xiuwen Zheng

#
Dear Dr. Zheng & SeqArray maintainer,

I have a Bioconductor package called "GDSArray" that interfaces GDS file nodes as DelayedArray instances. In this new Bioc devel version of 3.11, this package failed all platforms. The debugging shows inconsistent dimensions calculated from different SeqArray / gdsfmt functions. Following is some reproducible code showing that the "annotation/info/AA" node has different dimension from "AC" and the overall "num.variant" calculated from "SeqSummary". It works fine in the Bioc 3.10 (dimension of AA is 1348).Thanks!

Best,
Qian


```{r}
library(SeqArray)
file <- seqExampleFileName("gds")
f <- seqOpen(file)
objdesp.gdsn(index.gdsn(f, "annotation/info/AA"))$dim
## [1] 1328
objdesp.gdsn(index.gdsn(f, "annotation/info/AC"))$dim
## [1] 1348
seqSummary(f, verbose=FALSE)$num.variant
## [1] 1348
seqClose(f)

!> sessionInfo()
 R Under development (unstable) (2020-01-07 r77631)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 18.04.3 LTS

 Matrix products: default
 BLAS:   /home/qian/miniconda3/envs/r-devel/lib/R/lib/libRblas.so
 LAPACK: /home/qian/miniconda3/envs/r-devel/lib/R/lib/libRlapack.so

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] SeqArray_1.27.8 gdsfmt_1.23.5

 loaded via a namespace (and not attached):
  [1] IRanges_2.21.2         Biostrings_2.55.4      crayon_1.3.4
  [4] bitops_1.0-6           GenomeInfoDb_1.23.1    stats4_4.0.0
  [7] zlibbioc_1.33.1        XVector_0.27.0         S4Vectors_0.25.11
 [10] tools_4.0.0            RCurl_1.95-4.12        parallel_4.0.0
 [13] compiler_4.0.0         BiocGenerics_0.33.0    GenomicRanges_1.39.1
 [16] GenomeInfoDbData_1.2.2
```





This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
Hi Qian,

I have modified the GDS file in the SeqArray package.
"annotation/info/AA" should have fewer values than "annotation/info/AC",
since it is a variable-length vector in the new GDS file.
VCF format allows storing variable-length data, so SeqArray also allows
variable-length data.

seqGetData() has been updated in SeqArray_1.27.8, with a new option
'.padNA' for padding array with NA if possible.
Please revise your package according to the new function in SeqArray.

Best wishes,

Xiuwen
On Tue, Feb 11, 2020 at 12:45 PM Liu, Qian <Qian.Liu at roswellpark.org> wrote:

            

  
  
1 day later
#
Hi Xiuwen,

Thank you very much for the prompt response. I'll look at the new functions and make necessary changes in my packages.

Best,
Qian