Dear Dr. Zheng & SeqArray maintainer,
I have a Bioconductor package called "GDSArray" that interfaces GDS file nodes as DelayedArray instances. In this new Bioc devel version of 3.11, this package failed all platforms. The debugging shows inconsistent dimensions calculated from different SeqArray / gdsfmt functions. Following is some reproducible code showing that the "annotation/info/AA" node has different dimension from "AC" and the overall "num.variant" calculated from "SeqSummary". It works fine in the Bioc 3.10 (dimension of AA is 1348).Thanks!
Best,
Qian
```{r}
library(SeqArray)
file <- seqExampleFileName("gds")
f <- seqOpen(file)
objdesp.gdsn(index.gdsn(f, "annotation/info/AA"))$dim
## [1] 1328
objdesp.gdsn(index.gdsn(f, "annotation/info/AC"))$dim
## [1] 1348
seqSummary(f, verbose=FALSE)$num.variant
## [1] 1348
seqClose(f)
!> sessionInfo()
R Under development (unstable) (2020-01-07 r77631)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /home/qian/miniconda3/envs/r-devel/lib/R/lib/libRblas.so
LAPACK: /home/qian/miniconda3/envs/r-devel/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SeqArray_1.27.8 gdsfmt_1.23.5
loaded via a namespace (and not attached):
[1] IRanges_2.21.2 Biostrings_2.55.4 crayon_1.3.4
[4] bitops_1.0-6 GenomeInfoDb_1.23.1 stats4_4.0.0
[7] zlibbioc_1.33.1 XVector_0.27.0 S4Vectors_0.25.11
[10] tools_4.0.0 RCurl_1.95-4.12 parallel_4.0.0
[13] compiler_4.0.0 BiocGenerics_0.33.0 GenomicRanges_1.39.1
[16] GenomeInfoDbData_1.2.2
```
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
[Bioc-devel] gds nodes dimensions are inconsistent
3 messages · Liu, Qian, Xiuwen Zheng
Hi Qian, I have modified the GDS file in the SeqArray package. "annotation/info/AA" should have fewer values than "annotation/info/AC", since it is a variable-length vector in the new GDS file. VCF format allows storing variable-length data, so SeqArray also allows variable-length data. seqGetData() has been updated in SeqArray_1.27.8, with a new option '.padNA' for padding array with NA if possible. Please revise your package according to the new function in SeqArray. Best wishes, Xiuwen
On Tue, Feb 11, 2020 at 12:45 PM Liu, Qian <Qian.Liu at roswellpark.org> wrote:
Dear Dr. Zheng & SeqArray maintainer,
I have a Bioconductor package called "GDSArray" that interfaces GDS file
nodes as DelayedArray instances. In this new Bioc devel version of 3.11,
this package failed all platforms. The debugging shows inconsistent
dimensions calculated from different SeqArray / gdsfmt functions. Following
is some reproducible code showing that the "annotation/info/AA" node has
different dimension from "AC" and the overall "num.variant" calculated from
"SeqSummary". It works fine in the Bioc 3.10 (dimension of AA is 1348).
Thanks!
Best,
Qian
```{r}
library(SeqArray)
file <- seqExampleFileName("gds")
f <- seqOpen(file)
objdesp.gdsn(index.gdsn(f, "annotation/info/AA"))$dim
## [1] 1328
objdesp.gdsn(index.gdsn(f, "annotation/info/AC"))$dim
## [1] 1348
seqSummary(f, verbose=FALSE)$num.variant
## [1] 1348
seqClose(f)
!> sessionInfo()
R Under development (unstable) (2020-01-07 r77631)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /home/qian/miniconda3/envs/r-devel/lib/R/lib/libRblas.so
LAPACK: /home/qian/miniconda3/envs/r-devel/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SeqArray_1.27.8 gdsfmt_1.23.5
loaded via a namespace (and not attached):
[1] IRanges_2.21.2 Biostrings_2.55.4 crayon_1.3.4
[4] bitops_1.0-6 GenomeInfoDb_1.23.1 stats4_4.0.0
[7] zlibbioc_1.33.1 XVector_0.27.0 S4Vectors_0.25.11
[10] tools_4.0.0 RCurl_1.95-4.12 parallel_4.0.0
[13] compiler_4.0.0 BiocGenerics_0.33.0 GenomicRanges_1.39.1
[16] GenomeInfoDbData_1.2.2
```
This email message may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), or the employee or
agent responsible for the delivery of this message to the intended
recipient(s), you are hereby notified that any disclosure, copying,
distribution, or use of this email message is prohibited. If you have
received this message in error, please notify the sender immediately by
e-mail and delete this email message from your computer. Thank you.
1 day later
Hi Xiuwen, Thank you very much for the prompt response. I'll look at the new functions and make necessary changes in my packages. Best, Qian
From: Xiuwen Zheng <zhengx at u.washington.edu>
Sent: Tuesday, February 11, 2020 3:58 PM
To: Liu, Qian <Qian.Liu at RoswellPark.org>
Cc: bioc-devel at r-project.org <bioc-devel at r-project.org>
Subject: Re: gds nodes dimensions are inconsistent
Sent: Tuesday, February 11, 2020 3:58 PM
To: Liu, Qian <Qian.Liu at RoswellPark.org>
Cc: bioc-devel at r-project.org <bioc-devel at r-project.org>
Subject: Re: gds nodes dimensions are inconsistent
Hi Qian,
I have modified the GDS file in the SeqArray package.
"annotation/info/AA" should have fewer values than "annotation/info/AC", since it is a variable-length vector in the new GDS file.
VCF format allows storing variable-length data, so SeqArray also allows variable-length data.
seqGetData() has been updated in SeqArray_1.27.8, with a new option '.padNA' for padding array with NA if possible.
Please revise your package according to the new function in SeqArray.
Best wishes,
Xiuwen
On Tue, Feb 11, 2020 at 12:45 PM Liu, Qian <Qian.Liu at roswellpark.org<mailto:Qian.Liu at roswellpark.org>> wrote:
Dear Dr. Zheng & SeqArray maintainer,
I have a Bioconductor package called "GDSArray" that interfaces GDS file nodes as DelayedArray instances. In this new Bioc devel version of 3.11, this package failed all platforms. The debugging shows inconsistent dimensions calculated from different SeqArray / gdsfmt functions. Following is some reproducible code showing that the "annotation/info/AA" node has different dimension from "AC" and the overall "num.variant" calculated from "SeqSummary". It works fine in the Bioc 3.10 (dimension of AA is 1348).Thanks!
Best,
Qian
```{r}
library(SeqArray)
file <- seqExampleFileName("gds")
f <- seqOpen(file)
objdesp.gdsn(index.gdsn(f, "annotation/info/AA"))$dim
## [1] 1328
objdesp.gdsn(index.gdsn(f, "annotation/info/AC"))$dim
## [1] 1348
seqSummary(f, verbose=FALSE)$num.variant
## [1] 1348
seqClose(f)
!> sessionInfo()
R Under development (unstable) (2020-01-07 r77631)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /home/qian/miniconda3/envs/r-devel/lib/R/lib/libRblas.so
LAPACK: /home/qian/miniconda3/envs/r-devel/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SeqArray_1.27.8 gdsfmt_1.23.5
loaded via a namespace (and not attached):
[1] IRanges_2.21.2 Biostrings_2.55.4 crayon_1.3.4
[4] bitops_1.0-6 GenomeInfoDb_1.23.1 stats4_4.0.0
[7] zlibbioc_1.33.1 XVector_0.27.0 S4Vectors_0.25.11
[10] tools_4.0.0 RCurl_1.95-4.12 parallel_4.0.0
[13] compiler_4.0.0 BiocGenerics_0.33.0 GenomicRanges_1.39.1
[16] GenomeInfoDbData_1.2.2
```
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.