[Bioc-devel] Printing DataFrame with nested data.frame/DataFrame/DataFrameList
Hi Herv?, Thanks for addressing it so quickly, I will check it when the new version if available for biocLite(). Thanks! Jialin
On Thu, 2017-09-28 at 13:47 -0700, Herv? Pag?s wrote:
Hi Jialin, Thanks for the excellent report. These "show" methods like many others in Bioconductor, rely on low-level helper showAsCell() which was not working properly on data-frame-like or array-like objects with a single column, or on SplitDataFrameList objects. This should now be addressed. The fix is in S4Vectors 0.14.5 (release) and 0.15.10 (devel). Both should become available via biocLite() in about 24 hours. Let us know if you still see "show" problems after you update. Thanks, H. On 09/28/2017 01:19 AM, Jialin Ma wrote:
Dear all, I have a package in reviewing at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bio conductor_Contributions_issues_487&d=DwICAg&c=eRAMFD45gAfqt84VtBcfh Q&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=npFXtfKAjVRDigSzn tYatjIYfWrBU30MFNqbP6u8Njg&s=P6CWpnkqCx0GPBTlw7QD2gGs_Lc3c063in1J_F 4vvDY&e=, in which I would like to use a GRanges with nested data.frame or DataFrameList to represent the track data internally. However, the default show method does not seem to work well with such structures. I have an example for GRanges in which one meta-column is a one- column data frame: gr <- GRanges("chr21", IRanges(1:5, width = 1)) gr$df <- data.frame(x = 1:5) show(gr) GRanges object with 5 ranges and 1 metadata column: Error in .Method(..., deparse.level = deparse.level) : number of rows of matrices must match (see arg 3) However, if the nested data frame has two columns, it can be printed out correctly: gr <- GRanges("chr21", IRanges(1:5, width = 1)) gr$df <- data.frame(x = 1:5, y = 11:15) show(gr) GRanges object with 5 ranges and 1 metadata column: seqnames ranges strand | df <Rle> <IRanges> <Rle> | <data.frame> [1] chr21 [1, 1] * | 1:11 [2] chr21 [2, 2] * | 2:12 [3] chr21 [3, 3] * | 3:13 [4] chr21 [4, 4] * | 4:14 [5] chr21 [5, 5] * | 5:15 ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths In some cases, it can be printed with a warning message, but the form is wrong: gr <- GRanges("chr21", IRanges(1:5, width = 1), emm = 6:10) gr$df <- data.frame(x = 1:5) show(gr) # The nested df is not printed with correct format, there is only # one column in the nested df. GRanges object with 5 ranges and 2 metadata columns: seqnames ranges strand | emm df <Rle> <IRanges> <Rle> | <integer> <data.frame> [1] chr21 [1, 1] * | 6 1,2,3,... [2] chr21 [2, 2] * | 7 1,2,3,... [3] chr21 [3, 3] * | 8 1,2,3,... [4] chr21 [4, 4] * | 9 1,2,3,... [5] chr21 [5, 5] * | 10 1,2,3,... ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths Warning message: In (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : row names were found from a short variable and have been discarded Nested DataFrameList can not be printed: DF <- DataFrame(x = 1:2) DF$split = split(DataFrame(aa = 1:4), c(1,1,2,2)) show(DF) DataFrame with 2 rows and 2 columns Error in dim(object) <- c(nrow(object), prod(tail(dim(object), -1))) : invalid first argument class(DF$split) [1] "CompressedSplitDataFrameList" attr(,"package") [1] "IRanges" In the case above, I understand that it is hard to create a short string representation of the nested structure, but I think printing dimensions of the nested element may be sufficient. Any comments? Best, Jialin ----------- Session Info: R version 3.4.1 (2017-06-30) Platform: x86_64-suse-linux-gnu (64-bit) Running under: openSUSE Tumbleweed Matrix products: default BLAS: /usr/lib64/R/lib/libRblas.so LAPACK: /usr/lib64/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] Biobase_2.37.2 GenomicRanges_1.29.14 GenomeInfoDb_1.13.4 [4] IRanges_2.11.17 S4Vectors_0.15.8 BiocGenerics_0.23.1 [7] magrittr_1.5 loaded via a namesp r$> DF$split <- DF$split %>% as.list %>% lapply(as.data.frame) r$> DF DataFrame with 2 rows and 2 columns x split <integer> <list> 1 1 1,2 2 2 3,4 ace (and not attached): [1] zlibbioc_1.23.0 compiler_3.4.1 XVector_0.17.1 [4] tools_3.4.1 GenomeInfoDbData_0.99.1 RCurl_1.95- 4.8 [7] ulimit_0.0-3 bitops_1.0-6
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_m ailman_listinfo_bioc- 2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYb W0WYiZvSXAJJKaaPhzWA&m=npFXtfKAjVRDigSzntYatjIYfWrBU30MFNqbP6u8Njg& s=J5tukPZSuK7728ZillLQJHHrfu7e0o1QsLm0OPNiS2Y&e=