hi,
when i load the package 'GenomicScores' in a clean session i see thorugh
the 'sessionInfo()' that the package 'Matrix' is listed under "loaded
via a namespace (and not attached)".
i'd like to know what is the dependency that 'GenomicsScores' has that
ends up requiring the package 'Matrix'.
i've tried using the package 'pkgDepTools' without success, because the
dependency graph does not list any path from 'GenomicScores' to 'Matrix'.
i've been manually browsing the Bioc website and, unless i've overlooked
something, the only association with 'Matrix' i could find is that
'S4Vectors' and 'GenomicRanges', which are required by 'GenomicScores',
list 'Matrix' in the 'Suggests' field, but my understanding is that
those packages are not required and should not be loaded.
so, is there any way in which i can figure out what of the
'GenomicScores' dependencies leads to loading the package 'Matrix'?
here are the depends, import and suggests fields from 'GenomicScores':
Depends: R (>= 3.5), S4Vectors (>= 0.7.21), GenomicRanges, methods,
BiocGenerics (>= 0.13.8)
Imports: utils, XML, Biobase, IRanges (>= 2.3.23), Biostrings,
BSgenome, GenomeInfoDb, AnnotationHub, shiny, shinyjs,
DT, shinycustomloader, rtracklayer, data.table, shinythemes
Suggests: BiocStyle, knitr, rmarkdown, BSgenome.Hsapiens.UCSC.hg19,
phastCons100way.UCSC.hg19, MafDb.1Kgenomes.phase1.hs37d5,
SNPlocs.Hsapiens.dbSNP144.GRCh37, VariantAnnotation,
TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat, RColorBrewer
and here a session information in a fresh R-devel session after loading
the package 'GenomicScores':
R Under development (unstable) (2020-01-29 r77745)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
[5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
[7] LC_PAPER=en_US.UTF8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicScores_1.11.4 GenomicRanges_1.39.2 GenomeInfoDb_1.23.10
[4] IRanges_2.21.3 S4Vectors_0.25.12 BiocGenerics_0.33.0
[7] colorout_1.2-2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 lattice_0.20-38
[3] shinycustomloader_0.9.0 Rsamtools_2.3.3
[5] Biostrings_2.55.4 assertthat_0.2.1
[7] digest_0.6.23 mime_0.9
[9] BiocFileCache_1.11.4 R6_2.4.1
[11] RSQLite_2.2.0 httr_1.4.1
[13] pillar_1.4.3 zlibbioc_1.33.1
[15] rlang_0.4.4 curl_4.3
[17] data.table_1.12.8 blob_1.2.1
[19] DT_0.12 Matrix_1.2-18
[21] shinythemes_1.1.2 shinyjs_1.1
[23] BiocParallel_1.21.2 AnnotationHub_2.19.7
[25] htmlwidgets_1.5.1 RCurl_1.98-1.1
[27] bit_1.1-15.1 shiny_1.4.0
[29] DelayedArray_0.13.3 compiler_4.0.0
[31] httpuv_1.5.2 rtracklayer_1.47.0
[33] pkgconfig_2.0.3 htmltools_0.4.0
[35] tidyselect_1.0.0 SummarizedExperiment_1.17.1
[37] tibble_2.1.3 GenomeInfoDbData_1.2.2
[39] interactiveDisplayBase_1.25.0 matrixStats_0.55.0
[41] XML_3.99-0.3 crayon_1.3.4
[43] dplyr_0.8.4 dbplyr_1.4.2
[45] later_1.0.0 GenomicAlignments_1.23.1
[47] bitops_1.0-6 rappdirs_0.3.1
[49] grid_4.0.0 xtable_1.8-4
[51] DBI_1.1.0 magrittr_1.5
[53] XVector_0.27.0 promises_1.1.0
[55] vctrs_0.2.2 tools_4.0.0
[57] bit64_0.9-7 BSgenome_1.55.3
[59] Biobase_2.47.2 glue_1.3.1
[61] purrr_0.3.3 BiocVersion_3.11.1
[63] fastmap_1.0.1 yaml_2.2.1
[65] AnnotationDbi_1.49.1 BiocManager_1.30.10
[67] memoise_1.1.0
thanks!!
robert.
[Bioc-devel] how to trace 'Matrix' as package dependency for 'GenomicScores'
10 messages · Martin Morgan, Sean Davis, Robert Castelo +1 more
The first thing is to get the correct repositories
repos = BiocManager::repositories()
(maybe trim the experiment and annotation repos from this). I also tried pkgDepTools::makeDepGraph() but it took so long that I moved on... it has an option 'keep.builtin' which might include Matrix.
There is also BiocPkgTools::buildPkgDependencyDataFrame() & friends, but this seems to build dependencies within a single repository...
The building block for a solution is `tools::package_dependencies()`, and I can confirm that "Matrix" _is_ a dependency
db = available.packages(repos = BiocManager::repositories())
revdeps <- tools::package_dependencies("GenomicScores", db, recursive = TRUE)
"Matrix" %in% revdeps[[1]]
## [1] TRUE
so I'll leave the clever recursive or graph-based algorithm up to you, to report back to the mailing list?
For what it's worth I think the last time this came up Martin Maechler pointed to a function in base R (probably the tools package) that implements this, too...?
Martin Morgan
?On 2/6/20, 6:40 AM, "Bioc-devel on behalf of Robert Castelo" <bioc-devel-bounces at r-project.org on behalf of robert.castelo at upf.edu> wrote:
hi,
when i load the package 'GenomicScores' in a clean session i see thorugh
the 'sessionInfo()' that the package 'Matrix' is listed under "loaded
via a namespace (and not attached)".
i'd like to know what is the dependency that 'GenomicsScores' has that
ends up requiring the package 'Matrix'.
i've tried using the package 'pkgDepTools' without success, because the
dependency graph does not list any path from 'GenomicScores' to 'Matrix'.
i've been manually browsing the Bioc website and, unless i've overlooked
something, the only association with 'Matrix' i could find is that
'S4Vectors' and 'GenomicRanges', which are required by 'GenomicScores',
list 'Matrix' in the 'Suggests' field, but my understanding is that
those packages are not required and should not be loaded.
so, is there any way in which i can figure out what of the
'GenomicScores' dependencies leads to loading the package 'Matrix'?
here are the depends, import and suggests fields from 'GenomicScores':
Depends: R (>= 3.5), S4Vectors (>= 0.7.21), GenomicRanges, methods,
BiocGenerics (>= 0.13.8)
Imports: utils, XML, Biobase, IRanges (>= 2.3.23), Biostrings,
BSgenome, GenomeInfoDb, AnnotationHub, shiny, shinyjs,
DT, shinycustomloader, rtracklayer, data.table, shinythemes
Suggests: BiocStyle, knitr, rmarkdown, BSgenome.Hsapiens.UCSC.hg19,
phastCons100way.UCSC.hg19, MafDb.1Kgenomes.phase1.hs37d5,
SNPlocs.Hsapiens.dbSNP144.GRCh37, VariantAnnotation,
TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat, RColorBrewer
and here a session information in a fresh R-devel session after loading
the package 'GenomicScores':
R Under development (unstable) (2020-01-29 r77745)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
[5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
[7] LC_PAPER=en_US.UTF8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicScores_1.11.4 GenomicRanges_1.39.2 GenomeInfoDb_1.23.10
[4] IRanges_2.21.3 S4Vectors_0.25.12 BiocGenerics_0.33.0
[7] colorout_1.2-2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 lattice_0.20-38
[3] shinycustomloader_0.9.0 Rsamtools_2.3.3
[5] Biostrings_2.55.4 assertthat_0.2.1
[7] digest_0.6.23 mime_0.9
[9] BiocFileCache_1.11.4 R6_2.4.1
[11] RSQLite_2.2.0 httr_1.4.1
[13] pillar_1.4.3 zlibbioc_1.33.1
[15] rlang_0.4.4 curl_4.3
[17] data.table_1.12.8 blob_1.2.1
[19] DT_0.12 Matrix_1.2-18
[21] shinythemes_1.1.2 shinyjs_1.1
[23] BiocParallel_1.21.2 AnnotationHub_2.19.7
[25] htmlwidgets_1.5.1 RCurl_1.98-1.1
[27] bit_1.1-15.1 shiny_1.4.0
[29] DelayedArray_0.13.3 compiler_4.0.0
[31] httpuv_1.5.2 rtracklayer_1.47.0
[33] pkgconfig_2.0.3 htmltools_0.4.0
[35] tidyselect_1.0.0 SummarizedExperiment_1.17.1
[37] tibble_2.1.3 GenomeInfoDbData_1.2.2
[39] interactiveDisplayBase_1.25.0 matrixStats_0.55.0
[41] XML_3.99-0.3 crayon_1.3.4
[43] dplyr_0.8.4 dbplyr_1.4.2
[45] later_1.0.0 GenomicAlignments_1.23.1
[47] bitops_1.0-6 rappdirs_0.3.1
[49] grid_4.0.0 xtable_1.8-4
[51] DBI_1.1.0 magrittr_1.5
[53] XVector_0.27.0 promises_1.1.0
[55] vctrs_0.2.2 tools_4.0.0
[57] bit64_0.9-7 BSgenome_1.55.3
[59] Biobase_2.47.2 glue_1.3.1
[61] purrr_0.3.3 BiocVersion_3.11.1
[63] fastmap_1.0.1 yaml_2.2.1
[65] AnnotationDbi_1.49.1 BiocManager_1.30.10
[67] memoise_1.1.0
thanks!!
robert.
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
hi Martin,
thanks for hint!! i wasn't aware of 'tools::package_dependencies()',
adding a bit of graph sorcery i get the result i was looking for:
repos <- BiocManager::repositories()[c(1,5)]
repos
BioCsoft
"https://bioconductor.org/packages/3.11/bioc"
CRAN
"https://cran.rstudio.com"
db <- available.packages(repos=repos)
deps <- tools::package_dependencies("GenomicScores", db,
recursive=TRUE)[[1]]
deps <- tools::package_dependencies(c("GenomicScores", deps), db)
g <- graph::graphNEL(nodes=names(deps), edgeL=deps, edgemode="directed")
RBGL::sp.between(g, start="GenomicScores", finish="Matrix",
detail=TRUE)[[1]]$path_detail
[1] "GenomicScores" "rtracklayer" "GenomicAlignments"
[4] "SummarizedExperiment" "Matrix"
so, it was the rtracklayer dependency that leads to Matrix through
GenomeAlignments and SummarizedExperiment.
maybe the BioC package 'pkgDepTools' should be deprecated if its
functionality is part of 'tools' and it does not even work as fast and
correct as 'tools'.
cheers,
robert.
On 2/6/20 2:51 PM, Martin Morgan wrote:
The first thing is to get the correct repositories
repos = BiocManager::repositories()
(maybe trim the experiment and annotation repos from this). I also tried pkgDepTools::makeDepGraph() but it took so long that I moved on... it has an option 'keep.builtin' which might include Matrix.
There is also BiocPkgTools::buildPkgDependencyDataFrame() & friends, but this seems to build dependencies within a single repository...
The building block for a solution is `tools::package_dependencies()`, and I can confirm that "Matrix" _is_ a dependency
db = available.packages(repos = BiocManager::repositories())
revdeps <- tools::package_dependencies("GenomicScores", db, recursive = TRUE)
"Matrix" %in% revdeps[[1]]
## [1] TRUE
so I'll leave the clever recursive or graph-based algorithm up to you, to report back to the mailing list?
For what it's worth I think the last time this came up Martin Maechler pointed to a function in base R (probably the tools package) that implements this, too...?
Martin Morgan
?On 2/6/20, 6:40 AM, "Bioc-devel on behalf of Robert Castelo" <bioc-devel-bounces at r-project.org on behalf of robert.castelo at upf.edu> wrote:
hi,
when i load the package 'GenomicScores' in a clean session i see thorugh
the 'sessionInfo()' that the package 'Matrix' is listed under "loaded
via a namespace (and not attached)".
i'd like to know what is the dependency that 'GenomicsScores' has that
ends up requiring the package 'Matrix'.
i've tried using the package 'pkgDepTools' without success, because the
dependency graph does not list any path from 'GenomicScores' to 'Matrix'.
i've been manually browsing the Bioc website and, unless i've overlooked
something, the only association with 'Matrix' i could find is that
'S4Vectors' and 'GenomicRanges', which are required by 'GenomicScores',
list 'Matrix' in the 'Suggests' field, but my understanding is that
those packages are not required and should not be loaded.
so, is there any way in which i can figure out what of the
'GenomicScores' dependencies leads to loading the package 'Matrix'?
here are the depends, import and suggests fields from 'GenomicScores':
Depends: R (>= 3.5), S4Vectors (>= 0.7.21), GenomicRanges, methods,
BiocGenerics (>= 0.13.8)
Imports: utils, XML, Biobase, IRanges (>= 2.3.23), Biostrings,
BSgenome, GenomeInfoDb, AnnotationHub, shiny, shinyjs,
DT, shinycustomloader, rtracklayer, data.table, shinythemes
Suggests: BiocStyle, knitr, rmarkdown, BSgenome.Hsapiens.UCSC.hg19,
phastCons100way.UCSC.hg19, MafDb.1Kgenomes.phase1.hs37d5,
SNPlocs.Hsapiens.dbSNP144.GRCh37, VariantAnnotation,
TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat, RColorBrewer
and here a session information in a fresh R-devel session after loading
the package 'GenomicScores':
R Under development (unstable) (2020-01-29 r77745)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
[5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
[7] LC_PAPER=en_US.UTF8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicScores_1.11.4 GenomicRanges_1.39.2 GenomeInfoDb_1.23.10
[4] IRanges_2.21.3 S4Vectors_0.25.12 BiocGenerics_0.33.0
[7] colorout_1.2-2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 lattice_0.20-38
[3] shinycustomloader_0.9.0 Rsamtools_2.3.3
[5] Biostrings_2.55.4 assertthat_0.2.1
[7] digest_0.6.23 mime_0.9
[9] BiocFileCache_1.11.4 R6_2.4.1
[11] RSQLite_2.2.0 httr_1.4.1
[13] pillar_1.4.3 zlibbioc_1.33.1
[15] rlang_0.4.4 curl_4.3
[17] data.table_1.12.8 blob_1.2.1
[19] DT_0.12 Matrix_1.2-18
[21] shinythemes_1.1.2 shinyjs_1.1
[23] BiocParallel_1.21.2 AnnotationHub_2.19.7
[25] htmlwidgets_1.5.1 RCurl_1.98-1.1
[27] bit_1.1-15.1 shiny_1.4.0
[29] DelayedArray_0.13.3 compiler_4.0.0
[31] httpuv_1.5.2 rtracklayer_1.47.0
[33] pkgconfig_2.0.3 htmltools_0.4.0
[35] tidyselect_1.0.0 SummarizedExperiment_1.17.1
[37] tibble_2.1.3 GenomeInfoDbData_1.2.2
[39] interactiveDisplayBase_1.25.0 matrixStats_0.55.0
[41] XML_3.99-0.3 crayon_1.3.4
[43] dplyr_0.8.4 dbplyr_1.4.2
[45] later_1.0.0 GenomicAlignments_1.23.1
[47] bitops_1.0-6 rappdirs_0.3.1
[49] grid_4.0.0 xtable_1.8-4
[51] DBI_1.1.0 magrittr_1.5
[53] XVector_0.27.0 promises_1.1.0
[55] vctrs_0.2.2 tools_4.0.0
[57] bit64_0.9-7 BSgenome_1.55.3
[59] Biobase_2.47.2 glue_1.3.1
[61] purrr_0.3.3 BiocVersion_3.11.1
[63] fastmap_1.0.1 yaml_2.2.1
[65] AnnotationDbi_1.49.1 BiocManager_1.30.10
[67] memoise_1.1.0
thanks!!
robert.
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550
Excellent! I think there are other, independent, paths between your immediate dependents... RBGL::sp.between(g, start="DT", finish="Matrix", detail=TRUE)[[1]]$path_detail [1] "DT" "crosstalk" "ggplot2" "mgcv" "Matrix" ?? Martin
?On 2/6/20, 10:47 AM, "Robert Castelo" <robert.castelo at upf.edu> wrote:
hi Martin,
thanks for hint!! i wasn't aware of 'tools::package_dependencies()',
adding a bit of graph sorcery i get the result i was looking for:
repos <- BiocManager::repositories()[c(1,5)]
repos
BioCsoft
"https://bioconductor.org/packages/3.11/bioc"
CRAN
"https://cran.rstudio.com"
db <- available.packages(repos=repos)
deps <- tools::package_dependencies("GenomicScores", db,
recursive=TRUE)[[1]]
deps <- tools::package_dependencies(c("GenomicScores", deps), db)
g <- graph::graphNEL(nodes=names(deps), edgeL=deps, edgemode="directed")
RBGL::sp.between(g, start="GenomicScores", finish="Matrix",
detail=TRUE)[[1]]$path_detail
[1] "GenomicScores" "rtracklayer" "GenomicAlignments"
[4] "SummarizedExperiment" "Matrix"
so, it was the rtracklayer dependency that leads to Matrix through
GenomeAlignments and SummarizedExperiment.
maybe the BioC package 'pkgDepTools' should be deprecated if its
functionality is part of 'tools' and it does not even work as fast and
correct as 'tools'.
cheers,
robert.
On 2/6/20 2:51 PM, Martin Morgan wrote:
> The first thing is to get the correct repositories
>
> repos = BiocManager::repositories()
>
> (maybe trim the experiment and annotation repos from this). I also tried pkgDepTools::makeDepGraph() but it took so long that I moved on... it has an option 'keep.builtin' which might include Matrix.
>
> There is also BiocPkgTools::buildPkgDependencyDataFrame() & friends, but this seems to build dependencies within a single repository...
>
> The building block for a solution is `tools::package_dependencies()`, and I can confirm that "Matrix" _is_ a dependency
>
> db = available.packages(repos = BiocManager::repositories())
> revdeps <- tools::package_dependencies("GenomicScores", db, recursive = TRUE)
> "Matrix" %in% revdeps[[1]]
> ## [1] TRUE
>
> so I'll leave the clever recursive or graph-based algorithm up to you, to report back to the mailing list?
>
> For what it's worth I think the last time this came up Martin Maechler pointed to a function in base R (probably the tools package) that implements this, too...?
>
> Martin Morgan
>
> ?On 2/6/20, 6:40 AM, "Bioc-devel on behalf of Robert Castelo" <bioc-devel-bounces at r-project.org on behalf of robert.castelo at upf.edu> wrote:
>
> hi,
>
> when i load the package 'GenomicScores' in a clean session i see thorugh
> the 'sessionInfo()' that the package 'Matrix' is listed under "loaded
> via a namespace (and not attached)".
>
> i'd like to know what is the dependency that 'GenomicsScores' has that
> ends up requiring the package 'Matrix'.
>
> i've tried using the package 'pkgDepTools' without success, because the
> dependency graph does not list any path from 'GenomicScores' to 'Matrix'.
>
> i've been manually browsing the Bioc website and, unless i've overlooked
> something, the only association with 'Matrix' i could find is that
> 'S4Vectors' and 'GenomicRanges', which are required by 'GenomicScores',
> list 'Matrix' in the 'Suggests' field, but my understanding is that
> those packages are not required and should not be loaded.
>
> so, is there any way in which i can figure out what of the
> 'GenomicScores' dependencies leads to loading the package 'Matrix'?
>
> here are the depends, import and suggests fields from 'GenomicScores':
>
> Depends: R (>= 3.5), S4Vectors (>= 0.7.21), GenomicRanges, methods,
> BiocGenerics (>= 0.13.8)
> Imports: utils, XML, Biobase, IRanges (>= 2.3.23), Biostrings,
> BSgenome, GenomeInfoDb, AnnotationHub, shiny, shinyjs,
> DT, shinycustomloader, rtracklayer, data.table, shinythemes
> Suggests: BiocStyle, knitr, rmarkdown, BSgenome.Hsapiens.UCSC.hg19,
> phastCons100way.UCSC.hg19, MafDb.1Kgenomes.phase1.hs37d5,
> SNPlocs.Hsapiens.dbSNP144.GRCh37, VariantAnnotation,
> TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat, RColorBrewer
>
> and here a session information in a fresh R-devel session after loading
> the package 'GenomicScores':
>
> R Under development (unstable) (2020-01-29 r77745)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: CentOS Linux 7 (Core)
>
> Matrix products: default
> BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
> LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
>
> locale:
> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
> [7] LC_PAPER=en_US.UTF8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats4 stats graphics grDevices utils datasets
> [8] methods base
>
> other attached packages:
> [1] GenomicScores_1.11.4 GenomicRanges_1.39.2 GenomeInfoDb_1.23.10
> [4] IRanges_2.21.3 S4Vectors_0.25.12 BiocGenerics_0.33.0
> [7] colorout_1.2-2
>
> loaded via a namespace (and not attached):
> [1] Rcpp_1.0.3 lattice_0.20-38
> [3] shinycustomloader_0.9.0 Rsamtools_2.3.3
> [5] Biostrings_2.55.4 assertthat_0.2.1
> [7] digest_0.6.23 mime_0.9
> [9] BiocFileCache_1.11.4 R6_2.4.1
> [11] RSQLite_2.2.0 httr_1.4.1
> [13] pillar_1.4.3 zlibbioc_1.33.1
> [15] rlang_0.4.4 curl_4.3
> [17] data.table_1.12.8 blob_1.2.1
> [19] DT_0.12 Matrix_1.2-18
> [21] shinythemes_1.1.2 shinyjs_1.1
> [23] BiocParallel_1.21.2 AnnotationHub_2.19.7
> [25] htmlwidgets_1.5.1 RCurl_1.98-1.1
> [27] bit_1.1-15.1 shiny_1.4.0
> [29] DelayedArray_0.13.3 compiler_4.0.0
> [31] httpuv_1.5.2 rtracklayer_1.47.0
> [33] pkgconfig_2.0.3 htmltools_0.4.0
> [35] tidyselect_1.0.0 SummarizedExperiment_1.17.1
> [37] tibble_2.1.3 GenomeInfoDbData_1.2.2
> [39] interactiveDisplayBase_1.25.0 matrixStats_0.55.0
> [41] XML_3.99-0.3 crayon_1.3.4
> [43] dplyr_0.8.4 dbplyr_1.4.2
> [45] later_1.0.0 GenomicAlignments_1.23.1
> [47] bitops_1.0-6 rappdirs_0.3.1
> [49] grid_4.0.0 xtable_1.8-4
> [51] DBI_1.1.0 magrittr_1.5
> [53] XVector_0.27.0 promises_1.1.0
> [55] vctrs_0.2.2 tools_4.0.0
> [57] bit64_0.9-7 BSgenome_1.55.3
> [59] Biobase_2.47.2 glue_1.3.1
> [61] purrr_0.3.3 BiocVersion_3.11.1
> [63] fastmap_1.0.1 yaml_2.2.1
> [65] AnnotationDbi_1.49.1 BiocManager_1.30.10
> [67] memoise_1.1.0
>
>
>
> thanks!!
>
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
true, i was just searching for the shortest path, we can search for all simple (i.e., without repeating "vertices") paths and there are up to five routes from "GenomicScores" to "Matrix" igraph::all_simple_paths(igraph::igraph.from.graphNEL(g), from="GenomicScores", to="Matrix", mode="out") [[1]] + 7/117 vertices, named, from 04133ec: [1] GenomicScores BSgenome rtracklayer [4] GenomicAlignments SummarizedExperiment DelayedArray [7] Matrix [[2]] + 6/117 vertices, named, from 04133ec: [1] GenomicScores BSgenome rtracklayer [4] GenomicAlignments SummarizedExperiment Matrix [[3]] + 6/117 vertices, named, from 04133ec: [1] GenomicScores DT crosstalk ggplot2 mgcv [6] Matrix [[4]] + 6/117 vertices, named, from 04133ec: [1] GenomicScores rtracklayer GenomicAlignments [4] SummarizedExperiment DelayedArray Matrix [[5]] + 5/117 vertices, named, from 04133ec: [1] GenomicScores rtracklayer GenomicAlignments [4] SummarizedExperiment Matrix this is interesting, because it means that if i wanted to get rid of the "Matrix" dependence i'd need to get rid not only of the "rtracklayer" dependence but also of "BSgenome" and "DT". robert.
On 2/6/20 5:41 PM, Martin Morgan wrote:
Excellent! I think there are other, independent, paths between your immediate dependents...
RBGL::sp.between(g, start="DT", finish="Matrix", detail=TRUE)[[1]]$path_detail
[1] "DT" "crosstalk" "ggplot2" "mgcv" "Matrix"
??
Martin
?On 2/6/20, 10:47 AM, "Robert Castelo" <robert.castelo at upf.edu> wrote:
hi Martin,
thanks for hint!! i wasn't aware of 'tools::package_dependencies()',
adding a bit of graph sorcery i get the result i was looking for:
repos <- BiocManager::repositories()[c(1,5)]
repos
BioCsoft
"https://bioconductor.org/packages/3.11/bioc"
CRAN
"https://cran.rstudio.com"
db <- available.packages(repos=repos)
deps <- tools::package_dependencies("GenomicScores", db,
recursive=TRUE)[[1]]
deps <- tools::package_dependencies(c("GenomicScores", deps), db)
g <- graph::graphNEL(nodes=names(deps), edgeL=deps, edgemode="directed")
RBGL::sp.between(g, start="GenomicScores", finish="Matrix",
detail=TRUE)[[1]]$path_detail
[1] "GenomicScores" "rtracklayer" "GenomicAlignments"
[4] "SummarizedExperiment" "Matrix"
so, it was the rtracklayer dependency that leads to Matrix through
GenomeAlignments and SummarizedExperiment.
maybe the BioC package 'pkgDepTools' should be deprecated if its
functionality is part of 'tools' and it does not even work as fast and
correct as 'tools'.
cheers,
robert.
On 2/6/20 2:51 PM, Martin Morgan wrote:
> The first thing is to get the correct repositories
>
> repos = BiocManager::repositories()
>
> (maybe trim the experiment and annotation repos from this). I also tried pkgDepTools::makeDepGraph() but it took so long that I moved on... it has an option 'keep.builtin' which might include Matrix.
>
> There is also BiocPkgTools::buildPkgDependencyDataFrame() & friends, but this seems to build dependencies within a single repository...
>
> The building block for a solution is `tools::package_dependencies()`, and I can confirm that "Matrix" _is_ a dependency
>
> db = available.packages(repos = BiocManager::repositories())
> revdeps <- tools::package_dependencies("GenomicScores", db, recursive = TRUE)
> "Matrix" %in% revdeps[[1]]
> ## [1] TRUE
>
> so I'll leave the clever recursive or graph-based algorithm up to you, to report back to the mailing list?
>
> For what it's worth I think the last time this came up Martin Maechler pointed to a function in base R (probably the tools package) that implements this, too...?
>
> Martin Morgan
>
> ?On 2/6/20, 6:40 AM, "Bioc-devel on behalf of Robert Castelo" <bioc-devel-bounces at r-project.org on behalf of robert.castelo at upf.edu> wrote:
>
> hi,
>
> when i load the package 'GenomicScores' in a clean session i see thorugh
> the 'sessionInfo()' that the package 'Matrix' is listed under "loaded
> via a namespace (and not attached)".
>
> i'd like to know what is the dependency that 'GenomicsScores' has that
> ends up requiring the package 'Matrix'.
>
> i've tried using the package 'pkgDepTools' without success, because the
> dependency graph does not list any path from 'GenomicScores' to 'Matrix'.
>
> i've been manually browsing the Bioc website and, unless i've overlooked
> something, the only association with 'Matrix' i could find is that
> 'S4Vectors' and 'GenomicRanges', which are required by 'GenomicScores',
> list 'Matrix' in the 'Suggests' field, but my understanding is that
> those packages are not required and should not be loaded.
>
> so, is there any way in which i can figure out what of the
> 'GenomicScores' dependencies leads to loading the package 'Matrix'?
>
> here are the depends, import and suggests fields from 'GenomicScores':
>
> Depends: R (>= 3.5), S4Vectors (>= 0.7.21), GenomicRanges, methods,
> BiocGenerics (>= 0.13.8)
> Imports: utils, XML, Biobase, IRanges (>= 2.3.23), Biostrings,
> BSgenome, GenomeInfoDb, AnnotationHub, shiny, shinyjs,
> DT, shinycustomloader, rtracklayer, data.table, shinythemes
> Suggests: BiocStyle, knitr, rmarkdown, BSgenome.Hsapiens.UCSC.hg19,
> phastCons100way.UCSC.hg19, MafDb.1Kgenomes.phase1.hs37d5,
> SNPlocs.Hsapiens.dbSNP144.GRCh37, VariantAnnotation,
> TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat, RColorBrewer
>
> and here a session information in a fresh R-devel session after loading
> the package 'GenomicScores':
>
> R Under development (unstable) (2020-01-29 r77745)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: CentOS Linux 7 (Core)
>
> Matrix products: default
> BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
> LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
>
> locale:
> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
> [7] LC_PAPER=en_US.UTF8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats4 stats graphics grDevices utils datasets
> [8] methods base
>
> other attached packages:
> [1] GenomicScores_1.11.4 GenomicRanges_1.39.2 GenomeInfoDb_1.23.10
> [4] IRanges_2.21.3 S4Vectors_0.25.12 BiocGenerics_0.33.0
> [7] colorout_1.2-2
>
> loaded via a namespace (and not attached):
> [1] Rcpp_1.0.3 lattice_0.20-38
> [3] shinycustomloader_0.9.0 Rsamtools_2.3.3
> [5] Biostrings_2.55.4 assertthat_0.2.1
> [7] digest_0.6.23 mime_0.9
> [9] BiocFileCache_1.11.4 R6_2.4.1
> [11] RSQLite_2.2.0 httr_1.4.1
> [13] pillar_1.4.3 zlibbioc_1.33.1
> [15] rlang_0.4.4 curl_4.3
> [17] data.table_1.12.8 blob_1.2.1
> [19] DT_0.12 Matrix_1.2-18
> [21] shinythemes_1.1.2 shinyjs_1.1
> [23] BiocParallel_1.21.2 AnnotationHub_2.19.7
> [25] htmlwidgets_1.5.1 RCurl_1.98-1.1
> [27] bit_1.1-15.1 shiny_1.4.0
> [29] DelayedArray_0.13.3 compiler_4.0.0
> [31] httpuv_1.5.2 rtracklayer_1.47.0
> [33] pkgconfig_2.0.3 htmltools_0.4.0
> [35] tidyselect_1.0.0 SummarizedExperiment_1.17.1
> [37] tibble_2.1.3 GenomeInfoDbData_1.2.2
> [39] interactiveDisplayBase_1.25.0 matrixStats_0.55.0
> [41] XML_3.99-0.3 crayon_1.3.4
> [43] dplyr_0.8.4 dbplyr_1.4.2
> [45] later_1.0.0 GenomicAlignments_1.23.1
> [47] bitops_1.0-6 rappdirs_0.3.1
> [49] grid_4.0.0 xtable_1.8-4
> [51] DBI_1.1.0 magrittr_1.5
> [53] XVector_0.27.0 promises_1.1.0
> [55] vctrs_0.2.2 tools_4.0.0
> [57] bit64_0.9-7 BSgenome_1.55.3
> [59] Biobase_2.47.2 glue_1.3.1
> [61] purrr_0.3.3 BiocVersion_3.11.1
> [63] fastmap_1.0.1 yaml_2.2.1
> [65] AnnotationDbi_1.49.1 BiocManager_1.30.10
> [67] memoise_1.1.0
>
>
>
> thanks!!
>
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550
1 day later
I find it quite interesting to identify formal strategies for removing dependencies, but also a little outside my domain of expertise. This code
library(tools)
library(dplyr)
## non-base packages the user requires for GenomicScores
deps <- package_dependencies("GenomicScores", db, recursive=TRUE)[[1]]
deps <- intersect(deps, rownames(db))
## only need the 'universe' of GenomicScores dependencies
db1 <- db[c("GenomicScores", deps),]
## sub-graph of packages between each dependency and GenomicScores
revdeps <- package_dependencies(deps, db1, recursive = TRUE, reverse = TRUE)
tibble(
package = names(olap),
n_remove = lengths(revdeps),
) %>%
arrange(n_remove)
produces a tibble
# A tibble: 106 x 2
package n_remove
<chr> <int>
1 BSgenome 1
2 AnnotationHub 1
3 shinyjs 1
4 DT 1
5 shinycustomloader 1
6 data.table 1
7 shinythemes 1
8 rtracklayer 2
9 BiocFileCache 2
10 BiocManager 2
# ? with 96 more rows
shows me, via n_remove, that I can remove the dependency on AnnotationHub by removing the dependency on just one package (AnnotationHub!), but to remove BiocFileCache I'd also have to remove another package (AnnotationHub, I'd guess). So this provides some measure of the ease with which a package can be removed.
I'd like a 'benefit' column, too -- if I were to remove AnnotationHub, how many additional packages would I also be able to remove, because they are present only to satisfy the dependency on AnnotationHub? More generally, perhaps there is a dependency of AnnotationHub that is only used by AnnotationHub and BSgenome. So removing AnnotationHub as a dependency would make it easier to remove BSgenome, etc. I guess this is a graph optimization problem.
Probably also worth mentioning the itdepends package (https://github.com/r-lib/itdepends), which I think tries primarily to determine the relationship between package dependencies and lines of code, which seems like complementary information.
Martin
?On 2/6/20, 12:29 PM, "Robert Castelo" <robert.castelo at upf.edu> wrote:
true, i was just searching for the shortest path, we can search for all
simple (i.e., without repeating "vertices") paths and there are up to
five routes from "GenomicScores" to "Matrix"
igraph::all_simple_paths(igraph::igraph.from.graphNEL(g),
from="GenomicScores", to="Matrix", mode="out")
[[1]]
+ 7/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment DelayedArray
[7] Matrix
[[2]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment Matrix
[[3]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores DT crosstalk ggplot2 mgcv
[6] Matrix
[[4]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment DelayedArray Matrix
[[5]]
+ 5/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment Matrix
this is interesting, because it means that if i wanted to get rid of the
"Matrix" dependence i'd need to get rid not only of the "rtracklayer"
dependence but also of "BSgenome" and "DT".
robert.
On 2/6/20 5:41 PM, Martin Morgan wrote:
> Excellent! I think there are other, independent, paths between your immediate dependents...
>
> RBGL::sp.between(g, start="DT", finish="Matrix", detail=TRUE)[[1]]$path_detail
> [1] "DT" "crosstalk" "ggplot2" "mgcv" "Matrix"
>
> ??
>
> Martin
>
> ?On 2/6/20, 10:47 AM, "Robert Castelo" <robert.castelo at upf.edu> wrote:
>
> hi Martin,
>
> thanks for hint!! i wasn't aware of 'tools::package_dependencies()',
> adding a bit of graph sorcery i get the result i was looking for:
>
> repos <- BiocManager::repositories()[c(1,5)]
> repos
> BioCsoft
> "https://bioconductor.org/packages/3.11/bioc"
> CRAN
> "https://cran.rstudio.com"
>
> db <- available.packages(repos=repos)
>
> deps <- tools::package_dependencies("GenomicScores", db,
> recursive=TRUE)[[1]]
>
> deps <- tools::package_dependencies(c("GenomicScores", deps), db)
>
> g <- graph::graphNEL(nodes=names(deps), edgeL=deps, edgemode="directed")
>
> RBGL::sp.between(g, start="GenomicScores", finish="Matrix",
> detail=TRUE)[[1]]$path_detail
> [1] "GenomicScores" "rtracklayer" "GenomicAlignments"
> [4] "SummarizedExperiment" "Matrix"
>
> so, it was the rtracklayer dependency that leads to Matrix through
> GenomeAlignments and SummarizedExperiment.
>
> maybe the BioC package 'pkgDepTools' should be deprecated if its
> functionality is part of 'tools' and it does not even work as fast and
> correct as 'tools'.
>
> cheers,
>
> robert.
>
>
> On 2/6/20 2:51 PM, Martin Morgan wrote:
> > The first thing is to get the correct repositories
> >
> > repos = BiocManager::repositories()
> >
> > (maybe trim the experiment and annotation repos from this). I also tried pkgDepTools::makeDepGraph() but it took so long that I moved on... it has an option 'keep.builtin' which might include Matrix.
> >
> > There is also BiocPkgTools::buildPkgDependencyDataFrame() & friends, but this seems to build dependencies within a single repository...
> >
> > The building block for a solution is `tools::package_dependencies()`, and I can confirm that "Matrix" _is_ a dependency
> >
> > db = available.packages(repos = BiocManager::repositories())
> > revdeps <- tools::package_dependencies("GenomicScores", db, recursive = TRUE)
> > "Matrix" %in% revdeps[[1]]
> > ## [1] TRUE
> >
> > so I'll leave the clever recursive or graph-based algorithm up to you, to report back to the mailing list?
> >
> > For what it's worth I think the last time this came up Martin Maechler pointed to a function in base R (probably the tools package) that implements this, too...?
> >
> > Martin Morgan
> >
> > ?On 2/6/20, 6:40 AM, "Bioc-devel on behalf of Robert Castelo" <bioc-devel-bounces at r-project.org on behalf of robert.castelo at upf.edu> wrote:
> >
> > hi,
> >
> > when i load the package 'GenomicScores' in a clean session i see thorugh
> > the 'sessionInfo()' that the package 'Matrix' is listed under "loaded
> > via a namespace (and not attached)".
> >
> > i'd like to know what is the dependency that 'GenomicsScores' has that
> > ends up requiring the package 'Matrix'.
> >
> > i've tried using the package 'pkgDepTools' without success, because the
> > dependency graph does not list any path from 'GenomicScores' to 'Matrix'.
> >
> > i've been manually browsing the Bioc website and, unless i've overlooked
> > something, the only association with 'Matrix' i could find is that
> > 'S4Vectors' and 'GenomicRanges', which are required by 'GenomicScores',
> > list 'Matrix' in the 'Suggests' field, but my understanding is that
> > those packages are not required and should not be loaded.
> >
> > so, is there any way in which i can figure out what of the
> > 'GenomicScores' dependencies leads to loading the package 'Matrix'?
> >
> > here are the depends, import and suggests fields from 'GenomicScores':
> >
> > Depends: R (>= 3.5), S4Vectors (>= 0.7.21), GenomicRanges, methods,
> > BiocGenerics (>= 0.13.8)
> > Imports: utils, XML, Biobase, IRanges (>= 2.3.23), Biostrings,
> > BSgenome, GenomeInfoDb, AnnotationHub, shiny, shinyjs,
> > DT, shinycustomloader, rtracklayer, data.table, shinythemes
> > Suggests: BiocStyle, knitr, rmarkdown, BSgenome.Hsapiens.UCSC.hg19,
> > phastCons100way.UCSC.hg19, MafDb.1Kgenomes.phase1.hs37d5,
> > SNPlocs.Hsapiens.dbSNP144.GRCh37, VariantAnnotation,
> > TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat, RColorBrewer
> >
> > and here a session information in a fresh R-devel session after loading
> > the package 'GenomicScores':
> >
> > R Under development (unstable) (2020-01-29 r77745)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: CentOS Linux 7 (Core)
> >
> > Matrix products: default
> > BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
> > LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
> > [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
> > [7] LC_PAPER=en_US.UTF8 LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel stats4 stats graphics grDevices utils datasets
> > [8] methods base
> >
> > other attached packages:
> > [1] GenomicScores_1.11.4 GenomicRanges_1.39.2 GenomeInfoDb_1.23.10
> > [4] IRanges_2.21.3 S4Vectors_0.25.12 BiocGenerics_0.33.0
> > [7] colorout_1.2-2
> >
> > loaded via a namespace (and not attached):
> > [1] Rcpp_1.0.3 lattice_0.20-38
> > [3] shinycustomloader_0.9.0 Rsamtools_2.3.3
> > [5] Biostrings_2.55.4 assertthat_0.2.1
> > [7] digest_0.6.23 mime_0.9
> > [9] BiocFileCache_1.11.4 R6_2.4.1
> > [11] RSQLite_2.2.0 httr_1.4.1
> > [13] pillar_1.4.3 zlibbioc_1.33.1
> > [15] rlang_0.4.4 curl_4.3
> > [17] data.table_1.12.8 blob_1.2.1
> > [19] DT_0.12 Matrix_1.2-18
> > [21] shinythemes_1.1.2 shinyjs_1.1
> > [23] BiocParallel_1.21.2 AnnotationHub_2.19.7
> > [25] htmlwidgets_1.5.1 RCurl_1.98-1.1
> > [27] bit_1.1-15.1 shiny_1.4.0
> > [29] DelayedArray_0.13.3 compiler_4.0.0
> > [31] httpuv_1.5.2 rtracklayer_1.47.0
> > [33] pkgconfig_2.0.3 htmltools_0.4.0
> > [35] tidyselect_1.0.0 SummarizedExperiment_1.17.1
> > [37] tibble_2.1.3 GenomeInfoDbData_1.2.2
> > [39] interactiveDisplayBase_1.25.0 matrixStats_0.55.0
> > [41] XML_3.99-0.3 crayon_1.3.4
> > [43] dplyr_0.8.4 dbplyr_1.4.2
> > [45] later_1.0.0 GenomicAlignments_1.23.1
> > [47] bitops_1.0-6 rappdirs_0.3.1
> > [49] grid_4.0.0 xtable_1.8-4
> > [51] DBI_1.1.0 magrittr_1.5
> > [53] XVector_0.27.0 promises_1.1.0
> > [55] vctrs_0.2.2 tools_4.0.0
> > [57] bit64_0.9-7 BSgenome_1.55.3
> > [59] Biobase_2.47.2 glue_1.3.1
> > [61] purrr_0.3.3 BiocVersion_3.11.1
> > [63] fastmap_1.0.1 yaml_2.2.1
> > [65] AnnotationDbi_1.49.1 BiocManager_1.30.10
> > [67] memoise_1.1.0
> >
> >
> >
> > thanks!!
> >
> > robert.
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
>
> --
> Robert Castelo, PhD
> Associate Professor
> Dept. of Experimental and Health Sciences
> Universitat Pompeu Fabra (UPF)
> Barcelona Biomedical Research Park (PRBB)
> Dr Aiguader 88
> E-08003 Barcelona, Spain
> telf: +34.933.160.514
> fax: +34.933.160.550
>
>
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
On Sat, Feb 8, 2020 at 12:02 PM Martin Morgan <mtmorgan.bioc at gmail.com> wrote:
I find it quite interesting to identify formal strategies for removing dependencies, but also a little outside my domain of expertise. This code
It would be nice to collect the ideas in this thread into some recommendations. The themes I am thinking of are "how developers can make their packages robust to loss of external packages" and "how can the Bioc ecosystem best deal with departures of packages from itself and from CRAN?" A good and well-adopted solution to the first one makes the second one moot. Two CRAN-related events I know of that required some effort are (temporary) loss of ashr and (recently) archiving of Seurat.
library(tools)
library(dplyr)
## non-base packages the user requires for GenomicScores
deps <- package_dependencies("GenomicScores", db, recursive=TRUE)[[1]]
deps <- intersect(deps, rownames(db))
## only need the 'universe' of GenomicScores dependencies
db1 <- db[c("GenomicScores", deps),]
## sub-graph of packages between each dependency and GenomicScores
revdeps <- package_dependencies(deps, db1, recursive = TRUE, reverse =
TRUE)
tibble(
package = names(olap),
n_remove = lengths(revdeps),
) %>%
arrange(n_remove)
produces a tibble
# A tibble: 106 x 2
package n_remove
<chr> <int>
1 BSgenome 1
2 AnnotationHub 1
3 shinyjs 1
4 DT 1
5 shinycustomloader 1
6 data.table 1
7 shinythemes 1
8 rtracklayer 2
9 BiocFileCache 2
10 BiocManager 2
# ? with 96 more rows
shows me, via n_remove, that I can remove the dependency on AnnotationHub
by removing the dependency on just one package (AnnotationHub!), but to
remove BiocFileCache I'd also have to remove another package
(AnnotationHub, I'd guess). So this provides some measure of the ease with
which a package can be removed.
I'd like a 'benefit' column, too -- if I were to remove AnnotationHub, how
many additional packages would I also be able to remove, because they are
present only to satisfy the dependency on AnnotationHub? More generally,
perhaps there is a dependency of AnnotationHub that is only used by
AnnotationHub and BSgenome. So removing AnnotationHub as a dependency would
make it easier to remove BSgenome, etc. I guess this is a graph
optimization problem.
Probably also worth mentioning the itdepends package (
https://github.com/r-lib/itdepends), which I think tries primarily to
determine the relationship between package dependencies and lines of code,
which seems like complementary information.
Martin
?On 2/6/20, 12:29 PM, "Robert Castelo" <robert.castelo at upf.edu> wrote:
true, i was just searching for the shortest path, we can search for
all
simple (i.e., without repeating "vertices") paths and there are up to
five routes from "GenomicScores" to "Matrix"
igraph::all_simple_paths(igraph::igraph.from.graphNEL(g),
from="GenomicScores", to="Matrix", mode="out")
[[1]]
+ 7/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment DelayedArray
[7] Matrix
[[2]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment Matrix
[[3]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores DT crosstalk ggplot2 mgcv
[6] Matrix
[[4]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment DelayedArray Matrix
[[5]]
+ 5/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment Matrix
this is interesting, because it means that if i wanted to get rid of
the
"Matrix" dependence i'd need to get rid not only of the "rtracklayer"
dependence but also of "BSgenome" and "DT".
robert.
On 2/6/20 5:41 PM, Martin Morgan wrote:
> Excellent! I think there are other, independent, paths between your
immediate dependents...
>
> RBGL::sp.between(g, start="DT", finish="Matrix",
detail=TRUE)[[1]]$path_detail
> [1] "DT" "crosstalk" "ggplot2" "mgcv" "Matrix"
>
> ??
>
> Martin
>
> ?On 2/6/20, 10:47 AM, "Robert Castelo" <robert.castelo at upf.edu>
wrote:
>
> hi Martin,
>
> thanks for hint!! i wasn't aware of
'tools::package_dependencies()',
> adding a bit of graph sorcery i get the result i was looking
for:
>
> repos <- BiocManager::repositories()[c(1,5)]
> repos
> BioCsoft
> "https://bioconductor.org/packages/3.11/bioc"
> CRAN
> "https://cran.rstudio.com"
>
> db <- available.packages(repos=repos)
>
> deps <- tools::package_dependencies("GenomicScores", db,
> recursive=TRUE)[[1]]
>
> deps <- tools::package_dependencies(c("GenomicScores", deps),
db)
>
> g <- graph::graphNEL(nodes=names(deps), edgeL=deps,
edgemode="directed")
>
> RBGL::sp.between(g, start="GenomicScores", finish="Matrix",
> detail=TRUE)[[1]]$path_detail
> [1] "GenomicScores" "rtracklayer"
"GenomicAlignments"
> [4] "SummarizedExperiment" "Matrix"
>
> so, it was the rtracklayer dependency that leads to Matrix
through
> GenomeAlignments and SummarizedExperiment.
>
> maybe the BioC package 'pkgDepTools' should be deprecated if its
> functionality is part of 'tools' and it does not even work as
fast and
> correct as 'tools'.
>
> cheers,
>
> robert.
>
>
> On 2/6/20 2:51 PM, Martin Morgan wrote:
> > The first thing is to get the correct repositories
> >
> > repos = BiocManager::repositories()
> >
> > (maybe trim the experiment and annotation repos from this). I
also tried pkgDepTools::makeDepGraph() but it took so long that I moved on... it has an option 'keep.builtin' which might include Matrix.
> >
> > There is also BiocPkgTools::buildPkgDependencyDataFrame() &
friends, but this seems to build dependencies within a single repository...
> >
> > The building block for a solution is
`tools::package_dependencies()`, and I can confirm that "Matrix" _is_ a dependency
> >
> > db = available.packages(repos =
BiocManager::repositories())
> > revdeps <- tools::package_dependencies("GenomicScores",
db, recursive = TRUE)
> > "Matrix" %in% revdeps[[1]]
> > ## [1] TRUE
> >
> > so I'll leave the clever recursive or graph-based algorithm
up to you, to report back to the mailing list?
> >
> > For what it's worth I think the last time this came up Martin
Maechler pointed to a function in base R (probably the tools package) that implements this, too...?
> >
> > Martin Morgan
> >
> > ?On 2/6/20, 6:40 AM, "Bioc-devel on behalf of Robert Castelo"
<bioc-devel-bounces at r-project.org on behalf of robert.castelo at upf.edu> wrote:
> >
> > hi,
> >
> > when i load the package 'GenomicScores' in a clean
session i see thorugh
> > the 'sessionInfo()' that the package 'Matrix' is listed
under "loaded
> > via a namespace (and not attached)".
> >
> > i'd like to know what is the dependency that
'GenomicsScores' has that
> > ends up requiring the package 'Matrix'.
> >
> > i've tried using the package 'pkgDepTools' without
success, because the
> > dependency graph does not list any path from
'GenomicScores' to 'Matrix'.
> >
> > i've been manually browsing the Bioc website and, unless
i've overlooked
> > something, the only association with 'Matrix' i could
find is that
> > 'S4Vectors' and 'GenomicRanges', which are required by
'GenomicScores',
> > list 'Matrix' in the 'Suggests' field, but my
understanding is that
> > those packages are not required and should not be loaded.
> >
> > so, is there any way in which i can figure out what of
the
> > 'GenomicScores' dependencies leads to loading the
package 'Matrix'?
> >
> > here are the depends, import and suggests fields from
'GenomicScores':
> >
> > Depends: R (>= 3.5), S4Vectors (>= 0.7.21),
GenomicRanges, methods,
> > BiocGenerics (>= 0.13.8)
> > Imports: utils, XML, Biobase, IRanges (>= 2.3.23),
Biostrings,
> > BSgenome, GenomeInfoDb, AnnotationHub, shiny,
shinyjs,
> > DT, shinycustomloader, rtracklayer, data.table,
shinythemes
> > Suggests: BiocStyle, knitr, rmarkdown,
BSgenome.Hsapiens.UCSC.hg19,
> > phastCons100way.UCSC.hg19,
MafDb.1Kgenomes.phase1.hs37d5,
> > SNPlocs.Hsapiens.dbSNP144.GRCh37,
VariantAnnotation,
> > TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat,
RColorBrewer
> >
> > and here a session information in a fresh R-devel
session after loading
> > the package 'GenomicScores':
> >
> > R Under development (unstable) (2020-01-29 r77745)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: CentOS Linux 7 (Core)
> >
> > Matrix products: default
> > BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
> > LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
> > [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
> > [7] LC_PAPER=en_US.UTF8 LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel stats4 stats graphics grDevices
utils datasets
> > [8] methods base
> >
> > other attached packages:
> > [1] GenomicScores_1.11.4 GenomicRanges_1.39.2
GenomeInfoDb_1.23.10
> > [4] IRanges_2.21.3 S4Vectors_0.25.12
BiocGenerics_0.33.0
> > [7] colorout_1.2-2
> >
> > loaded via a namespace (and not attached):
> > [1] Rcpp_1.0.3 lattice_0.20-38
> > [3] shinycustomloader_0.9.0 Rsamtools_2.3.3
> > [5] Biostrings_2.55.4 assertthat_0.2.1
> > [7] digest_0.6.23 mime_0.9
> > [9] BiocFileCache_1.11.4 R6_2.4.1
> > [11] RSQLite_2.2.0 httr_1.4.1
> > [13] pillar_1.4.3 zlibbioc_1.33.1
> > [15] rlang_0.4.4 curl_4.3
> > [17] data.table_1.12.8 blob_1.2.1
> > [19] DT_0.12 Matrix_1.2-18
> > [21] shinythemes_1.1.2 shinyjs_1.1
> > [23] BiocParallel_1.21.2 AnnotationHub_2.19.7
> > [25] htmlwidgets_1.5.1 RCurl_1.98-1.1
> > [27] bit_1.1-15.1 shiny_1.4.0
> > [29] DelayedArray_0.13.3 compiler_4.0.0
> > [31] httpuv_1.5.2 rtracklayer_1.47.0
> > [33] pkgconfig_2.0.3 htmltools_0.4.0
> > [35] tidyselect_1.0.0
SummarizedExperiment_1.17.1
> > [37] tibble_2.1.3 GenomeInfoDbData_1.2.2
> > [39] interactiveDisplayBase_1.25.0 matrixStats_0.55.0
> > [41] XML_3.99-0.3 crayon_1.3.4
> > [43] dplyr_0.8.4 dbplyr_1.4.2
> > [45] later_1.0.0
GenomicAlignments_1.23.1
> > [47] bitops_1.0-6 rappdirs_0.3.1
> > [49] grid_4.0.0 xtable_1.8-4
> > [51] DBI_1.1.0 magrittr_1.5
> > [53] XVector_0.27.0 promises_1.1.0
> > [55] vctrs_0.2.2 tools_4.0.0
> > [57] bit64_0.9-7 BSgenome_1.55.3
> > [59] Biobase_2.47.2 glue_1.3.1
> > [61] purrr_0.3.3 BiocVersion_3.11.1
> > [63] fastmap_1.0.1 yaml_2.2.1
> > [65] AnnotationDbi_1.49.1 BiocManager_1.30.10
> > [67] memoise_1.1.0
> >
> >
> >
> > thanks!!
> >
> > robert.
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
>
> --
> Robert Castelo, PhD
> Associate Professor
> Dept. of Experimental and Health Sciences
> Universitat Pompeu Fabra (UPF)
> Barcelona Biomedical Research Park (PRBB)
> Dr Aiguader 88
> E-08003 Barcelona, Spain
> telf: +34.933.160.514
> fax: +34.933.160.550
>
>
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
The information in this e-mail is intended only for the ...{{dropped:18}}
There are some good ideas here that would provide enhancement to BiocPkgTools. I don't have the bandwidth to incorporate right now, but filing issues or a pull request with a skeleton would be helpful to keep track. Sean On Sun, Feb 9, 2020 at 7:31 AM Vincent Carey <stvjc at channing.harvard.edu> wrote:
On Sat, Feb 8, 2020 at 12:02 PM Martin Morgan <mtmorgan.bioc at gmail.com> wrote:
I find it quite interesting to identify formal strategies for removing dependencies, but also a little outside my domain of expertise. This code
It would be nice to collect the ideas in this thread into some recommendations. The themes I am thinking of are "how developers can make their packages robust to loss of external packages" and "how can the Bioc ecosystem best deal with departures of packages from itself and from CRAN?" A good and well-adopted solution to the first one makes the second one moot. Two CRAN-related events I know of that required some effort are (temporary) loss of ashr and (recently) archiving of Seurat.
library(tools)
library(dplyr)
## non-base packages the user requires for GenomicScores
deps <- package_dependencies("GenomicScores", db, recursive=TRUE)[[1]]
deps <- intersect(deps, rownames(db))
## only need the 'universe' of GenomicScores dependencies
db1 <- db[c("GenomicScores", deps),]
## sub-graph of packages between each dependency and GenomicScores
revdeps <- package_dependencies(deps, db1, recursive = TRUE, reverse =
TRUE)
tibble(
package = names(olap),
n_remove = lengths(revdeps),
) %>%
arrange(n_remove)
produces a tibble
# A tibble: 106 x 2
package n_remove
<chr> <int>
1 BSgenome 1
2 AnnotationHub 1
3 shinyjs 1
4 DT 1
5 shinycustomloader 1
6 data.table 1
7 shinythemes 1
8 rtracklayer 2
9 BiocFileCache 2
10 BiocManager 2
# ? with 96 more rows
shows me, via n_remove, that I can remove the dependency on AnnotationHub
by removing the dependency on just one package (AnnotationHub!), but to
remove BiocFileCache I'd also have to remove another package
(AnnotationHub, I'd guess). So this provides some measure of the ease
with
which a package can be removed. I'd like a 'benefit' column, too -- if I were to remove AnnotationHub,
how
many additional packages would I also be able to remove, because they are present only to satisfy the dependency on AnnotationHub? More generally, perhaps there is a dependency of AnnotationHub that is only used by AnnotationHub and BSgenome. So removing AnnotationHub as a dependency
would
make it easier to remove BSgenome, etc. I guess this is a graph optimization problem. Probably also worth mentioning the itdepends package ( https://github.com/r-lib/itdepends), which I think tries primarily to determine the relationship between package dependencies and lines of
code,
which seems like complementary information.
Martin
?On 2/6/20, 12:29 PM, "Robert Castelo" <robert.castelo at upf.edu> wrote:
true, i was just searching for the shortest path, we can search for
all
simple (i.e., without repeating "vertices") paths and there are up to
five routes from "GenomicScores" to "Matrix"
igraph::all_simple_paths(igraph::igraph.from.graphNEL(g),
from="GenomicScores", to="Matrix", mode="out")
[[1]]
+ 7/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment DelayedArray
[7] Matrix
[[2]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment Matrix
[[3]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores DT crosstalk ggplot2 mgcv
[6] Matrix
[[4]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment DelayedArray Matrix
[[5]]
+ 5/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment Matrix
this is interesting, because it means that if i wanted to get rid of
the
"Matrix" dependence i'd need to get rid not only of the "rtracklayer"
dependence but also of "BSgenome" and "DT".
robert.
On 2/6/20 5:41 PM, Martin Morgan wrote:
> Excellent! I think there are other, independent, paths between your
immediate dependents...
>
> RBGL::sp.between(g, start="DT", finish="Matrix",
detail=TRUE)[[1]]$path_detail
> [1] "DT" "crosstalk" "ggplot2" "mgcv" "Matrix"
>
> ??
>
> Martin
>
> ?On 2/6/20, 10:47 AM, "Robert Castelo" <robert.castelo at upf.edu>
wrote:
>
> hi Martin,
>
> thanks for hint!! i wasn't aware of
'tools::package_dependencies()',
> adding a bit of graph sorcery i get the result i was looking
for:
>
> repos <- BiocManager::repositories()[c(1,5)]
> repos
> BioCsoft
> "https://bioconductor.org/packages/3.11/bioc"
> CRAN
> "https://cran.rstudio.com"
>
> db <- available.packages(repos=repos)
>
> deps <- tools::package_dependencies("GenomicScores", db,
> recursive=TRUE)[[1]]
>
> deps <- tools::package_dependencies(c("GenomicScores", deps),
db)
>
> g <- graph::graphNEL(nodes=names(deps), edgeL=deps,
edgemode="directed")
>
> RBGL::sp.between(g, start="GenomicScores", finish="Matrix",
> detail=TRUE)[[1]]$path_detail
> [1] "GenomicScores" "rtracklayer"
"GenomicAlignments"
> [4] "SummarizedExperiment" "Matrix"
>
> so, it was the rtracklayer dependency that leads to Matrix
through
> GenomeAlignments and SummarizedExperiment.
>
> maybe the BioC package 'pkgDepTools' should be deprecated if
its
> functionality is part of 'tools' and it does not even work as
fast and
> correct as 'tools'.
>
> cheers,
>
> robert.
>
>
> On 2/6/20 2:51 PM, Martin Morgan wrote:
> > The first thing is to get the correct repositories
> >
> > repos = BiocManager::repositories()
> >
> > (maybe trim the experiment and annotation repos from this).
I
also tried pkgDepTools::makeDepGraph() but it took so long that I moved on... it has an option 'keep.builtin' which might include Matrix.
> >
> > There is also BiocPkgTools::buildPkgDependencyDataFrame() &
friends, but this seems to build dependencies within a single
repository...
> >
> > The building block for a solution is
`tools::package_dependencies()`, and I can confirm that "Matrix" _is_ a dependency
> >
> > db = available.packages(repos =
BiocManager::repositories())
> > revdeps <- tools::package_dependencies("GenomicScores",
db, recursive = TRUE)
> > "Matrix" %in% revdeps[[1]]
> > ## [1] TRUE
> >
> > so I'll leave the clever recursive or graph-based algorithm
up to you, to report back to the mailing list?
> >
> > For what it's worth I think the last time this came up
Martin
Maechler pointed to a function in base R (probably the tools package)
that
implements this, too...?
> >
> > Martin Morgan
> >
> > ?On 2/6/20, 6:40 AM, "Bioc-devel on behalf of Robert
Castelo"
<bioc-devel-bounces at r-project.org on behalf of robert.castelo at upf.edu> wrote:
> >
> > hi,
> >
> > when i load the package 'GenomicScores' in a clean
session i see thorugh
> > the 'sessionInfo()' that the package 'Matrix' is listed
under "loaded
> > via a namespace (and not attached)".
> >
> > i'd like to know what is the dependency that
'GenomicsScores' has that
> > ends up requiring the package 'Matrix'.
> >
> > i've tried using the package 'pkgDepTools' without
success, because the
> > dependency graph does not list any path from
'GenomicScores' to 'Matrix'.
> >
> > i've been manually browsing the Bioc website and,
unless
i've overlooked
> > something, the only association with 'Matrix' i could
find is that
> > 'S4Vectors' and 'GenomicRanges', which are required by
'GenomicScores',
> > list 'Matrix' in the 'Suggests' field, but my
understanding is that
> > those packages are not required and should not be
loaded.
> >
> > so, is there any way in which i can figure out what of
the
> > 'GenomicScores' dependencies leads to loading the
package 'Matrix'?
> >
> > here are the depends, import and suggests fields from
'GenomicScores':
> >
> > Depends: R (>= 3.5), S4Vectors (>= 0.7.21),
GenomicRanges, methods,
> > BiocGenerics (>= 0.13.8)
> > Imports: utils, XML, Biobase, IRanges (>= 2.3.23),
Biostrings,
> > BSgenome, GenomeInfoDb, AnnotationHub, shiny,
shinyjs,
> > DT, shinycustomloader, rtracklayer, data.table,
shinythemes
> > Suggests: BiocStyle, knitr, rmarkdown,
BSgenome.Hsapiens.UCSC.hg19,
> > phastCons100way.UCSC.hg19,
MafDb.1Kgenomes.phase1.hs37d5,
> > SNPlocs.Hsapiens.dbSNP144.GRCh37,
VariantAnnotation,
> > TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat,
RColorBrewer
> >
> > and here a session information in a fresh R-devel
session after loading
> > the package 'GenomicScores':
> >
> > R Under development (unstable) (2020-01-29 r77745)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: CentOS Linux 7 (Core)
> >
> > Matrix products: default
> > BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
> > LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
> > [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
> > [7] LC_PAPER=en_US.UTF8 LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel stats4 stats graphics grDevices
utils datasets
> > [8] methods base
> >
> > other attached packages:
> > [1] GenomicScores_1.11.4 GenomicRanges_1.39.2
GenomeInfoDb_1.23.10
> > [4] IRanges_2.21.3 S4Vectors_0.25.12
BiocGenerics_0.33.0
> > [7] colorout_1.2-2
> >
> > loaded via a namespace (and not attached):
> > [1] Rcpp_1.0.3 lattice_0.20-38
> > [3] shinycustomloader_0.9.0 Rsamtools_2.3.3
> > [5] Biostrings_2.55.4 assertthat_0.2.1
> > [7] digest_0.6.23 mime_0.9
> > [9] BiocFileCache_1.11.4 R6_2.4.1
> > [11] RSQLite_2.2.0 httr_1.4.1
> > [13] pillar_1.4.3 zlibbioc_1.33.1
> > [15] rlang_0.4.4 curl_4.3
> > [17] data.table_1.12.8 blob_1.2.1
> > [19] DT_0.12 Matrix_1.2-18
> > [21] shinythemes_1.1.2 shinyjs_1.1
> > [23] BiocParallel_1.21.2 AnnotationHub_2.19.7
> > [25] htmlwidgets_1.5.1 RCurl_1.98-1.1
> > [27] bit_1.1-15.1 shiny_1.4.0
> > [29] DelayedArray_0.13.3 compiler_4.0.0
> > [31] httpuv_1.5.2 rtracklayer_1.47.0
> > [33] pkgconfig_2.0.3 htmltools_0.4.0
> > [35] tidyselect_1.0.0
SummarizedExperiment_1.17.1
> > [37] tibble_2.1.3
GenomeInfoDbData_1.2.2
> > [39] interactiveDisplayBase_1.25.0 matrixStats_0.55.0
> > [41] XML_3.99-0.3 crayon_1.3.4
> > [43] dplyr_0.8.4 dbplyr_1.4.2
> > [45] later_1.0.0
GenomicAlignments_1.23.1
> > [47] bitops_1.0-6 rappdirs_0.3.1
> > [49] grid_4.0.0 xtable_1.8-4
> > [51] DBI_1.1.0 magrittr_1.5
> > [53] XVector_0.27.0 promises_1.1.0
> > [55] vctrs_0.2.2 tools_4.0.0
> > [57] bit64_0.9-7 BSgenome_1.55.3
> > [59] Biobase_2.47.2 glue_1.3.1
> > [61] purrr_0.3.3 BiocVersion_3.11.1
> > [63] fastmap_1.0.1 yaml_2.2.1
> > [65] AnnotationDbi_1.49.1 BiocManager_1.30.10
> > [67] memoise_1.1.0
> >
> >
> >
> > thanks!!
> >
> > robert.
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
>
> --
> Robert Castelo, PhD
> Associate Professor
> Dept. of Experimental and Health Sciences
> Universitat Pompeu Fabra (UPF)
> Barcelona Biomedical Research Park (PRBB)
> Dr Aiguader 88
> E-08003 Barcelona, Spain
> telf: +34.933.160.514
> fax: +34.933.160.550
>
>
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
The information in this e-mail is intended only for th...{{dropped:20}}
3 days later
Martin, Vince, Sean, thank you very much for your comments and suggestions, i've looked at the package 'itdepends' from Jim Hester, this was a great suggestion. i actually found a talk he gave about it on rstudioconf2019, here: https://resources.rstudio.com/rstudio-conf-2019/it-depends-a-dialog-about-dependencies i recommend watching it to anyone interested in this thread, i think pretty much tackles the most important issues we're concerned as developers, regarding dependencies. ironically, the package 'itdepends' doesn't seem to be actively developed: it's not part of CRAN, the GitHub repo hasn't been updated in the last 5 months, it has 10 open issues for 5 closed ones and i've experienced that some functions break in the current R-devel. i also didn't know about 'BiocPkgTools' and this seems to be the right home for adding the kind of functionality we're talking about, although i would think the same for 'itdepends' if it would be pushed to CRAN at some point. i've invested some time to develop what it constitutes at the moment my own needs on this subject. in case this is useful to anyone i've made a GitHub gist available here: https://gist.github.com/rcastelo/7429d05178ddb57a38bd42093c2ddfe2 i haven't attempted to integrate this into 'BiocPkgTools' and do a pull request because of two reasons: 1. if i try to fetch the dependencies from CRAN, as well as from BioC (which is the only default), i get an error: library(BiocPkgTools) df <- buildPkgDependencyDataFrame(repo=c("BioCsoft", "CRAN")) Error in url(viewsFileUrl) : invalid 'description' argument 2. because some of the calls break 'itdepends' in R-devel, this would also break 'BiocPkgTools' in R-devel. i'm also not sure how feasible it is for a BioC package to have a package dependency outside CRAN and BioC. my initial motivation for all this was that the installation of 'GenomicScores' was breaking in one of our servers because of compilation problems with the package 'Matrix'. this was surprising to me because i wasn't expecting to have that dependency. after the first exchange of messages in this thread, using the code we wrote, i identified that only a few lines in the source of 'GenomicScores' were leading to that dependency upstream. i could replace them and get rid of that dependency and actually other ones. i've tried to provide a first attempt for a general approach to this situation. first we should source the gist: devtools::source_gist("rcastelo/depburden.R") then build a database of dependencies information: repos <- BiocManager::repositories()[c("BioCsoft", "CRAN")] db <- utils::available.packages(repos=repos) and now the important part consists of the following three steps: 1. identify the burden of dependencies of a package, e.g., "GenomicScores" pkgDepMetrics("GenomicScores", db) ImportedBy Exported Usage DepOverlap Biobase 1 128 0.781250 0.0250 BSgenome 1 93 1.075269 0.3625 XML 2 175 1.142857 0.0125 IRanges 4 254 1.574803 0.0375 BiocGenerics 5 139 3.597122 0.0125 GenomicRanges 4 104 3.846154 0.1125 S4Vectors 11 262 4.198473 0.0250 GenomeInfoDb 5 53 9.433962 0.0750 AnnotationHub 4 33 12.121212 0.6875 Biostrings NA 240 NA 0.0750 following Jim's recommendations on his talk, concretely those in minute 16, this function reports the number of function calls to a dependency and the number of exported functions by that dependency. the column 'Usage' is the percentage of those imported calls to the exposed functionality by the dependency. for instance, if i want to get rid of 'AnnotationHub' i'd have to implement in my package about the 12% of the functionality exported by 'AnnotationHub'. the column 'DepOverlap' shows the overlap between the dependency graph of the analyzed package and the dependency graph of the dependency in that row. this is calculated as a Jaccard index (intersection of vertices divided by the union) where 0 would correspond to disjoint graphs and 1 to identical ones. from these numbers i can see that, for instance, i'm importing just one function call from 'BSgenome' (about 1% of its functionality), while the dependency burden of 'BSGenome' overlaps more than 1/3 of the total burden of the package. this is to me a good candidate to explore in the following two steps. 2.let's say we want to investigate what function calls are responsible for the dependency on "BSgenome" funCalls2Dep("GenomicScores", "BSgenome", db) # A tibble: 1 x 3 # Groups: pkg [1] pkg fun n <chr> <chr> <int> 1 BSgenome referenceGenome 4 so i'm using a function or method called "referenceGenome" imported from "BSgenome" 3. we want now to see what lines in our code contain those function calls (assuming we're in the source path of the package "GenomicScores"): lines <- funCalls2Dep("GenomicScores", "BSgenome", db, ".", "R") head(lines, 2) [[1]] R/makeGScoresPackage.R:60:68: warning: BSgenome::referenceGenome organism(gsco), providerVersion(referenceGenome(gsco))), ^~~~~~~~~~~~~~~ [[2]] R/makeGScoresPackage.R:69:49: warning: BSgenome::referenceGenome GENOMEVERSION=providerVersion(referenceGenome(gsco)), ^~~~~~~~~~~~~~~ here i'm using the release version of R because otherwise, as i said before, some of the function calls to the 'itdepends' package break. i'd be happy to pull-request this code, with the necessary adaptations, wherever the community feels is more appropriate, but i'd say that the problem with 'itdepends' and R-devel should be fixed first, and then we can decide if this is something we want to incorporate into an API and from what package. cheers, robert.
On 2/9/20 5:01 PM, Sean Davis wrote:
There are some good ideas here that would provide enhancement to BiocPkgTools. I don't have the bandwidth to incorporate right now, but filing issues or a pull request with a skeleton would be helpful to keep track. Sean On Sun, Feb 9, 2020 at 7:31 AM Vincent Carey <stvjc at channing.harvard.edu> wrote:
On Sat, Feb 8, 2020 at 12:02 PM Martin Morgan <mtmorgan.bioc at gmail.com> wrote:
I find it quite interesting to identify formal strategies for removing dependencies, but also a little outside my domain of expertise. This code
It would be nice to collect the ideas in this thread into some recommendations. The themes I am thinking of are "how developers can make their packages robust to loss of external packages" and "how can the Bioc ecosystem best deal with departures of packages from itself and from CRAN?" A good and well-adopted solution to the first one makes the second one moot. Two CRAN-related events I know of that required some effort are (temporary) loss of ashr and (recently) archiving of Seurat.
library(tools)
library(dplyr)
## non-base packages the user requires for GenomicScores
deps <- package_dependencies("GenomicScores", db, recursive=TRUE)[[1]]
deps <- intersect(deps, rownames(db))
## only need the 'universe' of GenomicScores dependencies
db1 <- db[c("GenomicScores", deps),]
## sub-graph of packages between each dependency and GenomicScores
revdeps <- package_dependencies(deps, db1, recursive = TRUE, reverse =
TRUE)
tibble(
package = names(olap),
n_remove = lengths(revdeps),
) %>%
arrange(n_remove)
produces a tibble
# A tibble: 106 x 2
package n_remove
<chr> <int>
1 BSgenome 1
2 AnnotationHub 1
3 shinyjs 1
4 DT 1
5 shinycustomloader 1
6 data.table 1
7 shinythemes 1
8 rtracklayer 2
9 BiocFileCache 2
10 BiocManager 2
# ? with 96 more rows
shows me, via n_remove, that I can remove the dependency on AnnotationHub
by removing the dependency on just one package (AnnotationHub!), but to
remove BiocFileCache I'd also have to remove another package
(AnnotationHub, I'd guess). So this provides some measure of the ease
with
which a package can be removed. I'd like a 'benefit' column, too -- if I were to remove AnnotationHub,
how
many additional packages would I also be able to remove, because they are present only to satisfy the dependency on AnnotationHub? More generally, perhaps there is a dependency of AnnotationHub that is only used by AnnotationHub and BSgenome. So removing AnnotationHub as a dependency
would
make it easier to remove BSgenome, etc. I guess this is a graph optimization problem. Probably also worth mentioning the itdepends package ( https://github.com/r-lib/itdepends), which I think tries primarily to determine the relationship between package dependencies and lines of
code,
which seems like complementary information.
Martin
?On 2/6/20, 12:29 PM, "Robert Castelo" <robert.castelo at upf.edu> wrote:
true, i was just searching for the shortest path, we can search for
all
simple (i.e., without repeating "vertices") paths and there are up to
five routes from "GenomicScores" to "Matrix"
igraph::all_simple_paths(igraph::igraph.from.graphNEL(g),
from="GenomicScores", to="Matrix", mode="out")
[[1]]
+ 7/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment DelayedArray
[7] Matrix
[[2]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment Matrix
[[3]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores DT crosstalk ggplot2 mgcv
[6] Matrix
[[4]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment DelayedArray Matrix
[[5]]
+ 5/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment Matrix
this is interesting, because it means that if i wanted to get rid of
the
"Matrix" dependence i'd need to get rid not only of the "rtracklayer"
dependence but also of "BSgenome" and "DT".
robert.
On 2/6/20 5:41 PM, Martin Morgan wrote:
> Excellent! I think there are other, independent, paths between your
immediate dependents...
>
> RBGL::sp.between(g, start="DT", finish="Matrix",
detail=TRUE)[[1]]$path_detail
> [1] "DT" "crosstalk" "ggplot2" "mgcv" "Matrix"
>
> ??
>
> Martin
>
> ?On 2/6/20, 10:47 AM, "Robert Castelo" <robert.castelo at upf.edu>
wrote:
>
> hi Martin,
>
> thanks for hint!! i wasn't aware of
'tools::package_dependencies()',
> adding a bit of graph sorcery i get the result i was looking
for:
>
> repos <- BiocManager::repositories()[c(1,5)]
> repos
> BioCsoft
> "https://bioconductor.org/packages/3.11/bioc"
> CRAN
> "https://cran.rstudio.com"
>
> db <- available.packages(repos=repos)
>
> deps <- tools::package_dependencies("GenomicScores", db,
> recursive=TRUE)[[1]]
>
> deps <- tools::package_dependencies(c("GenomicScores", deps),
db)
>
> g <- graph::graphNEL(nodes=names(deps), edgeL=deps,
edgemode="directed")
>
> RBGL::sp.between(g, start="GenomicScores", finish="Matrix",
> detail=TRUE)[[1]]$path_detail
> [1] "GenomicScores" "rtracklayer"
"GenomicAlignments"
> [4] "SummarizedExperiment" "Matrix"
>
> so, it was the rtracklayer dependency that leads to Matrix
through
> GenomeAlignments and SummarizedExperiment.
>
> maybe the BioC package 'pkgDepTools' should be deprecated if
its
> functionality is part of 'tools' and it does not even work as
fast and
> correct as 'tools'.
>
> cheers,
>
> robert.
>
>
> On 2/6/20 2:51 PM, Martin Morgan wrote:
> > The first thing is to get the correct repositories
> >
> > repos = BiocManager::repositories()
> >
> > (maybe trim the experiment and annotation repos from this).
I
also tried pkgDepTools::makeDepGraph() but it took so long that I moved on... it has an option 'keep.builtin' which might include Matrix.
> >
> > There is also BiocPkgTools::buildPkgDependencyDataFrame() &
friends, but this seems to build dependencies within a single
repository...
> >
> > The building block for a solution is
`tools::package_dependencies()`, and I can confirm that "Matrix" _is_ a dependency
> >
> > db = available.packages(repos =
BiocManager::repositories())
> > revdeps <- tools::package_dependencies("GenomicScores",
db, recursive = TRUE)
> > "Matrix" %in% revdeps[[1]]
> > ## [1] TRUE
> >
> > so I'll leave the clever recursive or graph-based algorithm
up to you, to report back to the mailing list?
> >
> > For what it's worth I think the last time this came up
Martin
Maechler pointed to a function in base R (probably the tools package)
that
implements this, too...?
> >
> > Martin Morgan
> >
> > ?On 2/6/20, 6:40 AM, "Bioc-devel on behalf of Robert
Castelo"
<bioc-devel-bounces at r-project.org on behalf of robert.castelo at upf.edu> wrote:
> >
> > hi,
> >
> > when i load the package 'GenomicScores' in a clean
session i see thorugh
> > the 'sessionInfo()' that the package 'Matrix' is listed
under "loaded
> > via a namespace (and not attached)".
> >
> > i'd like to know what is the dependency that
'GenomicsScores' has that
> > ends up requiring the package 'Matrix'.
> >
> > i've tried using the package 'pkgDepTools' without
success, because the
> > dependency graph does not list any path from
'GenomicScores' to 'Matrix'.
> >
> > i've been manually browsing the Bioc website and,
unless
i've overlooked
> > something, the only association with 'Matrix' i could
find is that
> > 'S4Vectors' and 'GenomicRanges', which are required by
'GenomicScores',
> > list 'Matrix' in the 'Suggests' field, but my
understanding is that
> > those packages are not required and should not be
loaded.
> >
> > so, is there any way in which i can figure out what of
the
> > 'GenomicScores' dependencies leads to loading the
package 'Matrix'?
> >
> > here are the depends, import and suggests fields from
'GenomicScores':
> >
> > Depends: R (>= 3.5), S4Vectors (>= 0.7.21),
GenomicRanges, methods,
> > BiocGenerics (>= 0.13.8)
> > Imports: utils, XML, Biobase, IRanges (>= 2.3.23),
Biostrings,
> > BSgenome, GenomeInfoDb, AnnotationHub, shiny,
shinyjs,
> > DT, shinycustomloader, rtracklayer, data.table,
shinythemes
> > Suggests: BiocStyle, knitr, rmarkdown,
BSgenome.Hsapiens.UCSC.hg19,
> > phastCons100way.UCSC.hg19,
MafDb.1Kgenomes.phase1.hs37d5,
> > SNPlocs.Hsapiens.dbSNP144.GRCh37,
VariantAnnotation,
> > TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat,
RColorBrewer
> >
> > and here a session information in a fresh R-devel
session after loading
> > the package 'GenomicScores':
> >
> > R Under development (unstable) (2020-01-29 r77745)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: CentOS Linux 7 (Core)
> >
> > Matrix products: default
> > BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
> > LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
> > [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
> > [7] LC_PAPER=en_US.UTF8 LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel stats4 stats graphics grDevices
utils datasets
> > [8] methods base
> >
> > other attached packages:
> > [1] GenomicScores_1.11.4 GenomicRanges_1.39.2
GenomeInfoDb_1.23.10
> > [4] IRanges_2.21.3 S4Vectors_0.25.12
BiocGenerics_0.33.0
> > [7] colorout_1.2-2
> >
> > loaded via a namespace (and not attached):
> > [1] Rcpp_1.0.3 lattice_0.20-38
> > [3] shinycustomloader_0.9.0 Rsamtools_2.3.3
> > [5] Biostrings_2.55.4 assertthat_0.2.1
> > [7] digest_0.6.23 mime_0.9
> > [9] BiocFileCache_1.11.4 R6_2.4.1
> > [11] RSQLite_2.2.0 httr_1.4.1
> > [13] pillar_1.4.3 zlibbioc_1.33.1
> > [15] rlang_0.4.4 curl_4.3
> > [17] data.table_1.12.8 blob_1.2.1
> > [19] DT_0.12 Matrix_1.2-18
> > [21] shinythemes_1.1.2 shinyjs_1.1
> > [23] BiocParallel_1.21.2 AnnotationHub_2.19.7
> > [25] htmlwidgets_1.5.1 RCurl_1.98-1.1
> > [27] bit_1.1-15.1 shiny_1.4.0
> > [29] DelayedArray_0.13.3 compiler_4.0.0
> > [31] httpuv_1.5.2 rtracklayer_1.47.0
> > [33] pkgconfig_2.0.3 htmltools_0.4.0
> > [35] tidyselect_1.0.0
SummarizedExperiment_1.17.1
> > [37] tibble_2.1.3
GenomeInfoDbData_1.2.2
> > [39] interactiveDisplayBase_1.25.0 matrixStats_0.55.0
> > [41] XML_3.99-0.3 crayon_1.3.4
> > [43] dplyr_0.8.4 dbplyr_1.4.2
> > [45] later_1.0.0
GenomicAlignments_1.23.1
> > [47] bitops_1.0-6 rappdirs_0.3.1
> > [49] grid_4.0.0 xtable_1.8-4
> > [51] DBI_1.1.0 magrittr_1.5
> > [53] XVector_0.27.0 promises_1.1.0
> > [55] vctrs_0.2.2 tools_4.0.0
> > [57] bit64_0.9-7 BSgenome_1.55.3
> > [59] Biobase_2.47.2 glue_1.3.1
> > [61] purrr_0.3.3 BiocVersion_3.11.1
> > [63] fastmap_1.0.1 yaml_2.2.1
> > [65] AnnotationDbi_1.49.1 BiocManager_1.30.10
> > [67] memoise_1.1.0
> >
> >
> >
> > thanks!!
> >
> > robert.
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
>
> --
> Robert Castelo, PhD
> Associate Professor
> Dept. of Experimental and Health Sciences
> Universitat Pompeu Fabra (UPF)
> Barcelona Biomedical Research Park (PRBB)
> Dr Aiguader 88
> E-08003 Barcelona, Spain
> telf: +34.933.160.514
> fax: +34.933.160.550
>
>
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
The information in this e-mail is intended only for th...{{dropped:20}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550
1 day later
i think this is an interesting analysis i have not viewed talk but it seems to me this could be a nice r journal paper querying jim h on fate of itdepends seems in order the topic is central to robustness of ecosystem so i hope some tools can come out of this On Wed, Feb 12, 2020 at 12:13 PM Robert Castelo <robert.castelo at upf.edu> wrote:
Martin, Vince, Sean, thank you very much for your comments and suggestions, i've looked at the package 'itdepends' from Jim Hester, this was a great suggestion. i actually found a talk he gave about it on rstudioconf2019, here: https://resources.rstudio.com/rstudio-conf-2019/it-depends-a-dialog-about-dependencies i recommend watching it to anyone interested in this thread, i think pretty much tackles the most important issues we're concerned as developers, regarding dependencies. ironically, the package 'itdepends' doesn't seem to be actively developed: it's not part of CRAN, the GitHub repo hasn't been updated in the last 5 months, it has 10 open issues for 5 closed ones and i've experienced that some functions break in the current R-devel. i also didn't know about 'BiocPkgTools' and this seems to be the right home for adding the kind of functionality we're talking about, although i would think the same for 'itdepends' if it would be pushed to CRAN at some point. i've invested some time to develop what it constitutes at the moment my own needs on this subject. in case this is useful to anyone i've made a GitHub gist available here: https://gist.github.com/rcastelo/7429d05178ddb57a38bd42093c2ddfe2 i haven't attempted to integrate this into 'BiocPkgTools' and do a pull request because of two reasons: 1. if i try to fetch the dependencies from CRAN, as well as from BioC (which is the only default), i get an error: library(BiocPkgTools) df <- buildPkgDependencyDataFrame(repo=c("BioCsoft", "CRAN")) Error in url(viewsFileUrl) : invalid 'description' argument 2. because some of the calls break 'itdepends' in R-devel, this would also break 'BiocPkgTools' in R-devel. i'm also not sure how feasible it is for a BioC package to have a package dependency outside CRAN and BioC. my initial motivation for all this was that the installation of 'GenomicScores' was breaking in one of our servers because of compilation problems with the package 'Matrix'. this was surprising to me because i wasn't expecting to have that dependency. after the first exchange of messages in this thread, using the code we wrote, i identified that only a few lines in the source of 'GenomicScores' were leading to that dependency upstream. i could replace them and get rid of that dependency and actually other ones. i've tried to provide a first attempt for a general approach to this situation. first we should source the gist: devtools::source_gist("rcastelo/depburden.R") then build a database of dependencies information: repos <- BiocManager::repositories()[c("BioCsoft", "CRAN")] db <- utils::available.packages(repos=repos) and now the important part consists of the following three steps: 1. identify the burden of dependencies of a package, e.g., "GenomicScores" pkgDepMetrics("GenomicScores", db) ImportedBy Exported Usage DepOverlap Biobase 1 128 0.781250 0.0250 BSgenome 1 93 1.075269 0.3625 XML 2 175 1.142857 0.0125 IRanges 4 254 1.574803 0.0375 BiocGenerics 5 139 3.597122 0.0125 GenomicRanges 4 104 3.846154 0.1125 S4Vectors 11 262 4.198473 0.0250 GenomeInfoDb 5 53 9.433962 0.0750 AnnotationHub 4 33 12.121212 0.6875 Biostrings NA 240 NA 0.0750 following Jim's recommendations on his talk, concretely those in minute 16, this function reports the number of function calls to a dependency and the number of exported functions by that dependency. the column 'Usage' is the percentage of those imported calls to the exposed functionality by the dependency. for instance, if i want to get rid of 'AnnotationHub' i'd have to implement in my package about the 12% of the functionality exported by 'AnnotationHub'. the column 'DepOverlap' shows the overlap between the dependency graph of the analyzed package and the dependency graph of the dependency in that row. this is calculated as a Jaccard index (intersection of vertices divided by the union) where 0 would correspond to disjoint graphs and 1 to identical ones. from these numbers i can see that, for instance, i'm importing just one function call from 'BSgenome' (about 1% of its functionality), while the dependency burden of 'BSGenome' overlaps more than 1/3 of the total burden of the package. this is to me a good candidate to explore in the following two steps. 2.let's say we want to investigate what function calls are responsible for the dependency on "BSgenome" funCalls2Dep("GenomicScores", "BSgenome", db) # A tibble: 1 x 3 # Groups: pkg [1] pkg fun n <chr> <chr> <int> 1 BSgenome referenceGenome 4 so i'm using a function or method called "referenceGenome" imported from "BSgenome" 3. we want now to see what lines in our code contain those function calls (assuming we're in the source path of the package "GenomicScores"): lines <- funCalls2Dep("GenomicScores", "BSgenome", db, ".", "R") head(lines, 2) [[1]] R/makeGScoresPackage.R:60:68: warning: BSgenome::referenceGenome organism(gsco), providerVersion(referenceGenome(gsco))), ^~~~~~~~~~~~~~~ [[2]] R/makeGScoresPackage.R:69:49: warning: BSgenome::referenceGenome GENOMEVERSION=providerVersion(referenceGenome(gsco)), ^~~~~~~~~~~~~~~ here i'm using the release version of R because otherwise, as i said before, some of the function calls to the 'itdepends' package break. i'd be happy to pull-request this code, with the necessary adaptations, wherever the community feels is more appropriate, but i'd say that the problem with 'itdepends' and R-devel should be fixed first, and then we can decide if this is something we want to incorporate into an API and from what package. cheers, robert. On 2/9/20 5:01 PM, Sean Davis wrote:
There are some good ideas here that would provide enhancement to BiocPkgTools. I don't have the bandwidth to incorporate right now, but filing issues or a pull request with a skeleton would be helpful to keep track. Sean On Sun, Feb 9, 2020 at 7:31 AM Vincent Carey <stvjc at channing.harvard.edu wrote:
On Sat, Feb 8, 2020 at 12:02 PM Martin Morgan <mtmorgan.bioc at gmail.com> wrote:
I find it quite interesting to identify formal strategies for removing dependencies, but also a little outside my domain of expertise. This
code
It would be nice to collect the ideas in this thread into some recommendations. The themes I am thinking of are "how developers can make their packages robust to loss of external packages" and "how can the Bioc ecosystem best deal with departures of packages from itself and
from
CRAN?" A good and well-adopted solution to the first one makes the second one moot. Two CRAN-related events I know of that required some effort are
(temporary)
loss of ashr and (recently) archiving of Seurat.
library(tools)
library(dplyr)
## non-base packages the user requires for GenomicScores
deps <- package_dependencies("GenomicScores", db, recursive=TRUE)[[1]]
deps <- intersect(deps, rownames(db))
## only need the 'universe' of GenomicScores dependencies
db1 <- db[c("GenomicScores", deps),]
## sub-graph of packages between each dependency and GenomicScores
revdeps <- package_dependencies(deps, db1, recursive = TRUE, reverse =
TRUE)
tibble(
package = names(olap),
n_remove = lengths(revdeps),
) %>%
arrange(n_remove)
produces a tibble
# A tibble: 106 x 2
package n_remove
<chr> <int>
1 BSgenome 1
2 AnnotationHub 1
3 shinyjs 1
4 DT 1
5 shinycustomloader 1
6 data.table 1
7 shinythemes 1
8 rtracklayer 2
9 BiocFileCache 2
10 BiocManager 2
# ? with 96 more rows
shows me, via n_remove, that I can remove the dependency on
AnnotationHub
by removing the dependency on just one package (AnnotationHub!), but to remove BiocFileCache I'd also have to remove another package (AnnotationHub, I'd guess). So this provides some measure of the ease
with
which a package can be removed. I'd like a 'benefit' column, too -- if I were to remove AnnotationHub,
how
many additional packages would I also be able to remove, because they
are
present only to satisfy the dependency on AnnotationHub? More
generally,
perhaps there is a dependency of AnnotationHub that is only used by AnnotationHub and BSgenome. So removing AnnotationHub as a dependency
would
make it easier to remove BSgenome, etc. I guess this is a graph optimization problem. Probably also worth mentioning the itdepends package ( https://github.com/r-lib/itdepends), which I think tries primarily to determine the relationship between package dependencies and lines of
code,
which seems like complementary information.
Martin
?On 2/6/20, 12:29 PM, "Robert Castelo" <robert.castelo at upf.edu> wrote:
true, i was just searching for the shortest path, we can search
for
all
simple (i.e., without repeating "vertices") paths and there are
up to
five routes from "GenomicScores" to "Matrix"
igraph::all_simple_paths(igraph::igraph.from.graphNEL(g),
from="GenomicScores", to="Matrix", mode="out")
[[1]]
+ 7/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment DelayedArray
[7] Matrix
[[2]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment Matrix
[[3]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores DT crosstalk ggplot2 mgcv
[6] Matrix
[[4]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment DelayedArray Matrix
[[5]]
+ 5/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment Matrix
this is interesting, because it means that if i wanted to get rid
of
the
"Matrix" dependence i'd need to get rid not only of the
"rtracklayer"
dependence but also of "BSgenome" and "DT".
robert.
On 2/6/20 5:41 PM, Martin Morgan wrote:
> Excellent! I think there are other, independent, paths between
your
immediate dependents...
>
> RBGL::sp.between(g, start="DT", finish="Matrix",
detail=TRUE)[[1]]$path_detail
> [1] "DT" "crosstalk" "ggplot2" "mgcv" "Matrix"
>
> ??
>
> Martin
>
> ?On 2/6/20, 10:47 AM, "Robert Castelo" <robert.castelo at upf.edu>
wrote:
>
> hi Martin,
>
> thanks for hint!! i wasn't aware of
'tools::package_dependencies()',
> adding a bit of graph sorcery i get the result i was
looking
for:
>
> repos <- BiocManager::repositories()[c(1,5)]
> repos
> BioCsoft
> "https://bioconductor.org/packages/3.11/bioc"
> CRAN
> "https://cran.rstudio.com"
>
> db <- available.packages(repos=repos)
>
> deps <- tools::package_dependencies("GenomicScores", db,
> recursive=TRUE)[[1]]
>
> deps <- tools::package_dependencies(c("GenomicScores",
deps),
db)
>
> g <- graph::graphNEL(nodes=names(deps), edgeL=deps,
edgemode="directed")
>
> RBGL::sp.between(g, start="GenomicScores", finish="Matrix",
> detail=TRUE)[[1]]$path_detail
> [1] "GenomicScores" "rtracklayer"
"GenomicAlignments"
> [4] "SummarizedExperiment" "Matrix"
>
> so, it was the rtracklayer dependency that leads to Matrix
through
> GenomeAlignments and SummarizedExperiment.
>
> maybe the BioC package 'pkgDepTools' should be deprecated
if
its
> functionality is part of 'tools' and it does not even work
as
fast and
> correct as 'tools'.
>
> cheers,
>
> robert.
>
>
> On 2/6/20 2:51 PM, Martin Morgan wrote:
> > The first thing is to get the correct repositories
> >
> > repos = BiocManager::repositories()
> >
> > (maybe trim the experiment and annotation repos from
this).
I
also tried pkgDepTools::makeDepGraph() but it took so long that I moved on... it has an option 'keep.builtin' which might include Matrix.
> >
> > There is also
BiocPkgTools::buildPkgDependencyDataFrame() &
friends, but this seems to build dependencies within a single
repository...
> >
> > The building block for a solution is
`tools::package_dependencies()`, and I can confirm that "Matrix" _is_ a dependency
> >
> > db = available.packages(repos =
BiocManager::repositories())
> > revdeps <-
tools::package_dependencies("GenomicScores",
db, recursive = TRUE)
> > "Matrix" %in% revdeps[[1]]
> > ## [1] TRUE
> >
> > so I'll leave the clever recursive or graph-based
algorithm
up to you, to report back to the mailing list?
> >
> > For what it's worth I think the last time this came up
Martin
Maechler pointed to a function in base R (probably the tools package)
that
implements this, too...?
> >
> > Martin Morgan
> >
> > ?On 2/6/20, 6:40 AM, "Bioc-devel on behalf of Robert
Castelo"
<bioc-devel-bounces at r-project.org on behalf of robert.castelo at upf.edu> wrote:
> >
> > hi,
> >
> > when i load the package 'GenomicScores' in a clean
session i see thorugh
> > the 'sessionInfo()' that the package 'Matrix' is
listed
under "loaded
> > via a namespace (and not attached)".
> >
> > i'd like to know what is the dependency that
'GenomicsScores' has that
> > ends up requiring the package 'Matrix'.
> >
> > i've tried using the package 'pkgDepTools' without
success, because the
> > dependency graph does not list any path from
'GenomicScores' to 'Matrix'.
> >
> > i've been manually browsing the Bioc website and,
unless
i've overlooked
> > something, the only association with 'Matrix' i
could
find is that
> > 'S4Vectors' and 'GenomicRanges', which are required
by
'GenomicScores',
> > list 'Matrix' in the 'Suggests' field, but my
understanding is that
> > those packages are not required and should not be
loaded.
> >
> > so, is there any way in which i can figure out what
of
the
> > 'GenomicScores' dependencies leads to loading the
package 'Matrix'?
> >
> > here are the depends, import and suggests fields
from
'GenomicScores':
> >
> > Depends: R (>= 3.5), S4Vectors (>= 0.7.21),
GenomicRanges, methods,
> > BiocGenerics (>= 0.13.8)
> > Imports: utils, XML, Biobase, IRanges (>= 2.3.23),
Biostrings,
> > BSgenome, GenomeInfoDb, AnnotationHub,
shiny,
shinyjs,
> > DT, shinycustomloader, rtracklayer,
data.table,
shinythemes
> > Suggests: BiocStyle, knitr, rmarkdown,
BSgenome.Hsapiens.UCSC.hg19,
> > phastCons100way.UCSC.hg19,
MafDb.1Kgenomes.phase1.hs37d5,
> > SNPlocs.Hsapiens.dbSNP144.GRCh37,
VariantAnnotation,
> > TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat,
RColorBrewer
> >
> > and here a session information in a fresh R-devel
session after loading
> > the package 'GenomicScores':
> >
> > R Under development (unstable) (2020-01-29 r77745)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: CentOS Linux 7 (Core)
> >
> > Matrix products: default
> > BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
> > LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF8
LC_COLLATE=en_US.UTF8
> > [5] LC_MONETARY=en_US.UTF8
LC_MESSAGES=en_US.UTF8
> > [7] LC_PAPER=en_US.UTF8 LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel stats4 stats graphics
grDevices
utils datasets
> > [8] methods base
> >
> > other attached packages:
> > [1] GenomicScores_1.11.4 GenomicRanges_1.39.2
GenomeInfoDb_1.23.10
> > [4] IRanges_2.21.3 S4Vectors_0.25.12
BiocGenerics_0.33.0
> > [7] colorout_1.2-2
> >
> > loaded via a namespace (and not attached):
> > [1] Rcpp_1.0.3 lattice_0.20-38
> > [3] shinycustomloader_0.9.0 Rsamtools_2.3.3
> > [5] Biostrings_2.55.4 assertthat_0.2.1
> > [7] digest_0.6.23 mime_0.9
> > [9] BiocFileCache_1.11.4 R6_2.4.1
> > [11] RSQLite_2.2.0 httr_1.4.1
> > [13] pillar_1.4.3 zlibbioc_1.33.1
> > [15] rlang_0.4.4 curl_4.3
> > [17] data.table_1.12.8 blob_1.2.1
> > [19] DT_0.12 Matrix_1.2-18
> > [21] shinythemes_1.1.2 shinyjs_1.1
> > [23] BiocParallel_1.21.2
AnnotationHub_2.19.7
> > [25] htmlwidgets_1.5.1 RCurl_1.98-1.1
> > [27] bit_1.1-15.1 shiny_1.4.0
> > [29] DelayedArray_0.13.3 compiler_4.0.0
> > [31] httpuv_1.5.2
rtracklayer_1.47.0
> > [33] pkgconfig_2.0.3 htmltools_0.4.0
> > [35] tidyselect_1.0.0
SummarizedExperiment_1.17.1
> > [37] tibble_2.1.3
GenomeInfoDbData_1.2.2
> > [39] interactiveDisplayBase_1.25.0
matrixStats_0.55.0
> > [41] XML_3.99-0.3 crayon_1.3.4
> > [43] dplyr_0.8.4 dbplyr_1.4.2
> > [45] later_1.0.0
GenomicAlignments_1.23.1
> > [47] bitops_1.0-6 rappdirs_0.3.1
> > [49] grid_4.0.0 xtable_1.8-4
> > [51] DBI_1.1.0 magrittr_1.5
> > [53] XVector_0.27.0 promises_1.1.0
> > [55] vctrs_0.2.2 tools_4.0.0
> > [57] bit64_0.9-7 BSgenome_1.55.3
> > [59] Biobase_2.47.2 glue_1.3.1
> > [61] purrr_0.3.3
BiocVersion_3.11.1
> > [63] fastmap_1.0.1 yaml_2.2.1
> > [65] AnnotationDbi_1.49.1
BiocManager_1.30.10
> > [67] memoise_1.1.0
> >
> >
> >
> > thanks!!
> >
> > robert.
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
>
> --
> Robert Castelo, PhD
> Associate Professor
> Dept. of Experimental and Health Sciences
> Universitat Pompeu Fabra (UPF)
> Barcelona Biomedical Research Park (PRBB)
> Dr Aiguader 88
> E-08003 Barcelona, Spain
> telf: +34.933.160.514
> fax: +34.933.160.550
>
>
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
The information in this e-mail is intended only for th...{{dropped:20}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550
The information in this e-mail is intended only for the ...{{dropped:18}}