For some stupid reason I agreed to take over maintenance of the
now-orphaned `gtools` package ("how much trouble could that be, after
all?"), which has 203 strong reverse dependencies.
CRAN is reporting that exactly **one** of them, dnapath, is
failing reverse dependency checks ("changed to worse").
I am having trouble replicating this (I'm running R-devel, can't
install bioconductor packages on R-devel, working on doing everything in
a Docker container, blah blah blah ...)
In any case I am mystified as to how my new version of the package
could be breaking something in the downstream package, since AFAICT the
*only* functionality that `dnapath` is using from gtools is `permute()`,
which is (1) a one-line wrapper for sample() and (2) hasn't changed in
many years.
I will try to figure out how much more time to invest in this rabbit
hole before continuing the conversation with the CRAN maintainers, but
in the meantime any insights or advice would be welcome.
The problem is in vignette rebuilding, errors of this form in both of
the package vignettes:
Can't join on `x$entrezgene_id` x `y$entrezgene_id` because of
incompatible types.
? `x$entrezgene_id` is of type <double>>.
? `y$entrezgene_id` is of type <character>>.
References:
https://win-builder.r-project.org/incoming_pretest/gtools_3.9.3_20220709_010826/reverseDependencies/summary.txt
https://github.com/r-gregmisc/gtools
cheers
Ben Bolker
[R-pkg-devel] help/advice on debugging
2 messages · Ben Bolker, Ivan Krylov
On Sat, 9 Jul 2022 16:29:57 -0400
Ben Bolker <bbolker at gmail.com> wrote:
The problem is in vignette rebuilding, errors of this form in both of the package vignettes: Can't join on `x$entrezgene_id` x `y$entrezgene_id` because of incompatible types. ? `x$entrezgene_id` is of type <double>>. ? `y$entrezgene_id` is of type <character>>.
I'd hazard a guess that both vignettes crash in a call to
entrez_to_symbol (a direct one or via rename_genes). Specifically, its
first argument (`x`) is converted to numeric, then the following
happens:
df <- data.frame(entrezgene_id = x)
df <- dplyr::left_join(df, gene_info, by = "entrezgene_id")
gene_info is obtained above, using the following:
gene_info <- get_biomart_mapping(species, symbol_name, dir_save,
verbose) %>%
dplyr::group_by(entrezgene_id) %>%
dplyr::summarise(dplyr::across(dplyr::everything(), dplyr::first))
get_biomart_mapping accesses the Internet using biomaRt::getBM if it
can, but otherwise uses a copy of the information for human genome
cached inside the package.
There doesn't seem to be any mention of special cases for
"entrezgene_id" in the code of the biomaRt package. biomaRt::getBM
POSTs XML queries to ensembl.org/biomart/martservice?... and parses the
resulting tab-separated values using read.table.
My guess is, ensembl.org started returning something that isn't a
number in the entrezgene_id column, and you were the first one to
rebuild the vignette and notice that.
Best regards, Ivan