Skip to content

[R-pkg-devel] help/advice on debugging

2 messages · Ben Bolker, Ivan Krylov

#
For some stupid reason I agreed to take over maintenance of the 
now-orphaned `gtools` package ("how much trouble could that be, after 
all?"), which has 203 strong reverse dependencies.

   CRAN is reporting that exactly **one** of them, dnapath, is
failing reverse dependency checks ("changed to worse").

   I am having trouble replicating this (I'm running R-devel, can't 
install bioconductor packages on R-devel, working on doing everything in 
a Docker container, blah blah blah ...)

   In any case I am mystified as to how my new version of the package 
could be breaking something in the downstream package, since AFAICT the 
*only* functionality that `dnapath` is using from gtools is `permute()`, 
which is (1) a one-line wrapper for sample() and (2) hasn't changed in 
many years.

    I will try to figure out how much more time to invest in this rabbit 
hole before continuing the conversation with the CRAN maintainers, but 
in the meantime any insights or advice would be welcome.

   The problem is in vignette rebuilding, errors of this form in both of 
the package vignettes:

   Can't join on `x$entrezgene_id` x `y$entrezgene_id` because of
   incompatible types.
   ? `x$entrezgene_id` is of type <double>>.
   ? `y$entrezgene_id` is of type <character>>.

References:

https://win-builder.r-project.org/incoming_pretest/gtools_3.9.3_20220709_010826/reverseDependencies/summary.txt

https://github.com/r-gregmisc/gtools

  cheers
    Ben Bolker
#
On Sat, 9 Jul 2022 16:29:57 -0400
Ben Bolker <bbolker at gmail.com> wrote:

            
I'd hazard a guess that both vignettes crash in a call to
entrez_to_symbol (a direct one or via rename_genes). Specifically, its
first argument (`x`) is converted to numeric, then the following
happens:

    df <- data.frame(entrezgene_id = x)
    df <- dplyr::left_join(df, gene_info, by = "entrezgene_id")

gene_info is obtained above, using the following:

  gene_info <- get_biomart_mapping(species, symbol_name, dir_save,
                                   verbose) %>%
    dplyr::group_by(entrezgene_id) %>%
    dplyr::summarise(dplyr::across(dplyr::everything(), dplyr::first))

get_biomart_mapping accesses the Internet using biomaRt::getBM if it
can, but otherwise uses a copy of the information for human genome
cached inside the package.

There doesn't seem to be any mention of special cases for
"entrezgene_id" in the code of the biomaRt package. biomaRt::getBM
POSTs XML queries to ensembl.org/biomart/martservice?... and parses the
resulting tab-separated values using read.table.

My guess is, ensembl.org started returning something that isn't a
number in the entrezgene_id column, and you were the first one to
rebuild the vignette and notice that.