Skip to content

[Bioc-devel] persistent build errors in bioconductor package

4 messages · Hillary Koch, Martin Morgan, Dario Righelli +1 more

#
Hi again,

I recently wrote about some build errors my bioconductor package is
experiencing. When I check the error online, it says:

"001E# service=git-upload-pack

00000041ERR FATAL: unknown git/gitolite command: 'packages/powerTCR'
FATAL: unknown git/gitolite command: 'packages/powerTCR'
001E# service=git-upload-pack

0041ERR FATAL: unknown git/gitolite command: 'packages/powerTCR'"

I wrote about this a couple of days ago and received the response

"

Martin Morgan <mtmorgan.bioc at gmail.com>
Sep 9, 2018, 12:47 PM (3 days ago)
to me, bioc-devel
I'd guess that your remote is incorrect; it should use ssh, and the
separation between the host 'git.bioconductor.org' and repository path
'packages/powerTCR' should be a colon, e.g.,

powerTCR master$ git remote -v
origin  git at git.bioconductor.org:packages/powerTCR (fetch)
origin  git at git.bioconductor.org:packages/powerTCR (push)

This could be corrected in various ways, one is

   git remote remove origin
   git remote add origin git at git.bioconductor.org:packages/powerTCR"

I have looked into all of this again - I have an SSH key associated with
the package which BioConductor is aware of. When I check "git remote -v" in
master I see exactly
origin  git at git.bioconductor.org:packages/powerTCR (fetch)
origin  git at git.bioconductor.org:packages/powerTCR (push)
plus the fetch and push for the upstream branch which I created some time
ago.
I am not sure how to proceed here. Thanks in advance for the help!

Hillary
#
I see, when you say

 > experiencing. When I check the error online, it says:
 >
 > "001E# service=git-upload-pack
 >
 > 00000041ERR FATAL: unknown git/gitolite command: 'packages/powerTCR'
 > FATAL: unknown git/gitolite command: 'packages/powerTCR'
 > 001E# service=git-upload-pack
 >
 > 0041ERR FATAL: unknown git/gitolite command: 'packages/powerTCR'"

you mean that you point your browser to

   https://git.bioconductor.org/packages/powerTCR

I see that message too; it is currently not possible to navigate to 
package sources in this way on our system.

It looks like the recent commits are

powerTCR master$ git log -n 2
commit 6ba40e640fb779371cbe8b3a232f8b0b3d549d12
Author: LiNk-NY <marcel.ramosperez at roswellpark.org>
Date:   Thu Sep 6 15:52:35 2018 +0000

     additional updates from BiocInstaller to BiocManager

commit 8af9bbf7f788defc73021b0621efa29b5f2d2361
Merge: 97dadf9 f5e4d5f
Author: Hillary Koch <hillary.koch01 at gmail.com>
Date:   Thu Sep 6 09:37:03 2018 -0400

     merge upstream

and from our 'build report'

   http://bioconductor.org/checkResults/devel/bioc-LATEST/

clicking on the 'ERROR' for your package, e.g., on Linux

 
http://bioconductor.org/checkResults/devel/bioc-LATEST/powerTCR/malbec1-buildsrc.html

the page reports

Snapshot Date: 2018-09-11 16:46:14 -0400 (Tue, 11 Sep 2018)
URL: https://git.bioconductor.org/packages/powerTCR
Branch: master
Last Commit: 6ba40e6
Last Changed Date: 2018-09-06 11:52:35 -0400 (Thu, 06 Sep 2018)

so the checkout on the build machine is consistent (compare the 'Last 
Commit:' field with the hash on the commit in the repository) with the 
most recent commit.

It looks like you've pushed changes, but they have not fixed the problem.

In your vignettes directory you have

/powerTCR/vignettes$ ls
powerTCR_cache        powerTCR_files  powerTCR.html  powerTCR.tex
powerTCR.fdb_latexmk  powerTCR.fls    powerTCR.Rmd

but you should have only the powerTCR.Rmd file committed, the others 
should be removed.

Note that the log shows a commit from the core team to address use of 
BiocManager (replacing BiocInstaller) so the right sequence of commands 
will be along the lines of

   git pull  # update to current version
   git rm -r vignettes/powerTCR_cache vignettes/powerTCR_files 
powerTCR.html powerTCR.tex powerTCR.fdb_latexmk powerTCR.fls
   git commit

after the git commit command, it can be helpful to change to a new 
directory and make a local clone of your powerTCR package to make sure 
it builds, e.g.,

   cd /tmp
   git clone /path/to/original/powerTCR
   R CMD build powerTCR
   R CMD check powerTCR_1.1.3.tar.gz

if that works out then change back to the original repository and git push.

Martin
On 09/12/2018 08:30 PM, Hillary Koch wrote:
#
Hello everyone,

I'm using in DEScan2 package the GenomeInfoDb::Seqinfo function with genome="mm10".

And sometimes it appens to retrieve this error message

"cannot open the connection to 'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/GCF_000001635.20_GRCm38/GCF_000001635.20_GRCm38_assembly_report.txt'"

Even if the the file is reachable.

I noticed it because I received an ERROR report from the bioconductor test bot.
I have a unit test for my package that doesn't pass on linux, but it works on other machines.

Looking on the Internet, this seems like an old (solved) problem.

What do you suggest to do?

thanks,
dario
14 days later
#
Hi Dario,
On 09/13/2018 09:18 AM, Dario Righelli wrote:
I cannot reproduce this, not too surprisingly...

This kind of intermittent internet access problem is not uncommon
and typically hard to reproduce. GenomeInfoDb::Seqinfo() was trying
to download a file from ftp.ncbi.nlm.nih.gov and failed for some
reason. It could be because NCBI's FTP site was temporarily unavailable
or because of any other network problem between NCBI and the machine
where GenomeInfoDb::Seqinfo() was called. Unfortunately there is not
much we can do about these transient connectivity issues in general.

However we can mitigate them:

- One way to mitigate them though would be to use a caching mechanism
   e.g. to use BiocFileCache to store the data downloaded by
   GenomeInfoDb::Seqinfo(genoe="some_genome") locally the 1st time
   it's downloaded for a particular genome.

- Another way would be to have this data already included in
   GenomeInfoDb (or GenomeInfoDbData) for the most frequently used
   genomes. In addition, the caching mechanism could still be used
   for the other genomes.

- Another way to mitigate this maybe would be to have
   GenomeInfoDb::Seqinfo(genoe="some_genome") re-try the download
   a couple of times (after waiting 1 or 2 sec before re-trying)
   before giving up. This could be done in combination with the
   above features. The re-try feature could even be integrated to
   BiocFileCache.

Although for now my feeling is that this issue is maybe not so much
of an annoyance to justify putting these new developments high on
the TODO list.

Just throwing some random thoughts here. Don't know what others
think about this.
Would you mind sharing a link to this information? Thanks!

Cheers,
H.