Skip to content

[Bioc-devel] Updates to BiocFileCache, AnnotationHub, and ExperimentHub

10 messages · Shepherd, Lori, Aaron Lun, Vincent Carey +1 more

#
We are in process of making some major updates to the caching in BiocFileCache, AnnotationHub, and ExperimentHub.  Namely, the default caching location will change from using rappdirs::user_cache_dir   to using  tools::R_user_dir  eventually relieving the dependency on rappdirs.  To avoid conflicting default caches, if anyone used an old default caching directory, there will be an error to decide how to deal with the old location before proceeding and documentation in the vignettes for how to resolve.  Currently I have update BiocFileCache, the changes were just pushed to the devel branch and should propagate tonight.  I plan on doing the same for both AnnotationHub and ExperimentHub within the next few days.  We appreciate any feedback or questions with regards to these updates.

This is only relevant to using the default cache location,  if a user manually specified a unique location, used environment variables, or created a package specific cache the code/location is not affected.  Anyone using package specific caching that utilizes rappdirs is encouraged also to consider changing package code to use the now available function in tools.

Cheers,


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
1 day later
#
rebook and basilisk are also currently using rappdirs. I would be 
interested in the motivation behind the switch for the Hubs and whether 
that is applicable to those two packages as well.

-A
On 4/5/21 6:41 AM, Kern, Lori wrote:
#
Mostly to lighten the dependency tree using tools that is built in with R would remove one additional dependency.  Also clarity; the tools directory adds an R folder for distinction that they are used with R packages which seemed like if a user was ever investigating, they would have a better idea where those files came from.



Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263
#
Woah, I missed the part where you said that there would be an error.

This does not sound good. Users are going to flip out, especially when
EHub and AHub are not visible dependencies (e.g., scRNAseq, celldex).
It also sounds completely unnecessary for EHub and AHub given that the
new cache can just be populated by fresh downloads. Similarly,
BiocFileCache::bfcrpath should not be affected, and people using that
shouldn't be getting an error.

Why not just move the old default cache into the new location
automatically? This seems like the simplest solution given that
everyone accessing BFC resources should be doing so through the BFC
API. And most files are not position-dependent, unless people are
putting shared libraries in there.

But even if you can't, an error is just too much. We use BiocFileCache
a lot in our company infrastructure and the brown stuff will hit the
fan if we have to find every old default cache and delete it. The
package should handle this for us.

-A
On Wed, Apr 7, 2021 at 4:46 AM Kern, Lori <Lori.Shepherd at roswellpark.org> wrote:
#
There is no guarantee we would be under the right user to have permissions to move the cache automatically and would not want to leave it in a broken state.

We could start a fresh cache in the new location but there would be no way to combined an old cache and a new cache and there would be no way to warn people before starting the new cache to give them an opportunity to move the old cache to the new location.

This should not affect any cache that is explicitly stated with a different name in the constructor or using environment variables;  only in the case of BiocFileCache() .  Most package specific caches created their own cache in the constructor so it should not cause the ERROR in that case.


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263
#
For convenience here are relevant sections of the new vignette.   Give it a
try and let us know.  This is "devel".

4Default Caching Location Update

As of BiocFileCache version > 1.15.1, the default caching location has
changed. The default cache is now controlled by the function
tools::R_user_dir instead of rappdirs::user_cache_dir. Users who have
utilized the default BiocFileCache, to continue using the created cache,
must move the cache and its files to the new default location or delete the
old cache and have to redownload any previous files.
4.1Option 1: Moving Files

The following steps can be used to move the files to the new location:

   1.

   Determine the old location by running the following in R
   rappdirs::user_cache_dir(appname="BiocFileCache")
   2.

   Determine the new location by running the following in R
tools::R_user_dir("BiocFileCache",
   which="cache")
   3.

   Move the files to the new location. You can do this manually or do the
   following steps in R. Remember if you have a lot of cached files, this may
   take awhile.

    olddir <- path.expand(rappdirs::user_cache_dir(appname="BiocFileCache"))
    newdir <- tools::R_user_dir("BiocFileCache", which="cache")
    dir.create(path=newdir, recursive=TRUE)
    files <- list.files(olddir, full.names =TRUE)
    moveres <- vapply(files,
        FUN=function(fl){
          filename = basename(fl)
          newname = file.path(newdir, filename)
          file.rename(fl, newname)
        },
        FUN.VALUE = logical(1))
    if(all(moveres)) unlink(olddir, recursive=TRUE)

4.2Option 2: Specify a Cache Location Explicitly

Users may always specify a unique caching location by providing the
cache argument
to the BiocFileCache constructor; however users must always specify this
location as it will not be recognized by default in subsequent runs.

Alternatively, the default caching location may also be controlled by a
user-wise or system-wide environment variable. Users may set the
environment variable BFC_CACHE to the old location to continue using as
default location.

On Wed, Apr 7, 2021 at 11:51 AM Kern, Lori <Lori.Shepherd at roswellpark.org>
wrote:

  
    
#
The experience:
Error in AnnotationHub() :
  As of AnnotationHub (>2.23.2), default caching location has changed.
  Problematic cache: /home/stvjc/.cache/AnnotationHub
  To continue with default caching location,
  See AnnotationHub vignette TroubleshootingTheCache section on 'Default
Caching Location Update'

Enter a frame number, or 0 to exit

1: AnnotationHub()

Selection: 0
[1] "~/.cache/BiocFileCache"
[1] "~/.cache/AnnotationHub"   # solution from vignette:  850MB moved very
quickly...
+         FUN=function(fl){
+           filename = basename(fl)
+           newname = file.path(newdir, filename)
+           file.rename(fl, newname)
+         },
+         FUN.VALUE = logical(1))
snapshotDate(): 2021-03-15


On Wed, Apr 7, 2021 at 12:03 PM Vincent Carey <stvjc at channing.harvard.edu>
wrote:

  
    
#
Well, can't you try? If people follow your 4.1 instructions and they
don't have permissions, the cache will be broken anyway.

But let's say you can't move it, and your worst-case scenario comes to
pass. EVEN THEN: I would expect a deprecation warning, no error, and
BiocFileCache continuing to pull from the old cache for 6 months.

Every previous non-transparent change to BioC's core infrastructure
has come with a deprecation warning. I don't see why this is any
different. An error is particularly galling given that the package was
working fine before, it's not like you're doing some kind of critical
bugfix.
If Vince's last email is any indication, and calling ExperimentHub()
or AnnotationHub() causes an error... this will be a disaster. I'm
going to get a lot of emails, unnecessary emails, from users wondering
why scRNAseq and celldex don't work anymore. It'll be like our
AWS-China problems multiplied by 10.

Why not just make a new cache and populate it? Well, I don't really
care what you do, as long as I don't get an error.

-A
#
FWIW, I ran into a similar problem when I moved R.cache
(https://cran.r-project.org/package=R.cache) from using ~/.Rcache to
~/.cache/R/R.cache (etc).  I decided on making it 100%-backward
compatible, i.e. if there's already a legacy ~/.Rcache cache folder,
it'll keep using that, otherwise the new standard.  That way nothing
breaks, and it's not a biggie if it keeps writing to the legacy cache
folder.  For now, it's silent, but I'll eventually deprecate
~/.Rcache, e.g. by producing one-time warning per session and in a
later release be more aggressive, and eventually make it defunct (but
not rushing there).  Here's what I wrote in my NEWS release:

Version: 0.14.0 [2019-12-05]

SIGNIFICANT CHANGES:

 * Now R.cache uses a default cache path that adheres to the standard cache
   location on the current operating system, whereas in previous versions it
   defaulted to ~/.Rcache.  On Unix, the 'XDG Base Directory Specification'
   is followed, which means that the R.cache folder will typically be
   ~/.cache/R/R.cache/.  On macOS, it will be ~/Library/Caches/R/R.cache/.
   On modern versions of Microsoft Windows, environment variables such
   as 'LOCALAPPDATA' will be used, which typically resolves to
   '%USERPROFILE%/AppData/Local, e.g. 'C:/Users/alice/AppData/Local'.
   If R.cache fails find a proper OS-specific cache folder, it will fall
   back to using ~/.Rcache as previously done.
   Importantly, if ~/.Rcache already exists, then that will be used by
   default.  This is done in order to not lose previously cached files.
   Users with an existing folder who wish to move to the new standard need
   to move or remove the ~/.Rcache folder manually.

/Henrik

On Wed, Apr 7, 2021 at 9:41 AM Aaron Lun
<infinite.monkeys.with.keyboards at gmail.com> wrote:
#
We agreed a deprecation cycle should be implemented.

We are pushing up changes to BiocFileCache, AnnotationHub, and ExperimentHub so instead of an ERROR and failing out, it will give a warning with deprecation notice and use the old default cache.

After the next release however, we will reimplement as an error in devel (3.14?)  with the intention of removing the rappdirs dependency completely in (3.15?)




Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263