hi, I'm writing a function which currently uses BiocFileCache to store a small data.frame and one or more TxDb objects, so that these objects are persistent and available across sessions (or possible available to multiple users). In the simplest case, I would call bfc <- BiocFileCache() inside my function, which will check the default location: user_cache_dir(appname = "BiocFileCache") In general, should developers also support the user specifying a specific location for the BiocFileCache? So functions using BiocFileCache should have an argument that overrides the above location? thanks, Mike
[Bioc-devel] BiocFileCache for developers
12 messages · Shepherd, Lori, Sean Davis, Martin Morgan +2 more
If you are using it as a helper function that may be too much exposure and you may just want it running behind the scenes in default location; but it could be given as an option to the user. I guess a coding preference. If the user specified directory is used, they will have to remember to input that each time they use your package or it will redownload. There shouldn't be a concern of overwriting files in the default cache location, as files added to the cache get a random identifier to try to avoid overwriting and to allow for essentially duplicate entries. You can always get the cache location of a bfc object by calling bfccache(bfc) in case a user specific directory is used. Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263
From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Michael Love <michaelisaiahlove at gmail.com>
Sent: Friday, December 1, 2017 10:28:48 AM
To: bioc-devel at r-project.org
Subject: [Bioc-devel] BiocFileCache for developers
Sent: Friday, December 1, 2017 10:28:48 AM
To: bioc-devel at r-project.org
Subject: [Bioc-devel] BiocFileCache for developers
hi, I'm writing a function which currently uses BiocFileCache to store a small data.frame and one or more TxDb objects, so that these objects are persistent and available across sessions (or possible available to multiple users). In the simplest case, I would call bfc <- BiocFileCache() inside my function, which will check the default location: user_cache_dir(appname = "BiocFileCache") In general, should developers also support the user specifying a specific location for the BiocFileCache? So functions using BiocFileCache should have an argument that overrides the above location? thanks, Mike _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
On Fri, Dec 1, 2017 at 10:28 AM, Michael Love <michaelisaiahlove at gmail.com> wrote:
hi, I'm writing a function which currently uses BiocFileCache to store a small data.frame and one or more TxDb objects, so that these objects are persistent and available across sessions (or possible available to multiple users). In the simplest case, I would call bfc <- BiocFileCache() inside my function, which will check the default location: user_cache_dir(appname = "BiocFileCache") In general, should developers also support the user specifying a specific location for the BiocFileCache? So functions using BiocFileCache should have an argument that overrides the above location?
On some systems, the user home directory is not large (such as on HPC systems) or has strong quotas. The default user_cache_dir may not be the best choice there. Sean
thanks, Mike
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]]
So having a user argument might be best. Or defining a unique cache location for your package would be another option. Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263
From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Sean Davis <seandavi at gmail.com>
Sent: Friday, December 1, 2017 11:06:39 AM
To: Michael Love
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] BiocFileCache for developers
Sent: Friday, December 1, 2017 11:06:39 AM
To: Michael Love
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] BiocFileCache for developers
On Fri, Dec 1, 2017 at 10:28 AM, Michael Love <michaelisaiahlove at gmail.com> wrote: > hi, > > I'm writing a function which currently uses BiocFileCache to store a > small data.frame and one or more TxDb objects, so that these objects > are persistent and available across sessions (or possible available to > multiple users). > > In the simplest case, I would call > > bfc <- BiocFileCache() > > inside my function, which will check the default location: > > user_cache_dir(appname = "BiocFileCache") > > In general, should developers also support the user specifying a > specific location for the BiocFileCache? So functions using > BiocFileCache should have an argument that overrides the above > location? > On some systems, the user home directory is not large (such as on HPC systems) or has strong quotas. The default user_cache_dir may not be the best choice there. Sean > > thanks, > Mike > > _______________________________________________ > Bioc-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
On Fri, Dec 1, 2017 at 11:16 AM, Shepherd, Lori <
Lori.Shepherd at roswellpark.org> wrote:
So having a user argument might be best. Or defining a unique cache location for your package would be another option.
The R package development policies actually has a statement that may be helpful in thinking about this. Your mileage may vary in the interpretation....
- - Packages should not write in the users? home filespace, nor anywhere else on the file system apart from the R session?s temporary directory (or during installation in the location pointed to by TMPDIR: and such usage should be cleaned up). Installing into the system?s R installation (e.g., scripts to its bin directory) is not allowed. Limited exceptions may be allowed in interactive sessions if the package obtains confirmation from the user.
https://cran.r-project.org/web/packages/policies.html Sean
Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263 ------------------------------ *From:* Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Sean Davis <seandavi at gmail.com> *Sent:* Friday, December 1, 2017 11:06:39 AM *To:* Michael Love *Cc:* bioc-devel at r-project.org *Subject:* Re: [Bioc-devel] BiocFileCache for developers On Fri, Dec 1, 2017 at 10:28 AM, Michael Love <michaelisaiahlove at gmail.com
wrote:
hi, I'm writing a function which currently uses BiocFileCache to store a small data.frame and one or more TxDb objects, so that these objects are persistent and available across sessions (or possible available to multiple users). In the simplest case, I would call bfc <- BiocFileCache() inside my function, which will check the default location: user_cache_dir(appname = "BiocFileCache") In general, should developers also support the user specifying a specific location for the BiocFileCache? So functions using BiocFileCache should have an argument that overrides the above location?
On some systems, the user home directory is not large (such as on HPC systems) or has strong quotas. The default user_cache_dir may not be the best choice there. Sean
thanks, Mike
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]]
Unfortunately I think there are a number of packages that don't necessarily adhere to this. Bioconductor packages we try to always make sure any example or vignette code follows this policy. I think the exception case may be made if it deals with main functionality of package code and if it is noted prominently in the package documentation. Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263
From: Sean Davis <seandavi at gmail.com>
Sent: Friday, December 1, 2017 11:23:39 AM
To: Shepherd, Lori
Cc: Michael Love; bioc-devel at r-project.org
Subject: Re: [Bioc-devel] BiocFileCache for developers
Sent: Friday, December 1, 2017 11:23:39 AM
To: Shepherd, Lori
Cc: Michael Love; bioc-devel at r-project.org
Subject: Re: [Bioc-devel] BiocFileCache for developers
On Fri, Dec 1, 2017 at 11:16 AM, Shepherd, Lori <Lori.Shepherd at roswellpark.org<mailto:Lori.Shepherd at roswellpark.org>> wrote: So having a user argument might be best. Or defining a unique cache location for your package would be another option. The R package development policies actually has a statement that may be helpful in thinking about this. Your mileage may vary in the interpretation.... * - Packages should not write in the users? home filespace, nor anywhere else on the file system apart from the R session?s temporary directory (or during installation in the location pointed to by TMPDIR: and such usage should be cleaned up). Installing into the system?s R installation (e.g., scripts to its bin directory) is not allowed. Limited exceptions may be allowed in interactive sessions if the package obtains confirmation from the user. https://cran.r-project.org/web/packages/policies.html Sean Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263 ________________________________ From: Bioc-devel <bioc-devel-bounces at r-project.org<mailto:bioc-devel-bounces at r-project.org>> on behalf of Sean Davis <seandavi at gmail.com<mailto:seandavi at gmail.com>> Sent: Friday, December 1, 2017 11:06:39 AM To: Michael Love Cc: bioc-devel at r-project.org<mailto:bioc-devel at r-project.org> Subject: Re: [Bioc-devel] BiocFileCache for developers On Fri, Dec 1, 2017 at 10:28 AM, Michael Love <michaelisaiahlove at gmail.com<mailto:michaelisaiahlove at gmail.com>> wrote: > hi, > > I'm writing a function which currently uses BiocFileCache to store a > small data.frame and one or more TxDb objects, so that these objects > are persistent and available across sessions (or possible available to > multiple users). > > In the simplest case, I would call > > bfc <- BiocFileCache() > > inside my function, which will check the default location: > > user_cache_dir(appname = "BiocFileCache") > > In general, should developers also support the user specifying a > specific location for the BiocFileCache? So functions using > BiocFileCache should have an argument that overrides the above > location? > On some systems, the user home directory is not large (such as on HPC systems) or has strong quotas. The default user_cache_dir may not be the best choice there. Sean > > thanks, > Mike > > _______________________________________________ > Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. -- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
One solution if a developer really wants to make sure the user knows that the function will store a cache somewhere would be to leave the BiocFileCache location argument without a default value.
On 12/01/2017 11:23 AM, Sean Davis wrote:
On Fri, Dec 1, 2017 at 11:16 AM, Shepherd, Lori < Lori.Shepherd at roswellpark.org> wrote:
So having a user argument might be best. Or defining a unique cache location for your package would be another option.
The R package development policies actually has a statement that may be helpful in thinking about this. Your mileage may vary in the interpretation....
- - Packages should not write in the users? home filespace, nor
anywhere else on the file system apart from the R session?s temporary
directory (or during installation in the location pointed to by TMPDIR:
and such usage should be cleaned up). Installing into the system?s R
installation (e.g., scripts to its bin directory) is not allowed.
Limited exceptions may be allowed in interactive sessions if the
package obtains confirmation from the user.
Actually, CRAN policies. The CRAN policy is definitely appropriate for vignette and example code, and certainly functions by default should not write to locations where they will potentially overwrite existing resources. The policy makes it impossible to write files that persist across sessions, which is the objective for BiocFileCache. For the original question, I think there's often a case for user_cache_dir(appname="mikes-package-name") Martin
Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263 ------------------------------ *From:* Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Sean Davis <seandavi at gmail.com> *Sent:* Friday, December 1, 2017 11:06:39 AM *To:* Michael Love *Cc:* bioc-devel at r-project.org *Subject:* Re: [Bioc-devel] BiocFileCache for developers On Fri, Dec 1, 2017 at 10:28 AM, Michael Love <michaelisaiahlove at gmail.com
wrote:
hi, I'm writing a function which currently uses BiocFileCache to store a small data.frame and one or more TxDb objects, so that these objects are persistent and available across sessions (or possible available to multiple users). In the simplest case, I would call bfc <- BiocFileCache() inside my function, which will check the default location: user_cache_dir(appname = "BiocFileCache") In general, should developers also support the user specifying a specific location for the BiocFileCache? So functions using BiocFileCache should have an argument that overrides the above location?
On some systems, the user home directory is not large (such as on HPC systems) or has strong quotas. The default user_cache_dir may not be the best choice there. Sean
thanks, Mike
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
This email message may contain legally privileged and/or...{{dropped:2}}
user_cache_dir(appname="mikes-package-name") wow, how did you guess it? I'm storing TxDb's for use across sessions with `rname` set to the basename of the GTF file, e.g. "gencode.v27.annotation.gtf.gz". I want to encourage the serendipitous case that there is already a BiocFileCache entry with this `rname` created outside of the use of my package. I can see this happening, especially if I mention this naming pattern in the vignette. I'm thinking I will encourage the user to pick a good BiocFileCache location by not setting a default value. Potentially multiple users could be sharing the same BiocFileCache location, e.g. a lab space on HPC. And then actively specifying NULL for the location (or something like this) could switch the location to: user_cache_dir(appname = "BiocFileCache")
R.cache (>= 0.6.0) does the following to acquire a persistent cache (root) folder. This behavior was introduced after getting prompted by CRAN not to write to disk by default (because they found "funny" folders on their check servers) and a following email conversation with CRAN (2011-12-29), and getting an "ok with me" from Uwe at CRAN: 1. When loaded (not only attached) it checks for the existence of a cache folder (defaults to ~/.Rcache unless neither an R option nor an env var is set). If it is exists, then we're good to go. 2. If the cache folder does not exist, and in a non-interactive session, then a temporary cache folder specific to that R session is used. 3. If the cache folder does not exist, and in an interactive session, then the user will be queried whether they'd like to create ~/.Rcache (the default choice) or whether they like to use a temporary folder (just as in the non-interactive case). If accepting ~/.Rcache, then that will be available across sessions (Step 1 above). The gist is: Make sure to get the user's approval before storing anything permanently and don't doing anything that surprises the user, risk overwriting their files, etc. Here is a real-world user example on a "fresh" user account: # Non-interactive sessions or user does not approve $ Rscript -e "R.cache::getCacheRootPath()" [1] "/tmp/RtmpzIZT4o/.Rcache" $ R --vanilla
dummy <- loadNamespace("R.cache")
The R.cache package needs to create a directory that will hold cache files. It is convenient to use one in the user's home directory, because it remains also after restarting R. Do you wish to create the '~/.Rcache/' directory? If not, a temporary directory (/tmp/RtmpMA4LTF/.Rcache) that is specific to this R session will be used. [Y/n]: n
R.cache::getCacheRootPath()
[1] "/tmp/Rtmp0Ic5zQ/.Rcache"
quit("no")
$ R --vanilla
R.cache::getCacheRootPath()
The R.cache package needs to create a directory that will hold cache files. It is convenient to use one in the user's home directory, because it remains also after restarting R. Do you wish to create the '~/.Rcache/' directory? If not, a temporary directory (/tmp/RtmpzSJd3d/.Rcache) that is specific to this R session will be used. [Y/n]: n [1] "/tmp/RtmpzSJd3d/.Rcache"
quit("no")
$ Rscript -e "R.cache::getCacheRootPath()" [1] "/tmp/Rtmpq1nx0H/.Rcache" # User approves or already approved $ R --vanilla
dummy <- loadNamespace("R.cache")
The R.cache package needs to create a directory that will hold cache files. It is convenient to use one in the user's home directory, because it remains also after restarting R. Do you wish to create the '~/.Rcache/' directory? If not, a temporary directory (/tmp/RtmpMA4LTF/.Rcache) that is specific to this R session will be used. [Y/n]: Y
R.cache::getCacheRootPath()
[1] "~/.Rcache/"
quit("no")
$ Rscript -e "R.cache::getCacheRootPath()" [1] "~/.Rcache/" $ R --vanilla
dummy <- loadNamespace("R.cache")
R.cache::getCacheRootPath()
[1] "~/.Rcache/"
The same applies when using library("R.cache") as well as when the
R.cache namespace is imported by another package.
This behavior also plays well with 'R CMD check' and 'R CMD check
--as-cran' where the cache folder will default to a temporary folder.
It will also prevent run-time errors since there will always be a
cache folder available (although it'll only survive the current
session). R.cache works the same on all OSes. To further lower the
risk for "what is this ~/.Rcache folder doing here?", R.cache also
adds a ~/.Rcache/README.txt file explaining what that folder is and
what created it.
About what the default location should be:
On Fri, Dec 1, 2017 at 8:06 AM, Sean Davis <seandavi at gmail.com> wrote:
[...]
On some systems, the user home directory is not large (such as on HPC systems) or has strong quotas. The default user_cache_dir may not be the best choice there.
I agree with this but it's hard to find a solid simple alternative to the user's home folder. However, and on my todo list to investigate, https://cran.r-project.org/package=rappdirs may provide a better approach because it follows OS-specific recommendations. Back to writing to user's home folder: in HPC environments with limited home quota, I simply do things like ln -s /scratch/$USER/.Rcache ~/.Rcache. /Henrik On Fri, Dec 1, 2017 at 8:32 AM, Michael Love
<michaelisaiahlove at gmail.com> wrote:
One solution if a developer really wants to make sure the user knows that the function will store a cache somewhere would be to leave the BiocFileCache location argument without a default value.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
8 days later
thanks Henrik,
I like the explicitness of the `R.cache` approach and I copied it for
my current implementation.
For the BiocFileCache location that should be used for this package
I'm developing, `tximeta`, I'm now using the following logic:
* If run non-interactively, `tximeta` uses a temporary directory.
* If run interactively, and a location has not been previously saved,
the user is prompted if she wants to use (1) the default directory or
a (2) temporary directory.
- If (1), then use the default directory, and save this choice.
- If (2), then use a temporary directory for the rest of this R
session, and ask again next R session.
* The prompt above also mentions that a specific function can be used
to manually set the directory at any time point, and this choice is
saved.
* The default directory is given by `rappdirs::user_cache_dir("BiocFileCache")`.
* The choice itself of the BiocFileCache directory that `tximeta`
should use is saved in a JSON file here
`rappdirs::user_cache_dir("tximeta")`.
12 days later
BiocFileCache has been updated to follow this type of behavior
- if location exists use without prompting (default user_cache_dir())
- if doesnt exit
- prompt user to create
- if respond N or not an interactive session uses temporary directory
This is reflected in devel version 1.3.8
Lori Shepherd
Bioconductor Core Team
Roswell Park Cancer Institute
Department of Biostatistics & Bioinformatics
Elm & Carlton Streets
Buffalo, New York 14263
From: Michael Love <michaelisaiahlove at gmail.com>
Sent: Saturday, December 9, 2017 5:18:08 PM
To: Henrik Bengtsson
Cc: Shepherd, Lori; bioc-devel at r-project.org
Subject: Re: [Bioc-devel] BiocFileCache for developers
Sent: Saturday, December 9, 2017 5:18:08 PM
To: Henrik Bengtsson
Cc: Shepherd, Lori; bioc-devel at r-project.org
Subject: Re: [Bioc-devel] BiocFileCache for developers
thanks Henrik,
I like the explicitness of the `R.cache` approach and I copied it for
my current implementation.
For the BiocFileCache location that should be used for this package
I'm developing, `tximeta`, I'm now using the following logic:
* If run non-interactively, `tximeta` uses a temporary directory.
* If run interactively, and a location has not been previously saved,
the user is prompted if she wants to use (1) the default directory or
a (2) temporary directory.
- If (1), then use the default directory, and save this choice.
- If (2), then use a temporary directory for the rest of this R
session, and ask again next R session.
* The prompt above also mentions that a specific function can be used
to manually set the directory at any time point, and this choice is
saved.
* The default directory is given by `rappdirs::user_cache_dir("BiocFileCache")`.
* The choice itself of the BiocFileCache directory that `tximeta`
should use is saved in a JSON file here
`rappdirs::user_cache_dir("tximeta")`.
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.