[Bioc-devel] new package for accessing some chemical and biological databases
I've lost track of whether the infrastructure is actually used, but certainly some package have a 'longtests' folder e.g. https://github.com/LTLA/beachmat On Fri, 13 Sep 2019 at 16:02, Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:
We used to have (? or at least discussed the possibility of) occasional extensive checking so we could have tests long_tests (names made up). On Fri, Sep 13, 2019 at 9:50 AM Martin Morgan <mtmorgan.bioc at gmail.com> wrote:
Putting bioc-devel back in the loop. I think that the straight-forward answer to your original query is 'no, git modules are not supported'. I think we'd carry on and say 'packages should be self-contained and conform to the Bioconductor size and time constraints', so you cannot
have
a very large package or a package that takes a long time to check, and
you
can't download part of the package from some alternative source (except perhaps AnnotationHub or ExperimentHub). I agree that the hubs are not suitable for regularly updated files, and that they are meant for biologically motivated rather than purely test-related data resources. While we 'could' make special accommodations on the build systems to support your package, we have found that this is not a fruitful endeavor. A natural place to put files used in tests would be in the /tests directory; these are not included in the installed package. But it seems likely that including your tests would violate the time and / or space limitations we place on packages. It seems likely that this leads to the question you pose below, which is how do you know that you're running on the build system so that you can perform more modest computations? This is similar to here, where special resources are normally required https://stat.ethz.ch/pipermail/bioc-devel/2019-September/015518.html Herve seems not willing to commit to an easy answer, perhaps because this opens the door to people circumventing even minimal tests of their package... Martin On 9/13/19, 7:49 AM, "Shepherd, Lori" <Lori.Shepherd at RoswellPark.org> wrote: I'm including Martin and Herve for their opinions and to chime in too since you took this conversation off the mailing list... Could you please describe what you mean by works transparently? We realize there isn't a function to call - we were suggesting you make a function to call that could be utilized How does your caching system work? I would also advise looking into BiocFileCache - the Bioconductor suggested package for data caching of files. The relevant files to look at for the environment calls can be found https://github.com/Bioconductor/Contributions esp.
Please also be mindful of:
Submission Guidelines
https://bioconductor.org/developers/package-submission/
Package Guidelines
https://bioconductor.org/developers/package-guidelines/
More specifically on the single package builder we use:
R CMD BiocCheckGitClone <package>
R CMD build --keep-empty-dirs --no-resave-data <package>
R CMD check --no-vignettes --timings <package_tar>
R CMD BiocCheck --build-output-file=<path to R.out> --new-package
<package_tar>
With the environment variables set up as described in the above link
special files are not encouraged and as far as I am aware not
allowed. Herve who has more experience with the builders may be able to
chime in further here.
Lori Shepherd
Bioconductor Core Team
Roswell Park Cancer Institute
Department of Biostatistics & Bioinformatics
Elm & Carlton Streets
Buffalo, New York 14263
________________________________________
From: Pierrick Roger <pierrick.roger at cea.fr>
Sent: Friday, September 13, 2019 2:48 AM
To: Shepherd, Lori <Lori.Shepherd at RoswellPark.org>
Subject: Re: [Bioc-devel] new package for accessing some chemical and
biological databases
Thank you for the example. However I do not think it is relevant.
This
package has no examples, no tests and just one vignette. The `get`
function is part of the interface, so it makes sens to use it inside
the vignette. But for my package biodb, there is no function to call,
the cache works transparently.
Could you please give me more details about the build process of
packages in
Bioconductor? Are there some environment variables set during the
build
so a package can now it is being built or checked by Bioconductor? If
this is the case, maybe I could write a tweak in my code in order to
download the cache when needed.
If not, would it be possible to have them defined or to have to have
a
special file `bioc.yml` defined at the root of the package in which I
could write a `prebuild_step` command for retrieving the cache from
my
public GitHub repos `biodb-cache`?
On Thu 12 Sep 19 17:12, Shepherd, Lori wrote:
> Please look at SRAdb for an example of how we would recommend
keeping the data.
>
> Summary:
> On github or wherever you would like to host and keep the data
current, please make sure it is publically accessible. Within your
package
have an download function that retrieves the file from the public
location.
>
> Its not recommended but This will be acceptable in this case.
>
> Thank you.
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Cancer Institute
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ________________________________
> From: Pierrick Roger <pierrick.roger at cea.fr>
> Sent: Thursday, September 12, 2019 10:48 AM
> To: Shepherd, Lori <Lori.Shepherd at RoswellPark.org>
> Subject: Re: [Bioc-devel] new package for accessing some chemical
and biological databases
>
> Examples can be run without the cache, and vignettes can be built
> without it too.
> In fact, the cache system is part of the package, and can be used
by
the
> user or turned off if not wanted or needed. Using the cache avoids
to
> send too many identical requests to the database servers.
> So yes users will access the databases directly, and use the cache
to
> speed up their code.
>
> I use this same cache system also while running `R CMD check` on
> Travis-CI for instance, in order to avoid taking too much time with
> requests and having errors returned by servers. Servers are not
always
> stable, and often the `R CMD check` will fail if not using the
cache.
>
> On Thu 12 Sep 19 11:36, Shepherd, Lori wrote:
> > Would the cache not be a subset of data for using the examples,
vigenttes, and tests that could be fairly stable and not necessarily use the updated database or be updated less frequently But wouldn't your
code
and for a users case do the longer process
of accessing databases directly? Or was I misunderstanding?
> >
> >
> > Lori Shepherd
> >
> > Bioconductor Core Team
> >
> > Roswell Park Cancer Institute
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> > ________________________________
> > From: Pierrick Roger <pierrick.roger at cea.fr>
> > Sent: Thursday, September 12, 2019 3:18 AM
> > To: Shepherd, Lori <Lori.Shepherd at RoswellPark.org>
> > Subject: Re: [Bioc-devel] new package for accessing some chemical
and biological databases
> >
> > Thank you for your answer.
> > The biodb-cache repository contains 63109 files (484MB).
> > Those files change regularly, since output of databases change
from time
> > to time, and also I add new examples, vignettes and tests.
> > Thus it is common that files are removed or updated or that new
files
> > are added. After reading the ExperimentHub description, it seems
to me
> > that my usage would not be exactly compatible with its principles
and
> > definition. Am I wrong?
> >
> > On Wed 11 Sep 19 11:19, Shepherd, Lori wrote:
> > > No we do not allow such submodules currently in Bioconductor.
> > >
> > > How big is the object? I assume putting the data object in the
package increases the package size over the limit?
> > >
> > > If this is the case, We would recommend storing the data in the
ExperimentHub. See [Creating An ExperimentHub package](
)
> > >
> > >
> > >
> > >
> > > Lori Shepherd
> > >
> > > Bioconductor Core Team
> > >
> > > Roswell Park Cancer Institute
> > >
> > > Department of Biostatistics & Bioinformatics
> > >
> > > Elm & Carlton Streets
> > >
> > > Buffalo, New York 14263
> > >
> > > ________________________________
> > > From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf
of Pierrick Roger <pierrick.roger at cea.fr>
> > > Sent: Wednesday, September 11, 2019 5:04 AM
> > > To: bioc-devel at r-project.org <bioc-devel at r-project.org>
> > > Subject: [Bioc-devel] new package for accessing some chemical
and biological databases
> > >
> > > Dear all,
> > >
> > > I'd like to submit by package biodb (
https://github.com/pkrog/biodb) in the near future.
> > > The aim of this package is to present a unified access to
diverse
> > > databases (ChEBI, KEGG databases, Uniprot, ...).
> > > For running examples, building vignettes and running tests, I
use a
> > > cache that is stored in another GitHub repository
> > > (https://github.com/pkrog/biodb-cache), and registered as a
Git
submodule of
> > > biodb.
> > > This cache is currently necessary, since accessing the
databases
during
> > > "R CMD check" would lead to some connection errors and would
take too
> > > much time.
> > > I would like to know if this scheme is acceptable for
Bioconductor.
> > >
> > > Best regards,
> > > --
> > > Research engineer Pierrick Roger
> > > http://www.cea-tech.fr |
|
> > > https://fr.linkedin.com/in/pkrog |
> > > In varietate concordia.
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> > >
> > > This email message may contain legally privileged and/or
confidential information. If you are not the intended recipient(s), or
the
employee or agent responsible for the delivery of this message to the
intended recipient(s), you are hereby notified that
any disclosure, copying, distribution, or use of this email message
is prohibited. If you have received this message in error, please notify
the sender immediately by e-mail and delete this email message from your
computer. Thank you.
> >
> > --
> > Research engineer Pierrick Roger
> > http://www.cea-tech.fr |
|
> > In varietate concordia.
> >
> >
> > This email message may contain legally privileged and/or
confidential information. If you are not the intended recipient(s), or
the
employee or agent responsible for the delivery of this message to the
intended recipient(s), you are hereby notified that
any disclosure, copying, distribution, or use of this email message
is prohibited. If you have received this message in error, please notify
the sender immediately by e-mail and delete this email message from your
computer. Thank you.
>
> --
> Research engineer Pierrick Roger
> http://www.cea-tech.fr |
|
> In varietate concordia.
>
>
> This email message may contain legally privileged and/or
confidential information. If you are not the intended recipient(s), or
the
employee or agent responsible for the delivery of this message to the
intended recipient(s), you are hereby notified that
any disclosure, copying, distribution, or use of this email message
is prohibited. If you have received this message in error, please notify
the sender immediately by e-mail and delete this email message from your
computer. Thank you.
--
Research engineer Pierrick Roger
http://www.cea-tech.fr |
http://workflow4metabolomics.org <http://workflow4metabolomics.org>
|
http://www.metabohub.fr https://fr.linkedin.com/in/pkrog | https://github.com/pkrog In varietate concordia.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Best,
Kasper
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel