Skip to content

[Bioc-devel] Acceptable dataset origins besides ExperimentHub

5 messages · Hector Gracia, Sean Davis, Shepherd, Lori +1 more

#
Recently our package AlpsNMR was accepted in Bioconductor
https://bioconductor.org/packages/devel/bioc/html/AlpsNMR.html

For pass the review process we had to remove a dataset that was stored in Dropbox and that was used in one long tutorial about the package.

Besides create an ExperimentHub package with this data, is there other ways to include this data in the package tutorial?

I ask it because the dataset is public available in metabolights ftp and i like to know if this ftp can be considered as dedicated server that ensure logevity of the data
ftp://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS242/

Or if the only way to do this is create a data package in ExperimentHub

Thanks

Enviado desde Correo<https://go.microsoft.com/fwlink/?LinkId=550986> para Windows 10




Aquest correu electr?nic cont? informaci? confidencial emparada pel secret professional. Qualsevol reproducci?, distribuci? o divulgaci? del seu contingut estan estrictament prohibides. Si vost? no ?s el destinatari, li preguem no faci difusi? ni copi? el seu contingut i ens ho notifiqui el m?s aviat possible a l'adre?a dataprotection at ibecbarcelona.eu
Este correo electr?nico contiene informaci?n confidencial amparada por el secreto profesional. Cualquier reproducci?n, distribuci?n o divulgaci?n de su contenido est?n estrictamente prohibidas. Si usted no es el destinatario, le rogamos no difunda ni copie su contenido y nos lo notifique de inmediato a la direcci?n dataprotection at ibecbarcelona.eu
This electronic transmission contains confidential information covered by professional secrecy. Any reproduction, distribution or disclosure of its contents is strictly prohibited. If you are not the intended recipient, you are kindly requested not to disseminate nor to copy this transmission and to notify us as soon as possible by email to dataprotection at ibecbarcelona.eu .
#
Hi, Hector.

While an "unofficial" answer, there are many packages that access data from
public repositories, so using an EBI-supported repository and FTP site
seems perfectly acceptable to me.

Sean

?

On Tue, Nov 10, 2020 at 5:54 AM Hector Gracia <hgracia at ibecbarcelona.eu>
wrote:

  
    
#
Including this data should be okay.  The benefit of the ExperimentHub package would be to allow broader use of the data beyond your package to the Bioconductor community at large.    If you include the data from the site,  I would strongly, strongly suggest implementing a caching mechanism (ExperimentHub does this already in the backend)  so that if the public server goes down or has connectivity issues, any previously cached version could be utilized  (I'm bias but see BiocFileCache)



Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263
2 days later
#
Thanks Lori for your answer.

I am not sure about the benefits of use BiocFileCache in this case, because data is only needed to run a more extensive tutorial based on large dataset. This dataset is not needed for the functionality of the package and is just a one time download.

Besides that, i have annother question related.
In the review of the package you(Lori) told me that a static vigente don?t was recommended because static vignettes have a tendency to get stale and i am agree with that.
I have the R markdown of this extended tutorial and the point is that if i execute it can take more that one hour to download the data and process it. So, besides a static vigente, is there other way of doing it and don?t overpass the package build limit of bioconductor?

Regards




Aquest correu electr?nic cont? informaci? confidencial emparada pel secret professional. Qualsevol reproducci?, distribuci? o divulgaci? del seu contingut estan estrictament prohibides. Si vost? no ?s el destinatari, li preguem no faci difusi? ni copi? el seu contingut i ens ho notifiqui el m?s aviat possible a l'adre?a dataprotection at ibecbarcelona.eu
Este correo electr?nico contiene informaci?n confidencial amparada por el secreto profesional. Cualquier reproducci?n, distribuci?n o divulgaci?n de su contenido est?n estrictamente prohibidas. Si usted no es el destinatario, le rogamos no difunda ni copie su contenido y nos lo notifique de inmediato a la direcci?n dataprotection at ibecbarcelona.eu
This electronic transmission contains confidential information covered by professional secrecy. Any reproduction, distribution or disclosure of its contents is strictly prohibited. If you are not the intended recipient, you are kindly requested not to disseminate nor to copy this transmission and to notify us as soon as possible by email to dataprotection at ibecbarcelona.eu .
#
Hi Hector --

caching means that the file is downloaded once per computer, so when, for instance, you edit your vignette and need to rebuild it, you don't have to re-download the data.

I don't think your static vignette is suitable for Bioconductor -- it sounds like you are trying to provide a full 'reproducible' analysis, maybe in support of a publication or other product of your research. But the computational demands of the full analysis is beyond the scope of what can be supported by our build system. I think you would be better off finding another solution, for instance exploring GitHub 'actions' and docker containers to build the vignette, and github.io to make the built vignette available to interested users. I don't know whether the computational demands of your vignette can be satisfied by GitHub actions, or whether you would run into limitations of time and space there, too -- it would be worth figuring out before embarking on that solution.

Because the Bioconductor build system would not build the vignette, the advice remains that the static vignette should NOT be included in your Bioconductor package.

Martin Morgan
Bioconductor

?On 11/13/20, 7:13 AM, "Bioc-devel on behalf of Hector Gracia" <bioc-devel-bounces at r-project.org on behalf of hgracia at ibecbarcelona.eu> wrote:

    Thanks Lori for your answer.

    I am not sure about the benefits of use BiocFileCache in this case, because data is only needed to run a more extensive tutorial based on large dataset. This dataset is not needed for the functionality of the package and is just a one time download.

    Besides that, i have annother question related.
    In the review of the package you(Lori) told me that a static vigente don?t was recommended because static vignettes have a tendency to get stale and i am agree with that.
    I have the R markdown of this extended tutorial and the point is that if i execute it can take more that one hour to download the data and process it. So, besides a static vigente, is there other way of doing it and don?t overpass the package build limit of bioconductor?

    Regards




    Aquest correu electr?nic cont? informaci? confidencial emparada pel secret professional. Qualsevol reproducci?, distribuci? o divulgaci? del seu contingut estan estrictament prohibides. Si vost? no ?s el destinatari, li preguem no faci difusi? ni copi? el seu contingut i ens ho notifiqui el m?s aviat possible a l'adre?a dataprotection at ibecbarcelona.eu
    Este correo electr?nico contiene informaci?n confidencial amparada por el secreto profesional. Cualquier reproducci?n, distribuci?n o divulgaci?n de su contenido est?n estrictamente prohibidas. Si usted no es el destinatario, le rogamos no difunda ni copie su contenido y nos lo notifique de inmediato a la direcci?n dataprotection at ibecbarcelona.eu
    This electronic transmission contains confidential information covered by professional secrecy. Any reproduction, distribution or disclosure of its contents is strictly prohibited. If you are not the intended recipient, you are kindly requested not to disseminate nor to copy this transmission and to notify us as soon as possible by email to dataprotection at ibecbarcelona.eu .