Recently our package AlpsNMR was accepted in Bioconductor https://bioconductor.org/packages/devel/bioc/html/AlpsNMR.html For pass the review process we had to remove a dataset that was stored in Dropbox and that was used in one long tutorial about the package. Besides create an ExperimentHub package with this data, is there other ways to include this data in the package tutorial? I ask it because the dataset is public available in metabolights ftp and i like to know if this ftp can be considered as dedicated server that ensure logevity of the data ftp://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS242/ Or if the only way to do this is create a data package in ExperimentHub Thanks Enviado desde Correo<https://go.microsoft.com/fwlink/?LinkId=550986> para Windows 10 Aquest correu electr?nic cont? informaci? confidencial emparada pel secret professional. Qualsevol reproducci?, distribuci? o divulgaci? del seu contingut estan estrictament prohibides. Si vost? no ?s el destinatari, li preguem no faci difusi? ni copi? el seu contingut i ens ho notifiqui el m?s aviat possible a l'adre?a dataprotection at ibecbarcelona.eu Este correo electr?nico contiene informaci?n confidencial amparada por el secreto profesional. Cualquier reproducci?n, distribuci?n o divulgaci?n de su contenido est?n estrictamente prohibidas. Si usted no es el destinatario, le rogamos no difunda ni copie su contenido y nos lo notifique de inmediato a la direcci?n dataprotection at ibecbarcelona.eu This electronic transmission contains confidential information covered by professional secrecy. Any reproduction, distribution or disclosure of its contents is strictly prohibited. If you are not the intended recipient, you are kindly requested not to disseminate nor to copy this transmission and to notify us as soon as possible by email to dataprotection at ibecbarcelona.eu .
[Bioc-devel] Acceptable dataset origins besides ExperimentHub
5 messages · Hector Gracia, Sean Davis, Shepherd, Lori +1 more
Hi, Hector. While an "unofficial" answer, there are many packages that access data from public repositories, so using an EBI-supported repository and FTP site seems perfectly acceptable to me. Sean ? On Tue, Nov 10, 2020 at 5:54 AM Hector Gracia <hgracia at ibecbarcelona.eu> wrote:
Recently our package AlpsNMR was accepted in Bioconductor https://bioconductor.org/packages/devel/bioc/html/AlpsNMR.html For pass the review process we had to remove a dataset that was stored in Dropbox and that was used in one long tutorial about the package. Besides create an ExperimentHub package with this data, is there other ways to include this data in the package tutorial? I ask it because the dataset is public available in metabolights ftp and i like to know if this ftp can be considered as dedicated server that ensure logevity of the data ftp://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS242/ Or if the only way to do this is create a data package in ExperimentHub Thanks Enviado desde Correo<https://go.microsoft.com/fwlink/?LinkId=550986> para Windows 10 Aquest correu electr?nic cont? informaci? confidencial emparada pel secret professional. Qualsevol reproducci?, distribuci? o divulgaci? del seu contingut estan estrictament prohibides. Si vost? no ?s el destinatari, li preguem no faci difusi? ni copi? el seu contingut i ens ho notifiqui el m?s aviat possible a l'adre?a dataprotection at ibecbarcelona.eu Este correo electr?nico contiene informaci?n confidencial amparada por el secreto profesional. Cualquier reproducci?n, distribuci?n o divulgaci?n de su contenido est?n estrictamente prohibidas. Si usted no es el destinatario, le rogamos no difunda ni copie su contenido y nos lo notifique de inmediato a la direcci?n dataprotection at ibecbarcelona.eu This electronic transmission contains confidential information covered by professional secrecy. Any reproduction, distribution or disclosure of its contents is strictly prohibited. If you are not the intended recipient, you are kindly requested not to disseminate nor to copy this transmission and to notify us as soon as possible by email to dataprotection at ibecbarcelona.eu . [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]]
Including this data should be okay. The benefit of the ExperimentHub package would be to allow broader use of the data beyond your package to the Bioconductor community at large. If you include the data from the site, I would strongly, strongly suggest implementing a caching mechanism (ExperimentHub does this already in the backend) so that if the public server goes down or has connectivity issues, any previously cached version could be utilized (I'm bias but see BiocFileCache) Lori Shepherd Bioconductor Core Team Roswell Park Comprehensive Cancer Center Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263
From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Sean Davis <seandavi at gmail.com>
Sent: Tuesday, November 10, 2020 7:29 AM
To: Hector Gracia <hgracia at ibecbarcelona.eu>
Cc: bioc-devel at r-project.org <bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Acceptable dataset origins besides ExperimentHub
Sent: Tuesday, November 10, 2020 7:29 AM
To: Hector Gracia <hgracia at ibecbarcelona.eu>
Cc: bioc-devel at r-project.org <bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Acceptable dataset origins besides ExperimentHub
Hi, Hector. While an "unofficial" answer, there are many packages that access data from public repositories, so using an EBI-supported repository and FTP site seems perfectly acceptable to me. Sean ? On Tue, Nov 10, 2020 at 5:54 AM Hector Gracia <hgracia at ibecbarcelona.eu> wrote: > Recently our package AlpsNMR was accepted in Bioconductor > https://secure-web.cisco.com/1QVyrCOoYojsO56K9awpCpiKlq4s6e6G9Ge_d2An_SuDgWlzM-MlYzxEysc-Y7SuG5Udcl6X2SKaZpUe9Z9D3S36GdqsSexPj2VVXSG9UoIvIzyHGV1C4gmIsQ2OyggbYCLelhaW5Vhk_tOZqQg1t_OLQCIOwM7k05edN1JtOMDMObKi0JG8SRt66UgRyG_urIe_hcCt17o6dwDheJBapQQLY6zEtlcij57JGzGB6CtgnKsuPzV7vBxNUbl85mCu83TbmuFPOWJRTm2Otyw2BeeWgZapUluBSXDYnNSe_iJ_3sqY815LFDgEFuZ1Ry1yZyw98GWjzUy7S1rEl4_bMuQ/https%3A%2F%2Fbioconductor.org%2Fpackages%2Fdevel%2Fbioc%2Fhtml%2FAlpsNMR.html > > For pass the review process we had to remove a dataset that was stored in > Dropbox and that was used in one long tutorial about the package. > > Besides create an ExperimentHub package with this data, is there other > ways to include this data in the package tutorial? > > I ask it because the dataset is public available in metabolights ftp and i > like to know if this ftp can be considered as dedicated server that ensure > logevity of the data > ftp://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS242/ > > Or if the only way to do this is create a data package in ExperimentHub > > Thanks > > Enviado desde Correo<https://secure-web.cisco.com/1t5lpRsDmmRl-UbsUQFQ7DOM21FbgmkgZzwxGYdyfF5SRxrcVfxJlXqWxYWh3_WRoyy-NdPEKRE-zdJlEg4d5txDJLEra279O45g0wgQ5Nf00Iihh2s_xAOfBkmFHtAGu4THRshFQntMsO4eH8wV5FeYsUowQgfZVmVKfwUxIBdnrL2Go-eJeUUV2IyyycGieQdJT4oWO90NfkugTD5AJMiIs-gQU6KF5rIZMgqmEFZV-PvECRUni7qQFsL9x9mRjrn6sQ901POibI0kdTi-n-qLmhQwI8X4bBt0Ux-91sb0lRhxPeNk7N3cpYIVi7wTvOuaV1S-xz2exABKjAN5EyQ/https%3A%2F%2Fgo.microsoft.com%2Ffwlink%2F%3FLinkId%3D550986> para > Windows 10 > > > > > Aquest correu electr?nic cont? informaci? confidencial emparada pel secret > professional. Qualsevol reproducci?, distribuci? o divulgaci? del seu > contingut estan estrictament prohibides. Si vost? no ?s el destinatari, li > preguem no faci difusi? ni copi? el seu contingut i ens ho notifiqui el m?s > aviat possible a l'adre?a dataprotection at ibecbarcelona.eu > Este correo electr?nico contiene informaci?n confidencial amparada por el > secreto profesional. Cualquier reproducci?n, distribuci?n o divulgaci?n de > su contenido est?n estrictamente prohibidas. Si usted no es el > destinatario, le rogamos no difunda ni copie su contenido y nos lo > notifique de inmediato a la direcci?n dataprotection at ibecbarcelona.eu > This electronic transmission contains confidential information covered by > professional secrecy. Any reproduction, distribution or disclosure of its > contents is strictly prohibited. If you are not the intended recipient, you > are kindly requested not to disseminate nor to copy this transmission and > to notify us as soon as possible by email to > dataprotection at ibecbarcelona.eu . > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel at r-project.org mailing list > https://secure-web.cisco.com/1m6fI62mNjCgb-xdiMYBeFAg3QY6TSaTdIlT6v9vX1KWN4R2kvVmNffKAUZEdDwglpDuP7lp14slnzSB_q6qWd-8cPrGDxXjz1tUby50SZmAIRIJLmcyhUH1BMXqf9dvC65ItWUeh2Tbh3qOAhJPmECIpIZBxkrPPx3HX-gpzcw4_wqguAoqbKOKJnYy4TDqCyNEKYHJeJke6K1gIma3OXd4t82ha-OSsmGpHbN1UFZ4qOJzF3YKLvWCvlSvpZk0C0zl8gZ4AILbD3O86S3tsvVkigGF7yLYio6Tfzx5seJcjrF6Wxtxmvi9z8QjvG4Jd3_4eKQteIMrjwu1vA8Cbug/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel > -- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://secure-web.cisco.com/15ErYuEYAaDZLLqIDco13Vff80WMiBO_cbcCc17vE93Y0F6NxJfLtBRfrljCl44ho6r2D5S0t8H1hKGY2fDoRgifywRgZtpyNRTSPJ3BUrCXCj1UqOPOgz6GbPk6ZDcqPSIjBpojoLVe22ZkyqRAR-MJczgtQeyS_f-Jo_q5PzMA1eU2VfHmX7WDiB9Drm38Hfrcy1kY6bVXtUAA2rBSatp_FhgyqRz9OdpTfmALrOudOOCJQXBtueeOtormsp8ffpgWYis0wTU8Q6guHQ2qC6QbZa7kDBfpTagAoYM53kDY-k0hhWklkXfofDQ5zAPh4/https%3A%2F%2Fseandavi.github.io%2F https://secure-web.cisco.com/1RFqcp6SWhV32-G7pDOB4SToctIDJxHX-8JWy3lzagn7GuzKdlIQ7svF2nudVlzVqWqYUzZTocI4dLNr0SyZzPOI1NCWY05hGiSq3YlvZCqxhpLsYgzoPzohISEXMcI9u7pf63s7XzXT9WN0k2tur4hqbniCn-0vf0UengyYAY-QJJKdmg5oZwCMvA3B_f0HE5nSSAGV__arDOkg2SXF6sbxiFFgfM_ZlHv00LcGEHlAipzZkcdMaNyzjM1VhCxPBpXZYGOrXS5u-85z4FwLc1IZRNwSAq-J0GH-xkfaU64zpKIUWjDxCHzVqPY4iSHxl/https%3A%2F%2Ftwitter.com%2Fseandavis12 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel at r-project.org mailing list https://secure-web.cisco.com/1m6fI62mNjCgb-xdiMYBeFAg3QY6TSaTdIlT6v9vX1KWN4R2kvVmNffKAUZEdDwglpDuP7lp14slnzSB_q6qWd-8cPrGDxXjz1tUby50SZmAIRIJLmcyhUH1BMXqf9dvC65ItWUeh2Tbh3qOAhJPmECIpIZBxkrPPx3HX-gpzcw4_wqguAoqbKOKJnYy4TDqCyNEKYHJeJke6K1gIma3OXd4t82ha-OSsmGpHbN1UFZ4qOJzF3YKLvWCvlSvpZk0C0zl8gZ4AILbD3O86S3tsvVkigGF7yLYio6Tfzx5seJcjrF6Wxtxmvi9z8QjvG4Jd3_4eKQteIMrjwu1vA8Cbug/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
2 days later
Thanks Lori for your answer. I am not sure about the benefits of use BiocFileCache in this case, because data is only needed to run a more extensive tutorial based on large dataset. This dataset is not needed for the functionality of the package and is just a one time download. Besides that, i have annother question related. In the review of the package you(Lori) told me that a static vigente don?t was recommended because static vignettes have a tendency to get stale and i am agree with that. I have the R markdown of this extended tutorial and the point is that if i execute it can take more that one hour to download the data and process it. So, besides a static vigente, is there other way of doing it and don?t overpass the package build limit of bioconductor? Regards Aquest correu electr?nic cont? informaci? confidencial emparada pel secret professional. Qualsevol reproducci?, distribuci? o divulgaci? del seu contingut estan estrictament prohibides. Si vost? no ?s el destinatari, li preguem no faci difusi? ni copi? el seu contingut i ens ho notifiqui el m?s aviat possible a l'adre?a dataprotection at ibecbarcelona.eu Este correo electr?nico contiene informaci?n confidencial amparada por el secreto profesional. Cualquier reproducci?n, distribuci?n o divulgaci?n de su contenido est?n estrictamente prohibidas. Si usted no es el destinatario, le rogamos no difunda ni copie su contenido y nos lo notifique de inmediato a la direcci?n dataprotection at ibecbarcelona.eu This electronic transmission contains confidential information covered by professional secrecy. Any reproduction, distribution or disclosure of its contents is strictly prohibited. If you are not the intended recipient, you are kindly requested not to disseminate nor to copy this transmission and to notify us as soon as possible by email to dataprotection at ibecbarcelona.eu .
Hi Hector --
caching means that the file is downloaded once per computer, so when, for instance, you edit your vignette and need to rebuild it, you don't have to re-download the data.
I don't think your static vignette is suitable for Bioconductor -- it sounds like you are trying to provide a full 'reproducible' analysis, maybe in support of a publication or other product of your research. But the computational demands of the full analysis is beyond the scope of what can be supported by our build system. I think you would be better off finding another solution, for instance exploring GitHub 'actions' and docker containers to build the vignette, and github.io to make the built vignette available to interested users. I don't know whether the computational demands of your vignette can be satisfied by GitHub actions, or whether you would run into limitations of time and space there, too -- it would be worth figuring out before embarking on that solution.
Because the Bioconductor build system would not build the vignette, the advice remains that the static vignette should NOT be included in your Bioconductor package.
Martin Morgan
Bioconductor
?On 11/13/20, 7:13 AM, "Bioc-devel on behalf of Hector Gracia" <bioc-devel-bounces at r-project.org on behalf of hgracia at ibecbarcelona.eu> wrote:
Thanks Lori for your answer.
I am not sure about the benefits of use BiocFileCache in this case, because data is only needed to run a more extensive tutorial based on large dataset. This dataset is not needed for the functionality of the package and is just a one time download.
Besides that, i have annother question related.
In the review of the package you(Lori) told me that a static vigente don?t was recommended because static vignettes have a tendency to get stale and i am agree with that.
I have the R markdown of this extended tutorial and the point is that if i execute it can take more that one hour to download the data and process it. So, besides a static vigente, is there other way of doing it and don?t overpass the package build limit of bioconductor?
Regards
Aquest correu electr?nic cont? informaci? confidencial emparada pel secret professional. Qualsevol reproducci?, distribuci? o divulgaci? del seu contingut estan estrictament prohibides. Si vost? no ?s el destinatari, li preguem no faci difusi? ni copi? el seu contingut i ens ho notifiqui el m?s aviat possible a l'adre?a dataprotection at ibecbarcelona.eu
Este correo electr?nico contiene informaci?n confidencial amparada por el secreto profesional. Cualquier reproducci?n, distribuci?n o divulgaci?n de su contenido est?n estrictamente prohibidas. Si usted no es el destinatario, le rogamos no difunda ni copie su contenido y nos lo notifique de inmediato a la direcci?n dataprotection at ibecbarcelona.eu
This electronic transmission contains confidential information covered by professional secrecy. Any reproduction, distribution or disclosure of its contents is strictly prohibited. If you are not the intended recipient, you are kindly requested not to disseminate nor to copy this transmission and to notify us as soon as possible by email to dataprotection at ibecbarcelona.eu .