[Bioc-devel] submitting a tiny data package (GmapGenome.Hsapiens.rCRS)

5 messages · Tim Triche, Jr., Obenchain, Valerie, Hervé Pagès +1 more

Original

1

5

Tim Triche, Jr.

Thu, Nov 8, 2018 9:25 AM #

What's the best/fastest way to do this? The package (MTseeker) will happily
build and install it for the user via the indexMTgenome() function, but
since I test for its presence prior to running examples, it seems like I
might as well have it available through BioC.

Thanks,

--t

Obenchain, Valerie

Fri, Nov 9, 2018 8:42 AM #

Hi Tim,

The options would be to make a data experiment package or put the data 
in ExperimentHub.

Lori is our resident expert on these things but is out today. She'll be 
back next week and can provide more info.

Valerie

On 11/8/18 9:27 AM, Tim Triche, Jr. wrote:

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.

Fri, Nov 9, 2018 11:02 AM #

Hi guys,

OTOH indexMTGenome() only takes 20s on my laptop to create, install and 
load GmapGenome.Hsapiens.rCRS.

Note that the size on disk of GmapGenome.Hsapiens.rCRS is 449M which is 
surprising considering that the input of indexMTGenome() is a tiny 83K 
FASTA file. That's for the source tree. 'R CMD build' then produces a 
source tarball that is only 24M so is much reduced. Seems like the big 
files in GmapGenome.Hsapiens.rCRS are very sparse!

Anyway this seems to be a reference genome so maybe would best belong to 
AnnotationHub?

Cheers,

H.

On 11/9/18 08:42, Obenchain, Valerie wrote:

_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=CJuHqYXh0XkBMrSjP3aaWmkdZALDlc5ycy4MRQVDkEc&s=OCFa2kt3VaWZzfqcvZGzIQr-61UL0B2ibC7J2EhbPfc&e=

_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=CJuHqYXh0XkBMrSjP3aaWmkdZALDlc5ycy4MRQVDkEc&s=OCFa2kt3VaWZzfqcvZGzIQr-61UL0B2ibC7J2EhbPfc&e=

Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

4 days later

Wed, Nov 14, 2018 8:41 AM #

Agreed the file should be in annotationhub.  I can give you S4 access to upload the file.  I will need a metadata.csv file created for the file as described in


https://bioconductor.org/packages/release/bioc/vignettes/AnnotationHub/inst/doc/CreateAnAnnotationPackage.html


This is an interesting case as you already have an ExperimentData package but this will use AnnotationHub - so there will be a mixture of views terms -  you shouldn't need to do much to the current Experiment Data package besides adding the metadata.csv file and making sure   makeAnnotationHubMetadata(<path to package>) runs -  if you get any errors as far as views terms let me know but i don't think it should be an issue


I think the package itself should stay a software and a data experiment package and these particular functions use annotationhub in the backend.


Cheers,


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263

From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Pages, Herve <hpages at fredhutch.org>
Sent: Friday, November 9, 2018 2:02:16 PM
To: Obenchain, Valerie; Tim Triche, Jr.; bioc-devel
Subject: Re: [Bioc-devel] submitting a tiny data package (GmapGenome.Hsapiens.rCRS)

Hi guys,

OTOH indexMTGenome() only takes 20s on my laptop to create, install and
load GmapGenome.Hsapiens.rCRS.

Note that the size on disk of GmapGenome.Hsapiens.rCRS is 449M which is
surprising considering that the input of indexMTGenome() is a tiny 83K
FASTA file. That's for the source tree. 'R CMD build' then produces a
source tarball that is only 24M so is much reduced. Seems like the big
files in GmapGenome.Hsapiens.rCRS are very sparse!

Anyway this seems to be a reference genome so maybe would best belong to
AnnotationHub?

Cheers,

H.

On 11/9/18 08:42, Obenchain, Valerie wrote:
> Hi Tim,
>
> The options would be to make a data experiment package or put the data
> in ExperimentHub.
>
> Lori is our resident expert on these things but is out today. She'll be
> back next week and can provide more info.
>
> Valerie
>
>
> On 11/8/18 9:27 AM, Tim Triche, Jr. wrote:
>> What's the best/fastest way to do this? The package (MTseeker) will happily
>> build and install it for the user via the indexMTgenome() function, but
>> since I test for its presence prior to running examples, it seems like I
>> might as well have it available through BioC.
>>
>> Thanks,
>>
>> --t
>>
>>       [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=CJuHqYXh0XkBMrSjP3aaWmkdZALDlc5ycy4MRQVDkEc&s=OCFa2kt3VaWZzfqcvZGzIQr-61UL0B2ibC7J2EhbPfc&e=
>>
>
>
> This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=CJuHqYXh0XkBMrSjP3aaWmkdZALDlc5ycy4MRQVDkEc&s=OCFa2kt3VaWZzfqcvZGzIQr-61UL0B2ibC7J2EhbPfc&e=

--
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.

Wed, Nov 14, 2018 8:42 AM #

I meant AWS S3 access -  sorry -  I will be sending credentials in a separate email for that


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263

From: Shepherd, Lori
Sent: Wednesday, November 14, 2018 11:41:56 AM
To: Pages, Herve; Obenchain, Valerie; Tim Triche, Jr.; bioc-devel
Subject: Re: [Bioc-devel] submitting a tiny data package (GmapGenome.Hsapiens.rCRS)

Agreed the file should be in annotationhub.  I can give you S4 access to upload the file.  I will need a metadata.csv file created for the file as described in

https://bioconductor.org/packages/release/bioc/vignettes/AnnotationHub/inst/doc/CreateAnAnnotationPackage.html

This is an interesting case as you already have an ExperimentData package but this will use AnnotationHub - so there will be a mixture of views terms -  you shouldn't need to do much to the current Experiment Data package besides adding the metadata.csv file and making sure   makeAnnotationHubMetadata(<path to package>) runs -  if you get any errors as far as views terms let me know but i don't think it should be an issue

I think the package itself should stay a software and a data experiment package and these particular functions use annotationhub in the backend.

Cheers,

Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263

________________________________
From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Pages, Herve <hpages at fredhutch.org>
Sent: Friday, November 9, 2018 2:02:16 PM
To: Obenchain, Valerie; Tim Triche, Jr.; bioc-devel
Subject: Re: [Bioc-devel] submitting a tiny data package (GmapGenome.Hsapiens.rCRS)

Hi guys,

OTOH indexMTGenome() only takes 20s on my laptop to create, install and
load GmapGenome.Hsapiens.rCRS.

Note that the size on disk of GmapGenome.Hsapiens.rCRS is 449M which is
surprising considering that the input of indexMTGenome() is a tiny 83K
FASTA file. That's for the source tree. 'R CMD build' then produces a
source tarball that is only 24M so is much reduced. Seems like the big
files in GmapGenome.Hsapiens.rCRS are very sparse!

Anyway this seems to be a reference genome so maybe would best belong to
AnnotationHub?

Cheers,

H.

On 11/9/18 08:42, Obenchain, Valerie wrote:
> Hi Tim,
>
> The options would be to make a data experiment package or put the data
> in ExperimentHub.
>
> Lori is our resident expert on these things but is out today. She'll be
> back next week and can provide more info.
>
> Valerie
>
>
> On 11/8/18 9:27 AM, Tim Triche, Jr. wrote:
>> What's the best/fastest way to do this? The package (MTseeker) will happily
>> build and install it for the user via the indexMTgenome() function, but
>> since I test for its presence prior to running examples, it seems like I
>> might as well have it available through BioC.
>>
>> Thanks,
>>
>> --t
>>
>>       [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=CJuHqYXh0XkBMrSjP3aaWmkdZALDlc5ycy4MRQVDkEc&s=OCFa2kt3VaWZzfqcvZGzIQr-61UL0B2ibC7J2EhbPfc&e=
>>
>
>
> This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=CJuHqYXh0XkBMrSjP3aaWmkdZALDlc5ycy4MRQVDkEc&s=OCFa2kt3VaWZzfqcvZGzIQr-61UL0B2ibC7J2EhbPfc&e=

--
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.