Skip to content

[Bioc-devel] Data Package Size Issues (.idat and .rda)

8 messages · Nicolas De Jay, Martin Morgan, Sean Davis +1 more

#
Hi,

I am preparing a data package and using the minfiData package as a
reference.  The .idat files in extdata and the .rda file in data are
both present in both the compressed tarball source and the installed
copy directory (in my case, under ~/R/x86-64.../3.0/minfiData).  Isn't
this redundant?  Is there a way to have the prospective user only
download the .rda files?

Sorry if my question is misguided and thanks in advance for your help.

---
Nicolas De Jay
M.Sc. Student
Department of Human Genetics
Montreal Children's Hospital Research Institute, McGill University Health Centre
4060 Ste Catherine West, PT-239
Montreal, QC H3Z2Z3, Canada
T: (514) 412-4440 | E: nicolas.dejay at mail.mcgill.ca
#
Thanks for the prompt answer.  The data set I am packaging closely
resembles that of minfiData except that there are 52 samples; the IDAT
files together are some 800MB whereas the Rda file is closer to 150MB.
 It is worth noting that my experiment data package will be submitted
to Bioconductor along with a software package which makes use of these
samples in the vignette.  With this in mind, can I omit the IDAT
files?  If this goes against Bioconductor's underlying design, what
would you say is the maximum size of an experiment data package?

---
Nicolas De Jay

On Thu, Nov 7, 2013 at 9:38 PM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
#
On 11/07/2013 09:26 PM, Nicolas De Jay wrote:
Hi Nicolas -- Some things to bear in mind.

Files are compressed in package tar balls, so your IDAT files may have a 
considerably smaller effective size.

Generally, original text files are a much better way to store external data than 
Rda files. For instance, rda files require updating when / if the class 
definition changes, and the provenance and content of the data is unambiguous.

Experiment data packages are meant to provide reusable examples for pedagogic 
purposes. One would hope that minfiData fulfills this requirement. If not, then 
it would be better to continue the current discussion with Kasper and others in 
the community to identify an appropriately comprehensive data set for use across 
many relevant packages.

There is no formal statement about the maximum size of experiment data packages, 
but one would need to make a strong argument for why a Gb of experiment data is 
necessary (including why existing experiment data packages are fundamentally 
inadequate), especially if it is to support a single package.

Martin

  
    
#
In that case, I will try to see if the public databases have the kind
of data sets I am trying to package and run the idea by the team that
is assigned to the project I am developing.  Thank you Martin, Sean
and Kasper for your valuable insight!

---
Nicolas De Jay
On Fri, Nov 8, 2013 at 9:07 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote: