[Bioc-devel] package size

Dear List,

I am creating a package, the purpose of which is to combine data from different microarray platforms. I have found a NCBI GEO data series with 3 different platforms (1 Affymetrix and 2 Illumina) that works well for illustrating my package functions. It would be nice to keep this data series as a data object for use in the function examples (currently, 4 of 5 functions use this data object in their example code) in the documentation, but the xz compressed .rda file (consisting of 3 data frames, one for each data set) is about 5MB (total package size is 6MB). Is this too big?

There are 2 alternatives:

1) The package includes a function to download datasets using the GEOquery package, which could be used to easily re-create the data frames included in my .rda file. The only downside is that it takes several minutes to download all the data, so it may be inconvenient, since this data object is used in example code for the 4 functions.

1a) I could have each function example contain code to either a) download the data and save it in an .RData image file, or b) load the image file saved in a). This way the investigator would only have to endure the download once, unless they chose not to save the data.

2) I could take, say, the first 1000 genes from each platform. I did this, and the combined data only has 19 probes/probesets (they are mapped by Accession/UniGene IDs, and the common transcripts are extracted) . It would be nice to have a larger example, although not necessary. Alternatively, I could find a better set of 1000 (or however many), so that more than 19 are present.

Thank you for any assistance,
Peter Bazeley