Skip to content

[Bioc-devel] Zipped Rdata files in windows binaries

4 messages · Tarca, Adi, Robert Gentleman, Nishant Gopalakrishnan

#
Hi all,
I am writing an R packge and at a given point I need to load an Rdata file from the "data" folder of the installed package, and in case the file it is not there I try to download it from somwhere.

I used to do the following test to see if a file called "datload" is NOT there, case in which I need to download it:

 if(! paste(datload,".RData",sep="") %in% dir(system.file("data",package="SPIA"))) {
 ...download the file from somwhere else
}

It works fine except that the windows binary package created by bioconductor scripts from my source, puts all RData file in a Rdata.zip file. Is there a way to list the files in Rdata.zip to see if my file is in there? 

Alternatively I tried to use the data() function and try to load it (in a private environment), and in case it is not loaded  then try to download it. However, the data() function does not return an error but only a warning.
I tried to use:

ow <- options("warn") 
options(warn=2) # to make warnings into errors 
errs<-try(data(list=datload, envir=.myDataEnv),silent=TRUE)

 if(class(errs)!="try-error"){
  ...download the file from somwhere else
 }
 
This works fine, except that a warning is still printed when the function returns.

Any ideas would be appreciated.


Thanks,
Adi Laurentiu Tarca
#
Hi,
On Fri, Feb 6, 2009 at 11:05 AM, Tarca, Adi <atarca at med.wayne.edu> wrote:
That does not sound like a good thing to do.  The data folder is
exclusively for data that is stored essentially at package build time
and is not a place to put other files, or to use during a session to
store objects.  Objects there are platform independent and are
accessed using the data command in R.  Please don't try to modify this
behavior.

   If you want/need to have your own data storage type and want to
control it, you should use a different folder.  A common choice is
inst/extdata.  And then you are in control of everything.

   Since lots of people use R in cases where they do not have access
to the internet, the idea that they should download something for your
package to work seems problematic.  Why not just use one of
the many platform independent formats and distribute the data on all
platforms in the same way.

   There are a number of examples in Bioconductor packages (eg
simpleaffy or flowCore)

  Best wishes
    Robert

  
    
#
Dear Robert,

Thank you for your advice. I will then put all the data at the build time as .RData files. The only issue I had was that I did not know if there is a limitation in terms of disk space occupied by these files. I am talking here about around 10 MB but it may double in the future releases. 
I was not too worried about people needing an internet connection when using my package in conjunction with a new organism for the first time, since is the same thing as trying to use some affy functions on a chip for which you do not have the cdf (except that you do not download a file but an additional package). 

Regards,
Adi  

  


Adi Laurentiu Tarca, PhD
Assistant Professor (Research), 
Bioinformatics and Computational Biology Unit of the NIH Perinatology Research Branch,
Department of Computer Science & Center for Molecular Medicine and Genetics,
Wayne State University, 
3990 John R., Office 4809,
Detroit, Michigan 48201
Tel: 1-313-5775305 
Cell: 1-313-4043116 
http://bioinformaticsprb.med.wayne.edu/tarca/

-----Original Message-----
From: rgentlem at gmail.com [mailto:rgentlem at gmail.com] On Behalf Of Robert Gentleman
Sent: Friday, February 06, 2009 3:56 PM
To: Tarca, Adi
Cc: bioc-devel at stat.math.ethz.ch
Subject: Re: [Bioc-devel] Zipped Rdata files in windows binaries

Hi,
On Fri, Feb 6, 2009 at 11:05 AM, Tarca, Adi <atarca at med.wayne.edu> wrote:
That does not sound like a good thing to do.  The data folder is exclusively for data that is stored essentially at package build time and is not a place to put other files, or to use during a session to store objects.  Objects there are platform independent and are accessed using the data command in R.  Please don't try to modify this behavior.

   If you want/need to have your own data storage type and want to control it, you should use a different folder.  A common choice is inst/extdata.  And then you are in control of everything.

   Since lots of people use R in cases where they do not have access to the internet, the idea that they should download something for your package to work seems problematic.  Why not just use one of the many platform independent formats and distribute the data on all platforms in the same way.

   There are a number of examples in Bioconductor packages (eg simpleaffy or flowCore)

  Best wishes
    Robert
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
#
Hi Adi

The recommended size for software packages is less than 2 MB on disk
http://wiki.fhcrc.org/bioc/Package_Guidelines#size-requirements

Data files in the range of 10 ~ 20 MB will need to be built into a
separate data package and would typically go  into our experimental data
package repository.
http://bioconductor.org/packages/release/ExperimentData.html

A typical user of your software package would probably not need the
large  experiment data files for their application and hence we would
like to maintain them as separate packages.  I can help you with
creating the data package once you have the data files ready.

Nishant
Tarca, Adi wrote: