Skip to content

[Bioc-devel] include data from non self-sufficient .R files

6 messages · Sean Davis, Steve Lianoglou, Vincent Carey +1 more

#
Dear all,

I have some data to be included in a package. The data needs to be
easily loadable by the users, optimally with data(), and consists of a
set of class instances, classes that are defined in the package
itself.
Something like:
myData1 <- new("aClass",...) ## in myData1.R
myData2 <- new("aClass",...) ## in myData2.R

that I would like to be loadable with

data(myData1)
data(myData2)

Although putting the .R files that generate the data objects directly
in the data directory would be the simplest solution, I can not this
because the code is not self-sufficient. I can't figure out how to
easily and automatically include these.
What is the suggested way to include this kind of data in a package?

Thank you very much in advance.

Best wishes,

Laurent
#
Hi Sean,
On 12 October 2010 00:15, Sean Davis <sdavis2 at mail.nih.gov> wrote:
Thank you for the advice. I was rather looking for an automatic way of
including the objects at installation time, as the code is readily
available in the package. But as the class and these objects are not
likely to change too much in the future, adding them once manually is
fine, of course.
By the way, I tried to have an R source (in ints/scripts/ or R/)
create the instances and save() them in data/, but without success.

Best wishes,

Laurent
#
Hi,
On Tue, Oct 12, 2010 at 3:39 AM, Laurent Gatto <laurent.gatto at gmail.com> wrote:
I'm not sure that I follow (or that you follow :-), but what Sean is
suggesting is that you create the data yourself ... manually. Not by
some automated R script, and then you save() the data into an RData
file that you put into your package's /data directory.

Once the user installs the package, the data will come with the
package ("automatically"). The user can then load that data by using
the data() function calls, as you mentioned.

A call like:

R> data(myData1)

Would then work if you have an *.RData file called "myData1.RData" in
your packages /data folder.

See the help in ?data to get a more detailed overview of how data is
searched for and loaded.

Does that help?

-steve
#
If I understand correctly, Laurent wants to avoid serializing S4
objects (in data folder), probably because he may have to remember to
update the serializations if the class definitions change, even if the
serialized data is not affected by such definition changes.
The SNPlocs.* packages address this concern -- while there could be
nice advantages to having SNP metadata serialized as GRanges
instances, the actual data is well-represented in a SQLite table and
one uses a function to get a GRanges representation when desired.
This avoids the problem of requiring reserialization of a potentially
large object whenever the GRanges class definition changes.

I doubt there is a completely satisfactory solution to this problem.
Especially when one S4 class extends another that has unstable
definition, serialized S4 instances can become a bit problematic to
maintain.  A number of bioc core packages illustrate how updateObject
methods can be written to simplify aspects of maintenance.  The class
versioning discipline defined in Biobase is also useful for
maintenance, but I do not know how broadly it is used in contributed
packages.

On Tue, Oct 12, 2010 at 12:12 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
#
Dear Vincent,
On 12 October 2010 18:13, Vincent Carey <stvjc at channing.harvard.edu> wrote:
Thank you for the SNPlocs.* hint. I am looking at it.
For the matter, some of the classes I am developing do contain
Biobase's Versioned and VersionedBiobase.

Best wishes,

Laurent