Dear all,
I have some data to be included in a package. The data needs to be
easily loadable by the users, optimally with data(), and consists of a
set of class instances, classes that are defined in the package
itself.
Something like:
myData1 <- new("aClass",...) ## in myData1.R
myData2 <- new("aClass",...) ## in myData2.R
that I would like to be loadable with
data(myData1)
data(myData2)
Although putting the .R files that generate the data objects directly
in the data directory would be the simplest solution, I can not this
because the code is not self-sufficient. I can't figure out how to
easily and automatically include these.
What is the suggested way to include this kind of data in a package?
Thank you very much in advance.
Best wishes,
Laurent
[Bioc-devel] include data from non self-sufficient .R files
6 messages · Sean Davis, Steve Lianoglou, Vincent Carey +1 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20101011/eb4c497e/attachment.pl>
Hi Sean,
On 12 October 2010 00:15, Sean Davis <sdavis2 at mail.nih.gov> wrote:
On Mon, Oct 11, 2010 at 6:58 PM, Laurent Gatto <laurent.gatto at gmail.com> wrote:
Dear all,
I have some data to be included in a package. The data needs to be
easily loadable by the users, optimally with data(), and consists of a
set of class instances, classes that are defined in the package
itself.
Something like:
myData1 <- new("aClass",...) ## in myData1.R
myData2 <- new("aClass",...) ## in myData2.R
that I would like to be loadable with
data(myData1)
data(myData2)
Although putting the .R files that generate the data objects directly
in the data directory would be the simplest solution, I can not this
because the code is not self-sufficient. I can't figure out how to
easily and automatically include these.
What is the suggested way to include this kind of data in a package?
Thank you very much in advance.
Hi, Laurent. You can save() your data objects and put them in your data directory. ?That should do it. ?Of course, you will want to document them, also. ?You can look at the "Writing R Extensions" manual for more details. Sean
Thank you for the advice. I was rather looking for an automatic way of including the objects at installation time, as the code is readily available in the package. But as the class and these objects are not likely to change too much in the future, adding them once manually is fine, of course. By the way, I tried to have an R source (in ints/scripts/ or R/) create the instances and save() them in data/, but without success. Best wishes, Laurent
Hi,
On Tue, Oct 12, 2010 at 3:39 AM, Laurent Gatto <laurent.gatto at gmail.com> wrote:
Hi Sean, On 12 October 2010 00:15, Sean Davis <sdavis2 at mail.nih.gov> wrote:
On Mon, Oct 11, 2010 at 6:58 PM, Laurent Gatto <laurent.gatto at gmail.com> wrote:
Dear all,
I have some data to be included in a package. The data needs to be
easily loadable by the users, optimally with data(), and consists of a
set of class instances, classes that are defined in the package
itself.
Something like:
myData1 <- new("aClass",...) ## in myData1.R
myData2 <- new("aClass",...) ## in myData2.R
that I would like to be loadable with
data(myData1)
data(myData2)
Although putting the .R files that generate the data objects directly
in the data directory would be the simplest solution, I can not this
because the code is not self-sufficient. I can't figure out how to
easily and automatically include these.
What is the suggested way to include this kind of data in a package?
Thank you very much in advance.
Hi, Laurent. You can save() your data objects and put them in your data directory. ?That should do it. ?Of course, you will want to document them, also. ?You can look at the "Writing R Extensions" manual for more details. Sean
Thank you for the advice. I was rather looking for an automatic way of including the objects at installation time, as the code is readily available in the package. But as the class and these objects are not likely to change too much in the future, adding them once manually is fine, of course. By the way, I tried to have an R source (in ints/scripts/ or R/) create the instances and save() them in data/, but without success.
I'm not sure that I follow (or that you follow :-), but what Sean is
suggesting is that you create the data yourself ... manually. Not by
some automated R script, and then you save() the data into an RData
file that you put into your package's /data directory.
Once the user installs the package, the data will come with the
package ("automatically"). The user can then load that data by using
the data() function calls, as you mentioned.
A call like:
R> data(myData1)
Would then work if you have an *.RData file called "myData1.RData" in
your packages /data folder.
See the help in ?data to get a more detailed overview of how data is
searched for and loaded.
Does that help?
-steve
Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
If I understand correctly, Laurent wants to avoid serializing S4 objects (in data folder), probably because he may have to remember to update the serializations if the class definitions change, even if the serialized data is not affected by such definition changes. The SNPlocs.* packages address this concern -- while there could be nice advantages to having SNP metadata serialized as GRanges instances, the actual data is well-represented in a SQLite table and one uses a function to get a GRanges representation when desired. This avoids the problem of requiring reserialization of a potentially large object whenever the GRanges class definition changes. I doubt there is a completely satisfactory solution to this problem. Especially when one S4 class extends another that has unstable definition, serialized S4 instances can become a bit problematic to maintain. A number of bioc core packages illustrate how updateObject methods can be written to simplify aspects of maintenance. The class versioning discipline defined in Biobase is also useful for maintenance, but I do not know how broadly it is used in contributed packages. On Tue, Oct 12, 2010 at 12:12 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
Hi, On Tue, Oct 12, 2010 at 3:39 AM, Laurent Gatto <laurent.gatto at gmail.com> wrote:
Hi Sean, On 12 October 2010 00:15, Sean Davis <sdavis2 at mail.nih.gov> wrote:
On Mon, Oct 11, 2010 at 6:58 PM, Laurent Gatto <laurent.gatto at gmail.com> wrote:
Dear all,
I have some data to be included in a package. The data needs to be
easily loadable by the users, optimally with data(), and consists of a
set of class instances, classes that are defined in the package
itself.
Something like:
myData1 <- new("aClass",...) ## in myData1.R
myData2 <- new("aClass",...) ## in myData2.R
that I would like to be loadable with
data(myData1)
data(myData2)
Although putting the .R files that generate the data objects directly
in the data directory would be the simplest solution, I can not this
because the code is not self-sufficient. I can't figure out how to
easily and automatically include these.
What is the suggested way to include this kind of data in a package?
Thank you very much in advance.
Hi, Laurent. You can save() your data objects and put them in your data directory. ?That should do it. ?Of course, you will want to document them, also. ?You can look at the "Writing R Extensions" manual for more details. Sean
Thank you for the advice. I was rather looking for an automatic way of including the objects at installation time, as the code is readily available in the package. But as the class and these objects are not likely to change too much in the future, adding them once manually is fine, of course. By the way, I tried to have an R source (in ints/scripts/ or R/) create the instances and save() them in data/, but without success.
I'm not sure that I follow (or that you follow :-), but what Sean is
suggesting is that you create the data yourself ... manually. Not by
some automated R script, and then you save() the data into an RData
file that you put into your package's /data directory.
Once the user installs the package, the data will come with the
package ("automatically"). The user can then load that data by using
the data() function calls, as you mentioned.
A call like:
R> data(myData1)
Would then work if you have an *.RData file called "myData1.RData" in
your packages /data folder.
See the help in ?data to get a more detailed overview of how data is
searched for and loaded.
Does that help?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Dear Vincent,
On 12 October 2010 18:13, Vincent Carey <stvjc at channing.harvard.edu> wrote:
If I understand correctly, Laurent wants to avoid serializing S4 objects (in data folder), probably because he may have to remember to update the serializations if the class definitions change, even if the serialized data is not affected by such definition changes. The SNPlocs.* packages address this concern -- while there could be nice advantages to having SNP metadata serialized as GRanges instances, the actual data is well-represented in a SQLite table and one uses a function to get a GRanges representation when desired. This avoids the problem of requiring reserialization of a potentially large object whenever the GRanges class definition changes.
Thank you for the SNPlocs.* hint. I am looking at it.
I doubt there is a completely satisfactory solution to this problem. Especially when one S4 class extends another that has unstable definition, serialized S4 instances can become a bit problematic to maintain. ?A number of bioc core packages illustrate how updateObject methods can be written to simplify aspects of maintenance. ?The class versioning discipline defined in Biobase is also useful for maintenance, but I do not know how broadly it is used in contributed packages.
For the matter, some of the classes I am developing do contain Biobase's Versioned and VersionedBiobase. Best wishes, Laurent