Just a quick "developer" question--how stable are the data structures and methods for dealing with eSets likely to be? I have been using eSets much more recently just for data handling, but I think I would probably like to extend them some and don't want to do that (given the availability of older, stable data structures) if they are going to be changing much. Thanks, Sean
[Bioc-devel] Stability of biobase eSets
6 messages · Martin Morgan, Sean Davis, Seth Falcon +1 more
Excellent and timely question! 1. Our plan is to never again make changes at the very end of the development cycle. We also plan to move all our hosted experiment data to ExpressionSet over the next few weeks -- i.e., we think this is the future, and want to encourage feedback. 2. We have one change planned for the underlying data structure, hopefully to make a preview by the end of this week. Our plan is to define a 'Versioned' class in Biobase. This class contains information about the version of Biobase in use when an object is created. If the object is 'serialized' (e.g., stored to disk) and retrieved at a later date, then the version information can be consulted to check that it is current, and can be used to update the instance to the current definition. More details to follow shortly... 3. We plan to introduce a new method, updateClass, that can be used to update instances, either to their current version or to a different representation (e.g., from exprSet to ExpressionSet, but allowing more flexibility than 'setAs' methods might). 4. We do not have definite plans to remove methods, but I suspect there is room for limited housekeeping. Again, any changes at this level will occur sooner rather than later. 5. We are also developing additional classes. These will add to, rather than change, the existing repertoire. For instance, we are looking at an 'EmptyMatrix' class that contains information about dimensions and type, but not actual data. The idea is that these would be convenient placeholders for elements missing from assayData. They would look like matricies for many operations (is, dim, rownames, as, [ to other EmptyMatrix, etc.) but not contain numerical data. Hope that helps. Look for more information about Versioned class shortly. Feedback most welcome! Martin Sean Davis <sdavis2 at mail.nih.gov> writes:
Just a quick "developer" question--how stable are the data structures and methods for dealing with eSets likely to be? I have been using eSets much more recently just for data handling, but I think I would probably like to extend them some and don't want to do that (given the availability of older, stable data structures) if they are going to be changing much. Thanks, Sean
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Martin Morgan wrote:
Excellent and timely question! 1. Our plan is to never again make changes at the very end of the development cycle. We also plan to move all our hosted experiment data to ExpressionSet over the next few weeks -- i.e., we think this is the future, and want to encourage feedback.
I personally like what I have been seeing. I particularly like the environment storage (with the obvious potential for other types of storage down the road, from looking at the code...).
2. We have one change planned for the underlying data structure, hopefully to make a preview by the end of this week. Our plan is to define a 'Versioned' class in Biobase. This class contains information about the version of Biobase in use when an object is created. If the object is 'serialized' (e.g., stored to disk) and retrieved at a later date, then the version information can be consulted to check that it is current, and can be used to update the instance to the current definition. More details to follow shortly...
This is precisely why I am asking. I just went back to some data from about 1.5 years ago and was unhappy to find that many of the data structures in use at the time were not available at all, resulting in MANY calls looking like ma at ....@...
3. We plan to introduce a new method, updateClass, that can be used to update instances, either to their current version or to a different representation (e.g., from exprSet to ExpressionSet, but allowing more flexibility than 'setAs' methods might).
Sounds good. I would think that the annotation packages might benefit from this as well, as often peoples' analyses were based on a particular set of annotations and switching would result in different results.
4. We do not have definite plans to remove methods, but I suspect there is room for limited housekeeping. Again, any changes at this level will occur sooner rather than later.
5. We are also developing additional classes. These will add to, rather than change, the existing repertoire. For instance, we are looking at an 'EmptyMatrix' class that contains information about dimensions and type, but not actual data. The idea is that these would be convenient placeholders for elements missing from assayData. They would look like matricies for many operations (is, dim, rownames, as, [ to other EmptyMatrix, etc.) but not contain numerical data. Hope that helps. Look for more information about Versioned class shortly. Feedback most welcome!
I'll chime in if I can. Thanks for the extensive answer. Sean
1 day later
Sean Davis <sdavis2 at mail.nih.gov> writes:
3. We plan to introduce a new method, updateClass, that can be used to update instances, either to their current version or to a different representation (e.g., from exprSet to ExpressionSet, but allowing more flexibility than 'setAs' methods might).
Sounds good. I would think that the annotation packages might benefit from this as well, as often peoples' analyses were based on a particular set of annotations and switching would result in different results.
I'm not sure what you have in mind here. The updateClass idea allows for a way of dealing with the following situation: you have saved an instance X of a given class FOO and later updated the class definition for FOO. When you load X, updateClass will provide for a means to attempt to update the data contained in the instance to the new class definition structure (imagine a case where a slot was renamed as a simple example). Certainly annotation data changes over time and results will change with it, but I don't think there is much we can/want to do about that. If you want the same results, use the same versions of the annotation data. If I've misunderstood, please feel free to explain further. I don't want to discourage suggestions or ways we can make the software more useful. :-) + seth
On 5/12/06 12:36 PM, "Seth Falcon" <sfalcon at fhcrc.org> wrote:
Sean Davis <sdavis2 at mail.nih.gov> writes:
3. We plan to introduce a new method, updateClass, that can be used to update instances, either to their current version or to a different representation (e.g., from exprSet to ExpressionSet, but allowing more flexibility than 'setAs' methods might).
Sounds good. I would think that the annotation packages might benefit from this as well, as often peoples' analyses were based on a particular set of annotations and switching would result in different results.
I'm not sure what you have in mind here. The updateClass idea allows for a way of dealing with the following situation: you have saved an instance X of a given class FOO and later updated the class definition for FOO. When you load X, updateClass will provide for a means to attempt to update the data contained in the instance to the new class definition structure (imagine a case where a slot was renamed as a simple example).
This is exactly the situation I had in mind, yes. Ideally, the data would remain the same and the class structures would be updated to as close as possible to that in use by the current release.
Certainly annotation data changes over time and results will change with it, but I don't think there is much we can/want to do about that. If you want the same results, use the same versions of the annotation data. If I've misunderstood, please feel free to explain further. I don't want to discourage suggestions or ways we can make the software more useful. :-)
I guess my more general point was that there are many (fairly complicated) data structures that undergo change over time besides those that contain the expression data. When possible and when it makes sense, it might be beneficial to think about versioning and updateClass-type ideas for them as well. The annotation packages might be candidates for that treatment. Sean
Certainly annotation data changes over time and results will change with it, but I don't think there is much we can/want to do about that. If you want the same results, use the same versions of the annotation data. If I've misunderstood, please feel free to explain further. I don't want to discourage suggestions or ways we can make the software more useful. :-)
I guess my more general point was that there are many (fairly complicated) data structures that undergo change over time besides those that contain the expression data. When possible and when it makes sense, it might be beneficial to think about versioning and updateClass-type ideas for them as well. The annotation packages might be candidates for that treatment.
Perhaps the distinction between structure and content can help clarify these issues. golubEsets is a package whose content is supposed to be immutable, but the container structure changed in Biobase. This made it hard to use the old objects. We want to simplify restoration of old objects when externally defined structures change, and updateClass is supposed to address this. The annotation packages can change in structure, but the principal aspect of change that we are concerned with is change in content. Such changes arise as biological knowledge changes, and are basically orthogonal to data structure design (bioc classes). It does seem desirable to identify the version of an annotation resource and to propagate that through the reporting workflow. it would be nice to have an object that could be updated with respect to annotation content if new annotation environments came into being after it was created. An Sweave document that scripts the analysis seems to be the closest thing we have to such an updateable object. Swap in the new annotation and rerun. Hopefully all the APIs are unchanged; updateClass might even help in this situation if some container structures changed in the mean time. but it is -- under current thinking -- independent of evolution of annotation content.