Skip to content

[Bioc-devel] Stability of biobase eSets

6 messages · Martin Morgan, Sean Davis, Seth Falcon +1 more

#
Just a quick "developer" question--how stable are the data structures and
methods for dealing with eSets likely to be?  I have been using eSets much
more recently just for data handling, but I think I would probably like to
extend them some and don't want to do that (given the availability of older,
stable data structures) if they are going to be changing much.

Thanks,
Sean
#
Excellent and timely question!

1. Our plan is to never again make changes at the very end of the
development cycle. We also plan to move all our hosted experiment data
to ExpressionSet over the next few weeks -- i.e., we think this is the
future, and want to encourage feedback.

2. We have one change planned for the underlying data structure,
hopefully to make a preview by the end of this week.  Our plan is to
define a 'Versioned' class in Biobase. This class contains information
about the version of Biobase in use when an object is created. If the
object is 'serialized' (e.g., stored to disk) and retrieved at a later
date, then the version information can be consulted to check that it
is current, and can be used to update the instance to the current
definition. More details to follow shortly...

3. We plan to introduce a new method, updateClass, that can be
used to update instances, either to their current version or to a
different representation (e.g., from exprSet to ExpressionSet, but
allowing more flexibility than 'setAs' methods might).

4. We do not have definite plans to remove methods, but I suspect
there is room for limited housekeeping. Again, any changes at this
level will occur sooner rather than later.

5. We are also developing additional classes. These will add to,
rather than change, the existing repertoire. For instance, we are
looking at an 'EmptyMatrix' class that contains information about
dimensions and type, but not actual data. The idea is that these would
be convenient placeholders for elements missing from assayData. They
would look like matricies for many operations (is, dim, rownames, as,
[ to other EmptyMatrix, etc.) but not contain numerical
data.

Hope that helps. Look for more information about Versioned class
shortly. Feedback most welcome!

Martin



Sean Davis <sdavis2 at mail.nih.gov> writes:
#
Martin Morgan wrote:
I personally like what I have been seeing.  I particularly like the 
environment storage (with the obvious potential for other types of 
storage down the road, from looking at the code...).
This is precisely why I am asking.  I just went back to some data from 
about 1.5 years ago and was unhappy to find that many of the data 
structures in use at the time were not available at all, resulting in 
MANY calls looking like ma at ....@...
Sounds good.  I would think that the annotation packages might benefit 
from this as well, as often peoples' analyses were based on a particular 
set of annotations and switching would result in different results.
I'll chime in if I can.  Thanks for the extensive answer.

Sean
1 day later
#
Sean Davis <sdavis2 at mail.nih.gov> writes:
I'm not sure what you have in mind here.  The updateClass idea allows
for a way of dealing with the following situation: you have saved an
instance X of a given class FOO and later updated the class definition
for FOO.  When you load X, updateClass will provide for a means to
attempt to update the data contained in the instance to the new class
definition structure (imagine a case where a slot was renamed as a
simple example).

Certainly annotation data changes over time and results will change
with it, but I don't think there is much we can/want to do about
that.  If you want the same results, use the same versions of the
annotation data.  If I've misunderstood, please feel free to explain
further.  I don't want to discourage suggestions or ways we can make
the software more useful. :-)

+ seth
#
On 5/12/06 12:36 PM, "Seth Falcon" <sfalcon at fhcrc.org> wrote:

            
This is exactly the situation I had in mind, yes.  Ideally, the data would
remain the same and the class structures would be updated to as close as
possible to that in use by the current release.
I guess my more general point was that there are many (fairly complicated)
data structures that undergo change over time besides those that contain the
expression data.  When possible and when it makes sense, it might be
beneficial to think about versioning and updateClass-type ideas for them as
well.  The annotation packages might be candidates for that treatment.

Sean
#
Perhaps the distinction between structure and content can help
clarify these issues.  golubEsets is a package whose content is supposed
to be immutable, but the container structure changed in Biobase.  This
made it hard to use the old objects.  We want to simplify restoration
of old objects when externally defined structures change, and updateClass
is supposed to address this.

The annotation packages can change in structure, but the principal
aspect of change that we are concerned with is change in content.
Such changes arise as biological knowledge changes, and are basically
orthogonal to data structure design (bioc classes).

It does seem desirable to identify the version of an annotation resource
and to propagate that through the reporting workflow.  it would
be nice to have an object that could be updated with respect to
annotation content if new annotation environments came into being
after it was created.  An Sweave document that scripts the analysis
seems to be the closest thing we have to such an updateable object.
Swap in the new annotation and rerun.  Hopefully all the APIs are
unchanged; updateClass might even help in this situation if some
container structures changed in the mean time.  but it is -- under
current thinking -- independent of evolution of annotation content.