Skip to content

[Bioc-devel] BioC 2.5: Added scanDates slot to Biobase's eSetclass

2 messages · Patrick Aboyoun, Vincent Carey

#
Laurent,
One of the subtle requirements I have gotten out of Martin's design is  
the notion of standard and additional columns in an data table, which  
in this case is the form of an AnnotatedDataFrame. As a programmer,  
objects like data.frames and AnnotatedDataFrames give me no end of  
headache, because the methods I write have to contain tedious data  
checking code to ensure what they operating on is what the methods are  
expecting.

This discussion can have much wider implications. If done right, we  
can create a scheme that others can leverage to create eSet subclasses  
that have specialized phenoData, featureData, and a potentially new  
arrayData/covariateData/experimentData slots with standard and  
additional columns enforced by a yet to be defined class. (I was  
curious if anybody has expanded on the AnnotatedDataFrame class to  
include the notion of standard and additional columns and the only  
package I found that creates subclasses of AnnotatedDataFrame is  
ShortRead, and those classes didn't hit upon this topic.)

I agree with you that without a formal data model, this discussion can  
devolve into semantic hair splitting. If, however, we create a  
lightweight, flexible data model that can be adapted to different  
situations, we can provide benefits to both developers and end-users  
who can assume the standard data columns exist and use defined methods  
to access them.


Patrick



Quoting Laurent Gautier <laurent at cbs.dtu.dk>:
#
On Fri, Jun 19, 2009 at 11:25 AM, Patrick Aboyoun<paboyoun at fhcrc.org> wrote:
Is new infrastructure required for this?  You can extend a given class
and write a
validity method that defines the requirements by denying construction if
requirements are not met.
It seems to me that we have this in eSet as it stands.

I noticed that scanDates is character.  So we will have to do some programming
to figure out what is in there.  Is POSIXt possibly a suitable class for
this information?

I have no problem with adding some slots and validity checking possibilities
to eSet, and I think the discussion is important.  I note that we
already have an
experimentData structure and that it is supposed to hold all relevant MIAME
information.  We did not elaborate it carefully.  It may not be
sensible to put scanDates in the MIAME
class definition -- i don't know.  If we did this, all structures that
use experimentData
would be able to hold scanDates in this way.  The internal
representation shouldn't be that
important -- what is important is that we give people reasonable ways of working
with eSet instances through a scanDates method.  So my proposal would
be that we figure out a way of putting scanDates in experimentData as a guide
and as something that satisfies an emergent requirement. [I am not saying that
we need to change anything that has been done, merely that if there were further
decisionmaking to be done, this is how I would contribute to the process.]

If some people decide down the road that they also need hybDates or IVTdates or
other metadata, i would say we don't need that in the infrastructure -- those
who require this information can decide if they want to extend experimentData
to include these items with suitable validity checking support.

[PS -- are we going to have to run updateObject on all serialized eSet
instances after
this change?  this seems to me to be an important consideration regarding how
we address this issue.  regarding phenoData as a container for all
sample-specific
information seems to me a very cost-effective solution to the problem
under discussion --
it is not ideal but it is extremely safe.  Perhaps the class
versioning infrastructure will
minimize the need for reserialization ... I have not studied it.  But
this issue should
be on the radar screen.  Class version mismatch errors are, in my
case, a major trauma
related to java programming that has influenced my approach to
software development.]