Skip to content

[Bioc-devel] BioC 2.5: Added scanDates slot to Biobase's eSet class

7 messages · Patrick Aboyoun, Kasper Daniel Hansen, Henrik Bengtsson +3 more

#
Laurent,
The scan dates were singled out originally because we have encountered 
data sets at the Hutch that appear to have a scan date effect and wanted 
a location to store this information so it can be included in the 
analysis. As you mentioned, there are other variables that could be 
important as well and shouldn't be ignored.

Given that you have been actively working towards a solution of managing 
array metadata, you can help create a design that can be implemented in 
the Biobase package. Martin Morgan is currently leading this effort and 
we can start a dialog off-list (so as not to spam the rest of the 
developers with minutiae) with those who are interested to hammer out a 
solution to this problem. I think once the requirements are formally 
expressed, we can easily put together a design that meets the user's needs.


Patrick
Laurent Gautier wrote:
#
I am adding my support to Laurent: I think scanDate is simply another  
column in the phenotype info, indeed something I always put in, if I  
have it available (well, actually I am usually more interested in prep  
date). Putting in a new slot seems counter intuitive to me.

Kasper
On Jun 18, 2009, at 12:07 , Patrick Aboyoun wrote:

            
#
FYI, be careful to blindly interpret the timestamps in the Affymetrix
CEL file headers (originating from the DAT header) as always being a
timestamp of the scan, although the Affymetrix file format labels this
DAT header field as 'Date and time of scan (padded with spaces).', cf.

  http://www.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/dat.html

These timestamps can be set/modified by other steps in the analysis
process, e.g.

 http://groups.google.com/group/dchip-software/browse_thread/thread/a643680fb8e8ada8

Another issue is that doesn't specify the time zone.

/Henrik
On Thu, Jun 18, 2009 at 12:07 PM, Patrick Aboyoun<paboyoun at fhcrc.org> wrote:
#
If by phenoData we want to mean 'Any random information that may or may not be phenotypic in nature', then scan date should certainly go there. However, it seems to me that up to this time we have been very careful about what goes where precisely because we didn't want to stuff random information in odd places. 

To me, the idea of having different slots with names like phenoData and assayData and featureData implies to the end user what sort of data are in there.

If we are to store non-phenotypic, non-biological data somewhere, I think it makes sense to have another slot. All the slots we have in the eSet class right now are for data that are conceptually quite different from things like 'who ran these chips' or 'what day they were run' or whatever. So putting this sort of data in with phenotypic data makes no sense to me at all.

Jim
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
#
Hi Jim, Laurent, Kasper, Henrik --

Quoting James MacDonald <jmacdon at med.umich.edu>:
For the second iteration  of these ideas, we are aiming for a slot,  
say arrayData, that addresses Jim's point, i.e., data about arrays and  
not phenotypes. Our current vision is for an abstract base class  
ArrayData, and initially a derived class with slot scanDate (it's  
difficult to know how literally to interpret this, as Henrik points  
out; we really don't want to have reserved column names in an  
AnnotatedDataFrame, no matter how mangled) and a slot for  
AnnotatedDataFrame for less structured data. There would be the  
expected subset and accessor functionalities.

As Laurent points out, we're taking a bit of a step down a slippery  
slope of additional complexity; we know we want to keep the data as  
simple as possible. The motivation for 'promoting' scanDate to a full  
slot rather than name-mangled column in an ADF is that we think that  
we can reliably (again, modulo Henrik's observation) incorporate this  
at an early stage from the main platforms that end up at ExpressionSet  
and friends.


Martin
#
To generalize this above the level of biology etc, some annotation
data maps naturally *along the dimension* of arrays, not to each cell,
e.g. rownames() and colnames() of a matrix.  One timestamp per array
is such an attribute.  On June 7, 2009 I sent the message '[Rd]
Suggestion: Dimension-sensitive attributes' to R-devel;

  http://tolstoy.newcastle.edu.au/R/e6/devel/09/06/2043.html

to suggest to have a generic dimattr(<obj>, <name>) "getter" and
"setter".  See example in that message.  Maybe such a design pattern
helps here too?

Also, the MAGE people should already have spent a lot of time thinking
and designing this kind of stuff.  Maybe something there?

/Henrik

On Thu, Jun 18, 2009 at 2:57 PM, Kevin R.
Coombes<krcoombes at mdacc.tmc.edu> wrote: