[Bioc-devel] RFC: eSet with two color data
Wolfgang Huber <huber at ebi.ac.uk> writes:
How can we best represent preprocessed, normalised data from a set of two- (or n-) colour arrays in an eSet like structure? I would like to keep the intensity information of each channel, and not reduce to M-values since that looses information. I see two options: A) in an ExpressionSet-derivative called e.g. "ExpressionSetWithColors" with ncol = n times the number of arrays, and with mandatory phenoData columns named e.g. "arrayID" and "dye" . B) in an eSet-derivative with ncol = the number of arrays, and n congruent matrices in the assayData slot. Currently I prefer A, because - most of the infrastructure is already there and the additional work is little - in B, the interpretation of the phenoData columns gets mushy because some columns will refer to the arrays, others to one particular sample of the n hybrised to each array, and we need additional infrastructure to resolve that.
What are typical actions with such an object? I'm particularly interested in access patterns for subsetting. Is getting a matrix for each color a common thing to do? I think the data organization of the expression values in option B (congruent matrices in assayData, one for each color) has some advantages in terms of accessing a given color in an efficient manner. Ratios of colors is vectorized easily and fast. With option A neigher operation is quite as straight forward I think. It is true that option B would require some amount of coding. Martin Morgan and I discussed this a bit we realized that one could have phenoData exactly the same as in option A. The phenoData table would have a special column (label/dye/color/colour) and values would correspond to named matrices in assayData. The eSet extension would then handle subsetting (this is the infrastructure that would need coding). I suspect that the efficiency difference in obtaining an expression matrix for a particular dye will make option B worth the effort. + seth
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org