[Bioc-devel] rownames in SummerizedExperiments

Martin Morgan · 2014-04-07T01:22:56Z

On 04/06/2014 04:21 PM, Michael Lawrence wrote: > > > > On Sun, Apr 6, 2014 at 2:48 PM, Simon Anders > wrote: > > Hi Michael > > On 06/04/14 23:32, Michael Lawrence wrote: > > On an arbitrary vector, the names do not need to be unique, but they DO > > need to be unique on a DataFrame (according to the data.frame > > conventions). Conditioning on whether there are duplicate names would be > > too complicated, so it is left

Martin Morgan

Sun, Apr 6, 2014 6:22 PM

On 04/06/2014 04:21 PM, Michael Lawrence wrote:

Empirically, the row names can be duplicated, but the column names cannot.

The lack of constraint on row names is enabled by the rowData GenomicRanges, 
while the constraint on column names is introduced by the (rownames of the) 
colData DataFrame. So the lack of symmetry in the class leads to lack of 
symmetry for dimnames. The use of GenomicRanges for rows has been the subject of 
previous discussion.

It wouldn't be inconceivable to impose constraints on duplicate row names in 
SummarizedExperiment and set use.names=TRUE by default, or to redefine mcols(se) 
to use.names=!any(dupclicated(se)). There would be performance consequences (how 
much?) and an mcols inconsistency. I think this is part of the same discussion as

   https://stat.ethz.ch/pipermail/bioc-devel/2014-March/005409.html

which I have not yet followed through on.

Syntax wise, there is also

   mcols(se)[rownames(se) == "gene_D", "yellowness"]

This is more efficient (and more error prone) than either use.names or Michael's 
suggestion.

Martin

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

[Bioc-devel] rownames in SummerizedExperiments

Thread (9 messages)