[Bioc-devel] read.AnnotatedDataFrame
Hi Seth, internal representation is one part of the story and I agree that row names are the way to go here. Another point however is how the user gets the information into R. At some point we need to match sample names and the sample meta data and IMO this should already be at the level of the text file. The closest to the row names idea is probably to take the first column in the file as the sample identifier, but this poses a pretty strict layout on the files (maybe for some users the first column is already the row numbering...). As far as I understand the current implementation the default is to take the first column and that you can pass row.names=x to read.AnnotatedDataFrame but there is this additional sampleNames parameter and I find this pretty confusing. So currently you can do almost everything with the function which is good in one sense but on the other hand might cause mix ups and confusion to the user. If the mapping is already clear at the level of the text file, we can sit back and tell people to check their files if something isn't showing up as they expect it to be, but currently you can do pretty stupid stuff just by setting a wrong argument without even realizing. I had the impression at the Bressanone courses that for the average user the biggest obstacle is to get all the necessary data from files somewhere on the hard disk into R and that it is important to provide a straightforward default way of doing that. Best, Florian Seth Falcon schrieb:
Florian Hahne <f.hahne at dkfz-heidelberg.de> writes:
I'm not sure about having the sample names as row.names, though. I think
there used to be a mandatory column "name" to store them, which I
personally liked better (in many spreadsheet programs the concept of row
names is somewhat vague...) .
Interesting. The row names are special since they must often be aligned with other object and can be used for subsetting. I have no problem with a "name" column being recognized by an import tool (aside from issues of name collisions -- what if I have a variable named "name"). But I think performance concerns will move us away from having such a column in the actual representation of the object. What we are moving towards is a setup where the row names are stored in a separate slot and may eventually be an external vector that can be shared among other objects that need to align on that vector. + seth
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Florian Hahne Abt. Molekulare Genomanalyse (B050) Deutsches Krebsforschungszentrum (DKFZ) Im Neuenheimer Feld 580 D-69120 Heidelberg phone: 0049 6221 424764 fax: 0049 6221 423454 web: www.dkfz.de/mga