rownames, colnames, and date and time

Thu, Mar 30, 2006 1:28 AM

I haven't been following all of this thread, but
it reminds me of a bug that was in S-PLUS not
too long ago where dimnames could sometimes
be numeric.  This caused some problems that
were very hard to track down because there were
no visual clues of what was really wrong.

I've been pleased not to encounter that in R and
hope it continues.

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Prof Brian Ripley wrote:

Looking at the code it occurs to me that there is another case you have 
not considered, namely dimnames().

rownames<- and colnames<- are just wrappers for dimnames<-, so consistency 
does mean that all three should behave the same.

For arrays (including matrices), dimnames<- is primitive.  It coerces 
factors to character, and says in the C code

    /* if (isObject(val1)) dispatch on as.character.foo, but we don't
       have the context at this point to do so */

so someone considered this before now.

For data frames, dimnames<-.data.frame is used.  That calls row.names<- 
and names<-, and the first has a data.frame method.  Only the row.names<- 
method is documented to coerce its value to character, and I think it _is_ 
all quite consistent.  The basic rule is that all these functions coerce 
for data frames, and none do for arrays.

However, there was a problematic assumption in the row.names<-.data.frame 
and dimnames<-.data.frame methods, which tested the length of 'value' 
before coercion.  That sounds reasonable, but in unusual cases such as 
POSIXlt, coercion changes the length, and I have swapped the lines around.

What you expected was that dimnames<-() would coerce to character, 
although I can find no support for that expectation in the documentation. 
If it were not a primitive function that would be easy to achieve, but as 
it is, it would need an expert in the internal code to change.  There is 
also the risk of inconsistency, since as the comment says, the C code is 
used in places where the context is not known.  I think this is probably 
best left alone.


On Wed, 29 Mar 2006, Prof Brian Ripley wrote:

Yet again, this is the wrong list for suggesting changes to R.  Please do use 
R-devel for that purpose (and I have moved this).

If this bothers you (it all works as documented, so why not use it as 
documented?), please supply a suitable patch to the current R-devel sources 
and it will be considered.

And BTW, row.names is the canonical accessor function for data frames,
and its 'value' argument is documented differently from that for rownames for 
an array.  Cf:

Details:

   The extractor functions try to do something sensible for any
   matrix-like object 'x'.  If the object has 'dimnames' the first
   component is used as the row names, and the second component (if
   any) is used for the col names.  For a data frame, 'rownames' and
   'colnames' are equivalent to 'row.names' and 'names' respectively.

Note:

   'row.names' is similar to 'rownames' for arrays, and it has a
   method that calls 'rownames' for an array argument.

I am not sure why R decided to add rownames for the same purpose as 
row.names: eventually they were made equivalent.


On Tue, 21 Mar 2006, Erich Neuwirth wrote:

I noticed something surprising (in R 2.2.1 on WinXP)
According to the documentation, rownames and colnames are character 
vectors.
Assigning a vector of class POSIXct or POSIXlt as rownames or colnames
therefore is not strictly according to the rules.
In some cases, R performs a reasonable typecast, but in some other cases
where the same typecast also would be possible, it does not.

Assigning a vector of class POSIXct to the rownames or names of a
dataframe creates a reasonable string representation of the dates (and
possibly times).
Assigning such a vector to the rownames or colnames of a matrix produces
rownames or colnames consisting of the integer representation of the
date-time value.
Trying to assign a vector of class POSIXlt in all cases
(dataframes and matrices, rownames, colnames, names)
produces an error.

Demonstration code is given below.

This is somewhat inconsistent.
Perhaps a reasonable solution could be that the typecast
used for POSIXct and dataframes is used in all the other cases also.

Code:

mymat<-matrix(1:4,nrow=2,ncol=2)
mydf<-data.frame(mymat)
mydates<-as.POSIXct(c("2001-1-24","2005-12-25"))

rownames(mydf)<-mydates
names(mydf)<-mydates
rownames(mymat)<-mydates
colnames(mymat)<-mydates

print(deparse(mydates))
print(deparse(rownames(mydf)))
print(deparse(names(mydf)))
print(deparse(rownames(mymat)))
print(deparse(colnames(mymat)))

mydates1<-as.POSIXlt(mydates)

# the following lines will not work and
# produce errors

rownames(mydf)<-mydates1
names(mydf)<-mydates1
rownames(mymat)<-mydates1
colnames(mymat)<-mydates1

rownames, colnames, and date and time

Thread (4 messages)