Non-unique column names in data frames
On Sun, 1 Apr 2007, John Fox wrote:
Dear r-devel members, It's just been brought to my attention that R permits non-unique column names in data frames -- e.g., via assignment to names() or colnames(). This behaviour is consistent with the help files (as I discovered), but it's not consistent with the behaviour of rownames() and row.names(). For example,
?? matrices and data frames are different, but rownames() and row.names() do the same on each class.
row.names(airquality) <- rep("a", nrow(airquality))
generates an error, but
as does rownames().
names(airquality) <- rep("a", ncol(airquality))
or even
names(airquality) <- rep("", ncol(airquality))
do not.
I figure that there must be some rationale for this difference, but I can't
think of what it might be. Any thoughts?
It's part of the definition of a data frame, from long ago (White Book p.60). Think of the row names as a 'primary key' in the sense of a DBMS/SQL. Why the names are not also required to be non-empty and unique is something for the designer (and John Chambers has not (yet) replied), but it is clearly deliberate as data.frame(check.names=FALSE) is allowed. One possible issue is that there are many ways to set names of a data frame, e.g. DF$name <- value can add a column, and checking them all could be tedious. OTOH, setting row names is centralized (it is done inside attr<-()).
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595