Non-unique column names in data frames

Brian Ripley · 2007-04-03T07:21:31Z

On Sun, 1 Apr 2007, John Fox wrote: > Dear r-devel members, > > It's just been brought to my attention that R permits non-unique column > names in data frames -- e.g., via assignment to names() or colnames(). This > behaviour is consistent with the help files (as I discovered), but it's not > consistent with the behaviour of rownames() and row.names(). For example, ?? matrices and data frames are different, but rownames() and row.names() do the same on each class. > > row.names(airquality)

Brian Ripley

Tue, Apr 3, 2007 12:21 AM

On Sun, 1 Apr 2007, John Fox wrote:

??  matrices and data frames are different, but rownames() and row.names() 
do the same on each class.

as does rownames().

It's part of the definition of a data frame, from long ago (White Book 
p.60).  Think of the row names as a 'primary key' in the sense of a 
DBMS/SQL.

Why the names are not also required to be non-empty and unique 
is something for the designer (and John Chambers has not (yet) replied), 
but it is clearly deliberate as data.frame(check.names=FALSE) is allowed.
One possible issue is that there are many ways to set names of a data 
frame, e.g. DF$name <- value can add a column, and checking them all could 
be tedious.  OTOH, setting row names is centralized (it is done inside
attr<-()).

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Non-unique column names in data frames

Thread (2 messages)