Unexpected alteration of data frame column names
On Mon, 2007-05-14 at 23:59 -0700, Herve Pages wrote:
Hi, I'm using data.frame(..., check.names=FALSE), because I want to create a data frame with duplicated column names (in the real life you can get such data frame as the result of an SQL query):
> df <- data.frame(aa=1:5, aa=9:5, check.names=FALSE) > df
aa aa 1 1 9 2 2 8 3 3 7 4 4 6 5 5 5 Why is [.data.frame changing my column names?
> df[1:3, ]
aa aa.1 1 1 9 2 2 8 3 3 7 How can this be avoided? Thanks! H.
Herve,
I had not seen a reply to your post, but you can review the code for
"[.data.frame" by using:
getAnywhere("[.data.frame")
and see where there are checks for duplicate column names in the
function.
That is going to be the default behavior for data frame
subsetting/extraction and in fact is noted in the 'ONEWS' file for R
version 1.8.0:
- Subsetting a data frame can no longer produce duplicate
column names.
So it has been around for some time (October of 2003).
In terms of avoiding it, I suspect that you would have to create your
own version of the function, perhaps with an additional argument that
enables/disables that duplicate column name checks.
I have not however considered the broader functional implications of
doing so however, so be vewwy vewwy careful here.
HTH,
Marc Schwartz