Skip to content

merge bug fix in R 2.15.0

7 messages · Matt Dowle, Uwe Ligges, Steve Lianoglou +1 more

#
Is it intended that the first suffix can no longer be blank? Seems to be
caused by a bug fix to merge in R 2.15.0.

$Rdevel --vanilla
DF1 = data.frame(a=1:3,b=4:6)
DF2 = data.frame(a=1:3,b=7:9)
merge(DF1,DF2,by="a",suffixes=c("",".1"))
Error in merge.data.frame(DF1, DF2, by = "a", suffixes = c("", ".1")) :
  there is already a column named ?b?

$R --vanilla
R version 2.14.2 (2012-02-29)
a b b.1
1 1 4   7
2 2 5   8
3 3 6   9
Matthew
1 day later
#
Anyone?
1 day later
#
On 15.03.2012 22:48, Matthew Dowle wrote:
Right, the user is now protected against confusing himself by using 
names that were not unique before the merge.

Uwe Ligges
1 day later
#
Hi Uwe,

2012/3/17 Uwe Ligges <ligges at statistik.tu-dortmund.de>:
... now I'm confused :-)

If the user explicitly asks for a NULL/0/empty/whatever suffix,
they're not really going to be confusing themselves, right?

I actually feel like I do this often, where "this" is explicitly
asking to not add a suffix to one group of columns ... I do confuse
myself every and now and again, but not in this context, yet.

I can see that *this* confusing case is now handled w/ this change
(which wasn't before):

## I'm using R-devel compiled back in November, 2011 (r57571)
R> d1 <- data.frame(a=letters[1:10], b=rnorm(10), b.x=tail(letters, 10))
R> d2 <- data.frame(a=letters[1:10], b=101:110)
R> merge(d1, d2, by='a', suffixes=c('.x', '.y'))
   a         b.x b.x b.y
1  a -1.52250626   q 101
2  b -0.99865341   r 102
... ## Let's call this "exhibit A"

But if I do this:
R> merge(d1, d2, by='a', suffixes=c("", ".y"))

I totally expect:

   a           b b.x b.y
1  a -1.52250626   q 101
2  b -0.99865341   r 102
## Let's call this "exhibit B"
...

and not (using R-2.15.0 beta) (exhibit B):

Error in merge.data.frame(d1, d2, by = "a", suffixes = c("", ".y")) :
  there is already a column named 'b'

I can take a crack at a patch to keep the "rescue user from surprises"
example outlined in "exhibit A," but also letting user accomplish
"exhibit B" if there is a consensus of agreement on this particular
world view.

-steve
#
Hi,

I'm not sure I follow ... I think we're in total agreement, but it
sounds like you're suggesting we aren't.

On Sun, Mar 18, 2012 at 4:40 PM, Peter Meilstrup
<peter.meilstrup at gmail.com> wrote:
[snip]
I agree, that is confusing -- where did this happen?
Total agreement here.
But it didn't "still use '.x'" ... it didn't do anything.

There was a column name in the original table that ended with '.x' and
it wasn't changed since the call to merge asked for a blank suffix.
These were the two data.frames, for reference:

d1 <- data.frame(a=letters[1:10], b=rnorm(10), b.x=tail(letters, 10))
d2 <- data.frame(a=letters[1:10], b=101:110)

If you had those two data.frames, and you did this:

merge(d1, d2, by='a', suffixes=c("", ".y")

How is the following result surprising?

   a           b b.x b.y
1  a -1.52250626   q 101
2  b -0.99865341   r 102
...
I agree that the rule should be simple. I'm not sure why asking for a
blank ("") suffix somehow isn't simple.
What was "the surprising name change" you are referring to?

-steve