Skip to content

unexpected sort order with merge

3 messages · Johann Hibschman, Brad Patrick Schneid

#
`merge` lists sorted as if by character, not by the actual class of the
by-columns.
levels=c("b","a")),
                          x=1:5),
               data.frame(f=ordered(c("a","b"),
                                    levels=c("b","a")),
                          y=c(10,20)))
f x  y
1 a 1 10
2 a 4 10
3 b 2 20
4 b 3 20
5 b 5 20
f x  y
3 b 2 20
4 b 3 20
5 b 5 20
1 a 1 10
2 a 4 10

I expected the second order, not the first.

I actually ran into this issue when merging zoo yearmon columns, but
that adds a package dependency.  In that context, I observed different
behavior depending on whether I had one key or two:
date icpn foo bar
1 Apr 2000  500   4  40
2 Feb 2000  500   2  20
3 Jan 2000  500   1  10
4 Jun 2000  500   6  60
5 Mar 2000  500   3  30
6 May 2000  500   5  50
date foo bar
1 Jan 2000   1  10
2 Feb 2000   2  20
3 Mar 2000   3  30
4 Apr 2000   4  40
5 May 2000   5  50
6 Jun 2000   6  60

The first example appears to sort by the name of the date, not by the
actual date value.

The documentation of `merge` says the sort is "lexicographic", but I
assumed that was in the cartesian-product sense, not in some
convert-everything-to-character sense.

Is this behavior expected?

Thanks,
Johann


P.S.
R version 2.10.1 (2009-12-14) 
x86_64-unknown-linux-gnu 

locale:
[1] C

attached base packages:
[1] grid      splines   stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] ggplot2_0.8.8   reshape_0.8.3   Rauto_1.0       plyr_1.1       
[5] zoo_1.6-4       Hmisc_3.7-0     survival_2.35-8 ascii_0.7      
[9] proto_0.3-8    

loaded via a namespace (and not attached):
[1] cluster_1.12.1  digest_0.4.2    lattice_0.17-26 tools_2.10.1
#
That is odd, I noticed some weird sorting with merge() a while back too and
always am careful with it now.  Fortunately, sort=FALSE seems to work the
way one would think most of the time.  

Although, the following results seem weird too!  (adding by="date" makes it
not sort oddly, regardless of sort=TRUE or FALSE)
date icpn.x foo icpn.y bar
1 Jan 2000    500   1    500  10
2 Feb 2000    500   2    500  20
3 Mar 2000    500   3    500  30
4 Apr 2000    500   4    500  40
5 May 2000    500   5    500  50
6 Jun 2000    500   6    500  60
date icpn.x foo icpn.y bar
1 Jan 2000    500   1    500  10
2 Feb 2000    500   2    500  20
3 Mar 2000    500   3    500  30
4 Apr 2000    500   4    500  40
5 May 2000    500   5    500  50
6 Jun 2000    500   6    500  60
date icpn.x foo icpn.y bar
1 Jan 2000    500   1    500  10
2 Feb 2000    500   2    500  20
3 Mar 2000    500   3    500  30
4 Apr 2000    500   4    500  40
5 May 2000    500   5    500  50
6 Jun 2000    500   6    500  60
date icpn foo bar
1 Apr 2000  500   4  40
2 Feb 2000  500   2  20
3 Jan 2000  500   1  10
4 Jun 2000  500   6  60
5 Mar 2000  500   3  30
6 May 2000  500   5  50



--
View this message in context: http://r.789695.n4.nabble.com/unexpected-sort-order-with-merge-tp3431338p3432250.html
Sent from the R help mailing list archive at Nabble.com.
#
B77S <bps0002 at auburn.edu> writes:
Thanks for checking.  Is this on a more recent version of R than 2.10.1?
(I'm half-hoping this is something fixed in a newer R, so I can use it
as an excuse to demand an upgrade.)
[...]
I think this is equivalent to the "single column" version.  For yearmon
objects, when `merge` only has one column to sort by, it seems to do the
right thing.  It only uses alphabetical order when there are more than
one column.  For ordered factors, though, even the single-column merge
was giving me strange sorts.

-Johann


P.S. Just in case people have bad threading on their mail/news reader, here's
the "bad sort" example: