Skip to content

Sort problem with merge (again)

3 messages · Bruce LaZerte, Gabor Grothendieck, Brian Ripley

#
# R version 2.3.1 (2006-06-01) Debian Linux "testing"

# Is the following behaviour a bug, feature or just a lack of
# understanding on my part? I see that this was discussed here
# last March with no apparent resolution.

d <- as.factor(c("1970-04-04","1970-08-11","1970-10-18"))
x <- c(9,10,11)
ch <- data.frame(Date=d,X=x)

d <- as.factor(c("1970-06-04","1970-08-11","1970-08-18"))
y <- c(109,110,111)
sp <- data.frame(Date=d,Y=y)

df <- merge(ch,sp,all=TRUE,by="Date")
# the rows with dates missing all ch vars are tacked on the end.
# the rows with dates missing all sp vars are sorted in with
# the row with a date with vars from both ch and sp
# is.ordered(df$Date) returns FALSE

# The rows of df are not sorted as they should be as sort=TRUE
# is the default. Adding sort=TRUE does nothing.
# So try this:
# dd <- df[order(df$Date),]
# But that doesn't work.
# Nor does sort(df$Date)
# But sort(as.vector(df$Date)) does work.
# As does order(as.vector(df$Date)), so this works:
dd <- df[order(as.vector(df$Date)),]
# ?????
#
If you want it to act like a date store it as a Date:

dx <- as.Date(c("1970-04-04","1970-08-11","1970-10-18")) ###
x <- c(9,10,11)
ch <- data.frame(Date=dx,X=x)

dy <- as.Date(c("1970-06-04","1970-08-11","1970-08-18")) ###
y <- c(109,110,111)
sp <- data.frame(Date=dy,Y=y)

merge(ch, sp, all = TRUE)

By the way you might consider using zoo objects here:

library(zoo)
chz <- zoo(x, dx)
spz <- zoo(y, dy)
merge(chz, spz)

See:
vignette("zoo")
On 9/25/06, Bruce LaZerte <bdl at fwr.on.ca> wrote:
#
On Mon, 25 Sep 2006, Bruce LaZerte wrote:

            
Reference?  It is the third alternative.  A factor is sorted by its codes: 
consider
[1] 1 2 3
Levels: 3 2 1
[1] 3 2 1
Levels: 3 2 1

and that is what is happening here: for your example the levels of df$Date 
are
[1] "1970-04-04" "1970-08-11" "1970-10-18" "1970-06-04" "1970-08-18"

so the result is sorted correctly.

If you want to sort a character column in lexicographic order, don't make 
it into a factor. Similarly for a date column: use class "Date".