[... ]
Well, sure but that is because it happens to be a list with each element
having length one. In which case, it really should not have been a list at
all, and the fact that it was seems a deeper problem that should likely be
resolved instead of treating the symptom, in my opinion.
I wouldn't mind it explicitly failing on the ground that you don't join a
list column on a character column, and I wouldn't mind it succeeding
either, because it's consistent with `c("a", "b") == list("a", "b")` and
`c("a", "b") %in% list("a", "b")` returning `c(TRUE, TRUE)`. But I feel
strongly that it shouldn't behave differently depending on which data frame
is provided first to the function, and I do think that if we do make it an
error, it is worth making it understandable.
> df1 <- data.frame(a=1)
df2 <- data.frame(b=2)
df1$id <- "ID"
df2$id <- list(c("ID", "ID2"))
merge(df1, df2)
[1] id a b
<0 rows> (or 0-length row.names)
Thats probably not what you wanted it to do, right? Or maybe it is, it
depends, right?. And therein lies the rub.
I have to be honest, as a developer, I really wish this, even in your
example case, threw an error. Anything else just looks to me like a
debugging nightmare looming in the wings waiting to strike.
What I did wrong in my real case, to provide context, is compute `df2$id <-
lapply(x, fun)`, which was a mistake, but looked ok when printing, `vapply`
solved the issue, `sapply` would still have been problematic because
`df2$id` would be an emply list for a `x` of length 0.
After correcting my mistake I tried to isolate the error and had trouble
reproducing it with my simple case because I was inverting both data frames
argument. This is how the inconsistency + cryptic message caused me more
trouble than I think it should have.
Imagine that I can have production code work for years with `merge(df1,
df2)`, maybe not written by me, I change it to `merge(df2, df1)` for some
reason and all breaks loose with `Error in sort.list(bx[m$xi]): 'x' must be
atomic for 'sort.list', method "shell" and "quick"`. If I'm not familiar
with list columns and that they can print just like character columns I
might have a rough day.
Here's another oddity that I think is worth fixing :
df1 <- data.frame(a=1, id = "ID")
df3 <- data.frame(c=character(), id = list())
merge(df3, df1)
#> [1] x[FALSE, ] a id
#> <0 lignes> (ou 'row.names' de longueur nulle)
merge(df1, df3)
#> [1] a id y[FALSE, ]
#> <0 lignes> (ou 'row.names' de longueur nulle)
[...]
There's no reason (in principle) you wouldn't be able to join by a list
column, they should just both have to be list columns, in my ideal (but
admittedly unlikely) world. Id rather the atomic-vector/list mismatch case
throw an error, myself.
The doc does say that "This is intended to work with data frames with
vector-like columns" in a note at the bottom, so anything we do is
consistent with the doc, and fine by me if it fails (that's how {dplyr}
joins work), but let the order of the data frames not matter. A warning is
another option.
Now I kind of doubt we can change the behavior that works now, but as Avi
points out, I think this is something that is complicated and case specific
enough that it really ought to be your job as the coder to take care of
what should happen when you try to merge on columns that are fundamentally
different types.
Well yes, one can always say it's the developer's fault, but we all
appreciate a software that guides us toward the light. List columns are not
a rare thing at all anymore and using an `lapply` call instad of `sapply`
or `vapply` is probably not a rare mistake. And again, the inconsistency is
wrong in any case.
I'll read other answers when I get the digest.
Thanks,
Antoine