Skip to content
Prev 378454 / 398502 Next

problems when merging two data sets

Quite agree with Jeff Newmiller and Bert Gunter.

The error you get (" 'by' must specify a uniquely valid column") is a 
very common mistake when the function merge is misused. Although, the 
function merge is the good choice. Have you read the manual of the 
function sending the command `?merge`. That is always a good start.

Hereafter is what the function call look like:

`merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, 
all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = 
c(".x",".y"), no.dups = TRUE, incomparables = NULL, ...)`

For your matter, you probably need only 4 arguments:

`merge(x = dataset1, y = dataset2, by.x = "key1", by.y = "key2")`

In the example, key1 correspond to the column name in the dataset1 that 
should match the column name in the dataset2. Likewise for key2.

Again, read the manual to understand the other arguments, I would 
especially advise you to look at the arguments suffixes, all.x, all.y 
which will help you doing exactly what you want.

Cheers,

Francois COLLIN
On 05/02/2019 19:49, Bert Gunter wrote: