Hello, I'm switching over from SAS to R and am having trouble merging data frames. The data frames have several columns with the same name, and each has a different number of rows. Some of the values are missing from cells with the same column names in each data frame. I had hoped that when I merged the dataframes, every column with the same name would be merged, with the value in a complete cell overwriting the value in an empty cell from the other data frame. I cannot seem to achieve this result, though I've tried several merge adaptations:
x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
merge(x,y,by="id") #I lose observations here (n=1 in this example), and my items are duplicated - I do not want this result
id item1.x item2.x item1.y item2.y
1 1 NA 1 NA NA
2 2 NA NA 2 NA
3 3 3 NA NA 3
4 4 4 4 4 4
5 5 5 5 5 5
merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here) and do not want this result
id item1 item2
1 4 4 4
2 5 5 5
merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated and the NA values are retained - I instead want one row per ID
id item1 item2
1 1 NA 1
2 1 NA NA
3 2 2 NA
4 2 NA NA
5 3 3 NA
6 3 NA 3
7 4 4 4
8 5 5 5
9 6 6 NA
In reality I have multiple data frames with numerous columns, all with this problem. I can do the merge seamlessly in SAS, but am trying to learn and stick with R for my analyses. Any help would be greatly appreciated.
Steve Lubitz
Cardiovascular Research Fellow, Brigham and Women's Hospital and Massachusetts General Hospital
merge data frames with same column names of different lengths and missing values
4 messages · Steven Lubitz, Dieter Menne, Domenico Vistocco +1 more
Steven Lubitz <slubitz1 <at> yahoo.com> writes:
x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5) y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
....
merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated
and the NA values are
retained - I instead want one row per ID id item1 item2 1 1 NA 1 2 1 NA NA 3 2 2 NA 4 2 NA NA 5 3 3 NA 6 3 NA 3 7 4 4 4 8 5 5 5 9 6 6 NA
I think you only got the wrong (too complex) function. Try rbind(x,y) Dieter
Steven Lubitz wrote:
Hello, I'm switching over from SAS to R and am having trouble merging data frames. The data frames have several columns with the same name, and each has a different number of rows. Some of the values are missing from cells with the same column names in each data frame. I had hoped that when I merged the dataframes, every column with the same name would be merged, with the value in a complete cell overwriting the value in an empty cell from the other data frame. I cannot seem to achieve this result, though I've tried several merge adaptations:
x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
merge(x,y,by="id") #I lose observations here (n=1 in this example), and my items are duplicated - I do not want this result
id item1.x item2.x item1.y item2.y
1 1 NA 1 NA NA
2 2 NA NA 2 NA
3 3 3 NA NA 3
4 4 4 4 4 4
5 5 5 5 5 5
merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here) and do not want this result
id item1 item2
1 4 4 4
2 5 5 5
merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated and the NA values are retained - I instead want one row per ID
id item1 item2
1 1 NA 1
2 1 NA NA
3 2 2 NA
4 2 NA NA
5 3 3 NA
6 3 NA 3
7 4 4 4
8 5 5 5
9 6 6 NA
You should obtain the desired solution using:
merge(y, x, by=c("id","item1","item2"), all=TRUE)
In database terminology all=TRUE corresponds to the full outer join,
all.x to the left outer join and all.y to the right outer join.
Ciao,
domenico
In reality I have multiple data frames with numerous columns, all with this problem. I can do the merge seamlessly in SAS, but am trying to learn and stick with R for my analyses. Any help would be greatly appreciated. Steve Lubitz Cardiovascular Research Fellow, Brigham and Women's Hospital and Massachusetts General Hospital
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090307/410492a4/attachment-0002.pl>