compare two data frames of different dimensions and only keep unique rows
No, but I tried your way too. In fact, the only three unique rows are these ones: Product Price Nbr.Lots Cocoa 2440 5 Cocoa 2450 1 Cocoa 2440 6 Here is a dirty working trick I found :
df<-merge(exportfile,reported,all.y=T) df1<-merge(exportfile,reported) dff1<-do.call(paste,df) dff<-do.call(paste,df) dff1<-do.call(paste,df1) df[!dff %in% dff1,]
Product Price Nbr.Lots 3 Cocoa 2440 5 4 Cocoa 2450 1 ? My two problems are : I do think it is not so a clean code, then I won't know by advance which of my two df will have the greates dimension (I can add some lines to deal with it, but again, seems very heavy). I hoped I could find a better solution. A2CT2 Ltd. -----Original Message----- From: jim holtman [mailto:jholtman at gmail.com] Sent: lundi 27 f?vrier 2012 18:42 To: Arnaud Gaboury Cc: r-help at r-project.org Subject: Re: [R] compare two data frames of different dimensions and only keep unique rows is this what you want:
v <- rbind(reported, exportfile) v[!duplicated(v), ]
Product Price Nbr.Lots 1 Cocoa 2331.00 -61 2 Cocoa 2356.00 -61 3 Cocoa 2440.00 5 4 Cocoa 2450.00 1 6 Coffee C 204.55 40 7 Coffee C 205.45 40 5 GC 17792.00 -1 10 Sugar No 11 24.81 -1 8 ZS 1273.50 -1 9 ZS 1276.25 1 13 Cocoa 2440.00 6
On Mon, Feb 27, 2012 at 12:36 PM, Arnaud Gaboury <arnaud.gaboury at a2ct2.com> wrote:
Dear list,
I am still struggling with something that should be easy: I compare two data frames with a lot of common rows and want to keep only rows that are NOT in both data frames, unique.
Here are an example of these data frame.
reported <-
structure(list(Product = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 4L,
5L, 5L), .Label = c("Cocoa", "Coffee C", "GC", "Sugar No 11", "ZS"),
class = "factor"), Price = c(2331, 2356, 2440, 2450, 204.55, 205.45,
17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61L, -61L, 5L, 1L, 40L,
40L, -1L, -1L, -1L, 1L)), .Names = c("Product", "Price", "Nbr.Lots"),
row.names = c(1L, 2L, 3L, 4L, 6L, 7L, 5L, 10L, 8L, 9L), class =
"data.frame")
exportfile <-
structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C",
"Coffee C", "GC", "Sugar No 11", "ZS", "ZS"), Price = c(2331, 2356,
2440, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots =
c(-61, -61, 6, 40, 40, -1, -1, -1, 1)), .Names = c("Product", "Price",
"Nbr.Lots"), row.names = c(NA, 9L), class = "data.frame")
I can rbind() them, thus resulting in one data frame with duplicated
row, but I have no idea how to delete duplicated rows. I have tried
plyaing with unique(), duplicated with no success
v<-rbind(exportfile,reported)
v <-
structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C",
"Coffee C", "GC", "Sugar No 11", "ZS", "ZS", "Cocoa", "Cocoa",
"Cocoa", "Cocoa", "Coffee C", "Coffee C", "GC", "Sugar No 11", "ZS",
"ZS"), Price = c(2331, 2356, 2440, 204.55, 205.45, 17792, 24.81,
1273.5, 1276.25, 2331, 2356, 2440, 2450, 204.55, 205.45, 17792, 24.81,
1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40, 40, -1, -1, -1, 1,
-61, -61, 5, 1, 40, 40, -1, -1, -1, 1)), .Names = c("Product",
"Price", "Nbr.Lots"), row.names = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "11", "21", "31", "41", "61", "71", "51", "10", "81", "91"),
class = "data.frame")
TY for your help
Arnaud Gaboury
A2CT2 Ltd.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.