Skip to content
Prev 156008 / 398502 Next

difference of two data frames

It would be useful to have indexed both dataframes with a unique 
identifier, such as in rownames etc.

Without that information, you could possibly try to use the same 
approach as duplicated() does by "pasting together a character 
representation of rows" using "|" (or any other separator).

    keys1 <- apply(DF1, 1, paste, collapse="|")
    keys1
    [1] "1|a" "2|b" "3|c" "4|d" "5|e" "6|f"
    duplicated(keys1)
    [1] FALSE FALSE FALSE FALSE FALSE FALSE

    keys2 <- apply(DF2, 1, paste, collapse="|")
    keys2
    [1] "1|a" "2|b" "3|c"
    duplicated(keys2)
    [1] FALSE FALSE FALSE

The duplicated part is neccessary to ensure the key generated is truly 
unique. You might want to experiment and see if you can create a unique 
key using just a few columns.


    keys1 %in% keys2
    [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

    w <- setdiff( keys1, keys2 )
    DF1[ w, ]
       V1 V2
    4  4  d
    5  5  e
    6  6  f

Regards, Adai
joseph wrote: