Message-ID: <541199.45654.qm@web56004.mail.re3.yahoo.com>
Date: 2009-05-29T18:48:49Z
From: Jason Rupert
Subject: Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified
I think I am using the improved version of setdiff(...) that handles data.frames, so I think some odd behavior was expected but this one is escaping me.
It appears that the the addition of duplicate entries is not caught by the setdiff(...). Is this expected behavior?
If so, is there another method or approach that should be used to identify duplicate row entries between two different data frames?
Thanks in advance for any feedback.
Test1_DF<-data.frame(HouseSize=c(1:100))
Test2_DF<-rbind(Test1_DF, Test1_DF)
setdiff(Test1_DF, Test2_DF)
integer(0)
setdiff(Test2_DF, Test1_DF)
integer(0)
However,
Test3_DF<-data.frame(HouseSize=c(1:25))
setdiff(Test1_DF, Test3_DF)
[1] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
[17] 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
[33] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
[49] 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
[65] 90 91 92 93 94 95 96 97 98 99 100
setdiff(Test3_DF, Test1_DF)
integer(0)