Skip to content
Back to formatted view

Raw Message

Message-ID: <541199.45654.qm@web56004.mail.re3.yahoo.com>
Date: 2009-05-29T18:48:49Z
From: Jason Rupert
Subject: Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified

I think I am using the improved version of setdiff(...) that handles data.frames, so I think some odd behavior was expected but this one is escaping me.  

It appears that the the addition of duplicate entries is not caught by the setdiff(...).  Is this expected behavior? 

If so, is there another method or approach that should be used to identify duplicate row entries between two different data frames? 

Thanks in advance for any feedback. 

Test1_DF<-data.frame(HouseSize=c(1:100))
Test2_DF<-rbind(Test1_DF, Test1_DF)
setdiff(Test1_DF, Test2_DF)
integer(0)
setdiff(Test2_DF, Test1_DF)
integer(0)

However, 
Test3_DF<-data.frame(HouseSize=c(1:25))
setdiff(Test1_DF, Test3_DF)
 [1]  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41
[17]  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57
[33]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73
[49]  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
[65]  90  91  92  93  94  95  96  97  98  99 100

setdiff(Test3_DF, Test1_DF)
integer(0)