Skip to content

Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified

2 messages · Jason Rupert, G. Jay Kerns

#
I think I am using the improved version of setdiff(...) that handles data.frames, so I think some odd behavior was expected but this one is escaping me.  

It appears that the the addition of duplicate entries is not caught by the setdiff(...).  Is this expected behavior? 

If so, is there another method or approach that should be used to identify duplicate row entries between two different data frames? 

Thanks in advance for any feedback. 

Test1_DF<-data.frame(HouseSize=c(1:100))
Test2_DF<-rbind(Test1_DF, Test1_DF)
setdiff(Test1_DF, Test2_DF)
integer(0)
setdiff(Test2_DF, Test1_DF)
integer(0)

However, 
Test3_DF<-data.frame(HouseSize=c(1:25))
setdiff(Test1_DF, Test3_DF)
 [1]  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41
[17]  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57
[33]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73
[49]  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
[65]  90  91  92  93  94  95  96  97  98  99 100

setdiff(Test3_DF, Test1_DF)
integer(0)
#
Dear Jason,
On Fri, May 29, 2009 at 2:48 PM, Jason Rupert <jasonkrupert at yahoo.com> wrote:
[snip]
You didn't explicitly say which "improved version" of setdiff() that
you are using, so I can only presume that you are using the
setdiff.data.frame in the prob package.

The behaviour you are observing is expected and matches the
base:::setdiff behaviour in the case of vectors;  cf.

x1 <- c(1:100)
x2 <- c(x1,x1)

setdiff(x1, x2)  # integer(0)
setdiff(x2, x1)  # integer(0)

x3 <- c(1:25)
setdiff(x1, x3)  # 26:100
setdiff(x3, x1)  # integer(0)
The R-help archives are chock full of every possible variant of
questions (and answers) about this, and you haven't said _exactly_
what you are looking for. In the absence of an already posted
solution, please specify exactly what you want and I'll wager an R
Ninja could dispatch it in moments.

Regards,
Jay









***************************************************
G. Jay Kerns, Ph.D.
Associate Professor
Department of Mathematics & Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gkerns at ysu.edu
http://www.cc.ysu.edu/~gjkerns/