Message-ID: <a695148b0905291321v220b4ac4o921a39af210bbee1@mail.gmail.com>
Date: 2009-05-29T20:21:45Z
From: G. Jay Kerns
Subject: Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified
In-Reply-To: <541199.45654.qm@web56004.mail.re3.yahoo.com>
Dear Jason,
On Fri, May 29, 2009 at 2:48 PM, Jason Rupert <jasonkrupert at yahoo.com> wrote:
>
> I think I am using the improved version of setdiff(...) that handles data.frames, so I think some odd behavior was expected but this one is escaping me.
>
> It appears that the the addition of duplicate entries is not caught by the setdiff(...). ?Is this expected behavior?
[snip]
> Thanks in advance for any feedback.
>
> Test1_DF<-data.frame(HouseSize=c(1:100))
> Test2_DF<-rbind(Test1_DF, Test1_DF)
> setdiff(Test1_DF, Test2_DF)
> integer(0)
> setdiff(Test2_DF, Test1_DF)
> integer(0)
>
> However,
> Test3_DF<-data.frame(HouseSize=c(1:25))
> setdiff(Test1_DF, Test3_DF)
> ?[1] ?26 ?27 ?28 ?29 ?30 ?31 ?32 ?33 ?34 ?35 ?36 ?37 ?38 ?39 ?40 ?41
> [17] ?42 ?43 ?44 ?45 ?46 ?47 ?48 ?49 ?50 ?51 ?52 ?53 ?54 ?55 ?56 ?57
> [33] ?58 ?59 ?60 ?61 ?62 ?63 ?64 ?65 ?66 ?67 ?68 ?69 ?70 ?71 ?72 ?73
> [49] ?74 ?75 ?76 ?77 ?78 ?79 ?80 ?81 ?82 ?83 ?84 ?85 ?86 ?87 ?88 ?89
> [65] ?90 ?91 ?92 ?93 ?94 ?95 ?96 ?97 ?98 ?99 100
>
> setdiff(Test3_DF, Test1_DF)
> integer(0)
You didn't explicitly say which "improved version" of setdiff() that
you are using, so I can only presume that you are using the
setdiff.data.frame in the prob package.
The behaviour you are observing is expected and matches the
base:::setdiff behaviour in the case of vectors; cf.
x1 <- c(1:100)
x2 <- c(x1,x1)
setdiff(x1, x2) # integer(0)
setdiff(x2, x1) # integer(0)
x3 <- c(1:25)
setdiff(x1, x3) # 26:100
setdiff(x3, x1) # integer(0)
>
> If so, is there another method or approach that should be used to identify duplicate row entries between two different data frames?
>
The R-help archives are chock full of every possible variant of
questions (and answers) about this, and you haven't said _exactly_
what you are looking for. In the absence of an already posted
solution, please specify exactly what you want and I'll wager an R
Ninja could dispatch it in moments.
Regards,
Jay
***************************************************
G. Jay Kerns, Ph.D.
Associate Professor
Department of Mathematics & Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gkerns at ysu.edu
http://www.cc.ysu.edu/~gjkerns/