Message-ID: <a695148b0905301519r4accb96aof6ee4daec3ea4bbe@mail.gmail.com>
Date: 2009-05-30T22:19:19Z
From: G. Jay Kerns
Subject: setdiff bizarre (was: odd behavior out of setdiff)
In-Reply-To: <157742.14855.qm@web56005.mail.re3.yahoo.com>
Jason,
(moved back to R-help)
On Sat, May 30, 2009 at 3:30 PM, Jason Rupert <jasonkrupert at yahoo.com> wrote:
>
> Jay,
>
>
> I really appreciate all your help help.
>
> I posted to Nabble an R file and input CSV files more accurately demonstrating what I am seeing and the output I desire to achieve when I difference two dataframes.
> http://n2.nabble.com/Support-SetDiff-Discussion-Items...-td2999739.html
>
>
> It may be that "setdiff" as intended in the base R functionality and "prob" was never intended to provide the type of result I desire. ?If that is the case then I will need to ask the "Ninjas" for help to produce the out come I seek.
>
> That is, when I different the data within RSetDiffEntry.csv and RSetDuplicatesRemoved.csv, I desire to get the result shown in ?RDesired.csv.
>
> Note that, it would not be enough to just work to remove duplicate "CostPerSquareFoot" values, since that variable is tied to "EntryDate" and "HouseNumber".
>
> Any further help and insights are much appreciated.
>
> Thanks again,
> Jason
>
>From your description, something like the following should work:
Let A = your RSetDiffEntry
Let B = your RSetDuplicatesRemoved...
library(prob)
C <- setdiff(A,B)
D <- rbind(A,C)
E <- D[duplicated(D),]
The E should = your RDesired.
Hope this helps,
Jay
P.S. I notice your row number 7 in "RSetDuplicatesRemoved" is
duplicated by the following row. That's a typo, yes? If so, then E
should have one more row than your "RDesired."