Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified
Jason,
On Fri, May 29, 2009 at 5:58 PM, Jason Rupert <jasonkrupert at yahoo.com> wrote:
Jay, Thanks much for the reply. ? ?I think you are right about the prob. Unfortunately, I was not able to find the old emails I had discussing the use of the more powerful setdiff that essentially inherits from the base class R setdiff functionality but extends that functionality by now working with data.frames instead of just a simple array of values. ?Love this functionality.
Your previous post is here [1] http://tolstoy.newcastle.edu.au/R/e6/help/09/03/7781.html and my earlier post is here: [2] https://stat.ethz.ch/pipermail/r-devel/2007-December/047706.html (please note that the link in [1] referring to [2] is now broken). As mentioned in [2], the notions of "set" and "element" are ambiguous in the data frame case... what is an element...? a row, a column, or a single entry?
However, for the following example,
Test1_DF<-data.frame(HouseSize=c(1:100), LandLocation=c("Here"))
Test1_DF<-data.frame(HouseSize=c(1:100), LandLocation=c("Here"), Price = c("Low"))
Test2_DF<-rbind(Test1_DF, Test1_DF)
setdiff(Test1_DF, Test2_DF)
[1] HouseSize ? ?LandLocation Price
<0 rows> (or 0-length row.names)
setdiff(Test2_DF, Test1_DF)
[1] HouseSize ? ?LandLocation Price <0 rows> (or 0-length row.names) I was hoping for this example one of the setdiff's would have returned essentially Test1_DF, since it is duplicated and that is what is different between the two dataframes. So, I guess I am trying to figure out a way to truely diff the dataframes, i.e. determine when two data.frames are different from one another and then receive the output of the results. Does this capability exist in a function within a current R package or does it exist within a typically used pattern to create this functionality? Thanks again for any feedback you can provide.
Your question speaks to the ambiguity above. For instance, your 2nd example would be solved by a setdiff for data frames that operates column-wise. If that is all you want, then IIRC there are at least 3 independent solutions in [2] to the row-wise problem. It should be easy enough to tweak one of them to operate on columns instead. For an efficient setdiff() for data frames that can decipher on-the-fly which of row/column/entry is desired, I am going to have to defer to the aforementioned Ninjas. :-)
Also, I tried to determine my Session Info and the packages I have loaded, but I received the following:
sessionInfo()
Error in x$Priority : $ operator is invalid for atomic vectors
Ninjas. Hope this helps, Jay