Compare two data sets
<amarkey at uiuc.edu> wrote in news:20080325101909.BDK93111 at expms2.cites.uiuc.edu:
I would like to compare two data sets saved as text files (example below) to determine if both sets are identical(or if dat2 is missing information that is included in dat1) and if they are not identical list what information is different between the two sets(ie output "a1", "a3" as the differing information). The overall purpose would be to remove "a1" and "a3" from dat 1 so both dat1 and dat2 are the same. My R abilities are somewhat limited so any suggestions are greatly appreciated.
I do not understand what it would mean to remove elements so "they would look the same". Why wouldn't you just use the smaller set?
Alysta dat1 a1 a2 a3 a4 a5 a6 dat2 a2 a4 a5 a6
You might want to look at the %in% function. These examples created
with neither dat1 nor dat2 being proper subsets of the other.
dat1 <- paste('a', 1:6, sep='')
dat2 <- paste('a', c(2,4:6,8,9,10), sep='')
dat1
[1] "a1" "a2" "a3" "a4" "a5" "a6"
dat2
[1] "a2" "a4" "a5" "a6" "a8" "a9" "a10" dat2 %in% dat1 #[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE dat1 %in% dat2 #[1] FALSE TRUE FALSE TRUE TRUE TRUE ### And then use the logical vectors as index arguments ### to first get the common elements
dat1[dat1 %in% dat2]
[1] "a2" "a4" "a5" "a6"
dat2[dat2 %in% dat1]
[1] "a2" "a4" "a5" "a6" ### And then to find the non-shared elements
dat2[!(dat2 %in% dat1)]
[1] "a8" "a9" "a10"
dat1[!(dat1 %in% dat2)]
[1] "a1" "a3"
David Winsemius