Combining Overlapping Data
Hi:
This doesn't sort the data by strain level, but I think it does what
you're after. It helps if strain is either a factor or character
vector in each data frame.
h <- function(x, y) {
tbx <- table(x$strain)
tby <- table(y$strain)
# Select the strains who have more than one member
# in each data frame
mgrps <- intersect(names(tbx[tbx > 0]),
names(tby[tby > 0]))
# concatenate the data with common strains
rbind(subset(x, gp %in% mgrps),
subset(y, gp %in% mgrps))
}
# Result:
dc <- h(x, y)
HTH,
Dennis
On Fri, Nov 11, 2011 at 1:07 PM, kickout <kyle.kocak at gmail.com> wrote:
I've scoured the archives but have found no concrete answer to my question. Problem: Two data sets 1st data set(x) = 20,000 rows 2nd data set(y) = 5,000 rows Both have the same column names, the column of interest to me is a variable called strain. For example, a strain named "Chab1405" appears in x 150 times and in y 25 times... strain "Chab1999" only appears 200 times in x and none in y (so i dont want that retained). I want to create a new data frame that has all 175 measurements for "Chab1405" and any other 'strain' that appears in both the two data sets.. but not strains that appear in only one data set...So i want the intersection of two data sets (maybe?). I've tried x %in% y, but that only gives TRUE/FALSE -- View this message in context: http://r.789695.n4.nabble.com/Combining-Overlapping-Data-tp4032719p4032719.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.