Filtering a dataset's columns by another dataset's column names
on 02/27/2009 11:27 AM Josh B wrote:
Hello all, I hope some of you can come to my rescue, yet again. I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. Here is a toy example (my real datasets have hundreds of columns): Dataset 1: Individual SNP1 SNP2 SNP3 SNP4 SNP5 1 A G T C A 2 T C A G T 3 A C T C A Dataset 2: Individual SNP1 SNP3 SNP5 SNP6 SNP7 4 A T T G C 5 T A A G G 6 A A T C G I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this: Individual SNP1 SNP3 SNP5 1 A T A 2 T A T 3 A T A Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the "merge" function. Thanks very much for your help everyone. Josh B.
Same.Cols <- intersect(names(DF1), names(DF2))
Same.Cols
[1] "Individual" "SNP1" "SNP3" "SNP5"
rbind(DF1[, Same.Cols], DF2[, Same.Cols])
Individual SNP1 SNP3 SNP5 1 1 A T A 2 2 T A T 3 3 A T A 4 4 A T T 5 5 T A A 6 6 A A T See ?intersect, which gives you the common column names, which you can then use in rbind(). HTH, Marc Schwartz