Skip to content

Filtering a dataset's columns by another dataset's column names

6 messages · Josh B, Rowe, Brian Lee Yung (Portfolio Analytics), Marc Schwartz +3 more

#
Try this:

d1[,intersect(names(d1),names(d2))]

HTH, Brian

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Josh B
Sent: Friday, February 27, 2009 12:28 PM
To: R Help
Subject: [R] Filtering a dataset's columns by another dataset's column
names


Hello all,

I hope some of you can come to my rescue, yet again.

I have two genetic datasets, and I want one of the datasets to have only
the columns that are in common with the other dataset. 
Here is a toy example (my real datasets have hundreds of columns):

Dataset 1:

Individual    SNP1    SNP2    SNP3    SNP4    SNP5
1    A    G    T    C    A
2    T    C    A    G    T
3    A    C    T    C    A

Dataset 2:

Individual    SNP1    SNP3    SNP5    SNP6    SNP7
4    A    T    T    G    C
5    T    A    A    G    G
6    A    A    T    C    G

I want Dataset1 to have only columns that are also represented in
Dataset 2, i.e., I want to generate a new Dataset 3 that looks like
this:

Individual    SNP1    SNP3    SNP5
1    A    T    A
2    T    A    T
3    A    T    A

Does anyone know how I could do this? Keep in mind that this is not a
simple merge, as in the "merge" function.

Thanks very much for your help everyone.
Josh B.



      

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------------
This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. References to "Merrill Lynch" are references to any company in the Merrill Lynch & Co., Inc. group of companies, which are wholly-owned by Bank of America Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not Insured by Any Federal Government Agency. Attachments that are part of this E-communication may have additional important disclosures and disclaimers, which you should read. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing.
--------------------------------------------------------------------------
#
on 02/27/2009 11:27 AM Josh B wrote:
Same.Cols <- intersect(names(DF1), names(DF2))
[1] "Individual" "SNP1"       "SNP3"       "SNP5"
Individual SNP1 SNP3 SNP5
1          1    A    T    A
2          2    T    A    T
3          3    A    T    A
4          4    A    T    T
5          5    T    A    A
6          6    A    A    T


See ?intersect, which gives you the common column names, which you can
then use in rbind().

HTH,

Marc Schwartz
#
So you want the data that is in Dataset 1 but only the column names  
that are also in Dataset 2:

How about:

  subset(DS1, select = names(DS1) %in% names(DS2) )

 > DS1 <-read.table(textConnection("Individual    SNP1    SNP2     
SNP3    SNP4    SNP5
+ 1    A    G    T    C    A
+ 2    T    C    A    G    T
+ 3    A    C    T    C    A"),header=TRUE)
 > DS2 <-read.table(textConnection("Individual    SNP1    SNP3     
SNP5    SNP6    SNP7
+ 4    A    T    T    G    C
+ 5    T    A    A    G    G
+ 6    A    A    T    C    G"),header=TRUE)

 > subset(DS1, select= names(DS1) %in% names(DS2) )
   Individual SNP1 SNP3 SNP5
1          1    A    T    A
2          2    T    A    T
3          3    A    T    A

Tested!
#
Hi Josh B,

this looks like homework to me. Please obey the posting rules. I.e., provide
self-contained code/examples and show what the point is at which you are
stuck. 

To solve your problem, you need the "which" and the "names" function as well
as the %in%  operator. It is then easy to rbind the two datasets once you
have figured out what the common column names are. Please try on your own
first and report back if and where you are stuck along with the
self-contained code. If this is indeed homework, please ask your professor
or teacher.

Example for two simulated datasets:

x=rnorm(30)
dim(x)=c(5,6)
x=data.frame(x)
names(x)=c("a","b","c","x","y","z")

y=rnorm(30)
dim(y)=c(5,6)
y=data.frame(y)
names(y)=c("a","b","d","v","w","x")

Daniel


-------------------------
cuncta stricte discussurus
-------------------------

-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Josh B
Gesendet: Friday, February 27, 2009 12:28 PM
An: R Help
Betreff: [R] Filtering a dataset's columns by another dataset's column names

Hello all,

I hope some of you can come to my rescue, yet again.

I have two genetic datasets, and I want one of the datasets to have only the
columns that are in common with the other dataset. 
Here is a toy example (my real datasets have hundreds of columns):

Dataset 1:

Individual    SNP1    SNP2    SNP3    SNP4    SNP5
1    A    G    T    C    A
2    T    C    A    G    T
3    A    C    T    C    A

Dataset 2:

Individual    SNP1    SNP3    SNP5    SNP6    SNP7
4    A    T    T    G    C
5    T    A    A    G    G
6    A    A    T    C    G

I want Dataset1 to have only columns that are also represented in Dataset 2,
i.e., I want to generate a new Dataset 3 that looks like this:

Individual    SNP1    SNP3    SNP5
1    A    T    A
2    T    A    T
3    A    T    A

Does anyone know how I could do this? Keep in mind that this is not a simple
merge, as in the "merge" function.

Thanks very much for your help everyone.
Josh B.



      

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.