Hi, I have 4 files with 10000 individuals in each file and 10 columns each. One of the columns, say C1, may have elements in common with the other columns C1 of other files. If I have only 2 files, I can do this check with the command: data1[data1 %in% data2] data2[data2 %in% data1] How do I check which common elements in the columns of C1 4 files? Thanks, --------------------------------------------- Silvano Cesar da Costa Universidade Estadual de Londrina Centro de Ci?ncias Exatas Departamento de Estat?stica Fone: (43) 3371-4346
Common elements in columns
5 messages · Silvano, arun, jim holtman
Hi, May be this might help: set.seed(1) df1<-data.frame(C1=sample(LETTERS[1:25],20,replace=FALSE),value=sample(50,20,replace=FALSE)) set.seed(15) df2<-data.frame(C1=sample(LETTERS[1:25],15,replace=FALSE),C2=1:15) set.seed(3) df3<-data.frame(C1=sample(LETTERS[1:10],10,replace=FALSE),B1=rnorm(10,3)) set.seed(5) df4<-data.frame(C1=sample(LETTERS[1:15],10,replace=FALSE),A2=rnorm(10,15)) df1$C1[df1$C1%in%df2$C1[df2$C1%in%df3$C1[df3$C1%in%df4$C1]]] #[1] G E H J A.K. ----- Original Message ----- From: Silvano Cesar da Costa <silvano at uel.br> To: r-help at r-project.org Cc: Sent: Sunday, September 2, 2012 7:05 PM Subject: [R] Common elements in columns Hi, I have 4 files with 10000 individuals in each file and 10 columns each. One of the columns, say C1, may have elements in common with the other columns C1 of other files. If I have only 2 files, I can do this check with the command: data1[data1 %in% data2] data2[data2 %in% data1] How do I check which common elements in the columns of C1 4 files? Thanks, --------------------------------------------- Silvano Cesar da Costa Universidade Estadual de Londrina Centro de Ci?ncias Exatas Departamento de Estat?stica Fone: (43) 3371-4346 ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Arun, it's exactly what I wanted. Thanks a lot,
Hi, May be this might help: set.seed(1) df1<-data.frame(C1=sample(LETTERS[1:25],20,replace=FALSE),value=sample(50,20,replace=FALSE)) set.seed(15) df2<-data.frame(C1=sample(LETTERS[1:25],15,replace=FALSE),C2=1:15) set.seed(3) df3<-data.frame(C1=sample(LETTERS[1:10],10,replace=FALSE),B1=rnorm(10,3)) set.seed(5) df4<-data.frame(C1=sample(LETTERS[1:15],10,replace=FALSE),A2=rnorm(10,15)) df1$C1[df1$C1%in%df2$C1[df2$C1%in%df3$C1[df3$C1%in%df4$C1]]] #[1] G E H J A.K. ----- Original Message ----- From: Silvano Cesar da Costa <silvano at uel.br> To: r-help at r-project.org Cc: Sent: Sunday, September 2, 2012 7:05 PM Subject: [R] Common elements in columns Hi, I have 4 files with 10000 individuals in each file and 10 columns each. One of the columns, say C1, may have elements in common with the other columns C1 of other files. If I have only 2 files, I can do this check with the command: data1[data1 %in% data2] data2[data2 %in% data1] How do I check which common elements in the columns of C1 4 files? Thanks, --------------------------------------------- Silvano Cesar da Costa Universidade Estadual de Londrina Centro de Ci?ncias Exatas Departamento de Estat?stica Fone: (43) 3371-4346
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
--------------------------------------------- Silvano Cesar da Costa Universidade Estadual de Londrina Centro de Ci?ncias Exatas Departamento de Estat?stica Fone: (43) 3371-4346
Another way of solving the problem:
set.seed(1) df1<-data.frame(C1=sample(LETTERS[1:25],20,replace=FALSE),value=sample(50,20,replace=FALSE)) set.seed(15) df2<-data.frame(C1=sample(LETTERS[1:25],15,replace=FALSE),C2=1:15) set.seed(3) df3<-data.frame(C1=sample(LETTERS[1:10],10,replace=FALSE),B1=rnorm(10,3)) set.seed(5) df4<-data.frame(C1=sample(LETTERS[1:15],10,replace=FALSE),A2=rnorm(10,15)) x <- list(df1$C1, df2$C1, df3$C1, df4$C1) Reduce(intersect, x)
[1] "G" "E" "H" "J"
On Mon, Sep 3, 2012 at 7:01 AM, Silvano Cesar da Costa <silvano at uel.br> wrote:
Hi Arun, it's exactly what I wanted. Thanks a lot,
Hi, May be this might help: set.seed(1) df1<-data.frame(C1=sample(LETTERS[1:25],20,replace=FALSE),value=sample(50,20,replace=FALSE)) set.seed(15) df2<-data.frame(C1=sample(LETTERS[1:25],15,replace=FALSE),C2=1:15) set.seed(3) df3<-data.frame(C1=sample(LETTERS[1:10],10,replace=FALSE),B1=rnorm(10,3)) set.seed(5) df4<-data.frame(C1=sample(LETTERS[1:15],10,replace=FALSE),A2=rnorm(10,15)) df1$C1[df1$C1%in%df2$C1[df2$C1%in%df3$C1[df3$C1%in%df4$C1]]] #[1] G E H J A.K. ----- Original Message ----- From: Silvano Cesar da Costa <silvano at uel.br> To: r-help at r-project.org Cc: Sent: Sunday, September 2, 2012 7:05 PM Subject: [R] Common elements in columns Hi, I have 4 files with 10000 individuals in each file and 10 columns each. One of the columns, say C1, may have elements in common with the other columns C1 of other files. If I have only 2 files, I can do this check with the command: data1[data1 %in% data2] data2[data2 %in% data1] How do I check which common elements in the columns of C1 4 files? Thanks, --------------------------------------------- Silvano Cesar da Costa Universidade Estadual de Londrina Centro de Ci?ncias Exatas Departamento de Estat?stica Fone: (43) 3371-4346
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
--------------------------------------------- Silvano Cesar da Costa Universidade Estadual de Londrina Centro de Ci?ncias Exatas Departamento de Estat?stica Fone: (43) 3371-4346
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
Hi Jim. It was a very elegant way of solving the problem. Thank you,
Another way of solving the problem:
set.seed(1) df1<-data.frame(C1=sample(LETTERS[1:25],20,replace=FALSE),value=sample(50,20,replace=FALSE)) set.seed(15) df2<-data.frame(C1=sample(LETTERS[1:25],15,replace=FALSE),C2=1:15) set.seed(3) df3<-data.frame(C1=sample(LETTERS[1:10],10,replace=FALSE),B1=rnorm(10,3)) set.seed(5) df4<-data.frame(C1=sample(LETTERS[1:15],10,replace=FALSE),A2=rnorm(10,15)) x <- list(df1$C1, df2$C1, df3$C1, df4$C1) Reduce(intersect, x)
[1] "G" "E" "H" "J"
On Mon, Sep 3, 2012 at 7:01 AM, Silvano Cesar da Costa <silvano at uel.br> wrote:
Hi Arun, it's exactly what I wanted. Thanks a lot,
Hi, May be this might help: set.seed(1) df1<-data.frame(C1=sample(LETTERS[1:25],20,replace=FALSE),value=sample(50,20,replace=FALSE)) set.seed(15) df2<-data.frame(C1=sample(LETTERS[1:25],15,replace=FALSE),C2=1:15) set.seed(3) df3<-data.frame(C1=sample(LETTERS[1:10],10,replace=FALSE),B1=rnorm(10,3)) set.seed(5) df4<-data.frame(C1=sample(LETTERS[1:15],10,replace=FALSE),A2=rnorm(10,15)) df1$C1[df1$C1%in%df2$C1[df2$C1%in%df3$C1[df3$C1%in%df4$C1]]] #[1] G E H J A.K. ----- Original Message ----- From: Silvano Cesar da Costa <silvano at uel.br> To: r-help at r-project.org Cc: Sent: Sunday, September 2, 2012 7:05 PM Subject: [R] Common elements in columns Hi, I have 4 files with 10000 individuals in each file and 10 columns each. One of the columns, say C1, may have elements in common with the other columns C1 of other files. If I have only 2 files, I can do this check with the command: data1[data1 %in% data2] data2[data2 %in% data1] How do I check which common elements in the columns of C1 4 files? Thanks, --------------------------------------------- Silvano Cesar da Costa Universidade Estadual de Londrina Centro de Ci?ncias Exatas Departamento de Estat?stica Fone: (43) 3371-4346
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
--------------------------------------------- Silvano Cesar da Costa Universidade Estadual de Londrina Centro de Ci?ncias Exatas Departamento de Estat?stica Fone: (43) 3371-4346
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
--------------------------------------------- Silvano Cesar da Costa Universidade Estadual de Londrina Centro de Ci?ncias Exatas Departamento de Estat?stica Fone: (43) 3371-4346