An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130610/996bfce2/attachment.pl>
Combining CSV data
7 messages · jim holtman, arun, Shreya Rawal
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130610/fadcd92b/attachment.pl>
Hi,
Try this:
dat1<-read.table(text="
Row_ID_CR,? Data1,??? Data2,??? Data3
1,????????????????? aa,????????? bb,????????? cc
2,????????????????? dd,????????? ee,????????? ff
",sep=",",header=TRUE,stringsAsFactors=FALSE)
dat2<-read.table(text="
Row_ID_N,? Src_Row_ID,? DataN1
1a,????????????? 1,????????????????? This is comment 1
2a,????????????? 1,????????????????? This is comment 2
3a,????????????? 2,????????????????? This is comment 1
4a,????????????? 1,????????????????? This is comment 3
",sep=",",header=TRUE,stringsAsFactors=FALSE)
library(stringr)
dat2$DataN1<-str_trim(dat2$DataN1)
res<- merge(dat1,dat2,by.x=1,by.y=2)
?res1<-res[,-5]
library(plyr)
?res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, DataN1=list(DataN1))
?res2
?# Row_ID_CR??????????????? Data1??????? Data2??????? Data3
#1???????? 1?????????????????? aa?????????? bb?????????? cc
#2???????? 2?????????????????? dd?????????? ee?????????? ff
#?????????????????????????????????????????????????? DataN1
#1 This is comment 1, This is comment 2, This is comment 3
#2?????????????????????????????????????? This is comment 1
res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) {x[duplicated(x)]<-NA;x})))
?colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
res3
#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1
#?????? DataComment2????? DataComment3
#1 This is comment 2 This is comment 3
#2????????????? <NA>????????????? <NA>
A.K.
----- Original Message -----
From: Shreya Rawal <rawal.shreya at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Monday, June 10, 2013 4:38 PM
Subject: [R] Combining CSV data
Hello R community,
I am trying to combine two CSV files that look like this:
File A
Row_ID_CR,? Data1,? ? Data2,? ? Data3
1,? ? ? ? ? ? ? ? ? aa,? ? ? ? ? bb,? ? ? ? ? cc
2,? ? ? ? ? ? ? ? ? dd,? ? ? ? ? ee,? ? ? ? ? ff
File B
Row_ID_N,? Src_Row_ID,? DataN1
1a,? ? ? ? ? ? ? 1,? ? ? ? ? ? ? ? ? This is comment 1
2a,? ? ? ? ? ? ? 1,? ? ? ? ? ? ? ? ? This is comment 2
3a,? ? ? ? ? ? ? 2,? ? ? ? ? ? ? ? ? This is comment 1
4a,? ? ? ? ? ? ? 1,? ? ? ? ? ? ? ? ? This is comment 3
And the output I am looking for is, comparing the values of Row_ID_CR and
Src_Row_ID
Output
ROW_ID_CR,? ? Data1,? ? Data2,? ? Data3,? ? DataComment1,
DataComment2,? ? ? ? ? DataComment3
1,? ? ? ? ? ? ? ? ? ? ? aa,? ? ? ? bb,? ? ? ? cc,? ? ? ? This is
comment1,? ? This is comment2,? ? This is comment 3
2,? ? ? ? ? ? ? ? ? ? ? dd,? ? ? ? ? ee,? ? ? ? ff,? ? ? ? ? This is
comment1
I am a novice R user, I am able to replicate a left join but I need a bit
more in the final result.
Thanks!!
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
HI,
I am not sure about your DataN1 column.? If there is any identifier to differentiate the comments (in this case 1,2,3), then it will easier to place that in the correct column.
? My previous solution is not helpful in situations like these:
dat2<-read.table(text="
Row_ID_N,? Src_Row_ID,? DataN1
1a,????????????? 1,????????????????? This is comment 1
2a,????????????? 1,????????????????? This is comment 2
3a,????????????? 2,????????????????? This is comment 2
4a,????????????? 1,????????????????? This is comment 3
",sep=",",header=TRUE,stringsAsFactors=FALSE)
dat3<-read.table(text="
Row_ID_N,? Src_Row_ID,? DataN1
1a,????????????? 1,????????????????? This is comment 1
2a,????????????? 1,????????????????? This is comment 2
3a,????????????? 2,????????????????? This is comment 3
4a,????????????? 1,????????????????? This is comment 3
5a,??? ??? ?2,????????????????? This is comment 2
",sep=",",header=TRUE,stringsAsFactors=FALSE)
library(stringr)
library(plyr)
fun1<- function(data1,data2){
??? data2$DataN1<- str_trim(data2$DataN1)???
??????? res<- merge(data1,data2,by.x=1,by.y=2)
??? res1<- res[,-5]
??? res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
??? Mx1<- max(sapply(res2[,5],length))
??? res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
??? ??? ??? ??? ??? ??? ??? ??? ? indx<- as.numeric(gsub("[[:alpha:]]","",x))
??? ??? ??? ??? ??? ??? ??? ??? ? x[match(seq(Mx1),indx)]
??? ??? ??? ??? ??? ??? ??? ??? ? })),stringsAsFactors=FALSE)
??? colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
??? res3
??? }??? ??? ??
fun1(dat1,dat2)
#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
#2???????? 2?????????????????? dd?????????? ee?????????? ff????????????? <NA>
#?????? DataComment2????? DataComment3
#1 This is comment 2 This is comment 3
#2 This is comment 2????????????? <NA>
?fun1(dat1,dat3)
#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
#2???????? 2?????????????????? dd?????????? ee?????????? ff????????????? <NA>
#?????? DataComment2????? DataComment3
#1 This is comment 2 This is comment 3
#2 This is comment 2 This is comment 3
A.K.
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Shreya Rawal <rawal.shreya at gmail.com>
Cc: R help <r-help at r-project.org>
Sent: Monday, June 10, 2013 6:41 PM
Subject: Re: [R] Combining CSV data
Hi,
Try this:
dat1<-read.table(text="
Row_ID_CR,? Data1,??? Data2,??? Data3
1,????????????????? aa,????????? bb,????????? cc
2,????????????????? dd,????????? ee,????????? ff
",sep=",",header=TRUE,stringsAsFactors=FALSE)
dat2<-read.table(text="
Row_ID_N,? Src_Row_ID,? DataN1
1a,????????????? 1,????????????????? This is comment 1
2a,????????????? 1,????????????????? This is comment 2
3a,????????????? 2,????????????????? This is comment 1
4a,????????????? 1,????????????????? This is comment 3
",sep=",",header=TRUE,stringsAsFactors=FALSE)
library(stringr)
dat2$DataN1<-str_trim(dat2$DataN1)
res<- merge(dat1,dat2,by.x=1,by.y=2)
?res1<-res[,-5]
library(plyr)
?res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, DataN1=list(DataN1))
?res2
?# Row_ID_CR??????????????? Data1??????? Data2??????? Data3
#1???????? 1?????????????????? aa?????????? bb?????????? cc
#2???????? 2?????????????????? dd?????????? ee?????????? ff
#?????????????????????????????????????????????????? DataN1
#1 This is comment 1, This is comment 2, This is comment 3
#2?????????????????????????????????????? This is comment 1
res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) {x[duplicated(x)]<-NA;x})))
?colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
res3
#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1
#?????? DataComment2????? DataComment3
#1 This is comment 2 This is comment 3
#2????????????? <NA>????????????? <NA>
A.K.
----- Original Message -----
From: Shreya Rawal <rawal.shreya at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Monday, June 10, 2013 4:38 PM
Subject: [R] Combining CSV data
Hello R community,
I am trying to combine two CSV files that look like this:
File A
Row_ID_CR,?? Data1,? ? Data2,? ? Data3
1,? ? ? ? ? ? ? ? ?? aa,? ? ? ? ? bb,? ? ? ? ? cc
2,? ? ? ? ? ? ? ? ?? dd,? ? ? ? ? ee,? ? ? ? ? ff
File B
Row_ID_N,?? Src_Row_ID,?? DataN1
1a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 1
2a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 2
3a,? ? ? ? ? ? ?? 2,? ? ? ? ? ? ? ? ?? This is comment 1
4a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 3
And the output I am looking for is, comparing the values of Row_ID_CR and
Src_Row_ID
Output
ROW_ID_CR,? ? Data1,? ? Data2,? ? Data3,? ? DataComment1,
DataComment2,? ? ? ? ? DataComment3
1,? ? ? ? ? ? ? ? ? ? ? aa,? ? ? ?? bb,? ? ? ?? cc,? ? ? ? This is
comment1,? ? This is comment2,? ?? This is comment 3
2,? ? ? ? ? ? ? ? ? ? ? dd,? ? ? ? ? ee,? ? ? ?? ff,? ? ? ? ? This is
comment1
I am a novice R user, I am able to replicate a left join but I need a bit
more in the final result.
Thanks!!
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130611/d3263588/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130611/c0737298/attachment.pl>
Hi,
If the dataset is like this with the comments in the order:
dat2<-read.table(text="
Row_ID_N,? Src_Row_ID,? DataN1
1a,????????????? 1,????????????????? This is comment 1
2a,????????????? 1,????????????????? This is comment 2
3a,????????????? 2,????????????????? This is comment 1
4a,????????????? 1,????????????????? This is comment 3
",sep=",",header=TRUE,stringsAsFactors=FALSE)
dat3<-read.table(text="
Row_ID_N,? Src_Row_ID,? DataN1
1a,????????????? 1,????????????????? This is comment 1
2a,????????????? 1,????????????????? This is comment 2
3a,????????????? 2,????????????????? This is comment 1?? #
4a,????????????? 1,????????????????? This is comment 3
5a,???????? 2,????????????????? This is comment 2? #
",sep=",",header=TRUE,stringsAsFactors=FALSE)
library(stringr)
library(plyr)
fun1<- function(data1,data2){
??? data2$DataN1<- str_trim(data2$DataN1)??
??????? res<- merge(data1,data2,by.x=1,by.y=2)
??? res1<- res[,-5]
??? res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
??? Mx1<- max(sapply(res2[,5],length))
??? res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
????????????????????????????????? c(x,rep(NA,Mx1-length(x)))
????????????????????????????????? })),stringsAsFactors=FALSE)
??? colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
??? res3
??? }???
?????
fun1(dat1,dat2)
#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1
#?????? DataComment2????? DataComment3
#1 This is comment 2 This is comment 3
#2????????????? <NA>????????????? <NA>
?fun1(dat1,dat3)
#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1
?# ???? DataComment2????? DataComment3
#1 This is comment 2 This is comment 3
#2 This is comment 2????????????? <NA>
Otherwise, you need to provide an example that matches the real dataset.
A.K.
From: Shreya Rawal <rawal.shreya at gmail.com>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Tuesday, June 11, 2013 12:22 PM
Subject: Re: [R] Combining CSV data
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Tuesday, June 11, 2013 12:22 PM
Subject: Re: [R] Combining CSV data
Hi Arun,
Thanks for your reply. Unfortunately the Comments are just text in the real data. There is no way to differentiate based on the value of the Comments column. I guess because of that reason I couldn't get your solution to work properly. Do you think I can try it for a more general case where we don't merger/split the comments based on the values?
Thanks for your help, I appreciate! ??
On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink111 at yahoo.com> wrote:
HI,
>I am not sure about your DataN1 column.? If there is any identifier to differentiate the comments (in this case 1,2,3), then it will easier to place that in the correct column.
>? My previous solution is not helpful in situations like these:
>
>dat2<-read.table(text="
>Row_ID_N,? Src_Row_ID,? DataN1
>1a,????????????? 1,????????????????? This is comment 1
>2a,????????????? 1,????????????????? This is comment 2
>3a,????????????? 2,????????????????? This is comment 2
>4a,????????????? 1,????????????????? This is comment 3
>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>dat3<-read.table(text="
>
>Row_ID_N,? Src_Row_ID,? DataN1
>1a,????????????? 1,????????????????? This is comment 1
>2a,????????????? 1,????????????????? This is comment 2
>3a,????????????? 2,????????????????? This is comment 3
>4a,????????????? 1,????????????????? This is comment 3
>5a,??? ??? ?2,????????????????? This is comment 2
>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>
>
>library(stringr)
>library(plyr)
>fun1<- function(data1,data2){
>??? data2$DataN1<- str_trim(data2$DataN1)???
>??????? res<- merge(data1,data2,by.x=1,by.y=2)
>??? res1<- res[,-5]
>??? res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>??? Mx1<- max(sapply(res2[,5],length))
>??? res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>??? ??? ??? ??? ??? ??? ??? ??? ? indx<- as.numeric(gsub("[[:alpha:]]","",x))
>??? ??? ??? ??? ??? ??? ??? ??? ? x[match(seq(Mx1),indx)]
>??? ??? ??? ??? ??? ??? ??? ??? ? })),stringsAsFactors=FALSE)
>
>??? colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>??? res3
>??? }??? ??? ??
>fun1(dat1,dat2)
>
>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>#2???????? 2?????????????????? dd?????????? ee?????????? ff????????????? <NA>
>
>#?????? DataComment2????? DataComment3
>#1 This is comment 2 This is comment 3
>#2 This is comment 2????????????? <NA>
>?fun1(dat1,dat3)
>
>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>#2???????? 2?????????????????? dd?????????? ee?????????? ff????????????? <NA>
>
>#?????? DataComment2????? DataComment3
>#1 This is comment 2 This is comment 3
>#2 This is comment 2 This is comment 3
>
>
>
>A.K.
>
>
>----- Original Message -----
>
>From: arun <smartpink111 at yahoo.com>
>To: Shreya Rawal <rawal.shreya at gmail.com>
>Cc: R help <r-help at r-project.org>
>Sent: Monday, June 10, 2013 6:41 PM
>Subject: Re: [R] Combining CSV data
>
>Hi,
>Try this:
>
>dat1<-read.table(text="
>Row_ID_CR,? Data1,??? Data2,??? Data3
>1,????????????????? aa,????????? bb,????????? cc
>2,????????????????? dd,????????? ee,????????? ff
>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>
>dat2<-read.table(text="
>Row_ID_N,? Src_Row_ID,? DataN1
>1a,????????????? 1,????????????????? This is comment 1
>2a,????????????? 1,????????????????? This is comment 2
>3a,????????????? 2,????????????????? This is comment 1
>4a,????????????? 1,????????????????? This is comment 3
>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>library(stringr)
>dat2$DataN1<-str_trim(dat2$DataN1)
>res<- merge(dat1,dat2,by.x=1,by.y=2)
>?res1<-res[,-5]
>library(plyr)
>?res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, DataN1=list(DataN1))
>?res2
>?# Row_ID_CR??????????????? Data1??????? Data2??????? Data3
>#1???????? 1?????????????????? aa?????????? bb?????????? cc
>#2???????? 2?????????????????? dd?????????? ee?????????? ff
>#?????????????????????????????????????????????????? DataN1
>#1 This is comment 1, This is comment 2, This is comment 3
>#2?????????????????????????????????????? This is comment 1
>
>
>
>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) {x[duplicated(x)]<-NA;x})))
>?colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>res3
>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1
>#?????? DataComment2????? DataComment3
>#1 This is comment 2 This is comment 3
>#2????????????? <NA>????????????? <NA>
>
>A.K.
>
>
>----- Original Message -----
>From: Shreya Rawal <rawal.shreya at gmail.com>
>To: r-help at r-project.org
>Cc:
>Sent: Monday, June 10, 2013 4:38 PM
>Subject: [R] Combining CSV data
>
>Hello R community,
>
>I am trying to combine two CSV files that look like this:
>
>File A
>
>Row_ID_CR,?? Data1,? ? Data2,? ? Data3
>1,? ? ? ? ? ? ? ? ?? aa,? ? ? ? ? bb,? ? ? ? ? cc
>2,? ? ? ? ? ? ? ? ?? dd,? ? ? ? ? ee,? ? ? ? ? ff
>
>
>File B
>
>Row_ID_N,?? Src_Row_ID,?? DataN1
>1a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 1
>2a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 2
>3a,? ? ? ? ? ? ?? 2,? ? ? ? ? ? ? ? ?? This is comment 1
>4a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 3
>
>And the output I am looking for is, comparing the values of Row_ID_CR and
>Src_Row_ID
>
>Output
>
>ROW_ID_CR,? ? Data1,? ? Data2,? ? Data3,? ? DataComment1,
>DataComment2,? ? ? ? ? DataComment3
>1,? ? ? ? ? ? ? ? ? ? ? aa,? ? ? ?? bb,? ? ? ?? cc,? ? ? ? This is
>comment1,? ? This is comment2,? ?? This is comment 3
>2,? ? ? ? ? ? ? ? ? ? ? dd,? ? ? ? ? ee,? ? ? ?? ff,? ? ? ? ? This is
>comment1
>
>
>I am a novice R user, I am able to replicate a left join but I need a bit
>more in the final result.
>
>
>Thanks!!
>
>??? [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>