HI,
You could use:
result3<- data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)
colnames(result3)[5:7]<- paste0("DataComment",1:3)
A.K.
From: Shreya Rawal <rawal.shreya at gmail.com>
To: arun <smartpink111 at yahoo.com>
Sent: Tuesday, June 11, 2013 4:22 PM
Subject: Re: [R] Combining CSV data
To: arun <smartpink111 at yahoo.com>
Sent: Tuesday, June 11, 2013 4:22 PM
Subject: Re: [R] Combining CSV data
Hey Arun,
I guess you could guide me with this a little bit. I have been working on the solution Jim suggested (and also because that I could understand it with my little knowledge of R :))
So with these commands I am able to get the data in this format:
> fileA <- read.csv(text = "Row_ID_CR, ? Data1, ? ?Data2, ? ?Data3
+ 1, ? ? ? ? ? ? ? ? ? aa, ? ? ? ? ?bb, ? ? ? ? ?cc
+ 2, ? ? ? ? ? ? ? ? ? dd, ? ? ? ? ?ee, ? ? ? ? ?ff",?as.is?= TRUE)
>?
> fileB <- read.csv(text = "Row_ID_N, ? Src_Row_ID, ? DataN1
+ 1a, ? ? ? ? ? ? ? 1, ? ? ? ? ? ? ? ? ? This is comment 1
+ 2a, ? ? ? ? ? ? ? 1, ? ? ? ? ? ? ? ? ? This is comment 2
+ 3a, ? ? ? ? ? ? ? 2, ? ? ? ? ? ? ? ? ? This is comment 1
+ 4a, ? ? ? ? ? ? ? 1, ? ? ? ? ? ? ? ? ? This is comment 3",?as.is?= TRUE)
>?
> # get rid of leading/trailing blanks on comments
> fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1)
>?
> # merge together
> result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID")
>?
> # now partition by Row_ID_CR and aggregate the comments
> result2 <- do.call(rbind,?
+ ? ? lapply(split(result, result$Row_ID_CR), function(.grp){
+ ? ? ? ? cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse = '|'))
+ ? ? })
+ )
Row_ID_CR ? ? ? ? ? ? ? ? Data1 ? ? ? ?Data2 ? ? ? ?Data3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? comment
1 ? ? ? ? 1 ? ? ? ? ? ? ? ? ? ?aa ? ? ? ? ? bb ? ? ? ? ? cc ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?This is comment 1| This is comment 2| This is comment 3
2 ? ? ? ? 2 ? ? ? ? ? ? ? ? ? ?dd ? ? ? ? ? ee ? ? ? ? ? ff ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?This is comment 1| This is Comment 2
I can even split the last column by this:?strsplit(as.character(result2$comment), split='\\|')
[[1]]
[1] "This is comment 1" "This is comment 2" " This is comment 3"
[[2]]
[1] "This is comment 1" "This is comment 2"
but now I am not sure how to combine everything together. I guess by now you must have realized how new I am to R :)
Thanks!!
Shreya
On Tue, Jun 11, 2013 at 1:02 PM, arun <smartpink111 at yahoo.com> wrote:
Hi,
>If the dataset is like this with the comments in the order:
>
>dat2<-read.table(text="
>Row_ID_N,? Src_Row_ID,? DataN1
>1a,????????????? 1,????????????????? This is comment 1
>2a,????????????? 1,????????????????? This is comment 2
>3a,????????????? 2,????????????????? This is comment 1
>4a,????????????? 1,????????????????? This is comment 3
>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>
>dat3<-read.table(text="
>Row_ID_N,? Src_Row_ID,? DataN1
>1a,????????????? 1,????????????????? This is comment 1
>2a,????????????? 1,????????????????? This is comment 2
>3a,????????????? 2,????????????????? This is comment 1?? #
>
>4a,????????????? 1,????????????????? This is comment 3
>5a,???????? 2,????????????????? This is comment 2? #
>
>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>
>
>library(stringr)
>library(plyr)
>fun1<- function(data1,data2){
>??? data2$DataN1<- str_trim(data2$DataN1)??
>??????? res<- merge(data1,data2,by.x=1,by.y=2)
>??? res1<- res[,-5]
>??? res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>??? Mx1<- max(sapply(res2[,5],length))
>??? res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>????????????????????????????????? c(x,rep(NA,Mx1-length(x)))
>
>????????????????????????????????? })),stringsAsFactors=FALSE)
>??? colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>??? res3
>??? }???
>
>?????
>fun1(dat1,dat2)
>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>
>#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1
>#?????? DataComment2????? DataComment3
>#1 This is comment 2 This is comment 3
>#2????????????? <NA>????????????? <NA>
>
>?fun1(dat1,dat3)
>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>
>#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1
>?# ???? DataComment2????? DataComment3
>#1 This is comment 2 This is comment 3
>
>#2 This is comment 2????????????? <NA>
>
>
>Otherwise, you need to provide an example that matches the real dataset.
>A.K.
>
>________________________________
>From: Shreya Rawal <rawal.shreya at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Cc: R help <r-help at r-project.org>
>Sent: Tuesday, June 11, 2013 12:22 PM
>
>Subject: Re: [R] Combining CSV data
>
>
>
>Hi Arun,
>
>Thanks for your reply. Unfortunately the Comments are just text in the real data. There is no way to differentiate based on the value of the Comments column. I guess because of that reason I couldn't get your solution to work properly. Do you think I can try it for a more general case where we don't merger/split the comments based on the values?
>
>Thanks for your help, I appreciate! ??
>
>
>
>On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink111 at yahoo.com> wrote:
>
>HI,
>>I am not sure about your DataN1 column.? If there is any identifier to differentiate the comments (in this case 1,2,3), then it will easier to place that in the correct column.
>>? My previous solution is not helpful in situations like these:
>>
>>dat2<-read.table(text="
>>Row_ID_N,? Src_Row_ID,? DataN1
>>1a,????????????? 1,????????????????? This is comment 1
>>2a,????????????? 1,????????????????? This is comment 2
>>3a,????????????? 2,????????????????? This is comment 2
>>4a,????????????? 1,????????????????? This is comment 3
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>dat3<-read.table(text="
>>
>>Row_ID_N,? Src_Row_ID,? DataN1
>>1a,????????????? 1,????????????????? This is comment 1
>>2a,????????????? 1,????????????????? This is comment 2
>>3a,????????????? 2,????????????????? This is comment 3
>>4a,????????????? 1,????????????????? This is comment 3
>>5a,??? ??? ?2,????????????????? This is comment 2
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>
>>library(stringr)
>>library(plyr)
>>fun1<- function(data1,data2){
>>??? data2$DataN1<- str_trim(data2$DataN1)???
>>??????? res<- merge(data1,data2,by.x=1,by.y=2)
>>??? res1<- res[,-5]
>>??? res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>>??? Mx1<- max(sapply(res2[,5],length))
>>??? res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>>??? ??? ??? ??? ??? ??? ??? ??? ? indx<- as.numeric(gsub("[[:alpha:]]","",x))
>>??? ??? ??? ??? ??? ??? ??? ??? ? x[match(seq(Mx1),indx)]
>>??? ??? ??? ??? ??? ??? ??? ??? ? })),stringsAsFactors=FALSE)
>>
>>??? colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>??? res3
>>??? }??? ??? ??
>>fun1(dat1,dat2)
>>
>>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>>#2???????? 2?????????????????? dd?????????? ee?????????? ff????????????? <NA>
>>
>>#?????? DataComment2????? DataComment3
>>#1 This is comment 2 This is comment 3
>>#2 This is comment 2????????????? <NA>
>>?fun1(dat1,dat3)
>>
>>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>>#2???????? 2?????????????????? dd?????????? ee?????????? ff????????????? <NA>
>>
>>#?????? DataComment2????? DataComment3
>>#1 This is comment 2 This is comment 3
>>#2 This is comment 2 This is comment 3
>>
>>
>>
>>A.K.
>>
>>
>>----- Original Message -----
>>
>>From: arun <smartpink111 at yahoo.com>
>>To: Shreya Rawal <rawal.shreya at gmail.com>
>>Cc: R help <r-help at r-project.org>
>>Sent: Monday, June 10, 2013 6:41 PM
>>Subject: Re: [R] Combining CSV data
>>
>>Hi,
>>Try this:
>>
>>dat1<-read.table(text="
>>Row_ID_CR,? Data1,??? Data2,??? Data3
>>1,????????????????? aa,????????? bb,????????? cc
>>2,????????????????? dd,????????? ee,????????? ff
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>dat2<-read.table(text="
>>Row_ID_N,? Src_Row_ID,? DataN1
>>1a,????????????? 1,????????????????? This is comment 1
>>2a,????????????? 1,????????????????? This is comment 2
>>3a,????????????? 2,????????????????? This is comment 1
>>4a,????????????? 1,????????????????? This is comment 3
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>library(stringr)
>>dat2$DataN1<-str_trim(dat2$DataN1)
>>res<- merge(dat1,dat2,by.x=1,by.y=2)
>>?res1<-res[,-5]
>>library(plyr)
>>?res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, DataN1=list(DataN1))
>>?res2
>>?# Row_ID_CR??????????????? Data1??????? Data2??????? Data3
>>#1???????? 1?????????????????? aa?????????? bb?????????? cc
>>#2???????? 2?????????????????? dd?????????? ee?????????? ff
>>#?????????????????????????????????????????????????? DataN1
>>#1 This is comment 1, This is comment 2, This is comment 3
>>#2?????????????????????????????????????? This is comment 1
>>
>>
>>
>>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) {x[duplicated(x)]<-NA;x})))
>>?colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>res3
>>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>>#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1
>>#?????? DataComment2????? DataComment3
>>#1 This is comment 2 This is comment 3
>>#2????????????? <NA>????????????? <NA>
>>
>>A.K.
>>
>>
>>----- Original Message -----
>>From: Shreya Rawal <rawal.shreya at gmail.com>
>>To: r-help at r-project.org
>>Cc:
>>Sent: Monday, June 10, 2013 4:38 PM
>>Subject: [R] Combining CSV data
>>
>>Hello R community,
>>
>>I am trying to combine two CSV files that look like this:
>>
>>File A
>>
>>Row_ID_CR,?? Data1,? ? Data2,? ? Data3
>>1,? ? ? ? ? ? ? ? ?? aa,? ? ? ? ? bb,? ? ? ? ? cc
>>2,? ? ? ? ? ? ? ? ?? dd,? ? ? ? ? ee,? ? ? ? ? ff
>>
>>
>>File B
>>
>>Row_ID_N,?? Src_Row_ID,?? DataN1
>>1a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 1
>>2a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 2
>>3a,? ? ? ? ? ? ?? 2,? ? ? ? ? ? ? ? ?? This is comment 1
>>4a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 3
>>
>>And the output I am looking for is, comparing the values of Row_ID_CR and
>>Src_Row_ID
>>
>>Output
>>
>>ROW_ID_CR,? ? Data1,? ? Data2,? ? Data3,? ? DataComment1,
>>DataComment2,? ? ? ? ? DataComment3
>>1,? ? ? ? ? ? ? ? ? ? ? aa,? ? ? ?? bb,? ? ? ?? cc,? ? ? ? This is
>>comment1,? ? This is comment2,? ?? This is comment 3
>>2,? ? ? ? ? ? ? ? ? ? ? dd,? ? ? ? ? ee,? ? ? ?? ff,? ? ? ? ? This is
>>comment1
>>
>>
>>I am a novice R user, I am able to replicate a left join but I need a bit
>>more in the final result.
>>
>>
>>Thanks!!
>>
>>??? [[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>
>