Message-ID: <1371042629.5707.YahooMailNeo@web142603.mail.bf1.yahoo.com>
Date: 2013-06-12T13:10:29Z
From: arun
Subject: Combining CSV data
In-Reply-To: <CALFKK3zXwO3L6vcTj+=FC+dOJkXLHknXuReCs14phb77fQ3Utg@mail.gmail.com>
HI Shreya,
#Looks like you run the two line code as a single line.
result3<-
data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)
colnames(result3)[5:7]<- paste0("DataComment",1:3)
?result3
#? Row_ID_CR???????????????? Data1??????? Data2??????? Data3????? DataComment1
#1???????? 1??????????????????? aa?????????? bb?????????? cc This is comment 1
#2???????? 2??????????????????? dd?????????? ee?????????? ff This is comment 1
#?????? DataComment2????? DataComment3
#1 This is comment 2 This is comment 3
#2????????????? <NA>????????????? <NA>
A.K.
________________________________
From: Shreya Rawal <rawal.shreya at gmail.com>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>; jim holtman <jholtman at gmail.com>
Sent: Wednesday, June 12, 2013 8:58 AM
Subject: Re: [R] Combining CSV data
Great, thanks Arun, but I seem to be running into this error. Not sure what did I miss.
> result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames(result)[5:7]<-paste0("DataComment",1:3)
Error: unexpected symbol in "result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames"
On Tue, Jun 11, 2013 at 5:09 PM, arun <smartpink111 at yahoo.com> wrote:
>
>
>HI,
>You could use:
>result3<- data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)
>colnames(result3)[5:7]<- paste0("DataComment",1:3)
>
>A.K.
>________________________________
>From: Shreya Rawal <rawal.shreya at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Tuesday, June 11, 2013 4:22 PM
>
>Subject: Re: [R] Combining CSV data
>
>
>
>Hey Arun,
>
>I guess you could guide me with this a little bit. I have been working on the solution Jim suggested (and also because that I could understand it with my little knowledge of R :))
>
>So with these commands I am able to get the data in this format:
>
>> fileA <- read.csv(text = "Row_ID_CR, ? Data1, ? ?Data2, ? ?Data3
>+ 1, ? ? ? ? ? ? ? ? ? aa, ? ? ? ? ?bb, ? ? ? ? ?cc
>+ 2, ? ? ? ? ? ? ? ? ? dd, ? ? ? ? ?ee, ? ? ? ? ?ff",?as.is?= TRUE)
>>?
>> fileB <- read.csv(text = "Row_ID_N, ? Src_Row_ID, ? DataN1
>+ 1a, ? ? ? ? ? ? ? 1, ? ? ? ? ? ? ? ? ? This is comment 1
>+ 2a, ? ? ? ? ? ? ? 1, ? ? ? ? ? ? ? ? ? This is comment 2
>+ 3a, ? ? ? ? ? ? ? 2, ? ? ? ? ? ? ? ? ? This is comment 1
>+ 4a, ? ? ? ? ? ? ? 1, ? ? ? ? ? ? ? ? ? This is comment 3",?as.is?= TRUE)
>>?
>> # get rid of leading/trailing blanks on comments
>> fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1)
>>?
>> # merge together
>> result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID")
>>?
>> # now partition by Row_ID_CR and aggregate the comments
>> result2 <- do.call(rbind,?
>+ ? ? lapply(split(result, result$Row_ID_CR), function(.grp){
>+ ? ? ? ? cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse = '|'))
>+ ? ? })
>+ )
>
>Row_ID_CR ? ? ? ? ? ? ? ? Data1 ? ? ? ?Data2 ? ? ? ?Data3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? comment
>1 ? ? ? ? 1 ? ? ? ? ? ? ? ? ? ?aa ? ? ? ? ? bb ? ? ? ? ? cc ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?This is comment 1| This is comment 2| This is comment 3
>2 ? ? ? ? 2 ? ? ? ? ? ? ? ? ? ?dd ? ? ? ? ? ee ? ? ? ? ? ff ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?This is comment 1| This is Comment 2
>
>I can even split the last column by this:?strsplit(as.character(result2$comment), split='\\|')
>
>[[1]]
>[1] "This is comment 1" "This is comment 2" " This is comment 3"
>
>[[2]]
>[1] "This is comment 1" "This is comment 2"
>
>
>but now I am not sure how to combine everything together. I guess by now you must have realized how new I am to R :)
>
>Thanks!!
>Shreya
>
>
>
>
>
>
>On Tue, Jun 11, 2013 at 1:02 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Hi,
>>If the dataset is like this with the comments in the order:
>>
>>dat2<-read.table(text="
>>Row_ID_N,? Src_Row_ID,? DataN1
>>1a,????????????? 1,????????????????? This is comment 1
>>2a,????????????? 1,????????????????? This is comment 2
>>3a,????????????? 2,????????????????? This is comment 1
>>4a,????????????? 1,????????????????? This is comment 3
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>dat3<-read.table(text="
>>Row_ID_N,? Src_Row_ID,? DataN1
>>1a,????????????? 1,????????????????? This is comment 1
>>2a,????????????? 1,????????????????? This is comment 2
>>3a,????????????? 2,????????????????? This is comment 1?? #
>>
>>4a,????????????? 1,????????????????? This is comment 3
>>5a,???????? 2,????????????????? This is comment 2? #
>>
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>
>>library(stringr)
>>library(plyr)
>>fun1<- function(data1,data2){
>>??? data2$DataN1<- str_trim(data2$DataN1)??
>>??????? res<- merge(data1,data2,by.x=1,by.y=2)
>>??? res1<- res[,-5]
>>??? res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>>??? Mx1<- max(sapply(res2[,5],length))
>>??? res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>>????????????????????????????????? c(x,rep(NA,Mx1-length(x)))
>>
>>????????????????????????????????? })),stringsAsFactors=FALSE)
>>??? colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>??? res3
>>??? }???
>>
>>?????
>>fun1(dat1,dat2)
>>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>>
>>#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1
>>#?????? DataComment2????? DataComment3
>>#1 This is comment 2 This is comment 3
>>#2????????????? <NA>????????????? <NA>
>>
>>?fun1(dat1,dat3)
>>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>>
>>#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1
>>?# ???? DataComment2????? DataComment3
>>#1 This is comment 2 This is comment 3
>>
>>#2 This is comment 2????????????? <NA>
>>
>>
>>Otherwise, you need to provide an example that matches the real dataset.
>>A.K.
>>
>>________________________________
>>From: Shreya Rawal <rawal.shreya at gmail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Cc: R help <r-help at r-project.org>
>>Sent: Tuesday, June 11, 2013 12:22 PM
>>
>>Subject: Re: [R] Combining CSV data
>>
>>
>>
>>Hi Arun,
>>
>>Thanks for your reply. Unfortunately the Comments are just text in the real data. There is no way to differentiate based on the value of the Comments column. I guess because of that reason I couldn't get your solution to work properly. Do you think I can try it for a more general case where we don't merger/split the comments based on the values?
>>
>>Thanks for your help, I appreciate! ??
>>
>>
>>
>>On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>HI,
>>>I am not sure about your DataN1 column.? If there is any identifier to differentiate the comments (in this case 1,2,3), then it will easier to place that in the correct column.
>>>? My previous solution is not helpful in situations like these:
>>>
>>>dat2<-read.table(text="
>>>Row_ID_N,? Src_Row_ID,? DataN1
>>>1a,????????????? 1,????????????????? This is comment 1
>>>2a,????????????? 1,????????????????? This is comment 2
>>>3a,????????????? 2,????????????????? This is comment 2
>>>4a,????????????? 1,????????????????? This is comment 3
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>dat3<-read.table(text="
>>>
>>>Row_ID_N,? Src_Row_ID,? DataN1
>>>1a,????????????? 1,????????????????? This is comment 1
>>>2a,????????????? 1,????????????????? This is comment 2
>>>3a,????????????? 2,????????????????? This is comment 3
>>>4a,????????????? 1,????????????????? This is comment 3
>>>5a,??? ??? ?2,????????????????? This is comment 2
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>
>>>
>>>library(stringr)
>>>library(plyr)
>>>fun1<- function(data1,data2){
>>>??? data2$DataN1<- str_trim(data2$DataN1)???
>>>??????? res<- merge(data1,data2,by.x=1,by.y=2)
>>>??? res1<- res[,-5]
>>>??? res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>>>??? Mx1<- max(sapply(res2[,5],length))
>>>??? res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>>>??? ??? ??? ??? ??? ??? ??? ??? ? indx<- as.numeric(gsub("[[:alpha:]]","",x))
>>>??? ??? ??? ??? ??? ??? ??? ??? ? x[match(seq(Mx1),indx)]
>>>??? ??? ??? ??? ??? ??? ??? ??? ? })),stringsAsFactors=FALSE)
>>>
>>>??? colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>>??? res3
>>>??? }??? ??? ??
>>>fun1(dat1,dat2)
>>>
>>>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>>>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>>>#2???????? 2?????????????????? dd?????????? ee?????????? ff????????????? <NA>
>>>
>>>#?????? DataComment2????? DataComment3
>>>#1 This is comment 2 This is comment 3
>>>#2 This is comment 2????????????? <NA>
>>>?fun1(dat1,dat3)
>>>
>>>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>>>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>>>#2???????? 2?????????????????? dd?????????? ee?????????? ff????????????? <NA>
>>>
>>>#?????? DataComment2????? DataComment3
>>>#1 This is comment 2 This is comment 3
>>>#2 This is comment 2 This is comment 3
>>>
>>>
>>>
>>>A.K.
>>>
>>>
>>>----- Original Message -----
>>>
>>>From: arun <smartpink111 at yahoo.com>
>>>To: Shreya Rawal <rawal.shreya at gmail.com>
>>>Cc: R help <r-help at r-project.org>
>>>Sent: Monday, June 10, 2013 6:41 PM
>>>Subject: Re: [R] Combining CSV data
>>>
>>>Hi,
>>>Try this:
>>>
>>>dat1<-read.table(text="
>>>Row_ID_CR,? Data1,??? Data2,??? Data3
>>>1,????????????????? aa,????????? bb,????????? cc
>>>2,????????????????? dd,????????? ee,????????? ff
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>
>>>dat2<-read.table(text="
>>>Row_ID_N,? Src_Row_ID,? DataN1
>>>1a,????????????? 1,????????????????? This is comment 1
>>>2a,????????????? 1,????????????????? This is comment 2
>>>3a,????????????? 2,????????????????? This is comment 1
>>>4a,????????????? 1,????????????????? This is comment 3
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>library(stringr)
>>>dat2$DataN1<-str_trim(dat2$DataN1)
>>>res<- merge(dat1,dat2,by.x=1,by.y=2)
>>>?res1<-res[,-5]
>>>library(plyr)
>>>?res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, DataN1=list(DataN1))
>>>?res2
>>>?# Row_ID_CR??????????????? Data1??????? Data2??????? Data3
>>>#1???????? 1?????????????????? aa?????????? bb?????????? cc
>>>#2???????? 2?????????????????? dd?????????? ee?????????? ff
>>>#?????????????????????????????????????????????????? DataN1
>>>#1 This is comment 1, This is comment 2, This is comment 3
>>>#2?????????????????????????????????????? This is comment 1
>>>
>>>
>>>
>>>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) {x[duplicated(x)]<-NA;x})))
>>>?colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>>res3
>>>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1
>>>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1
>>>#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1
>>>#?????? DataComment2????? DataComment3
>>>#1 This is comment 2 This is comment 3
>>>#2????????????? <NA>????????????? <NA>
>>>
>>>A.K.
>>>
>>>
>>>----- Original Message -----
>>>From: Shreya Rawal <rawal.shreya at gmail.com>
>>>To: r-help at r-project.org
>>>Cc:
>>>Sent: Monday, June 10, 2013 4:38 PM
>>>Subject: [R] Combining CSV data
>>>
>>>Hello R community,
>>>
>>>I am trying to combine two CSV files that look like this:
>>>
>>>File A
>>>
>>>Row_ID_CR,?? Data1,? ? Data2,? ? Data3
>>>1,? ? ? ? ? ? ? ? ?? aa,? ? ? ? ? bb,? ? ? ? ? cc
>>>2,? ? ? ? ? ? ? ? ?? dd,? ? ? ? ? ee,? ? ? ? ? ff
>>>
>>>
>>>File B
>>>
>>>Row_ID_N,?? Src_Row_ID,?? DataN1
>>>1a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 1
>>>2a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 2
>>>3a,? ? ? ? ? ? ?? 2,? ? ? ? ? ? ? ? ?? This is comment 1
>>>4a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 3
>>>
>>>And the output I am looking for is, comparing the values of Row_ID_CR and
>>>Src_Row_ID
>>>
>>>Output
>>>
>>>ROW_ID_CR,? ? Data1,? ? Data2,? ? Data3,? ? DataComment1,
>>>DataComment2,? ? ? ? ? DataComment3
>>>1,? ? ? ? ? ? ? ? ? ? ? aa,? ? ? ?? bb,? ? ? ?? cc,? ? ? ? This is
>>>comment1,? ? This is comment2,? ?? This is comment 3
>>>2,? ? ? ? ? ? ? ? ? ? ? dd,? ? ? ? ? ee,? ? ? ?? ff,? ? ? ? ? This is
>>>comment1
>>>
>>>
>>>I am a novice R user, I am able to replicate a left join but I need a bit
>>>more in the final result.
>>>
>>>
>>>Thanks!!
>>>
>>>??? [[alternative HTML version deleted]]
>>>
>>>______________________________________________
>>>R-help at r-project.org mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>