create unique ID for each group
Hi,
Try this:
dat1<- read.table(text="
ObsNumber???? ID????????? Weight
???? 1???????????????? 0001???????? 12
???? 2???????????????? 0001????????? 13
???? 3???????????????? 0001?????????? 14
???? 4????????????????? 0002???????? 16
????? 5???????????????? 0002???????? 17
???? 6?????????????????? N/A????????? 18??
???? 7?????????????????? 0003????????? 19
???? 8?????????????????? N/A?????????? 20
???? 9?????????????????? 0003????????? 21
",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
dat2<- read.table(text="
ID?????????????? Height
0001??????????? 3.2
0001???????????? 2.6
0001???????????? 3.2
0002???????????? 2.2
0002????????????? 2.6
",sep="",header=TRUE,colClass=c("character","numeric"))
dat1[!is.na(dat1$ID),"UniqueID"]<-unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
library(plyr)
join(dat1,dat2,by="UniqueID",type="left")
?# ObsNumber?? ID Weight UniqueID?? ID Height
#1???????? 1 0001???? 12?? 0001_1 0001??? 3.2
#2???????? 2 0001???? 13?? 0001_2 0001??? 2.6
#3???????? 3 0001???? 14?? 0001_3 0001??? 3.2
#4???????? 4 0002???? 16?? 0002_1 0002??? 2.2
#5???????? 5 0002???? 17?? 0002_2 0002??? 2.6
#6???????? 6 <NA>???? 18???? <NA> <NA>???? NA
#7???????? 7 0003???? 19?? 0003_1 <NA>???? NA
#8???????? 8 <NA>???? 20???? <NA> <NA>???? NA
#9???????? 9 0003???? 21?? 0003_2 <NA>???? NA
A.K.
From: Ye Lin <yelin at lbl.gov>
To: arun <smartpink111 at yahoo.com>
Sent: Tuesday, May 7, 2013 4:05 PM
Subject: Re: [R] create unique ID for each group
To: arun <smartpink111 at yahoo.com>
Sent: Tuesday, May 7, 2013 4:05 PM
Subject: Re: [R] create unique ID for each group
Yes, I need to keep the N/A records.
On Tue, May 7, 2013 at 1:00 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Hi
>Do you need an output like this?
>?merge(dat1,dat2,by="UniqueID",all.x=TRUE)
>? UniqueID ObsNumber ID.x Weight ID.y Height
>1?? 0001_1???????? 1 0001???? 12 0001??? 3.2
>2?? 0001_2???????? 2 0001???? 13 0001??? 2.6
>3?? 0001_3???????? 3 0001???? 14 0001??? 3.2
>4?? 0002_1???????? 4 0002???? 16 0002??? 2.2
>5?? 0002_2???????? 5 0002???? 17 0002??? 2.6
>6???? <NA>???????? 6 <NA>???? 18 <NA>???? NA
>
>when you use:
>
>
>?dat1
>? ObsNumber?? ID Weight UniqueID
>1???????? 1 0001???? 12?? 0001_1
>2???????? 2 0001???? 13?? 0001_2
>3???????? 3 0001???? 14?? 0001_3
>4???????? 4 0002???? 16?? 0002_1
>5???????? 5 0002???? 17?? 0002_2
>6???????? 6 <NA>???? 18???? <NA>
>
>?dat2
>??? ID Height UniqueID
>1 0001??? 3.2?? 0001_1
>2 0001??? 2.6?? 0001_2
>3 0001??? 3.2?? 0001_3
>4 0002??? 2.2?? 0002_1
>5 0002??? 2.6?? 0002_2
>
>
>
>
>________________________________
>From: Ye Lin <yelin at lbl.gov>
>To: arun <smartpink111 at yahoo.com>
>Sent: Tuesday, May 7, 2013 3:41 PM
>
>Subject: Re: [R] create unique ID for each group
>
>
>
>If the ID="N/A" then when merge, there would be any match and can return N/A
>
>I use merge(dat1, dat2, by="UniqueID", all.x=TRUE),then an extra row will be added to the output for each case in?dat1?that has no matching cases in?dat2
>
>I just have to leave the records in dat1 even ID=N/A
>
>
>
>
>On Tue, May 7, 2013 at 12:38 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Also, another problem might be where do you assign those rows with missing ID.? It could be the missing value for any ID.
>>For example in this case:
>>
>>dat1<- read.table(text="
>>?ObsNumber???? ID????????? Weight
>>????? 1???????????????? 0001???????? 12
>>????? 2???????????????? 0001????????? 13
>>????? 3???????????????? 0001?????????? 14
>>????? 4????????????????? 0002???????? 16
>>?????? 5???????????????? 0002???????? 17
>>????? 6?????????????????? N/A????????? 18??
>>???? 7?????????????????? 0003???????? 19
>>???? 8?????????????????? 0003????????? 20
>>
>>?",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
>>?dat1
>>? ObsNumber?? ID Weight
>>1???????? 1 0001???? 12
>>2???????? 2 0001???? 13
>>3???????? 3 0001???? 14
>>4???????? 4 0002???? 16
>>5???????? 5 0002???? 17
>>6???????? 6 <NA>???? 18
>>7???????? 7 0003???? 19
>>8???????? 8 0003???? 20
>>
>>
>>?The missing ID could be either "0002" or "0003".?
>>
>>
>>
>>
>>
>>
>>----- Original Message -----
>>From: arun <smartpink111 at yahoo.com>
>>To: Ye Lin <yelin at lbl.gov>
>>Cc:
>>
>>Sent: Tuesday, May 7, 2013 3:32 PM
>>Subject: Re: [R] create unique ID for each group
>>
>>If you modify with na.strings="N/A", IDs with missing values will be read correctly.? Otherwise, it is just a character string.? BTW, if you need rows with NAs, then what will be your UniqueIDs you expect for those rows?
>>
>>
>>
>>
>>
>>
>>
>>________________________________
>>From: Ye Lin <yelin at lbl.gov>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Tuesday, May 7, 2013 3:25 PM
>>Subject: Re: [R] create unique ID for each group
>>
>>
>>
>>I do need rows with "NA". I already read the data in R, so do I assume I need to modify dat1 first with na.strings="N/A" ?
>>
>>
>>
>>On Tue, May 7, 2013 at 12:16 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>Hi,
>>>Do you need that row with "N/A".? The code I sent will remove that row.? If you don't use "na.strings="N/A", then it is not read NA, but some other character.? That is the reason you got results like:
>>>
>>>? unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>#[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2" "N/A_1"
>>>
>>>
>>>
>>>----- Original Message -----
>>>From: arun <smartpink111 at yahoo.com>
>>>To: Ye Lin <yelin at lbl.gov>
>>>Cc: R help <r-help at r-project.org>
>>>
>>>Sent: Tuesday, May 7, 2013 3:13 PM
>>>Subject: Re: [R] create unique ID for each group
>>>
>>>HI Ye,
>>>
>>>For the NA in ID column,
>>>
>>>
>>>
>>>Hi
>>>dat1<- read.table(text="
>>>ObsNumber???? ID????????? Weight
>>>???? 1???????????????? 0001???????? 12
>>>???? 2???????????????? 0001????????? 13
>>>???? 3???????????????? 0001?????????? 14
>>>???? 4????????????????? 0002???????? 16
>>>????? 5???????????????? 0002???????? 17
>>>???? 6?????????????????? N/A????????? 18??
>>>",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
>>>?unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>#[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2"
>>>A.K.
>>>________________________________
>>>From: Ye Lin <yelin at lbl.gov>
>>>To: arun <smartpink111 at yahoo.com>
>>>Cc: R help <r-help at r-project.org>
>>>Sent: Tuesday, May 7, 2013 2:54 PM
>>>Subject: Re: [R] create unique ID for each group
>>>
>>>
>>>
>>>Thanks A.K. But I have "NA" in ID column, so when I apply the code, it gives me error saying the replacement as less rows than the data has. Anyway for ID=N/A, return sth like "N/A_1" in order as well?
>>>
>>>
>>>
>>>
>>>
>>>
>>>On Tue, May 7, 2013 at 11:17 AM, arun <smartpink111 at yahoo.com> wrote:
>>>
>>>H,
>>>>Sorry, a mistake:
>>>>dat1$UniqueID<-unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>>dat1
>>>>?# ObsNumber?? ID Weight UniqueID
>>>>#1???????? 1 0001???? 12?? 0001_1
>>>>#2???????? 2 0001???? 13?? 0001_2
>>>>#3???????? 3 0001???? 14?? 0001_3
>>>>#4???????? 4 0002???? 16?? 0002_1
>>>>#5???????? 5 0002???? 17?? 0002_2
>>>>
>>>>dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>>
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>----- Original Message -----
>>>>
>>>>From: arun <smartpink111 at yahoo.com>
>>>>To: Ye Lin <yelin at lbl.gov>
>>>>Cc: R help <r-help at r-project.org>
>>>>Sent: Tuesday, May 7, 2013 2:10 PM
>>>>Subject: Re: [R] create unique ID for each group
>>>>
>>>>
>>>>
>>>>Hi,
>>>>
>>>>Try this:
>>>>dat1<- read.table(text="
>>>>ObsNumber???? ID????????? Weight
>>>>???? 1???????????????? 0001???????? 12
>>>>???? 2???????????????? 0001????????? 13
>>>>???? 3???????????????? 0001?????????? 14
>>>>???? 4????????????????? 0002???????? 16
>>>>????? 5???????????????? 0002???????? 17
>>>>",sep="",header=TRUE,colClass=c("numeric","character","numeric"))
>>>>dat2<- read.table(text="
>>>>ID?????????????? Height
>>>>0001??????????? 3.2
>>>>0001???????????? 2.6
>>>>0001???????????? 3.2
>>>>0002???????????? 2.2
>>>>0002????????????? 2.6
>>>>",sep="",header=TRUE,colClass=c("character","numeric"))
>>>>dat1$UniqueID<-with(dat1,as.character(interaction(ID,ObsNumber,sep="_")))
>>>>?dat2$UniqueID<-with(dat2,as.character(interaction(ID,rownames(dat2),sep="_")))
>>>>?dat2
>>>>#??? ID Height UniqueID
>>>>#1 0001??? 3.2?? 0001_1
>>>>#2 0001??? 2.6?? 0001_2
>>>>#3 0001??? 3.2?? 0001_3
>>>>#4 0002??? 2.2?? 0002_4
>>>>#5 0002??? 2.6?? 0002_5
>>>>A.K.
>>>>
>>>>
>>>>
>>>>----- Original Message -----
>>>>From: Ye Lin <yelin at lbl.gov>
>>>>To: R help <r-help at r-project.org>
>>>>Cc:
>>>>Sent: Tuesday, May 7, 2013 1:54 PM
>>>>Subject: [R] create unique ID for each group
>>>>
>>>>Hey All,
>>>>
>>>>I have a dataset(dat1) like this:
>>>>
>>>>ObsNumber? ?? ID? ? ? ? ? Weight
>>>>? ?? 1? ? ? ? ? ? ? ?? 0001? ? ? ?? 12
>>>>? ?? 2? ? ? ? ? ? ? ?? 0001? ? ? ? ? 13
>>>>? ?? 3? ? ? ? ? ? ? ?? 0001? ? ? ? ?? 14
>>>>? ?? 4? ? ? ? ? ? ? ? ? 0002? ? ? ?? 16
>>>>? ? ? 5? ? ? ? ? ? ? ?? 0002? ? ? ?? 17
>>>>
>>>>And another dataset(dat2) like this:
>>>>
>>>>ID? ? ? ? ? ? ?? Height
>>>>0001? ? ? ? ? ? 3.2
>>>>0001? ? ? ? ? ?? 2.6
>>>>0001? ? ? ? ? ?? 3.2
>>>>0002? ? ? ? ? ?? 2.2
>>>>0002? ? ? ? ? ? ? 2.6
>>>>
>>>>I want to merge dat1 and dat2 based on "ID" in order, I know "match" only
>>>>returns the first match it finds. So I am thinking create unique ID col in
>>>>dat2 and dat2, then merge. But I dont know how to do that so it can be like
>>>>this:
>>>>
>>>>dat1:
>>>>
>>>>ObsNumber? ?? ID? ? ? ? ? Weight? UniqueID
>>>>? ?? 1? ? ? ? ? ? ? ?? 0001? ? ? ?? 12? ? ? ?? 0001_1
>>>>? ?? 2? ? ? ? ? ? ? ?? 0001? ? ? ? ? 13? ? ? ? 0001_2
>>>>? ?? 3? ? ? ? ? ? ? ?? 0001? ? ? ? ?? 14? ? ?? 0001_3
>>>>? ?? 4? ? ? ? ? ? ? ? ? 0002? ? ? ?? 16? ? ? ?? 0002_1
>>>>? ? ? 5? ? ? ? ? ? ? ?? 0002? ? ? ?? 17? ? ? ?? 0002_1
>>>>
>>>>dat2:
>>>>
>>>>ID? ? ? ? ? ? ?? Height?? UniqueID
>>>>0001? ? ? ? ? ? 3.2? ? ? ? ? 0001_1
>>>>0001? ? ? ? ? ?? 2.6? ? ? ?? 0001_2
>>>>0001? ? ? ? ? ?? 3.2? ? ? ?? 0001_3
>>>>0002? ? ? ? ? ?? 2.2? ? ? ?? 0002_1
>>>>0002? ? ? ? ? ? ? 2.6? ? ? ? 0002_2
>>>>
>>>>Or if it is possible to merge dat1 and dat2 by matching "ID" but return the
>>>>match in order that would be great!
>>>>
>>>>Thanks for your help!
>>>>
>>>>??? [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help at r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>
>