Skip to content
Prev 257398 / 398506 Next

Matching Problem: Want to match to data.frame with inexact matching identifier (one identifier has to be in the range of the other).

Hi together,

I found a solution to my problem that works for me and performs
reasonable well in my opinion. Instead of looping through both
datasets, I decided to replicate each row in d2 from Min_Month to
Max_Month and then use the match-function. Here is the code (I changed
the original example to be able to compare the performance of the two
solutions):

#Example for performance test
nobs <- 100 ?#Observation per company [26 companies]
d1 <- data.frame(cbind("ID"=rep(letters,each=nobs,1),"Month"=rep(1:nobs,26)),stringsAsFactors=FALSE)
d1$Month <- as.integer(d1$Month)
d1[,"Unique_ID"] <- mapply(paste,d1$ID,d1$Month,sep="_")

d2 <- data.frame(cbind("ID"=rep(letters,each=3,1),"Min_Month"=rep(c(1,as.integer(nobs/3),as.integer(2*nobs/3)),26),
?? ? ?"Max_Month"=rep(c(as.integer(nobs/3)-1,as.integer(2*nobs/3)-1,nobs),26),"Value1"=rep(1:3,26),
"Value2"=rep(4:6,26)),stringsAsFactors=FALSE)
d2$Min_Month <- as.integer(d2$Min_Month)
d2$Max_Month <- as.integer(d2$Max_Month)

#Possible solution; is there are more elegant one?
proc1 <- proc.time()
for (i in 1:nrow(d1)){
??for (j in 1:nrow(d2)){
?? ?if ((d1[i,"ID"] == d2[j,"ID"]) & (d1[i,"Month"]>= d2[j,"Min_Month"]) &
?? ?(d1[i,"Month"]<= d2[j,"Max_Month"])) {break}
??}
??d1[i,"Value1"] <- d2[j,"Value1"]
??d1[i,"Value2"] <- d2[j,"Value2"]
}
proc1 <- proc.time() - proc1

#New Solution; Create rows first, then match it
proc2 <- proc.time()
createRows <- function(df){
??dates <- seq(df$Min_Month,df$Max_Month,by=1)
??rows <- data.frame(ID = df$ID,date=dates, df[,c("Value1","Value2")])
??rows
}
library(plyr)
d2 <- ddply(d2,.(ID,Min_Month),createRows)
d2[,"Unique_ID"] <- mapply(paste,d2$ID,d2$date,sep="_")
d1[,c("Value1_Alt", "Value2_Alt")] <-
d2[match(d1$Unique_ID,d2$Unique_ID),c("Value1","Value2")]
proc2 <- proc.time() - proc2

Hope that helps. If there is still a better solution, I would still be
more than happy for an advice!

Christoph

On Sat, Apr 16, 2011 at 4:35 PM, Christoph J?ckel
<christoph.jaeckel at wi.tum.de> wrote: