Bus stop sequence matching problem
Adam Lawrence <alaw005 <at> gmail.com> writes:
I am hoping someone can help me with a bus stop sequencing problem in R, where I need to match counts of people getting on and off a bus to the correct stop in the bus route stop sequence. I have tried looking online/forums for sequence matching but seems to refer to numeric sequences or DNA matching and over my head. I am after a simple example if anyone can please help.
Adam, Yet another way... See inline code. BTW, you should have mentioned that you are a transit planner or included a signature block so folks would know this is not a homework question. As others have noted/hinted, there are some unstated assumptions, so you need to try some test cases to be sure any solution always works. You only have one outbound/inbound cycle in stop_onoff, right?? If not, I think almost any approach can fail given the right sequence of 'seq's.
I have two data series as per below (from database), that I want to
combine. In this example ?stop_sequence? includes the equence (seq) of bus
stops and ?stop_onoff? is a count of people getting on and off at certain
stops (there is no entry if noone gets on or off).
stop_sequence <- data.frame(seq=c(10,20,30,40,50,60),
ref=c('A','B','C','D','B','A'))
## seq ref
## 1 10 A
## 2 20 B
## 3 30 C
## 4 40 D
## 5 50 B
## 6 60 A
stop_onoff <-
data.frame(ref=c('A','D','B','A'),on=c(5,0,10,0),off=c(0,2,2,6))
## ref on off
## 1 A 5 0
## 2 D 0 2
## 3 B 10 2
## 4 A 0 6
I need to match the stop_onoff numbers in the right sto sequence, with the
correctly matched output as follows (load is a cumulative count of on and
off)
desired_output <- data.frame(seq=c(10,20,30,40,50,60),
ref=c('A','B','C','D','B','A'),
on=c(5,'-','-',0,10,0),off=c(0,'-','-',2,2,6), load=c(5,0,0,3,11,5))
## seq ref on off load
## 1 10 A 5 0 5
## 2 20 B - - 0
## 3 30 C - - 0
## 4 40 D 0 2 3
## 5 50 B 10 2 11
## 6 60 A 0 6 5
Start here:
stop_onoff$load <- with(stop_onoff,cumsum(on)-cumsum(off))
split.ref <- with(stop_sequence,split(seq,ref))
split.ref.onoff <- split.ref[as.character(stop_onoff$ref)]
stop.mat <- sapply(split.ref.onoff,rep,length=2)
inout <- cbind(stop.mat,c(0,Inf))>cbind(c(0,Inf),stop.mat)
stop_onoff$seq <- head(stop.mat[inout],-1)
merge(stop_sequence[c("ref","seq")],stop_onoff[-1],by="seq",all.x=T)
seq ref on off load 1 10 A 5 0 5 2 20 B NA NA NA 3 30 C NA NA NA 4 40 D 0 2 3 5 50 B 10 2 11 6 60 A 0 6 5 You can take care of turning the NA's to zeroes or '-'s, I think. HTH, Chuck