Hi Jim,
Thanks a lot. As you already mentioned, this is a clunky method. This code
runs fine with the sample data I provided here. But my real data is much
larger (more than 50 observations for df1) and this code does not work. I
am getting this error:
Error in `*tmp*`[[j]] : subscript out of bounds
Do you have any solution for that? You have already done a lot. So, I
really appreciate your effort. Thanks
------------------------------
*From:* Jim Lemon <drjimlemon at gmail.com>
*Sent:* Thursday, February 2, 2017 2:07 PM
*To:* Md Sami Bin Shokrana; r-help mailing list
*Subject:* Re: [R] sub-setting rows based on dates in R
Hi Md,
What I have done is to use the most recent intervening date between the
last set of dates if any are there, otherwise the last set of dates. That
is what I understand from your description.
Remember that this is a very clunky way to do something like this by
adding rows to a data frame, and it is likely to scale up to large data
sets badly.
df1<-read.table(text="Date Rainfall_Duration
6/14/2016 10
6/15/2016 20
6/17/2016 10
8/16/2016 30
8/19/2016 40
8/21/2016 20
9/4/2016 10",
header=TRUE,stringsAsFactors=FALSE)
# change the character strings in df2$Date to Date values
df1$Date<-as.Date(df1$Date,"%m/%d/%Y")
df2<-read.table(text="Date Removal.Rate
6/17/2016 64.7
6/30/2016 22.63
7/14/2016 18.18
8/19/2016 27.87
8/30/2016 23.45
9/2/2016 17.2",
header=TRUE,stringsAsFactors=FALSE)
# change the character strings in df2$Date to Date values
df2$Date<-as.Date(df2$Date,"%m/%d/%Y")
df3<-data.frame(Rate.Removal.Date=NULL,Date=NULL,Rainfall_Duration=NULL)
df3row<-0
for(i in 1:dim(df2)[1]) {
rdrows<-which(df2$Date[i] >= df1$Date & !(df2$Date[i] > df1$Date + 8))
# if there are no dates in df1 within the prior 7 days
if(!length(rdrows)) {
# first check if at least one date in df1 is less than the df2
# date and is not included in the last set of df1 dates
checkrows<-which(df2$Date[i] >= df1$Date)
# use the last date greater than the maximum in lastrows
if(any(checkrows > lastrows))
rdrows<-max(checkrows[checkrows > lastrows])
# otherwise use the last set
else rdrows<-lastrows
}
# save the current set of dates
lastrows<-rdrows
# get the number of new rows
nrows<-length(rdrows)
for(row in 1:nrows) {
# set the values in each row
df3[row+df3row,1]<-format(df2$Date[i],"%m/%d/%Y")
df3[row+df3row,2]<-format(df1$Date[rdrows[row]],"%m/%d/%Y")
df3[row+df3row,3]<-df1$Rainfall_Duration[rdrows[row]]
}
# keep count of the current number of rows
df3row<-df3row+nrows
}
names(df3)<-c("Rate.Removal.Date","Date","Rainfall_Duration")
df3
Jim
On Thu, Feb 2, 2017 at 4:58 AM, Md Sami Bin Shokrana <samimist at live.com>
wrote:
Hi Jim,
Thank you so much for your help. Your code works great. Could you please
explain your code a bit? One more thing, I am so sorry that I forgot to
mention one more criteria in my post. If it is not much trouble, could you
please help me out with that? I have added a couple more observations
(the bold ones) to each of my data frames which are shown below:
The main concept is,
(i) For a specific date in df2, if no matching dates are available in df1
within the 7 days range, the code will keep on looking for the latest
available date in df1 with a "Rainfall_Duration" data. For example, in df2,
for *8/30/2016*, there is no "Rainfall_Duration" data available in
df1 within the prior 7 days range. So, I want the code to keep on looking
for dates in df1 until there is an available data for "Rainfall_Duration"
in df1 (in this case which is * 8/21/2016)* .
(ii) Additionally, for* 9/2/2016 *(df2), there is no date available in
df1 with a "Rainfall_Duration" data within prior 7 days range. The latest
available data for "Rainfall_Duration" is *8/21/2016*. So, the code will
extract the same result we had for *8/30/2016* in df2.
In simpler words, i just want the code to keep on looking for data with
"Rainfall_Duraiton" in df1 if there is none available within the prior 7
days range. Sorry for not mentioning it before.
df1 <-
Date Rainfall_Duration
6/14/2016 10
6/15/2016 20
6/17/2016 10
8/16/2016 30
8/19/2016 40
*8/21/2016* *20* *9/4/2016 10*
df2 <-
Date Removal.Rate
6/17/2016 64.7
6/30/2016 22.63
7/14/2016 18.18
8/19/2016 27.87
*8/30/2016* *23.45* *9/2/2016 17.2*
Expected output:
df3 <-
Rate.Removal.Date Date Rainfall_Duration
6/17/2016 6/14/2016 10
6/17/2016 6/15/2016 20
6/17/2016 6/17/2016 10
6/30/2016 6/14/2016 10
6/30/2016 6/15/2016 20
6/30/2016 6/17/2016 10
7/14/2016 6/14/2016 10
7/14/2016 6/15/2016 20
7/14/2016 6/17/2016 10
8/19/2016 8/16/2016 30
8/19/2016 8/19/2016 40
*8/30/2016* *8/21/2016* *20* *9/2/2016 8/21/2016
20*
Thanks in advance.
------------------------------
*From:* Jim Lemon <drjimlemon at gmail.com>
*Sent:* Wednesday, February 1, 2017 1:18 PM
*To:* Md Sami Bin Shokrana; r-help mailing list
*Subject:* Re: [R] sub-setting rows based on dates in R
Hi Md,
This kind of clunky, but it might do what you want.
df1<-read.table(text="Date Rainfall_Duration
6/14/2016 10
6/15/2016 20
6/17/2016 10
8/16/2016 30
8/19/2016 40",
header=TRUE,stringsAsFactors=FALSE)
df1$Date<-as.Date(df1$Date,"%m/%d/%Y")
df2<-read.table(text="Date Removal.Rate
6/17/2016 64.7
6/30/2016 22.63
7/14/2016 18.18
8/19/2016 27.87",
header=TRUE,stringsAsFactors=FALSE)
df2$Date<-as.Date(df2$Date,"%m/%d/%Y")
df3<-data.frame(Rate.Removal.Date=NULL,Date=NULL,Rainfall_Duration=NULL)
df3row<-0
for(i in 1:dim(df2)[1]) {
rdrows<-which(df2$Date[i] >= df1$Date & !(df2$Date[i] > df1$Date + 8))
if(!length(rdrows)) rdrows<-lastrows
lastrows<-rdrows
nrows<-length(rdrows)
for(row in 1:nrows) {
df3[row+df3row,1]<-format(df2$Date[i],"%m/%d/%Y")
df3[row+df3row,2]<-format(df1$Date[rdrows[row]],"%m/%d/%Y")
df3[row+df3row,3]<-df1$Rainfall_Duration[rdrows[row]]
}
df3row<-df3row+nrows
}
names(df3)<-c("Rate.Removal.Date","Date","Rainfall_Duration")
df3
Jim
On Wed, Feb 1, 2017 at 3:48 AM, Md Sami Bin Shokrana <samimist at live.com>
wrote:
Hello guys, I am trying to solve a problem in R. I have 2 data frames
df1 <-
Date Rainfall_Duration
6/14/2016 10
6/15/2016 20
6/17/2016 10
8/16/2016 30
8/19/2016 40
df2 <-
Date Removal.Rate
6/17/2016 64.7
6/30/2016 22.63
7/14/2016 18.18
8/19/2016 27.87
I want to look up the dates from df2 in df1 and their corresponding
Rainfall_Duration data. For example, I want to look for the 1st date of df2
in df1 and subset rows in df1 for that specific date and 7 days prior to
that. additionally, for example: for 6/30/2016 (in df2) there is no dates
available in df1 within it's 7 days range. So, in this case I just want to
extract the results same as it's previous date (6/17/2016) in df2. Same
logic goes for 7/14/2016(df2).
The output should look like this:
df3<-
Rate.Removal.Date Date Rainfall_Duration
6/17/2016 6/14/2016 10
6/17/2016 6/15/2016 20
6/17/2016 6/17/2016 10
6/30/2016 6/14/2016 10
6/30/2016 6/15/2016 20
6/30/2016 6/17/2016 10
7/14/2016 6/14/2016 10
7/14/2016 6/15/2016 20
7/14/2016 6/17/2016 10
8/19/2016 8/16/2016 30
8/19/2016 8/19/2016 40
I could subset data for the 7 days range. But could not do it when no
dates are available in that range. I have the following code:
library(plyr)
library (dplyr)
df1$Date <- as.Date(df1$Date,format = "%m/%d/%Y")
df2$Date <- as.Date(df2$Date,format = "%m/%d/%Y")
df3 <- lapply(df2$Date, function(x){
filter(df1, between(Date, x-7, x))
})
names(df3) <- as.character(df2$Date)
bind_rows(df3, .id = "Rate.Removal.Date")
df3 <- ldply (df3, data.frame, .id = "Rate.Removal.Date")
I hope I could explain my problem properly. I would highly appreciate
if someone can help me out with this code or a new one. Thanks in advance.
[[alternative HTML version deleted]]