Skip to content

sub-setting rows based on dates in R

1 message · Jim Lemon

#
Hi Md,
What I have done is to use the most recent intervening date between the
last set of dates if any are there, otherwise the last set of dates. That
is what I understand from your description.

Remember that this is a very clunky way to do something like this by adding
rows to a data frame, and it is likely to scale up to large data sets badly.

df1<-read.table(text="Date    Rainfall_Duration
 6/14/2016       10
 6/15/2016       20
 6/17/2016       10
 8/16/2016       30
 8/19/2016       40
 8/21/2016       20
 9/4/2016        10",
 header=TRUE,stringsAsFactors=FALSE)
# change the character strings in df2$Date to Date values
df1$Date<-as.Date(df1$Date,"%m/%d/%Y")

df2<-read.table(text="Date    Removal.Rate
 6/17/2016    64.7
 6/30/2016    22.63
 7/14/2016    18.18
 8/19/2016    27.87
 8/30/2016    23.45
 9/2/2016     17.2",
 header=TRUE,stringsAsFactors=FALSE)
# change the character strings in df2$Date to Date values
df2$Date<-as.Date(df2$Date,"%m/%d/%Y")

df3<-data.frame(Rate.Removal.Date=NULL,Date=NULL,Rainfall_Duration=NULL)

df3row<-0

for(i in 1:dim(df2)[1]) {
 rdrows<-which(df2$Date[i] >= df1$Date & !(df2$Date[i] > df1$Date + 8))
 # if there are no dates in df1 within the prior 7 days
 if(!length(rdrows)) {
  # first check if at least one date in df1 is less than the df2
  # date and is not included in the last set of df1 dates
  checkrows<-which(df2$Date[i] >= df1$Date)
  # use the last date greater than the maximum in lastrows
  if(any(checkrows > lastrows))
   rdrows<-max(checkrows[checkrows > lastrows])
  # otherwise use the last set
  else rdrows<-lastrows
 }
 # save the current set of dates
 lastrows<-rdrows
 # get the number of new rows
 nrows<-length(rdrows)
 for(row in 1:nrows) {
  # set the values in each row
  df3[row+df3row,1]<-format(df2$Date[i],"%m/%d/%Y")
  df3[row+df3row,2]<-format(df1$Date[rdrows[row]],"%m/%d/%Y")
  df3[row+df3row,3]<-df1$Rainfall_Duration[rdrows[row]]
 }
 # keep count of the current number of rows
 df3row<-df3row+nrows
}

names(df3)<-c("Rate.Removal.Date","Date","Rainfall_Duration")
df3

Jim


On Thu, Feb 2, 2017 at 4:58 AM, Md Sami Bin Shokrana <samimist at live.com>
wrote: