Skip to content
Prev 308461 / 398506 Next

Creating a new by variable in a dataframe

Suppose your data frame is
d <- data.frame(
     stringsAsFactors = FALSE,
     transaction = c("T01", "T02", "T03", "T04", "T05", "T06", 
        "T07", "T08", "T09", "T10"),
     date = c("2012-10-19", "2012-10-19", "2012-10-19", 
        "2012-10-19", "2012-10-22", "2012-10-23", 
        "2012-10-23", "2012-10-23", "2012-10-23", 
        "2012-10-23"),
     time = c("08:00", "09:00", "10:00", "11:00", "12:00", 
        "13:00", "14:00", "15:00", "16:00", "17:00"
        ))
(Convert the date and time to your favorite classes, it doesn't matter here.)

A general way to say if an item is the last of its group is:
  isLastInGroup <- function(...)  ave(logical(length(..1)), ..., FUN=function(x)seq_along(x)==length(x))
  is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for large
datasets by using
  isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
  is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
  > cbind(d, is_last_of_dayA, is_last_of_dayB)
     transaction       date  time is_last_of_dayA is_last_of_dayB
  1          T01 2012-10-19 08:00           FALSE           FALSE
  2          T02 2012-10-19 09:00           FALSE           FALSE
  3          T03 2012-10-19 10:00           FALSE           FALSE
  4          T04 2012-10-19 11:00            TRUE            TRUE
  5          T05 2012-10-22 12:00            TRUE            TRUE
  6          T06 2012-10-23 13:00           FALSE           FALSE
  7          T07 2012-10-23 14:00           FALSE           FALSE
  8          T08 2012-10-23 15:00           FALSE           FALSE
  9          T09 2012-10-23 16:00           FALSE           FALSE
  10         T10 2012-10-23 17:00            TRUE            TRUE


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com