Suppose your data frame is
d <- data.frame(
stringsAsFactors = FALSE,
transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
"T07", "T08", "T09", "T10"),
date = c("2012-10-19", "2012-10-19", "2012-10-19",
"2012-10-19", "2012-10-22", "2012-10-23",
"2012-10-23", "2012-10-23", "2012-10-23",
"2012-10-23"),
time = c("08:00", "09:00", "10:00", "11:00", "12:00",
"13:00", "14:00", "15:00", "16:00", "17:00"
))
(Convert the date and time to your favorite classes, it doesn't matter here.)
A general way to say if an item is the last of its group is:
isLastInGroup <- function(...) ave(logical(length(..1)), ..., FUN=function(x)seq_along(x)==length(x))
is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for large
datasets by using
isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
> cbind(d, is_last_of_dayA, is_last_of_dayB)
transaction date time is_last_of_dayA is_last_of_dayB
1 T01 2012-10-19 08:00 FALSE FALSE
2 T02 2012-10-19 09:00 FALSE FALSE
3 T03 2012-10-19 10:00 FALSE FALSE
4 T04 2012-10-19 11:00 TRUE TRUE
5 T05 2012-10-22 12:00 TRUE TRUE
6 T06 2012-10-23 13:00 FALSE FALSE
7 T07 2012-10-23 14:00 FALSE FALSE
8 T08 2012-10-23 15:00 FALSE FALSE
9 T09 2012-10-23 16:00 FALSE FALSE
10 T10 2012-10-23 17:00 TRUE TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of ramoss
Sent: Friday, October 19, 2012 10:52 AM
To: r-help at r-project.org
Subject: [R] Creating a new by variable in a dataframe
Hello,
I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
/*Create last transaction flag per day*/
data all6;
set all6;
by tdate event_tim;
last_trans=last.tdate;
Thanks ahead for any suggestions.
--
View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-
variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.