Hello,
I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
/*Create last transaction flag per day*/
data all6;
set all6;
by tdate event_tim;
last_trans=last.tdate;
Thanks ahead for any suggestions.
--
View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.
Creating a new by variable in a dataframe
11 messages · ramoss, William Dunlap, Flavio Barros +1 more
Suppose your data frame is
d <- data.frame(
stringsAsFactors = FALSE,
transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
"T07", "T08", "T09", "T10"),
date = c("2012-10-19", "2012-10-19", "2012-10-19",
"2012-10-19", "2012-10-22", "2012-10-23",
"2012-10-23", "2012-10-23", "2012-10-23",
"2012-10-23"),
time = c("08:00", "09:00", "10:00", "11:00", "12:00",
"13:00", "14:00", "15:00", "16:00", "17:00"
))
(Convert the date and time to your favorite classes, it doesn't matter here.)
A general way to say if an item is the last of its group is:
isLastInGroup <- function(...) ave(logical(length(..1)), ..., FUN=function(x)seq_along(x)==length(x))
is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for large
datasets by using
isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
> cbind(d, is_last_of_dayA, is_last_of_dayB)
transaction date time is_last_of_dayA is_last_of_dayB
1 T01 2012-10-19 08:00 FALSE FALSE
2 T02 2012-10-19 09:00 FALSE FALSE
3 T03 2012-10-19 10:00 FALSE FALSE
4 T04 2012-10-19 11:00 TRUE TRUE
5 T05 2012-10-22 12:00 TRUE TRUE
6 T06 2012-10-23 13:00 FALSE FALSE
7 T07 2012-10-23 14:00 FALSE FALSE
8 T08 2012-10-23 15:00 FALSE FALSE
9 T09 2012-10-23 16:00 FALSE FALSE
10 T10 2012-10-23 17:00 TRUE TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of ramoss
Sent: Friday, October 19, 2012 10:52 AM
To: r-help at r-project.org
Subject: [R] Creating a new by variable in a dataframe
Hello,
I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
/*Create last transaction flag per day*/
data all6;
set all6;
by tdate event_tim;
last_trans=last.tdate;
Thanks ahead for any suggestions.
--
View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-
variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, May be this helps you: dat1<-read.table(text=" tdate? event_tim? transaction 1/10/2012?? 2?? 14 1/10/2012?? 4?? 28 1/10/2012?? 6?? 42 1/10/2012?? 8?? 14 2/10/2012?? 6?? 46 2/10/2012?? 9?? 64 2/10/2012?? 8?? 71 3/10/2012? 3?? 85 3/10/2012?? 1?? 14 3/10/2012?? 4?? 28 9/10/2012?? 5?? 51 9/10/2012?? 9?? 66 9/20/2012? 12?? 84 ",sep="",header=TRUE,stringsAsFactors=FALSE) dat2<-dat1[with(dat1,order(tdate,event_tim)),] dat2$tdate<-as.Date(dat2$tdate,format="%m/%d/%Y") dat3<-dat2 ?dat3$last_trans<-NA library(plyr) dat4<-merge(dat3,ddply(dat2,.(tdate),tail,1)) dat4$last_trans<-dat4$transaction ?res<-merge(dat4,dat2,all=TRUE) ?res #??????? tdate event_tim transaction last_trans #1? 2012-01-10???????? 2????????? 14???????? NA #2? 2012-01-10???????? 4????????? 28???????? NA #3? 2012-01-10???????? 6????????? 42???????? NA #4? 2012-01-10???????? 8????????? 14???????? 14 #5? 2012-02-10???????? 6????????? 46???????? NA #6? 2012-02-10???????? 8????????? 71???????? NA #7? 2012-02-10???????? 9????????? 64???????? 64 #8? 2012-03-10???????? 1????????? 14???????? NA #9? 2012-03-10???????? 3????????? 85???????? NA #10 2012-03-10???????? 4????????? 28???????? 28 #11 2012-09-10???????? 5????????? 51???????? NA #12 2012-09-10???????? 9????????? 66???????? 66 #13 2012-09-20??????? 12????????? 84???????? 84 ----- Original Message ----- From: ramoss <ramine.mossadegh at finra.org> To: r-help at r-project.org Cc: Sent: Friday, October 19, 2012 1:51 PM Subject: [R] Creating a new by variable in a dataframe Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; ? ? ? ? /*Create last transaction flag per day*/ data all6; ? set all6; ? by tdate event_tim; ? last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks for all the help guys.
This worked for me:
all6 <- arrange(all6, tdate,event_tim)
lt <- ddply(all6,.(tdate),tail,1)
lt$last_trans <-'Y'
all6 <-merge(all6,lt, by.x=c("tdate","event_tim"),
by.y=c("tdate","event_tim"),all.x=TRUE)
--
View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782p4646799.html
Sent from the R help mailing list archive at Nabble.com.
Um texto embutido e sem conjunto de caracteres especificado foi limpo... Nome: n?o dispon?vel Url: <https://stat.ethz.ch/pipermail/r-help/attachments/20121019/dbcd63d8/attachment.pl>
Hi, In addition to merge(), you can also use join() dat1<-read.table(text=" tdate? event_tim? transaction 1/10/2012?? 2?? 14 1/10/2012?? 4?? 28 1/10/2012?? 6?? 42 1/10/2012?? 8?? 14 2/10/2012?? 6?? 46 2/10/2012?? 9?? 64 2/10/2012?? 8?? 71 3/10/2012? 3?? 85 3/10/2012?? 1?? 14 3/10/2012?? 4?? 28 9/10/2012?? 5?? 51 9/10/2012?? 9?? 66 9/20/2012? 12?? 84 ",sep="",header=TRUE,stringsAsFactors=FALSE) dat2<-dat1[with(dat1,order(tdate,event_tim)),] aggres<-aggregate(dat2[,-1],by=list(tdate=dat2$tdate),tail,1) aggres$last_trans<-"Y" library(plyr) join(dat2,aggres,by=intersect(names(dat2),names(aggres)),type="full") #?????? tdate event_tim transaction last_trans #1? 1/10/2012???????? 2????????? 14?????? <NA> #2? 1/10/2012???????? 4????????? 28?????? <NA> #3? 1/10/2012???????? 6????????? 42?????? <NA> #4? 1/10/2012???????? 8????????? 14????????? Y #5? 2/10/2012???????? 6????????? 46?????? <NA> #6? 2/10/2012???????? 8????????? 71?????? <NA> #7? 2/10/2012???????? 9????????? 64????????? Y #8? 3/10/2012???????? 1????????? 14?????? <NA> #9? 3/10/2012???????? 3????????? 85?????? <NA> #10 3/10/2012???????? 4????????? 28????????? Y #11 9/10/2012???????? 5????????? 51?????? <NA> #12 9/10/2012???????? 9????????? 66????????? Y #13 9/20/2012??????? 12????????? 84????????? Y A.K. ----- Original Message ----- From: ramoss <ramine.mossadegh at finra.org> To: r-help at r-project.org Cc: Sent: Friday, October 19, 2012 1:51 PM Subject: [R] Creating a new by variable in a dataframe Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; ? ? ? ? /*Create last transaction flag per day*/ data all6; ? set all6; ? by tdate event_tim; ? last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
?d1
#?? transaction??????????? datetime? flag
#1????????? T01 2012-10-19 08:00:00 FALSE
#2????????? T02 2012-10-19 09:00:00 FALSE
#3????????? T03 2012-10-19 10:00:00 FALSE
#4????????? T04 2012-10-19 11:00:00? TRUE
#5????????? T05 2012-10-22 12:00:00? TRUE
#6????????? T06 2012-10-23 13:00:00 FALSE
#7????????? T07 2012-10-23 14:00:00 FALSE
#8????????? T08 2012-10-23 15:00:00 FALSE
#9????????? T09 2012-10-23 16:00:00 FALSE
#10???????? T10 2012-10-23 17:00:00? TRUE
str(d1)
#'data.frame':??? 10 obs. of? 3 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...
A.K.
----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe
I think i have a better solution
*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')
library(reshape)
*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))
*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))
*## Coerce to logical vector*
flag <- unlist(rbind(ex))
*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)
On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
Suppose your data frame is
d <- data.frame(
? ? ? stringsAsFactors = FALSE,
? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
? ? ? ? "T07", "T08", "T09", "T10"),
? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
? ? ? ? "2012-10-19", "2012-10-22", "2012-10-23",
? ? ? ? "2012-10-23", "2012-10-23", "2012-10-23",
? ? ? ? "2012-10-23"),
? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
? ? ? ? "13:00", "14:00", "15:00", "16:00", "17:00"
? ? ? ? ))
(Convert the date and time to your favorite classes, it doesn't matter
here.)
A general way to say if an item is the last of its group is:
? isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
? is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for
large
datasets by using
? isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
? is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
? > cbind(d, is_last_of_dayA, is_last_of_dayB)
? ? ? transaction? ? ? date? time is_last_of_dayA is_last_of_dayB
? 1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
? 5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
? 6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 10? ? ? ? T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf
Of ramoss Sent: Friday, October 19, 2012 10:52 AM To: r-help at r-project.org Subject: [R] Creating a new by variable in a dataframe Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; ? ? ? ? ? /*Create last transaction flag per day*/ data all6; ? set all6; ? by tdate event_tim; ? last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context:
variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Att, Fl?vio Barros ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
I think that line is unnecessarily complicated. lapply() returns a list and rbind applied to one argument, L, mainly adds dimensions c(length(L),1) to it (it also changes its names to rownames). unlist doesn't care about the dimensions, so you may as well leave out the rbind. The only difference in the results with and without calling rbind is that the rbind version omits the names from flag. Use the more direct unname() on split's output or unlists's output if that concerns you. Also, if you are interested in saving time and memory when the input, d, is large, you will be better off applying split() to just the column of the data.frame that you want split instead of to the entire data.frame. d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x)))) (I used d[[3]] instead of the more readable d$time to follow your original more closely.) You ought to check that the data is sorted by date: otherwise these give the wrong answer. What result do you want when there are several transactions at the last time in the day? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of arun
Sent: Friday, October 19, 2012 7:49 PM
To: Flavio Barros
Cc: R help; ramoss
Subject: Re: [R] Creating a new by variable in a dataframe
HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
?d1
#?? transaction??????????? datetime? flag
#1????????? T01 2012-10-19 08:00:00 FALSE
#2????????? T02 2012-10-19 09:00:00 FALSE
#3????????? T03 2012-10-19 10:00:00 FALSE
#4????????? T04 2012-10-19 11:00:00? TRUE
#5????????? T05 2012-10-22 12:00:00? TRUE
#6????????? T06 2012-10-23 13:00:00 FALSE
#7????????? T07 2012-10-23 14:00:00 FALSE
#8????????? T08 2012-10-23 15:00:00 FALSE
#9????????? T09 2012-10-23 16:00:00 FALSE
#10???????? T10 2012-10-23 17:00:00? TRUE
str(d1)
#'data.frame':??? 10 obs. of? 3 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...
A.K.
----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss
<ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe
I think i have a better solution
*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')
library(reshape)
*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))
*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))
*## Coerce to logical vector*
flag <- unlist(rbind(ex))
*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)
On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
Suppose your data frame is
d <- data.frame(
? ? ? stringsAsFactors = FALSE,
? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
? ? ? ? "T07", "T08", "T09", "T10"),
? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
? ? ? ? "2012-10-19", "2012-10-22", "2012-10-23",
? ? ? ? "2012-10-23", "2012-10-23", "2012-10-23",
? ? ? ? "2012-10-23"),
? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
? ? ? ? "13:00", "14:00", "15:00", "16:00", "17:00"
? ? ? ? ))
(Convert the date and time to your favorite classes, it doesn't matter
here.)
A general way to say if an item is the last of its group is:
? isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
? is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for
large
datasets by using
? isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
? is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
? > cbind(d, is_last_of_dayA, is_last_of_dayB)
? ? ? transaction? ? ? date? time is_last_of_dayA is_last_of_dayB
? 1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
? 5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
? 6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ? FALSE? ? ? ? ? FALSE
? 10? ? ? ? T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf
Of ramoss Sent: Friday, October 19, 2012 10:52 AM To: r-help at r-project.org Subject: [R] Creating a new by variable in a dataframe Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; ? ? ? ? ? /*Create last transaction flag per day*/ data all6; ? set all6; ? by tdate event_tim; ? last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context:
variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Att, Fl?vio Barros ??? [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
HI Bill, Thanks for the reply. It was unnecessarily complicated. d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3])),use.names=FALSE) #or d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3]))) should have done the same job. str(d) #'data.frame':??? 10 obs. of? 4 variables: # $ transaction: chr? "T01" "T02" "T03" "T04" ... # $ date?????? : Date, format: "2012-10-19" "2012-10-19" ... # $ time?????? : int? 8 9 10 11 12 13 14 15 16 17 ?#$ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ... I am getting error messages with: d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x)))) Error in match.fun(FUN) : argument "FUN" is missing, with no default A.K. ----- Original Message ----- From: William Dunlap <wdunlap at tibco.com> To: arun <smartpink111 at yahoo.com>; Flavio Barros <flaviomargarito at gmail.com> Cc: R help <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org> Sent: Saturday, October 20, 2012 12:04 PM Subject: RE: [R] Creating a new by variable in a dataframe
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
I think that line is unnecessarily complicated. lapply() returns a list and rbind applied to one argument, L, mainly adds dimensions c(length(L),1) to it (it also changes its names to rownames).? unlist doesn't care about the dimensions, so you may as well leave out the rbind.? The only difference in the results with and without calling rbind is that the rbind version omits the names from flag.? Use the more direct unname() on split's output or unlists's output if that concerns you. Also, if you are interested in saving time and memory when the input, d, is large, you will be better off applying split() to just the column of the data.frame that you want split instead of to the entire data.frame. ? d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x)))) (I used d[[3]] instead of the more readable d$time to follow your original more closely.) You ought to check that the data is sorted by date: otherwise these give the wrong answer. What result do you want when there are several transactions at the last time in the day? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of arun
Sent: Friday, October 19, 2012 7:49 PM
To: Flavio Barros
Cc: R help; ramoss
Subject: Re: [R] Creating a new by variable in a dataframe
HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
?d1
#?? transaction??????????? datetime? flag
#1????????? T01 2012-10-19 08:00:00 FALSE
#2????????? T02 2012-10-19 09:00:00 FALSE
#3????????? T03 2012-10-19 10:00:00 FALSE
#4????????? T04 2012-10-19 11:00:00? TRUE
#5????????? T05 2012-10-22 12:00:00? TRUE
#6????????? T06 2012-10-23 13:00:00 FALSE
#7????????? T07 2012-10-23 14:00:00 FALSE
#8????????? T08 2012-10-23 15:00:00 FALSE
#9????????? T09 2012-10-23 16:00:00 FALSE
#10???????? T10 2012-10-23 17:00:00? TRUE
str(d1)
#'data.frame':??? 10 obs. of? 3 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...
A.K.
----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss
<ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe
I think i have a better solution
*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')
library(reshape)
*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))
*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))
*## Coerce to logical vector*
flag <- unlist(rbind(ex))
*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)
On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
Suppose your data frame is
d <- data.frame(
? ? ? stringsAsFactors = FALSE,
? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
? ? ? ?? "T07", "T08", "T09", "T10"),
? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
? ? ? ?? "2012-10-19", "2012-10-22", "2012-10-23",
? ? ? ?? "2012-10-23", "2012-10-23", "2012-10-23",
? ? ? ?? "2012-10-23"),
? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
? ? ? ?? "13:00", "14:00", "15:00", "16:00", "17:00"
? ? ? ?? ))
(Convert the date and time to your favorite classes, it doesn't matter
here.)
A general way to say if an item is the last of its group is:
?? isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
?? is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for
large
datasets by using
?? isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
?? is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
?? > cbind(d, is_last_of_dayA, is_last_of_dayB)
? ? ? transaction? ? ?? date? time is_last_of_dayA is_last_of_dayB
?? 1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 10? ? ? ?? T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf
Of ramoss Sent: Friday, October 19, 2012 10:52 AM To: r-help at r-project.org Subject: [R] Creating a new by variable in a dataframe Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; ? ? ? ? ? /*Create last transaction flag per day*/ data all6; ?? set all6; ?? by tdate event_tim; ?? last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context:
variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Att, Fl?vio Barros ??? [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
HI Bill, I figured it out. ?d$flag2<-unlist(lapply(unname(split(d[[3]],d$date)),function(x) x==max(x))) # [1] FALSE FALSE FALSE? TRUE? TRUE FALSE FALSE FALSE FALSE? TRUE ")" created the error. A.K. ----- Original Message ----- From: William Dunlap <wdunlap at tibco.com> To: arun <smartpink111 at yahoo.com>; Flavio Barros <flaviomargarito at gmail.com> Cc: R help <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org> Sent: Saturday, October 20, 2012 12:04 PM Subject: RE: [R] Creating a new by variable in a dataframe
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
I think that line is unnecessarily complicated. lapply() returns a list and rbind applied to one argument, L, mainly adds dimensions c(length(L),1) to it (it also changes its names to rownames).? unlist doesn't care about the dimensions, so you may as well leave out the rbind.? The only difference in the results with and without calling rbind is that the rbind version omits the names from flag.? Use the more direct unname() on split's output or unlists's output if that concerns you. Also, if you are interested in saving time and memory when the input, d, is large, you will be better off applying split() to just the column of the data.frame that you want split instead of to the entire data.frame. ? d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x)))) (I used d[[3]] instead of the more readable d$time to follow your original more closely.) You ought to check that the data is sorted by date: otherwise these give the wrong answer. What result do you want when there are several transactions at the last time in the day? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of arun
Sent: Friday, October 19, 2012 7:49 PM
To: Flavio Barros
Cc: R help; ramoss
Subject: Re: [R] Creating a new by variable in a dataframe
HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
?d1
#?? transaction??????????? datetime? flag
#1????????? T01 2012-10-19 08:00:00 FALSE
#2????????? T02 2012-10-19 09:00:00 FALSE
#3????????? T03 2012-10-19 10:00:00 FALSE
#4????????? T04 2012-10-19 11:00:00? TRUE
#5????????? T05 2012-10-22 12:00:00? TRUE
#6????????? T06 2012-10-23 13:00:00 FALSE
#7????????? T07 2012-10-23 14:00:00 FALSE
#8????????? T08 2012-10-23 15:00:00 FALSE
#9????????? T09 2012-10-23 16:00:00 FALSE
#10???????? T10 2012-10-23 17:00:00? TRUE
str(d1)
#'data.frame':??? 10 obs. of? 3 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...
A.K.
----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss
<ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe
I think i have a better solution
*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')
library(reshape)
*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))
*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))
*## Coerce to logical vector*
flag <- unlist(rbind(ex))
*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)
On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
Suppose your data frame is
d <- data.frame(
? ? ? stringsAsFactors = FALSE,
? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
? ? ? ?? "T07", "T08", "T09", "T10"),
? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
? ? ? ?? "2012-10-19", "2012-10-22", "2012-10-23",
? ? ? ?? "2012-10-23", "2012-10-23", "2012-10-23",
? ? ? ?? "2012-10-23"),
? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
? ? ? ?? "13:00", "14:00", "15:00", "16:00", "17:00"
? ? ? ?? ))
(Convert the date and time to your favorite classes, it doesn't matter
here.)
A general way to say if an item is the last of its group is:
?? isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
?? is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for
large
datasets by using
?? isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
?? is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
?? > cbind(d, is_last_of_dayA, is_last_of_dayB)
? ? ? transaction? ? ?? date? time is_last_of_dayA is_last_of_dayB
?? 1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 10? ? ? ?? T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf
Of ramoss Sent: Friday, October 19, 2012 10:52 AM To: r-help at r-project.org Subject: [R] Creating a new by variable in a dataframe Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; ? ? ? ? ? /*Create last transaction flag per day*/ data all6; ?? set all6; ?? by tdate event_tim; ?? last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context:
variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Att, Fl?vio Barros ??? [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
I'm sorry, I stuck in the unname() in the mail but did not run it - its closing parenthesis should be after split's closing parenthisis, not at the end.
d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date)), function(x)x==max(x))) identical(d$flag , d$flag2)
[1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: arun [mailto:smartpink111 at yahoo.com] Sent: Saturday, October 20, 2012 9:29 AM To: William Dunlap Cc: R help; Flavio Barros; ramoss Subject: Re: [R] Creating a new by variable in a dataframe HI Bill, Thanks for the reply. It was unnecessarily complicated. d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3])),use.names=FALSE) #or d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3]))) should have done the same job. str(d) #'data.frame':??? 10 obs. of? 4 variables: # $ transaction: chr? "T01" "T02" "T03" "T04" ... # $ date?????? : Date, format: "2012-10-19" "2012-10-19" ... # $ time?????? : int? 8 9 10 11 12 13 14 15 16 17 ?#$ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ... I am getting error messages with: d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x)))) Error in match.fun(FUN) : argument "FUN" is missing, with no default A.K. ----- Original Message ----- From: William Dunlap <wdunlap at tibco.com> To: arun <smartpink111 at yahoo.com>; Flavio Barros <flaviomargarito at gmail.com> Cc: R help <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org> Sent: Saturday, October 20, 2012 12:04 PM Subject: RE: [R] Creating a new by variable in a dataframe
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
I think that line is unnecessarily complicated. lapply() returns a list and rbind applied to one argument, L, mainly adds dimensions c(length(L),1) to it (it also changes its names to rownames).? unlist doesn't care about the dimensions, so you may as well leave out the rbind.? The only difference in the results with and without calling rbind is that the rbind version omits the names from flag.? Use the more direct unname() on split's output or unlists's output if that concerns you. Also, if you are interested in saving time and memory when the input, d, is large, you will be better off applying split() to just the column of the data.frame that you want split instead of to the entire data.frame. ? d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x)))) (I used d[[3]] instead of the more readable d$time to follow your original more closely.) You ought to check that the data is sorted by date: otherwise these give the wrong answer. What result do you want when there are several transactions at the last time in the day? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of arun
Sent: Friday, October 19, 2012 7:49 PM
To: Flavio Barros
Cc: R help; ramoss
Subject: Re: [R] Creating a new by variable in a dataframe
HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
?d1
#?? transaction??????????? datetime? flag
#1????????? T01 2012-10-19 08:00:00 FALSE
#2????????? T02 2012-10-19 09:00:00 FALSE
#3????????? T03 2012-10-19 10:00:00 FALSE
#4????????? T04 2012-10-19 11:00:00? TRUE
#5????????? T05 2012-10-22 12:00:00? TRUE
#6????????? T06 2012-10-23 13:00:00 FALSE
#7????????? T07 2012-10-23 14:00:00 FALSE
#8????????? T08 2012-10-23 15:00:00 FALSE
#9????????? T09 2012-10-23 16:00:00 FALSE
#10???????? T10 2012-10-23 17:00:00? TRUE
str(d1)
#'data.frame':??? 10 obs. of? 3 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...
A.K.
----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss
<ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe
I think i have a better solution
*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')
library(reshape)
*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))
*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))
*## Coerce to logical vector*
flag <- unlist(rbind(ex))
*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)
On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
Suppose your data frame is
d <- data.frame(
? ? ? stringsAsFactors = FALSE,
? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
? ? ? ?? "T07", "T08", "T09", "T10"),
? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
? ? ? ?? "2012-10-19", "2012-10-22", "2012-10-23",
? ? ? ?? "2012-10-23", "2012-10-23", "2012-10-23",
? ? ? ?? "2012-10-23"),
? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
? ? ? ?? "13:00", "14:00", "15:00", "16:00", "17:00"
? ? ? ?? ))
(Convert the date and time to your favorite classes, it doesn't matter
here.)
A general way to say if an item is the last of its group is:
?? isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
?? is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for
large
datasets by using
?? isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
?? is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
?? > cbind(d, is_last_of_dayA, is_last_of_dayB)
? ? ? transaction? ? ?? date? time is_last_of_dayA is_last_of_dayB
?? 1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 10? ? ? ?? T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf
Of ramoss Sent: Friday, October 19, 2012 10:52 AM To: r-help at r-project.org Subject: [R] Creating a new by variable in a dataframe Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; ? ? ? ? ? /*Create last transaction flag per day*/ data all6; ?? set all6; ?? by tdate event_tim; ?? last_trans=last.tdate; Thanks ahead for any suggestions. -- View this message in context:
variable-in-a-dataframe-tp4646782.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Att, Fl?vio Barros ??? [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.