Creating a new by variable in a dataframe - R-help

ramoss

Fri, Oct 19, 2012 10:51 AM #

Hello,

I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
         /*Create last transaction flag per day*/
data all6;
  set all6;
  by tdate event_tim;
  last_trans=last.tdate;

Thanks ahead for any suggestions.



--
View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.

William Dunlap

Fri, Oct 19, 2012 11:51 AM #

Suppose your data frame is
d <- data.frame(
     stringsAsFactors = FALSE,
     transaction = c("T01", "T02", "T03", "T04", "T05", "T06", 
        "T07", "T08", "T09", "T10"),
     date = c("2012-10-19", "2012-10-19", "2012-10-19", 
        "2012-10-19", "2012-10-22", "2012-10-23", 
        "2012-10-23", "2012-10-23", "2012-10-23", 
        "2012-10-23"),
     time = c("08:00", "09:00", "10:00", "11:00", "12:00", 
        "13:00", "14:00", "15:00", "16:00", "17:00"
        ))
(Convert the date and time to your favorite classes, it doesn't matter here.)

A general way to say if an item is the last of its group is:
  isLastInGroup <- function(...)  ave(logical(length(..1)), ..., FUN=function(x)seq_along(x)==length(x))
  is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for large
datasets by using
  isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
  is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
  > cbind(d, is_last_of_dayA, is_last_of_dayB)
     transaction       date  time is_last_of_dayA is_last_of_dayB
  1          T01 2012-10-19 08:00           FALSE           FALSE
  2          T02 2012-10-19 09:00           FALSE           FALSE
  3          T03 2012-10-19 10:00           FALSE           FALSE
  4          T04 2012-10-19 11:00            TRUE            TRUE
  5          T05 2012-10-22 12:00            TRUE            TRUE
  6          T06 2012-10-23 13:00           FALSE           FALSE
  7          T07 2012-10-23 14:00           FALSE           FALSE
  8          T08 2012-10-23 15:00           FALSE           FALSE
  9          T09 2012-10-23 16:00           FALSE           FALSE
  10         T10 2012-10-23 17:00            TRUE            TRUE


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

Fri, Oct 19, 2012 12:33 PM #

Hi,

May be this helps you: 

dat1<-read.table(text="
tdate? event_tim? transaction
1/10/2012?? 2?? 14
1/10/2012?? 4?? 28
1/10/2012?? 6?? 42
1/10/2012?? 8?? 14
2/10/2012?? 6?? 46
2/10/2012?? 9?? 64
2/10/2012?? 8?? 71
3/10/2012? 3?? 85
3/10/2012?? 1?? 14
3/10/2012?? 4?? 28
9/10/2012?? 5?? 51
9/10/2012?? 9?? 66
9/20/2012? 12?? 84
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-dat1[with(dat1,order(tdate,event_tim)),]
dat2$tdate<-as.Date(dat2$tdate,format="%m/%d/%Y")
dat3<-dat2
?dat3$last_trans<-NA
library(plyr)
dat4<-merge(dat3,ddply(dat2,.(tdate),tail,1))
dat4$last_trans<-dat4$transaction
?res<-merge(dat4,dat2,all=TRUE)
?res
#??????? tdate event_tim transaction last_trans
#1? 2012-01-10???????? 2????????? 14???????? NA
#2? 2012-01-10???????? 4????????? 28???????? NA
#3? 2012-01-10???????? 6????????? 42???????? NA
#4? 2012-01-10???????? 8????????? 14???????? 14
#5? 2012-02-10???????? 6????????? 46???????? NA
#6? 2012-02-10???????? 8????????? 71???????? NA
#7? 2012-02-10???????? 9????????? 64???????? 64
#8? 2012-03-10???????? 1????????? 14???????? NA
#9? 2012-03-10???????? 3????????? 85???????? NA
#10 2012-03-10???????? 4????????? 28???????? 28
#11 2012-09-10???????? 5????????? 51???????? NA
#12 2012-09-10???????? 9????????? 66???????? 66
#13 2012-09-20??????? 12????????? 84???????? 84





----- Original Message -----
From: ramoss <ramine.mossadegh at finra.org>
To: r-help at r-project.org
Cc: 
Sent: Friday, October 19, 2012 1:51 PM
Subject: [R] Creating a new by variable in a dataframe

Hello,

I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
? ? ? ?  /*Create last transaction flag per day*/
data all6;
? set all6;
? by tdate event_tim;
? last_trans=last.tdate;

Thanks ahead for any suggestions.



--
View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

ramoss

Fri, Oct 19, 2012 1:09 PM #

Thanks for all the help guys.

This worked for me:

all6 <- arrange(all6, tdate,event_tim)
lt <- ddply(all6,.(tdate),tail,1) 
lt$last_trans <-'Y'
all6 <-merge(all6,lt, by.x=c("tdate","event_tim"),
by.y=c("tdate","event_tim"),all.x=TRUE)




--
View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782p4646799.html
Sent from the R help mailing list archive at Nabble.com.

Flavio Barros

Fri, Oct 19, 2012 1:24 PM #

Um texto embutido e sem conjunto de caracteres especificado foi limpo...
Nome: n?o dispon?vel
Url: <https://stat.ethz.ch/pipermail/r-help/attachments/20121019/dbcd63d8/attachment.pl>

arun

Fri, Oct 19, 2012 1:41 PM #

Hi,

In addition to merge(), you can also use join()
dat1<-read.table(text="
tdate? event_tim? transaction
1/10/2012?? 2?? 14
1/10/2012?? 4?? 28
1/10/2012?? 6?? 42
1/10/2012?? 8?? 14
2/10/2012?? 6?? 46
2/10/2012?? 9?? 64
2/10/2012?? 8?? 71
3/10/2012? 3?? 85
3/10/2012?? 1?? 14
3/10/2012?? 4?? 28
9/10/2012?? 5?? 51
9/10/2012?? 9?? 66
9/20/2012? 12?? 84
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-dat1[with(dat1,order(tdate,event_tim)),]
aggres<-aggregate(dat2[,-1],by=list(tdate=dat2$tdate),tail,1)
aggres$last_trans<-"Y"
library(plyr)

join(dat2,aggres,by=intersect(names(dat2),names(aggres)),type="full")
#?????? tdate event_tim transaction last_trans
#1? 1/10/2012???????? 2????????? 14?????? <NA>
#2? 1/10/2012???????? 4????????? 28?????? <NA>
#3? 1/10/2012???????? 6????????? 42?????? <NA>
#4? 1/10/2012???????? 8????????? 14????????? Y
#5? 2/10/2012???????? 6????????? 46?????? <NA>
#6? 2/10/2012???????? 8????????? 71?????? <NA>
#7? 2/10/2012???????? 9????????? 64????????? Y
#8? 3/10/2012???????? 1????????? 14?????? <NA>
#9? 3/10/2012???????? 3????????? 85?????? <NA>
#10 3/10/2012???????? 4????????? 28????????? Y
#11 9/10/2012???????? 5????????? 51?????? <NA>
#12 9/10/2012???????? 9????????? 66????????? Y
#13 9/20/2012??????? 12????????? 84????????? Y


A.K.

----- Original Message -----
From: ramoss <ramine.mossadegh at finra.org>
To: r-help at r-project.org
Cc: 
Sent: Friday, October 19, 2012 1:51 PM
Subject: [R] Creating a new by variable in a dataframe

Hello,

I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
? ? ? ?  /*Create last transaction flag per day*/
data all6;
? set all6;
? by tdate event_tim;
? last_trans=last.tdate;

Thanks ahead for any suggestions.



--
View this message in context: http://r.789695.n4.nabble.com/Creating-a-new-by-variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

Fri, Oct 19, 2012 7:49 PM #

HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
?d1
#?? transaction??????????? datetime? flag
#1????????? T01 2012-10-19 08:00:00 FALSE
#2????????? T02 2012-10-19 09:00:00 FALSE
#3????????? T03 2012-10-19 10:00:00 FALSE
#4????????? T04 2012-10-19 11:00:00? TRUE
#5????????? T05 2012-10-22 12:00:00? TRUE
#6????????? T06 2012-10-23 13:00:00 FALSE
#7????????? T07 2012-10-23 14:00:00 FALSE
#8????????? T08 2012-10-23 15:00:00 FALSE
#9????????? T09 2012-10-23 16:00:00 FALSE
#10???????? T10 2012-10-23 17:00:00? TRUE

str(d1)
#'data.frame':??? 10 obs. of? 3 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...

A.K.


----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe

I think i have a better solution

*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')

library(reshape)

*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))

*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))

*## Coerce to logical vector*
flag <- unlist(rbind(ex))

*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)

On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:

Suppose your data frame is
d <- data.frame(
? ? ? stringsAsFactors = FALSE,
? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
? ? ? ?  "T07", "T08", "T09", "T10"),
? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
? ? ? ?  "2012-10-19", "2012-10-22", "2012-10-23",
? ? ? ?  "2012-10-23", "2012-10-23", "2012-10-23",
? ? ? ?  "2012-10-23"),
? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
? ? ? ?  "13:00", "14:00", "15:00", "16:00", "17:00"
? ? ? ?  ))
(Convert the date and time to your favorite classes, it doesn't matter
here.)

A general way to say if an item is the last of its group is:
?  isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
?  is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for
large
datasets by using
?  isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
?  is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
?  > cbind(d, is_last_of_dayA, is_last_of_dayB)
? ? ? transaction? ? ?  date? time is_last_of_dayA is_last_of_dayB
?  1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?  5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?  6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  10? ? ? ?  T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]

On Behalf

Of ramoss
Sent: Friday, October 19, 2012 10:52 AM
To: r-help at r-project.org
Subject: [R] Creating a new by variable in a dataframe

Hello,

I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
? ? ? ? ? /*Create last transaction flag per day*/
data all6;
?  set all6;
?  by tdate event_tim;
?  last_trans=last.tdate;

Thanks ahead for any suggestions.

--
View this message in context:

http://r.789695.n4.nabble.com/Creating-a-new-by-

variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Att,

Fl?vio Barros

??? [[alternative HTML version deleted]]


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

William Dunlap

Sat, Oct 20, 2012 9:04 AM #

I think that line is unnecessarily complicated. lapply() returns a list
and rbind applied to one argument, L, mainly adds dimensions c(length(L),1)
to it (it also changes its names to rownames).  unlist doesn't care about
the dimensions, so you may as well leave out the rbind.  The only difference
in the results with and without calling rbind is that the rbind version omits
the names from flag.  Use the more direct unname() on split's output or
unlists's output if that concerns you. 

Also, if you are interested in saving time and memory when the input, d, is large,
you will be better off applying split() to just the column of the data.frame
that you want split instead of to the entire data.frame.
   d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
(I used d[[3]] instead of the more readable d$time to follow your original more closely.)

You ought to check that the data is sorted by date: otherwise these give the
wrong answer.

What result do you want when there are several transactions at the last time
in the day?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of arun
Sent: Friday, October 19, 2012 7:49 PM
To: Flavio Barros
Cc: R help; ramoss
Subject: Re: [R] Creating a new by variable in a dataframe



HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
?d1
#?? transaction??????????? datetime? flag
#1????????? T01 2012-10-19 08:00:00 FALSE
#2????????? T02 2012-10-19 09:00:00 FALSE
#3????????? T03 2012-10-19 10:00:00 FALSE
#4????????? T04 2012-10-19 11:00:00? TRUE
#5????????? T05 2012-10-22 12:00:00? TRUE
#6????????? T06 2012-10-23 13:00:00 FALSE
#7????????? T07 2012-10-23 14:00:00 FALSE
#8????????? T08 2012-10-23 15:00:00 FALSE
#9????????? T09 2012-10-23 16:00:00 FALSE
#10???????? T10 2012-10-23 17:00:00? TRUE

str(d1)
#'data.frame':??? 10 obs. of? 3 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...

A.K.


----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss
<ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe

I think i have a better solution

*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')

library(reshape)

*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))

*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))

*## Coerce to logical vector*
flag <- unlist(rbind(ex))

*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)

On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:

Suppose your data frame is
d <- data.frame(
? ? ? stringsAsFactors = FALSE,
? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
? ? ? ?  "T07", "T08", "T09", "T10"),
? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
? ? ? ?  "2012-10-19", "2012-10-22", "2012-10-23",
? ? ? ?  "2012-10-23", "2012-10-23", "2012-10-23",
? ? ? ?  "2012-10-23"),
? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
? ? ? ?  "13:00", "14:00", "15:00", "16:00", "17:00"
? ? ? ?  ))
(Convert the date and time to your favorite classes, it doesn't matter
here.)

A general way to say if an item is the last of its group is:
?  isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
?  is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for
large
datasets by using
?  isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
?  is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
?  > cbind(d, is_last_of_dayA, is_last_of_dayB)
? ? ? transaction? ? ?  date? time is_last_of_dayA is_last_of_dayB
?  1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?  5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?  6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ?  FALSE? ? ? ? ?  FALSE
?  10? ? ? ?  T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]

On Behalf

Of ramoss
Sent: Friday, October 19, 2012 10:52 AM
To: r-help at r-project.org
Subject: [R] Creating a new by variable in a dataframe

Hello,

I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
? ? ? ? ? /*Create last transaction flag per day*/
data all6;
?  set all6;
?  by tdate event_tim;
?  last_trans=last.tdate;

Thanks ahead for any suggestions.

--
View this message in context:

http://r.789695.n4.nabble.com/Creating-a-new-by-

variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

Sat, Oct 20, 2012 9:28 AM #

HI Bill,

Thanks for the reply.
It was unnecessarily complicated.
d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3])),use.names=FALSE)
#or
d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3])))
should have done the same job.
str(d)
#'data.frame':??? 10 obs. of? 4 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ date?????? : Date, format: "2012-10-19" "2012-10-19" ...
# $ time?????? : int? 8 9 10 11 12 13 14 15 16 17
?#$ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...

I am getting error messages with:
d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
Error in match.fun(FUN) : argument "FUN" is missing, with no default


A.K.





----- Original Message -----
From: William Dunlap <wdunlap at tibco.com>
To: arun <smartpink111 at yahoo.com>; Flavio Barros <flaviomargarito at gmail.com>
Cc: R help <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org>
Sent: Saturday, October 20, 2012 12:04 PM
Subject: RE: [R] Creating a new by variable in a dataframe

I think that line is unnecessarily complicated. lapply() returns a list
and rbind applied to one argument, L, mainly adds dimensions c(length(L),1)
to it (it also changes its names to rownames).? unlist doesn't care about
the dimensions, so you may as well leave out the rbind.? The only difference
in the results with and without calling rbind is that the rbind version omits
the names from flag.? Use the more direct unname() on split's output or
unlists's output if that concerns you. 

Also, if you are interested in saving time and memory when the input, d, is large,
you will be better off applying split() to just the column of the data.frame
that you want split instead of to the entire data.frame.
?  d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
(I used d[[3]] instead of the more readable d$time to follow your original more closely.)

You ought to check that the data is sorted by date: otherwise these give the
wrong answer.

What result do you want when there are several transactions at the last time
in the day?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of arun
Sent: Friday, October 19, 2012 7:49 PM
To: Flavio Barros
Cc: R help; ramoss
Subject: Re: [R] Creating a new by variable in a dataframe



HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
?d1
#?? transaction??????????? datetime? flag
#1????????? T01 2012-10-19 08:00:00 FALSE
#2????????? T02 2012-10-19 09:00:00 FALSE
#3????????? T03 2012-10-19 10:00:00 FALSE
#4????????? T04 2012-10-19 11:00:00? TRUE
#5????????? T05 2012-10-22 12:00:00? TRUE
#6????????? T06 2012-10-23 13:00:00 FALSE
#7????????? T07 2012-10-23 14:00:00 FALSE
#8????????? T08 2012-10-23 15:00:00 FALSE
#9????????? T09 2012-10-23 16:00:00 FALSE
#10???????? T10 2012-10-23 17:00:00? TRUE

str(d1)
#'data.frame':??? 10 obs. of? 3 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...

A.K.


----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss
<ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe

I think i have a better solution

*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')

library(reshape)

*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))

*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))

*## Coerce to logical vector*
flag <- unlist(rbind(ex))

*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)

On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:

Suppose your data frame is
d <- data.frame(
? ? ? stringsAsFactors = FALSE,
? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
? ? ? ?? "T07", "T08", "T09", "T10"),
? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
? ? ? ?? "2012-10-19", "2012-10-22", "2012-10-23",
? ? ? ?? "2012-10-23", "2012-10-23", "2012-10-23",
? ? ? ?? "2012-10-23"),
? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
? ? ? ?? "13:00", "14:00", "15:00", "16:00", "17:00"
? ? ? ?? ))
(Convert the date and time to your favorite classes, it doesn't matter
here.)

A general way to say if an item is the last of its group is:
?? isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
?? is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for
large
datasets by using
?? isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
?? is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
?? > cbind(d, is_last_of_dayA, is_last_of_dayB)
? ? ? transaction? ? ?? date? time is_last_of_dayA is_last_of_dayB
?? 1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 10? ? ? ?? T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]

On Behalf

Of ramoss
Sent: Friday, October 19, 2012 10:52 AM
To: r-help at r-project.org
Subject: [R] Creating a new by variable in a dataframe

Hello,

I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
? ? ? ? ? /*Create last transaction flag per day*/
data all6;
?? set all6;
?? by tdate event_tim;
?? last_trans=last.tdate;

Thanks ahead for any suggestions.

--
View this message in context:

http://r.789695.n4.nabble.com/Creating-a-new-by-

variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

Sat, Oct 20, 2012 9:37 AM #

HI Bill,

I figured it out.
?d$flag2<-unlist(lapply(unname(split(d[[3]],d$date)),function(x) x==max(x)))
# [1] FALSE FALSE FALSE? TRUE? TRUE FALSE FALSE FALSE FALSE? TRUE

")" created the error.

A.K.




----- Original Message -----
From: William Dunlap <wdunlap at tibco.com>
To: arun <smartpink111 at yahoo.com>; Flavio Barros <flaviomargarito at gmail.com>
Cc: R help <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org>
Sent: Saturday, October 20, 2012 12:04 PM
Subject: RE: [R] Creating a new by variable in a dataframe

I think that line is unnecessarily complicated. lapply() returns a list
and rbind applied to one argument, L, mainly adds dimensions c(length(L),1)
to it (it also changes its names to rownames).? unlist doesn't care about
the dimensions, so you may as well leave out the rbind.? The only difference
in the results with and without calling rbind is that the rbind version omits
the names from flag.? Use the more direct unname() on split's output or
unlists's output if that concerns you. 

Also, if you are interested in saving time and memory when the input, d, is large,
you will be better off applying split() to just the column of the data.frame
that you want split instead of to the entire data.frame.
?  d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
(I used d[[3]] instead of the more readable d$time to follow your original more closely.)

You ought to check that the data is sorted by date: otherwise these give the
wrong answer.

What result do you want when there are several transactions at the last time
in the day?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of arun
Sent: Friday, October 19, 2012 7:49 PM
To: Flavio Barros
Cc: R help; ramoss
Subject: Re: [R] Creating a new by variable in a dataframe



HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
?d1
#?? transaction??????????? datetime? flag
#1????????? T01 2012-10-19 08:00:00 FALSE
#2????????? T02 2012-10-19 09:00:00 FALSE
#3????????? T03 2012-10-19 10:00:00 FALSE
#4????????? T04 2012-10-19 11:00:00? TRUE
#5????????? T05 2012-10-22 12:00:00? TRUE
#6????????? T06 2012-10-23 13:00:00 FALSE
#7????????? T07 2012-10-23 14:00:00 FALSE
#8????????? T08 2012-10-23 15:00:00 FALSE
#9????????? T09 2012-10-23 16:00:00 FALSE
#10???????? T10 2012-10-23 17:00:00? TRUE

str(d1)
#'data.frame':??? 10 obs. of? 3 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...

A.K.


----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss
<ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe

I think i have a better solution

*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')

library(reshape)

*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))

*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))

*## Coerce to logical vector*
flag <- unlist(rbind(ex))

*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)

On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:

Suppose your data frame is
d <- data.frame(
? ? ? stringsAsFactors = FALSE,
? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
? ? ? ?? "T07", "T08", "T09", "T10"),
? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
? ? ? ?? "2012-10-19", "2012-10-22", "2012-10-23",
? ? ? ?? "2012-10-23", "2012-10-23", "2012-10-23",
? ? ? ?? "2012-10-23"),
? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
? ? ? ?? "13:00", "14:00", "15:00", "16:00", "17:00"
? ? ? ?? ))
(Convert the date and time to your favorite classes, it doesn't matter
here.)

A general way to say if an item is the last of its group is:
?? isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
?? is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for
large
datasets by using
?? isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
?? is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
?? > cbind(d, is_last_of_dayA, is_last_of_dayB)
? ? ? transaction? ? ?? date? time is_last_of_dayA is_last_of_dayB
?? 1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 10? ? ? ?? T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]

On Behalf

Of ramoss
Sent: Friday, October 19, 2012 10:52 AM
To: r-help at r-project.org
Subject: [R] Creating a new by variable in a dataframe

Hello,

I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
? ? ? ? ? /*Create last transaction flag per day*/
data all6;
?? set all6;
?? by tdate event_tim;
?? last_trans=last.tdate;

Thanks ahead for any suggestions.

--
View this message in context:

http://r.789695.n4.nabble.com/Creating-a-new-by-

variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

William Dunlap

Sat, Oct 20, 2012 10:53 AM #

I'm sorry, I stuck in the unname() in the mail but did not run it - its closing
parenthesis should be after split's closing parenthisis, not at the end.

[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: Saturday, October 20, 2012 9:29 AM
To: William Dunlap
Cc: R help; Flavio Barros; ramoss
Subject: Re: [R] Creating a new by variable in a dataframe

HI Bill,

Thanks for the reply.
It was unnecessarily complicated.
d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3])),use.names=FALSE)
#or
d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3])))
should have done the same job.
str(d)
#'data.frame':??? 10 obs. of? 4 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ date?????? : Date, format: "2012-10-19" "2012-10-19" ...
# $ time?????? : int? 8 9 10 11 12 13 14 15 16 17
?#$ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...

I am getting error messages with:
d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
Error in match.fun(FUN) : argument "FUN" is missing, with no default


A.K.





----- Original Message -----
From: William Dunlap <wdunlap at tibco.com>
To: arun <smartpink111 at yahoo.com>; Flavio Barros <flaviomargarito at gmail.com>
Cc: R help <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org>
Sent: Saturday, October 20, 2012 12:04 PM
Subject: RE: [R] Creating a new by variable in a dataframe

d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))

I think that line is unnecessarily complicated. lapply() returns a list
and rbind applied to one argument, L, mainly adds dimensions c(length(L),1)
to it (it also changes its names to rownames).? unlist doesn't care about
the dimensions, so you may as well leave out the rbind.? The only difference
in the results with and without calling rbind is that the rbind version omits
the names from flag.? Use the more direct unname() on split's output or
unlists's output if that concerns you.

Also, if you are interested in saving time and memory when the input, d, is large,
you will be better off applying split() to just the column of the data.frame
that you want split instead of to the entire data.frame.
?  d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
(I used d[[3]] instead of the more readable d$time to follow your original more closely.)

You ought to check that the data is sorted by date: otherwise these give the
wrong answer.

What result do you want when there are several transactions at the last time
in the day?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of arun
Sent: Friday, October 19, 2012 7:49 PM
To: Flavio Barros
Cc: R help; ramoss
Subject: Re: [R] Creating a new by variable in a dataframe



HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
?d1
#?? transaction??????????? datetime? flag
#1????????? T01 2012-10-19 08:00:00 FALSE
#2????????? T02 2012-10-19 09:00:00 FALSE
#3????????? T03 2012-10-19 10:00:00 FALSE
#4????????? T04 2012-10-19 11:00:00? TRUE
#5????????? T05 2012-10-22 12:00:00? TRUE
#6????????? T06 2012-10-23 13:00:00 FALSE
#7????????? T07 2012-10-23 14:00:00 FALSE
#8????????? T08 2012-10-23 15:00:00 FALSE
#9????????? T09 2012-10-23 16:00:00 FALSE
#10???????? T10 2012-10-23 17:00:00? TRUE

str(d1)
#'data.frame':??? 10 obs. of? 3 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...

A.K.


----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss
<ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe

I think i have a better solution

*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')

library(reshape)

*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))

*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))

*## Coerce to logical vector*
flag <- unlist(rbind(ex))

*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)

On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:

Suppose your data frame is
d <- data.frame(
? ? ? stringsAsFactors = FALSE,
? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
? ? ? ?? "T07", "T08", "T09", "T10"),
? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
? ? ? ?? "2012-10-19", "2012-10-22", "2012-10-23",
? ? ? ?? "2012-10-23", "2012-10-23", "2012-10-23",
? ? ? ?? "2012-10-23"),
? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
? ? ? ?? "13:00", "14:00", "15:00", "16:00", "17:00"
? ? ? ?? ))
(Convert the date and time to your favorite classes, it doesn't matter
here.)

A general way to say if an item is the last of its group is:
?? isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
?? is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for
large
datasets by using
?? isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
?? is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
?? > cbind(d, is_last_of_dayA, is_last_of_dayB)
? ? ? transaction? ? ?? date? time is_last_of_dayA is_last_of_dayB
?? 1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 10? ? ? ?? T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]

On Behalf

Of ramoss
Sent: Friday, October 19, 2012 10:52 AM
To: r-help at r-project.org
Subject: [R] Creating a new by variable in a dataframe

Hello,

I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
? ? ? ? ? /*Create last transaction flag per day*/
data all6;
?? set all6;
?? by tdate event_tim;
?? last_trans=last.tdate;

Thanks ahead for any suggestions.

--
View this message in context:

http://r.789695.n4.nabble.com/Creating-a-new-by-

variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.