Message-ID: <E66794E69CFDE04D9A70842786030B9337990C@PA-MBX04.na.tibco.com>
Date: 2012-10-20T17:53:23Z
From: William Dunlap
Subject: Creating a new by variable in a dataframe
In-Reply-To: <1350750522.21619.YahooMailNeo@web142602.mail.bf1.yahoo.com>
> d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
I'm sorry, I stuck in the unname() in the mail but did not run it - its closing
parenthesis should be after split's closing parenthisis, not at the end.
> d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date)), function(x)x==max(x)))
> identical(d$flag , d$flag2)
[1] TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: arun [mailto:smartpink111 at yahoo.com]
> Sent: Saturday, October 20, 2012 9:29 AM
> To: William Dunlap
> Cc: R help; Flavio Barros; ramoss
> Subject: Re: [R] Creating a new by variable in a dataframe
>
> HI Bill,
>
> Thanks for the reply.
> It was unnecessarily complicated.
> d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3])),use.names=FALSE)
> #or
> d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3])))
> should have done the same job.
> str(d)
> #'data.frame':??? 10 obs. of? 4 variables:
> # $ transaction: chr? "T01" "T02" "T03" "T04" ...
> # $ date?????? : Date, format: "2012-10-19" "2012-10-19" ...
> # $ time?????? : int? 8 9 10 11 12 13 14 15 16 17
> ?#$ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...
>
> I am getting error messages with:
> d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
> Error in match.fun(FUN) : argument "FUN" is missing, with no default
>
>
> A.K.
>
>
>
>
>
> ----- Original Message -----
> From: William Dunlap <wdunlap at tibco.com>
> To: arun <smartpink111 at yahoo.com>; Flavio Barros <flaviomargarito at gmail.com>
> Cc: R help <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org>
> Sent: Saturday, October 20, 2012 12:04 PM
> Subject: RE: [R] Creating a new by variable in a dataframe
>
> > d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
>
> I think that line is unnecessarily complicated. lapply() returns a list
> and rbind applied to one argument, L, mainly adds dimensions c(length(L),1)
> to it (it also changes its names to rownames).? unlist doesn't care about
> the dimensions, so you may as well leave out the rbind.? The only difference
> in the results with and without calling rbind is that the rbind version omits
> the names from flag.? Use the more direct unname() on split's output or
> unlists's output if that concerns you.
>
> Also, if you are interested in saving time and memory when the input, d, is large,
> you will be better off applying split() to just the column of the data.frame
> that you want split instead of to the entire data.frame.
> ? d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
> (I used d[[3]] instead of the more readable d$time to follow your original more closely.)
>
> You ought to check that the data is sorted by date: otherwise these give the
> wrong answer.
>
> What result do you want when there are several transactions at the last time
> in the day?
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> > Of arun
> > Sent: Friday, October 19, 2012 7:49 PM
> > To: Flavio Barros
> > Cc: R help; ramoss
> > Subject: Re: [R] Creating a new by variable in a dataframe
> >
> >
> >
> > HI,
> > Without using "ifelse()" on the same example dataset.
> > d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
> > "T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
> > c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
> > "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
> > = c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
> > "16:00", "17:00"))
> >
> > d$date <- as.Date(d$date,format="%Y-%m-%d")
> > d$time<-strptime(d$time,format="%H:%M")$hour
> > d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
> > d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
> > d1<-d[,c(1,5,4)]
> > ?d1
> > #?? transaction??????????? datetime? flag
> > #1????????? T01 2012-10-19 08:00:00 FALSE
> > #2????????? T02 2012-10-19 09:00:00 FALSE
> > #3????????? T03 2012-10-19 10:00:00 FALSE
> > #4????????? T04 2012-10-19 11:00:00? TRUE
> > #5????????? T05 2012-10-22 12:00:00? TRUE
> > #6????????? T06 2012-10-23 13:00:00 FALSE
> > #7????????? T07 2012-10-23 14:00:00 FALSE
> > #8????????? T08 2012-10-23 15:00:00 FALSE
> > #9????????? T09 2012-10-23 16:00:00 FALSE
> > #10???????? T10 2012-10-23 17:00:00? TRUE
> >
> > str(d1)
> > #'data.frame':??? 10 obs. of? 3 variables:
> > # $ transaction: chr? "T01" "T02" "T03" "T04" ...
> > # $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
> > # $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...
> >
> > A.K.
> >
> >
> > ----- Original Message -----
> > From: Flavio Barros <flaviomargarito at gmail.com>
> > To: William Dunlap <wdunlap at tibco.com>
> > Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss
> > <ramine.mossadegh at finra.org>
> > Sent: Friday, October 19, 2012 4:24 PM
> > Subject: Re: [R] Creating a new by variable in a dataframe
> >
> > I think i have a better solution
> >
> > *## Example data.frame*
> > d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
> > "T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
> > c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
> > "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
> > = c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
> > "16:00", "17:00"))
> >
> > *## As date tranfomation*
> > d$date <- as.Date(d$date)
> > d$time <- strptime(d$time, format='%H')
> >
> > library(reshape)
> >
> > *## Create factor to split the data*
> > fdate <- factor(format(d$date, '%D'))
> >
> > *## Create a list with logical TRUE when is the last transaction*
> > ex <- sapply(split(d, fdate), function(x)
> > ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))
> >
> > *## Coerce to logical vector*
> > flag <- unlist(rbind(ex))
> >
> > *## With reshape we have the transform function e can add the flag column *
> > d <- transform(d, flag = flag)
> >
> > On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
> >
> > > Suppose your data frame is
> > > d <- data.frame(
> > >? ? ? stringsAsFactors = FALSE,
> > >? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
> > >? ? ? ?? "T07", "T08", "T09", "T10"),
> > >? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
> > >? ? ? ?? "2012-10-19", "2012-10-22", "2012-10-23",
> > >? ? ? ?? "2012-10-23", "2012-10-23", "2012-10-23",
> > >? ? ? ?? "2012-10-23"),
> > >? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
> > >? ? ? ?? "13:00", "14:00", "15:00", "16:00", "17:00"
> > >? ? ? ?? ))
> > > (Convert the date and time to your favorite classes, it doesn't matter
> > > here.)
> > >
> > > A general way to say if an item is the last of its group is:
> > >?? isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
> > > FUN=function(x)seq_along(x)==length(x))
> > >?? is_last_of_dayA <- with(d, isLastInGroup(date))
> > > If you know your data is sorted by date you could save a little time for
> > > large
> > > datasets by using
> > >?? isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
> > >?? is_last_of_dayB <- isLastInRun(d$date)
> > > The above d is sorted by date so you get the same results for both:
> > >?? > cbind(d, is_last_of_dayA, is_last_of_dayB)
> > >? ? ? transaction? ? ?? date? time is_last_of_dayA is_last_of_dayB
> > >?? 1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
> > >?? 2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
> > >?? 3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
> > >?? 4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
> > >?? 5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
> > >?? 6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
> > >?? 7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
> > >?? 8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
> > >?? 9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
> > >?? 10? ? ? ?? T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
> > >
> > >
> > > Bill Dunlap
> > > Spotfire, TIBCO Software
> > > wdunlap tibco.com
> > >
> > >
> > > > -----Original Message-----
> > > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> > > On Behalf
> > > > Of ramoss
> > > > Sent: Friday, October 19, 2012 10:52 AM
> > > > To: r-help at r-project.org
> > > > Subject: [R] Creating a new by variable in a dataframe
> > > >
> > > > Hello,
> > > >
> > > > I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
> > > > time(event_tim).
> > > > How could I create a 4th variable (last_trans) that would flag the last
> > > > transaction of the day for each day?
> > > > In SAS I use:
> > > > proc sort data=all6;
> > > > by tdate event_tim;
> > > > run;
> > > >? ? ? ? ? /*Create last transaction flag per day*/
> > > > data all6;
> > > >?? set all6;
> > > >?? by tdate event_tim;
> > > >?? last_trans=last.tdate;
> > > >
> > > > Thanks ahead for any suggestions.
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > http://r.789695.n4.nabble.com/Creating-a-new-by-
> > > > variable-in-a-dataframe-tp4646782.html
> > > > Sent from the R help mailing list archive at Nabble.com.
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
> > --
> > Att,
> >
> > Fl?vio Barros
> >
> > ??? [[alternative HTML version deleted]]
> >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.