-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: Saturday, October 20, 2012 9:29 AM
To: William Dunlap
Cc: R help; Flavio Barros; ramoss
Subject: Re: [R] Creating a new by variable in a dataframe
HI Bill,
Thanks for the reply.
It was unnecessarily complicated.
d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3])),use.names=FALSE)
#or
d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3])))
should have done the same job.
str(d)
#'data.frame':??? 10 obs. of? 4 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ date?????? : Date, format: "2012-10-19" "2012-10-19" ...
# $ time?????? : int? 8 9 10 11 12 13 14 15 16 17
?#$ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...
I am getting error messages with:
d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
Error in match.fun(FUN) : argument "FUN" is missing, with no default
A.K.
----- Original Message -----
From: William Dunlap <wdunlap at tibco.com>
To: arun <smartpink111 at yahoo.com>; Flavio Barros <flaviomargarito at gmail.com>
Cc: R help <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org>
Sent: Saturday, October 20, 2012 12:04 PM
Subject: RE: [R] Creating a new by variable in a dataframe
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
I think that line is unnecessarily complicated. lapply() returns a list
and rbind applied to one argument, L, mainly adds dimensions c(length(L),1)
to it (it also changes its names to rownames).? unlist doesn't care about
the dimensions, so you may as well leave out the rbind.? The only difference
in the results with and without calling rbind is that the rbind version omits
the names from flag.? Use the more direct unname() on split's output or
unlists's output if that concerns you.
Also, if you are interested in saving time and memory when the input, d, is large,
you will be better off applying split() to just the column of the data.frame
that you want split instead of to the entire data.frame.
? d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
(I used d[[3]] instead of the more readable d$time to follow your original more closely.)
You ought to check that the data is sorted by date: otherwise these give the
wrong answer.
What result do you want when there are several transactions at the last time
in the day?
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of arun
Sent: Friday, October 19, 2012 7:49 PM
To: Flavio Barros
Cc: R help; ramoss
Subject: Re: [R] Creating a new by variable in a dataframe
HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
?d1
#?? transaction??????????? datetime? flag
#1????????? T01 2012-10-19 08:00:00 FALSE
#2????????? T02 2012-10-19 09:00:00 FALSE
#3????????? T03 2012-10-19 10:00:00 FALSE
#4????????? T04 2012-10-19 11:00:00? TRUE
#5????????? T05 2012-10-22 12:00:00? TRUE
#6????????? T06 2012-10-23 13:00:00 FALSE
#7????????? T07 2012-10-23 14:00:00 FALSE
#8????????? T08 2012-10-23 15:00:00 FALSE
#9????????? T09 2012-10-23 16:00:00 FALSE
#10???????? T10 2012-10-23 17:00:00? TRUE
str(d1)
#'data.frame':??? 10 obs. of? 3 variables:
# $ transaction: chr? "T01" "T02" "T03" "T04" ...
# $ datetime?? : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag?????? : logi? FALSE FALSE FALSE TRUE TRUE FALSE ...
A.K.
----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss
<ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe
I think i have a better solution
*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))
*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')
library(reshape)
*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))
*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))
*## Coerce to logical vector*
flag <- unlist(rbind(ex))
*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)
On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
Suppose your data frame is
d <- data.frame(
? ? ? stringsAsFactors = FALSE,
? ? ? transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
? ? ? ?? "T07", "T08", "T09", "T10"),
? ? ? date = c("2012-10-19", "2012-10-19", "2012-10-19",
? ? ? ?? "2012-10-19", "2012-10-22", "2012-10-23",
? ? ? ?? "2012-10-23", "2012-10-23", "2012-10-23",
? ? ? ?? "2012-10-23"),
? ? ? time = c("08:00", "09:00", "10:00", "11:00", "12:00",
? ? ? ?? "13:00", "14:00", "15:00", "16:00", "17:00"
? ? ? ?? ))
(Convert the date and time to your favorite classes, it doesn't matter
here.)
A general way to say if an item is the last of its group is:
?? isLastInGroup <- function(...)? ave(logical(length(..1)), ...,
FUN=function(x)seq_along(x)==length(x))
?? is_last_of_dayA <- with(d, isLastInGroup(date))
If you know your data is sorted by date you could save a little time for
large
datasets by using
?? isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
?? is_last_of_dayB <- isLastInRun(d$date)
The above d is sorted by date so you get the same results for both:
?? > cbind(d, is_last_of_dayA, is_last_of_dayB)
? ? ? transaction? ? ?? date? time is_last_of_dayA is_last_of_dayB
?? 1? ? ? ? ? T01 2012-10-19 08:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 2? ? ? ? ? T02 2012-10-19 09:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 3? ? ? ? ? T03 2012-10-19 10:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 4? ? ? ? ? T04 2012-10-19 11:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 5? ? ? ? ? T05 2012-10-22 12:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
?? 6? ? ? ? ? T06 2012-10-23 13:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 7? ? ? ? ? T07 2012-10-23 14:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 8? ? ? ? ? T08 2012-10-23 15:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 9? ? ? ? ? T09 2012-10-23 16:00? ? ? ? ?? FALSE? ? ? ? ?? FALSE
?? 10? ? ? ?? T10 2012-10-23 17:00? ? ? ? ? ? TRUE? ? ? ? ? ? TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Of ramoss
Sent: Friday, October 19, 2012 10:52 AM
To: r-help at r-project.org
Subject: [R] Creating a new by variable in a dataframe
Hello,
I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
? ? ? ? ? /*Create last transaction flag per day*/
data all6;
?? set all6;
?? by tdate event_tim;
?? last_trans=last.tdate;
Thanks ahead for any suggestions.
--
View this message in context:
variable-in-a-dataframe-tp4646782.html
Sent from the R help mailing list archive at Nabble.com.