Skip to content
Prev 308523 / 398506 Next

Creating a new by variable in a dataframe

HI Bill,

I figured it out.
?d$flag2<-unlist(lapply(unname(split(d[[3]],d$date)),function(x) x==max(x)))
# [1] FALSE FALSE FALSE? TRUE? TRUE FALSE FALSE FALSE FALSE? TRUE

")" created the error.

A.K.




----- Original Message -----
From: William Dunlap <wdunlap at tibco.com>
To: arun <smartpink111 at yahoo.com>; Flavio Barros <flaviomargarito at gmail.com>
Cc: R help <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org>
Sent: Saturday, October 20, 2012 12:04 PM
Subject: RE: [R] Creating a new by variable in a dataframe
I think that line is unnecessarily complicated. lapply() returns a list
and rbind applied to one argument, L, mainly adds dimensions c(length(L),1)
to it (it also changes its names to rownames).? unlist doesn't care about
the dimensions, so you may as well leave out the rbind.? The only difference
in the results with and without calling rbind is that the rbind version omits
the names from flag.? Use the more direct unname() on split's output or
unlists's output if that concerns you. 

Also, if you are interested in saving time and memory when the input, d, is large,
you will be better off applying split() to just the column of the data.frame
that you want split instead of to the entire data.frame.
?  d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x))))
(I used d[[3]] instead of the more readable d$time to follow your original more closely.)

You ought to check that the data is sorted by date: otherwise these give the
wrong answer.

What result do you want when there are several transactions at the last time
in the day?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com