Discretising intra-day data using zoo?
On Sun, Nov 8, 2009 at 7:58 AM, Ajay Shah <ajayshah at mayin.org> wrote:
library(zoo)
print(load(url("http://www.mayin.org/ajayshah/tmp/demo.rda")))
options("digits.secs"=6)
head(demo)
tail(demo)
On Sun, Nov 08, 2009 at 07:20:02AM -0500, Gabor Grothendieck wrote:
See the aggregate.zoo example in vignette("zoo-quickref") but round up
to the next 4 seconds instead of next Friday:
to4sec <- function(x) as.POSIXct(4*ceiling(as.numeric(x)/4), origin = "1970-01-01") aggregate(demo, to4sec, tail, 1)
? ? ? ? ? ? ? ? ? ? spread ? ?ltp 2009-02-16 05:00:04 0.0050 48.715 2009-02-16 05:00:08 0.0025 48.715 2009-02-16 05:00:12 0.0025 48.715 2009-02-16 05:00:16 0.0025 48.715
Gabor, thanks! I am not as fluent with as.POSIXct() as I should be. And, to continue with my original question:
Suppose there is not a single record in the raw data from 10:30:04 to 10:30:09. Despite this, the resulting object should contain a record for 10:30:08 with NA values (which can then be filled out e.g. using na.locf()). How would we do this? This problem is not present in this data, where records are plentiful. But discretisation code should be general and handle this case right.
How would we do this? To illustrate: ?demo2 <- demo[-300:-700,] ?plot(index(demo2), 1:599, type="l") ? ? ? ? # we see that 5th to 10th ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?# second is zapped out. ?to5sec <- function(x) as.POSIXct(5*ceiling(as.numeric(x)/5), origin = "1970-01-01") Now :
aggregate(demo, to5sec, tail, 1)
? ? ? ? ? ? ? ? ? ?spread ? ?ltp 2009-02-16 05:00:05 0.0050 48.715 2009-02-16 05:00:10 0.0025 48.715 2009-02-16 05:00:15 0.0025 48.715 2009-02-16 05:00:20 0.0025 48.715
aggregate(demo2, to5sec, tail, 1)
? ? ? ? ? ? ? ? ? ?spread ? ?ltp 2009-02-16 05:00:05 0.0050 48.715 2009-02-16 05:00:15 0.0025 48.715 2009-02-16 05:00:20 0.0025 48.715 We should get : ? ? ? ? ? ? ? ? ? ?spread ? ?ltp 2009-02-16 05:00:05 0.0050 48.715 2009-02-16 05:00:10 NA ? ? NA 2009-02-16 05:00:15 0.0025 48.715 2009-02-16 05:00:20 0.0025 48.715
The trick is that converting to ts makes the series regular (as that is the only thing ts can represent) so just convert it to ts and then back to zoo. Since ts cannot represent POSIXct what you get back will not have the POSIXct class= attribute set so just set it yourself.
# aggregate to 5 seconds ag <- aggregate(demo2, to5sec, tail, 1) # make regular (this will strip class from time) ag.fill <- as.zoo(as.ts(ag)) # put class back on time time(ag.fill) <- structure(time(ag.fill), class = class(time(ag))) ag.fill
spread ltp 2009-02-16 05:00:05 0.0050 48.715 2009-02-16 05:00:10 NA NA 2009-02-16 05:00:15 0.0025 48.715 2009-02-16 05:00:20 0.0025 48.715