Discretising intra-day data using zoo?
library(zoo)
print(load(url("http://www.mayin.org/ajayshah/tmp/demo.rda")))
options("digits.secs"=6)
head(demo)
tail(demo)
On Sun, Nov 08, 2009 at 07:20:02AM -0500, Gabor Grothendieck wrote:
See the aggregate.zoo example in vignette("zoo-quickref") but round up
to the next 4 seconds instead of next Friday:
to4sec <- function(x) as.POSIXct(4*ceiling(as.numeric(x)/4), origin = "1970-01-01") aggregate(demo, to4sec, tail, 1)
spread ltp 2009-02-16 05:00:04 0.0050 48.715 2009-02-16 05:00:08 0.0025 48.715 2009-02-16 05:00:12 0.0025 48.715 2009-02-16 05:00:16 0.0025 48.715
Gabor, thanks! I am not as fluent with as.POSIXct() as I should be. And, to continue with my original question:
Suppose there is not a single record in the raw data from 10:30:04 to 10:30:09. Despite this, the resulting object should contain a record for 10:30:08 with NA values (which can then be filled out e.g. using na.locf()). How would we do this? This problem is not present in this data, where records are plentiful. But discretisation code should be general and handle this case right.
How would we do this? To illustrate:
demo2 <- demo[-300:-700,]
plot(index(demo2), 1:599, type="l") # we see that 5th to 10th
# second is zapped out.
to5sec <- function(x) as.POSIXct(5*ceiling(as.numeric(x)/5), origin = "1970-01-01")
Now :
aggregate(demo, to5sec, tail, 1)
spread ltp 2009-02-16 05:00:05 0.0050 48.715 2009-02-16 05:00:10 0.0025 48.715 2009-02-16 05:00:15 0.0025 48.715 2009-02-16 05:00:20 0.0025 48.715
aggregate(demo2, to5sec, tail, 1)
spread ltp
2009-02-16 05:00:05 0.0050 48.715
2009-02-16 05:00:15 0.0025 48.715
2009-02-16 05:00:20 0.0025 48.715
We should get :
spread ltp
2009-02-16 05:00:05 0.0050 48.715
2009-02-16 05:00:10 NA NA
2009-02-16 05:00:15 0.0025 48.715
2009-02-16 05:00:20 0.0025 48.715
Ajay Shah http://www.mayin.org/ajayshah ajayshah at mayin.org http://ajayshahblog.blogspot.com <*(:-? - wizard who doesn't know the answer.