Discretising intra-day data -- how to get by with less memory?
The three functions that can be found in xts to help here are:
(1) align.time: (as Brian alluded to)
This will simply shift all times to the next n-th second specified.
e.g. align.time(x, n=300) # 5 minutes
(2) endpoints:
Locate the last time-stamp (obs in time-series) for each "k" "on" periods
e.g. endpoints(x, on="minutes", k=5) # 5 minutes
(3) merge.xts with a regular time index.
e.g. merge(x, xts(, timeBasedSeq('2009-11-01 08:30/2009-11-01 03:00')))
A complete example:
x <- xts(1:10, Sys.time()+1:10*rnorm(10)*60) x
[,1] 2009-11-27 08:48:18 9 2009-11-27 08:51:03 7 2009-11-27 08:52:13 8 2009-11-27 08:53:10 10 2009-11-27 08:55:25 6 2009-11-27 08:55:56 1 2009-11-27 08:56:02 4 2009-11-27 08:56:44 3 2009-11-27 08:59:24 2 2009-11-27 09:02:46 5
xa <- align.time(x,60) # align to end of minutes xa
[,1] 2009-11-27 08:49:00 9 2009-11-27 08:52:00 7 2009-11-27 08:53:00 8 2009-11-27 08:54:00 10 2009-11-27 08:56:00 6 2009-11-27 08:56:00 1 2009-11-27 08:57:00 4 2009-11-27 08:57:00 3 2009-11-27 09:00:00 2 2009-11-27 09:03:00 5
xa[endpoints(xa,'minutes')] # get last obs with unique timestamp
[,1] 2009-11-27 08:49:00 9 2009-11-27 08:52:00 7 2009-11-27 08:53:00 8 2009-11-27 08:54:00 10 2009-11-27 08:56:00 1 2009-11-27 08:57:00 3 2009-11-27 09:00:00 2 2009-11-27 09:03:00 5
# fill in 'regular' time series merge(xa[endpoints(xa,'minutes')], xts( ,seq(start(xa),end(xa),by="mins")))
xa.endpoints.xa...minutes... 2009-11-27 08:49:00 9 2009-11-27 08:50:00 NA 2009-11-27 08:51:00 NA 2009-11-27 08:52:00 7 2009-11-27 08:53:00 8 2009-11-27 08:54:00 10 2009-11-27 08:55:00 NA 2009-11-27 08:56:00 1 2009-11-27 08:57:00 3 2009-11-27 08:58:00 NA 2009-11-27 08:59:00 NA 2009-11-27 09:00:00 2 2009-11-27 09:01:00 NA 2009-11-27 09:02:00 NA 2009-11-27 09:03:00 5
# optional fill=na.locf will carry forward the last observation (last trade?) merge(xa[endpoints(xa,'minutes')], xts( ,seq(start(xa),end(xa),by="mins")),fill=na.locf)
xa.endpoints.xa...minutes... 2009-11-27 08:49:00 9 2009-11-27 08:50:00 9 2009-11-27 08:51:00 9 2009-11-27 08:52:00 7 2009-11-27 08:53:00 8 2009-11-27 08:54:00 10 2009-11-27 08:55:00 10 2009-11-27 08:56:00 1 2009-11-27 08:57:00 3 2009-11-27 08:58:00 3 2009-11-27 08:59:00 3 2009-11-27 09:00:00 2 2009-11-27 09:01:00 2 2009-11-27 09:02:00 2 2009-11-27 09:03:00 5 I didn't test against your solution(s), but this should be very fast and use as little memory as possible. endpoints, align.time and merge.xts have all been heavily optimized for speed and memory. HTH Jeff
On Fri, Nov 27, 2009 at 7:00 AM, Brian G. Peterson <brian at braverock.com> wrote:
Brian G. Peterson wrote:
Ajay Shah wrote:
I'm using this function to convert intra-day data into a grid with an observation each N seconds: ?# This function consumes "z" a zoo object where timestamps are intraday ?# and a period for discretisation Nseconds. ?# The key ideas are from this thread: ?# ? ?https://stat.ethz.ch/pipermail/r-sig-finance/2009q4/005144.html ?intraday.discretise <- function(z, Nseconds) { ? ?toNsec <- function(x) as.POSIXct(Nseconds*ceiling(as.numeric(x)/Nseconds), ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? origin = "1970-01-01") ? ?d <- aggregate(z, toNsec, tail, 1) ? ?# At this point there is one problem: NA records are not created ? ?# for blocks of time in which there were no records. ? ?# To solve this: ? ?dreg <- as.zoo(as.ts(d)) ? ?class(time(dreg)) <- class(time(d)) ? ?dreg ?} This works correctly but it's incredibly memory-intensive. I'm running out of core in running this for some problems. Is there a way to write this which would use less RAM?
Jeff Ryan, Abe Winter, and I came up with an align.time function a few
months back:
align.time <- function(x, n=30) {
?structure(unclass(x) + (n - unclass(x) %% n),
class=c("POSIXt","POSIXct")) }
x is xts data
n is seconds
Regards,
?- Brian
Or, an earlier, slower version:
this works well enough to generate a new index on the output of to.period:
# stamp is POSIXct object, like index(x) of an xts object
# n is number of seconds to round to, so n=k in to.period
even_seconds = function(stamp,n=60)
{
?tzone = attr(stamp,"tzone")
?if (is.null(tzone)) { tzone = "" }
?base = as.POSIXct(strptime( format(stamp,"%Y%m%d"), "%Y%m%d" ),tz=tzone)
?i = as.numeric(stamp) - as.numeric(base)
?i = base + n*ceiling(i/n)
?i
}
--
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock
_______________________________________________ R-SIG-Finance at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first.
Jeffrey Ryan jeffrey.ryan at insightalgo.com ia: insight algorithmics www.insightalgo.com