Discretising intra-day data -- how to get by with less memory?
Brian G. Peterson wrote:
Ajay Shah wrote:
I'm using this function to convert intra-day data into a grid with an observation each N seconds: # This function consumes "z" a zoo object where timestamps are intraday # and a period for discretisation Nseconds. # The key ideas are from this thread: # https://stat.ethz.ch/pipermail/r-sig-finance/2009q4/005144.html intraday.discretise <- function(z, Nseconds) { toNsec <- function(x) as.POSIXct(Nseconds*ceiling(as.numeric(x)/Nseconds), origin = "1970-01-01") d <- aggregate(z, toNsec, tail, 1) # At this point there is one problem: NA records are not created # for blocks of time in which there were no records. # To solve this: dreg <- as.zoo(as.ts(d)) class(time(dreg)) <- class(time(d)) dreg } This works correctly but it's incredibly memory-intensive. I'm running out of core in running this for some problems. Is there a way to write this which would use less RAM?
Jeff Ryan, Abe Winter, and I came up with an align.time function a few
months back:
align.time <- function(x, n=30) {
structure(unclass(x) + (n - unclass(x) %% n),
class=c("POSIXt","POSIXct")) }
x is xts data
n is seconds
Regards,
- Brian
Or, an earlier, slower version:
this works well enough to generate a new index on the output of to.period:
# stamp is POSIXct object, like index(x) of an xts object
# n is number of seconds to round to, so n=k in to.period
even_seconds = function(stamp,n=60)
{
tzone = attr(stamp,"tzone")
if (is.null(tzone)) { tzone = "" }
base = as.POSIXct(strptime( format(stamp,"%Y%m%d"), "%Y%m%d" ),tz=tzone)
i = as.numeric(stamp) - as.numeric(base)
i = base + n*ceiling(i/n)
i
}
Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock