How can I do this better (handling "realtime" macroeconomic data)?
Try this:
fetch2 <- function(d, as.of.date = Inf) {
do.call(rbind, by(d, d$date, function(x)
x[which.max(x[x$infodate <= as.of.date, ]$infodate), ]
))}
On Tue, Jun 30, 2009 at 11:19 PM, Ajay Shah<ajayshah at mayin.org> wrote:
Folks,
One problem faced with macroeconomic data is that of data revision. On
date t1 you're told a value x, and then on date t2 you're told that
same value has been changed to y. It is often fine to merely ignore
older values. But in some problems, it becomes important to track the
time-series as observed at past dates. In order to address this
problem, we need to store not just the time-series as a set of
(date,value) pairs, but also an additional field "infodate" which is
the date on which a given record was observed.
Here's an example of this data representation:
?a <- structure(list(date = c("2007-04-01", "2007-04-01", "2007-05-01",
? ?"2007-04-01", "2007-05-01", "2007-06-01", "2007-05-01", "2007-06-01",
? ?"2007-07-01"), infodate = structure(c(13634, 13665, 13665, 13695,
? ?13695, 13695, 13726, 13726, 13726), class = "Date"), value = c(42L,
? ?43L, 55L, 49L, 55L, 66L, 56L, 67L, 77L)), .Names = c("date",
? ?"infodate", "value"), row.names = c(NA, -9L), class = "data.frame")
?a
?str(a)
So this is a dataset containing date (a string), infodate (a Date) and value.
Using this representation, I wrote a function which queries the
dataset and reports the time series as seen on a given date. If a
value for ondate is supplied, only records with infodate <= ondate are
utilised.
?fetch.ts <- function(d, ondate=NULL) {
? ?if (!is.null(ondate)) {
? ? ?d <- subset(d, d$infodate <= ondate)
? ?}
? ?# Now we walk through the series, and every time a new value for
? ?# a given date shows up, we overwrite the previous version.
? ?x <- a$value[1]; names(x)[1] <- d$date[1]
? ?for (i in 2:nrow(d)) {
? ? ?x[d$date[i]] <- d$value[i]
? ?}
? ?x
?}
This seems to work okay:
?fetch.ts(a)
?all.equal(fetch.ts(a), structure(c(49L, 56L, 67L, 77L),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? .Names = c("2007-04-01",
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "2007-05-01", "2007-06-01", "2007-07-01")))
?fetch.ts(a, "2007-07-01")
?all.equal(fetch.ts(a, "2007-07-01"),
? ? ? ? ? ?structure(c(49L, 55L, 66L),
? ? ? ? ? ? ? ? ? ? ?.Names = c("2007-04-01", "2007-05-01", "2007-06-01")))
but I'm not happy at my loops-intensive solution. Also, the use of
associative arrays (using the names in R) might be quite
expensive. How would you improve on this?
--
Ajay Shah ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?http://www.mayin.org/ajayshah
ajayshah at mayin.org ? ? ? ? ? ? ? ? ? ? ? ? ? ? http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.
_______________________________________________ R-SIG-Finance at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first.