Aggregating Statistics By Time Interval
Try producing it in "long" format using aggregate and then reshaping
it into "wide" format using xtabs, reshape or the reshape package:
twas <- function(x) {
y <- data.frame(timediff = diff(x$time), head(x, -1))
aggregate(100 * y[1]/sum(y[1]), y[c("hour", "spread")], sum)
}
tmp2 <- cbind(tmp, hour = fmt(tmp$time))
long <- do.call("rbind", by(tmp2, tmp2["hour"], twas))
# any one of these three:
xtabs(timediff ~., long)
reshape(long, dir = "wide", timevar = "spread", idvar = "hour")
library(reshape)
cast(melt(long, id = 1:2), hour ~ spread)
On 8/3/07, Rory Winston <rory.winston at gmail.com> wrote:
Hi
Sorry, I'm not sure what happened with that last one. Here is a fully
contained example (sorry about the line length if this doesnt wrap).
tmp <- data.frame(
time=c(1185882786,1185882790,1185882791,1185882791,1185882792,1185882795,1185882796,1185882797,1185882797,1185882798,1185882799,1185882800,1185882806,1185882807,1185882809,1185882810,1185882810,1185882811,1185882845,1185882846,1185882906,1185882918,1185882950,1185882951,1185882951,1185882952,1185882953,1185882954,1185882955,1185882956,1185882991,1185882991,1185882995,1185882996,1185882997,1185882997,1185882998,1185882998,1185882999,1185883003,1185883004,1185883006,1185883007,1185883025,1185883026,1185883086,1185883129,1185883129,1185883133,1185883133,1185883137,1185883137,1185883144,1185883145,1185883145,1185883148,1185883148,1185883149,1185883150,1185883151,1185883152,1185883154,1185883154,1185883155,1185883155,1185883175,1185883176,1185883179,1185883179,1185883180,1185883181,1185883181,1185883182,1185883186,1185883187,1185883191,1185883191,1185883200,1185883200,1185883211,1185883212,1185883214,1185883214,1185883215,1185883217,1185883218,1185883219,1185883279,1185883307,1185883307,1185883365,1185883366,1185883366,1185883367,1185883368,1185883368,1185883368,1185883369,1185883373,1185883376),
spread=c(1e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,1e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,1e-04,2e-04,1e-04,1e-04,1e-04,2e-04,1e-04,1e-04,1e-04,2e-04,1e-04,1e-04,2e-04,1e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,1e-04,2e-04,1e-04,2e-04,1e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,1e-04,1e-04,2e-04,1e-04,2e-04,1e-04,1e-04,2e-04,1e-04,2e-04,1e-04,2e-04,3e-04,2e-04)
)
twas <- function (dat)
{
data.frame(tapply(diff(dat$time), head(dat$spread, -1),
sum)/sum(diff(dat$time)) * 100)
}
now <- Sys.time()
epoch <- now - as.numeric(now)
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
Cheers
Rory
On 8/3/07, Gabor Grothendieck < ggrothendieck at gmail.com> wrote:
I still get no warning. Please provide complete self contained input and output.
tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791,
1185882791,
+ 1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04, + 2e-04, 3e-04))
twas <-
+ function(dat) {
+ data.frame(tapply(diff(dat$time), head(dat$spread, -1),
+ sum)/sum(diff(dat$time)) * 100.0)
+ }
now <- Sys.time()
epoch <- now - as.numeric(now)
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
z
1e-04 2e-04 07 66.66667 33.33333
R.version.string # XP
[1] "R version 2.5.1 (2007-06-27)" On 8/3/07, Rory Winston <rory.winston at gmail.com> wrote:
Hi I have figured out what causes the warning (and recycling), but I am not sure how I can fix it. After seeing that it seemed to work for you, I
went
back and tried working with different subsets of the data. I eventually found where it occurs - when we get a third unique spread value. To reproduce, just change the definition of tmp to be: tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791,
1185882791,
1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04, 2e-04, 3e-04)) <== Added 3e-04 i.e. I have just changed one of the spread values to be a third value -
this
seems to trigger the warning "Warning message:number of columns of
result
is not a multiple of vector length (arg 3) in: rbind", and the
recycling. I
tried this on R 2.5.0 and 2.5.1 Can anyone see what I am doing wrong here? Cheers Rory On 8/3/07, Gabor Grothendieck < ggrothendieck at gmail.com> wrote:
Can you provide a reproducible example that exhibits the warning. Redoing it in a more easily reproducible way and using the data in your post gives me no warning
tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791,
1185882791,
+ 1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04, + 2e-04, 1e-04))
twas <-
+ function(dat) {
+ data.frame(tapply(diff(dat$time), head(dat$spread, -1),
+ sum)/sum(diff(dat$time)) * 100.0)
+ }
now <- Sys.time()
epoch <- now - as.numeric(now)
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
z
1e-04 2e-04 07 66.66667 33.33333
R.version.string # XP
[1] "R version 2.5.1 (2007-06-27)" Here is input: tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791,
1185882791,
1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
2e-04, 1e-04))
twas <-
function(dat) {
data.frame(tapply(diff(dat$time), head(dat$spread, -1),
sum)/sum(diff(dat$time)) * 100.0)
}
now <- Sys.time()
epoch <- now - as.numeric(now)
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
z
R.version.string # XP
On 8/3/07, Rory Winston <rory.winston at gmail.com> wrote:
Hi I've been wrestling with this a little bit, using the example in the
that Gabor pointed me to as a reference, and I think I have almost
got
what
I want...however its still not quite right. I have a variable, tmp, with two dimensions: time and spread:
head(tmp$time)
[1] 1185882786 1185882790 1185882791 1185882791 1185882792
1185882795
head(tmp$spread)
[1] 1e-04 1e-04 2e-04 1e-04 2e-04 1e-04
I also have a function that calculates the time-weighted average
spread:
twas
function(dat) {
data.frame(tapply(diff(dat$time), head(dat$spread, -1),
sum)/sum(diff(dat$time)) * 100.0)
}
I can combine them using as rbind() and by():
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
(epoch is just an instance of ISOdatetime)
This gives me a warning:
Warning message:
number of columns of result
is not a multiple of vector length (arg 3) in: rbind(1, "12"
= c(
91.99207541277 , 8.00792458723005), "13" = c(90.1884966797708, The output from the above command is almost exactly what I need,
apart
from
the recycling:
1e-04 2e-04 3e-04 4e-04
12 91.99208 8.007925 91.9920754 8.007924587 <== recycled values
13 90.18850 9.337448 0.4218405 0.052214551
14 90.59640 9.171417 0.2321811 90.596401668
15 89.55771 10.194291 0.2343418 0.013661453
...
I can just pass this into a barplot() and get a nice visual
breakdown of
hourly weighted spreads, *but* I dont know how to get these results
without
the recycling. Looking at rbind(), it seems that this will
automatically
recycle. Does anyone know of a function I could use to get these
results
without this problem? Cheers Rory On 8/1/07, Gabor Grothendieck < ggrothendieck at gmail.com > wrote:
Something similar was just discussed this morning:
On 8/1/07, Rory Winston <rory.winston at gmail.com > wrote:
Hi all I have a question about aggegating statistics by time intervals.
I
have
a
data set with 3 columns : time, bid, and ask. Time is specified
as a
millisecond timestamp since epoch. I would like to compute
summary
statistics for the data set on an hourly basis. Here is what I
have
tried so
far: # Data is in pricedata t <- ISODatetime(1970, 1, 1, 0, 0, 0) + pricedata$time agg <- aggregate(pricedata$spread, list(byhour=format(t, "%Y-%m
%H")),
mean)
This seems to do what I want - however, what really want to do
is
more
specific: I would like to be able to extract a subset of the
data
frame
pricedata, and not just the aggregated entries - for instance,
instead
of
just extracting pricedata$spread by hour, I would like to
extract a
slice of
columns, e.g. pricedata$spread and pricedata$time on an hourly
basis,
and
pass these into a function that can compute a time-weighted
average
spread,
for instance. Does anyone know an elegant way to do this? I have
a
feeling
zoo may do what I want, but I'm new to zoo ...
Cheers
Rory
[[alternative HTML version deleted]]
_______________________________________________ R-SIG-Finance at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first. [[alternative HTML version deleted]] _______________________________________________ R-SIG-Finance at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first.