Aggregating Statistics By Time Interval
I still get no warning. Please provide complete self contained input and output.
tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791, 1185882791,
+ 1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04, + 2e-04, 3e-04))
twas <-
+ function(dat) {
+ data.frame(tapply(diff(dat$time), head(dat$spread, -1),
+ sum)/sum(diff(dat$time)) * 100.0)
+ }
now <- Sys.time()
epoch <- now - as.numeric(now)
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
z
1e-04 2e-04 07 66.66667 33.33333
R.version.string # XP
[1] "R version 2.5.1 (2007-06-27)"
On 8/3/07, Rory Winston <rory.winston at gmail.com> wrote:
Hi I have figured out what causes the warning (and recycling), but I am not sure how I can fix it. After seeing that it seemed to work for you, I went back and tried working with different subsets of the data. I eventually found where it occurs - when we get a third unique spread value. To reproduce, just change the definition of tmp to be: tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791, 1185882791, 1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04, 2e-04, 3e-04)) <== Added 3e-04 i.e. I have just changed one of the spread values to be a third value - this seems to trigger the warning "Warning message:number of columns of result is not a multiple of vector length (arg 3) in: rbind", and the recycling. I tried this on R 2.5.0 and 2.5.1 Can anyone see what I am doing wrong here? Cheers Rory On 8/3/07, Gabor Grothendieck < ggrothendieck at gmail.com> wrote:
Can you provide a reproducible example that exhibits the warning. Redoing it in a more easily reproducible way and using the data in your post gives me no warning
tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791,
1185882791,
+ 1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04, + 2e-04, 1e-04))
twas <-
+ function(dat) {
+ data.frame(tapply(diff(dat$time), head(dat$spread, -1),
+ sum)/sum(diff(dat$time)) * 100.0)
+ }
now <- Sys.time()
epoch <- now - as.numeric(now)
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
z
1e-04 2e-04 07 66.66667 33.33333
R.version.string # XP
[1] "R version 2.5.1 (2007-06-27)"
Here is input:
tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791, 1185882791,
1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
2e-04, 1e-04))
twas <-
function(dat) {
data.frame(tapply(diff(dat$time), head(dat$spread, -1),
sum)/sum(diff(dat$time)) * 100.0)
}
now <- Sys.time()
epoch <- now - as.numeric(now)
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
z
R.version.string # XP
On 8/3/07, Rory Winston <rory.winston at gmail.com> wrote:
Hi I've been wrestling with this a little bit, using the example in the
that Gabor pointed me to as a reference, and I think I have almost got
what
I want...however its still not quite right. I have a variable, tmp, with two dimensions: time and spread:
head(tmp$time)
[1] 1185882786 1185882790 1185882791 1185882791 1185882792 1185882795
head(tmp$spread)
[1] 1e-04 1e-04 2e-04 1e-04 2e-04 1e-04
I also have a function that calculates the time-weighted average spread:
twas
function(dat) {
data.frame(tapply(diff(dat$time), head(dat$spread, -1),
sum)/sum(diff(dat$time)) * 100.0)
}
I can combine them using as rbind() and by():
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
(epoch is just an instance of ISOdatetime)
This gives me a warning:
Warning message:
number of columns of result
is not a multiple of vector length (arg 3) in: rbind(1, "12" = c(
91.99207541277, 8.00792458723005), "13" = c(90.1884966797708,
The output from the above command is almost exactly what I need, apart
from
the recycling:
1e-04 2e-04 3e-04 4e-04
12 91.99208 8.007925 91.9920754 8.007924587 <== recycled values
13 90.18850 9.337448 0.4218405 0.052214551
14 90.59640 9.171417 0.2321811 90.596401668
15 89.55771 10.194291 0.2343418 0.013661453
...
I can just pass this into a barplot() and get a nice visual breakdown of
hourly weighted spreads, *but* I dont know how to get these results
without
the recycling. Looking at rbind(), it seems that this will automatically recycle. Does anyone know of a function I could use to get these results without this problem? Cheers Rory On 8/1/07, Gabor Grothendieck <ggrothendieck at gmail.com > wrote:
Something similar was just discussed this morning:
On 8/1/07, Rory Winston <rory.winston at gmail.com> wrote:
Hi all I have a question about aggegating statistics by time intervals. I
have
a
data set with 3 columns : time, bid, and ask. Time is specified as a millisecond timestamp since epoch. I would like to compute summary statistics for the data set on an hourly basis. Here is what I have
tried so
far: # Data is in pricedata t <- ISODatetime(1970, 1, 1, 0, 0, 0) + pricedata$time agg <- aggregate(pricedata$spread, list(byhour=format(t, "%Y-%m
%H")),
mean)
This seems to do what I want - however, what really want to do is
more
specific: I would like to be able to extract a subset of the data
frame
pricedata, and not just the aggregated entries - for instance,
instead
of
just extracting pricedata$spread by hour, I would like to extract a
slice of
columns, e.g. pricedata$spread and pricedata$time on an hourly
basis,
and
pass these into a function that can compute a time-weighted average
spread,
for instance. Does anyone know an elegant way to do this? I have a
feeling
zoo may do what I want, but I'm new to zoo ...
Cheers
Rory
[[alternative HTML version deleted]]
_______________________________________________ R-SIG-Finance at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first.
[[alternative HTML version deleted]]
_______________________________________________ R-SIG-Finance at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first.