Aggregating Statistics By Time Interval

I still get no warning.  Please provide complete self contained input
and output.
tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791, 1185882791,
+  1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
+  2e-04, 3e-04))
twas <-
+  function(dat) {
+    data.frame(tapply(diff(dat$time), head(dat$spread, -1),
+  sum)/sum(diff(dat$time)) * 100.0)
+ }
now <- Sys.time()
epoch <- now - as.numeric(now)
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
z
1e-04    2e-04
07 66.66667 33.33333
R.version.string # XP
[1] "R version 2.5.1 (2007-06-27)"
Hi

I have figured out what causes the warning (and recycling), but I am not
sure how I can fix it. After seeing that it seemed to work for you, I went
back and tried working with different subsets of the data. I eventually
found where it occurs - when we get a third unique spread value. To
reproduce, just change the definition of tmp to be:

tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791, 1185882791,
 1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
 2e-04, 3e-04)) <== Added 3e-04

i.e. I have just changed one of the spread values to be a third value - this
seems to trigger the warning  "Warning message:number of columns of result
is not a multiple of vector length (arg 3) in: rbind", and the recycling. I
tried this on R 2.5.0 and 2.5.1

Can anyone see what I am doing wrong here?

Cheers
Rory

On 8/3/07, Gabor Grothendieck < ggrothendieck at gmail.com> wrote:
Can you provide a reproducible example that exhibits the warning.
Redoing it in a more easily reproducible way and using the data
in your post gives me no warning

tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791,
1185882791,
+ 1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
+ 2e-04, 1e-04))
twas <-
+  function(dat) {
+    data.frame(tapply(diff(dat$time), head(dat$spread, -1),
+  sum)/sum(diff(dat$time)) * 100.0)
+ }
now <- Sys.time()
epoch <- now - as.numeric(now)
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
z
      1e-04    2e-04
07 66.66667 33.33333
R.version.string # XP
[1] "R version 2.5.1 (2007-06-27)"

Here is input:

tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791, 1185882791,
1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
2e-04, 1e-04))
twas <-
function(dat) {
   data.frame(tapply(diff(dat$time), head(dat$spread, -1),
sum)/sum(diff(dat$time)) * 100.0)
}
now <- Sys.time()
epoch <- now - as.numeric(now)
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
z
R.version.string # XP

On 8/3/07, Rory Winston <rory.winston at gmail.com> wrote:
Hi

I've been wrestling with this a little bit, using the example in the
email
that Gabor pointed me to as a reference, and I think I have almost got
what
I want...however its still not quite right.

I have a variable, tmp, with two dimensions: time and spread:

head(tmp$time)
[1] 1185882786 1185882790 1185882791 1185882791 1185882792 1185882795

head(tmp$spread)
[1] 1e-04 1e-04 2e-04 1e-04 2e-04 1e-04

I also have a function that calculates the time-weighted average spread:

twas
function(dat) {
  data.frame(tapply(diff(dat$time), head(dat$spread, -1),
sum)/sum(diff(dat$time)) * 100.0)
}

I can combine them using as rbind() and by():

z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))

(epoch is just an instance of ISOdatetime)

This gives me a warning:

Warning message:
number of columns of result
       is not a multiple of vector length (arg 3) in: rbind(1, "12" = c(
91.99207541277, 8.00792458723005), "13" = c(90.1884966797708,

The output from the above command is almost exactly what I need, apart
from
the recycling:

     1e-04     2e-04      3e-04        4e-04
12 91.99208  8.007925 91.9920754  8.007924587 <== recycled values
13 90.18850  9.337448  0.4218405  0.052214551
14 90.59640  9.171417  0.2321811 90.596401668
15 89.55771 10.194291  0.2343418  0.013661453
...

I can just pass this into a barplot() and get a nice visual breakdown of
hourly weighted spreads, *but* I dont know how to get these results
without
the recycling. Looking at rbind(), it seems that this will automatically
recycle. Does anyone know of a function I could use to get these results
without this problem?

Cheers
Rory

On 8/1/07, Gabor Grothendieck <ggrothendieck at gmail.com > wrote:
Something similar was just discussed this morning:

https://www.stat.math.ethz.ch/pipermail/r-help/2007-August/137695.html

On 8/1/07, Rory Winston <rory.winston at gmail.com> wrote:
Hi all

I have a question about aggegating statistics by time intervals. I
have
a
data set with 3 columns : time, bid, and ask. Time is specified as a
millisecond timestamp since epoch. I would like to compute summary
statistics for the data set on an hourly basis. Here is what I have
tried so
far:

# Data is in pricedata

t <- ISODatetime(1970, 1, 1, 0, 0, 0) + pricedata$time
agg <- aggregate(pricedata$spread, list(byhour=format(t, "%Y-%m
%H")),
mean)
This seems to do what I want - however, what really want to do is
more
specific: I would like to be able to extract a subset of the data
frame
pricedata, and not just the aggregated entries - for instance,
instead
of
just extracting pricedata$spread by hour, I would like to extract a
slice of
columns, e.g. pricedata$spread and pricedata$time on an hourly
basis,
and
pass these into a function that can compute a time-weighted average
spread,
for instance. Does anyone know an elegant way to do this? I have a
feeling
zoo may do what I want, but I'm new to zoo ...

Cheers
Rory

       [[alternative HTML version deleted]]

_______________________________________________
R-SIG-Finance at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.

       [[alternative HTML version deleted]]

_______________________________________________
R-SIG-Finance at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.

Aggregating Statistics By Time Interval

Thread (13 messages)