Interpolating/comparing two irregular time/price sequences?

An HTML attachment was scrubbed...
URL: https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20071108/fc7e5b47/attachment.html
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20071108/743bf3a0/attachment.pl
Thanks everyone for their extremely helpful comments on this issue.
Eric, that is a very interesting point you have raised. Did Peter
publish a paper on this topic? If so, do you happen to know the
title? I feel intuitively that the previous tick method should be more
reliable than interpolation for high-frequency data, although it would
be nice to see some research on this topic confirming this to be the
case.

Thanks
Rory

Just a few quick comments on this issue

The Olsen group book, Introduction to High Frequency Finance, discusses
various interpolation schemes to align multiple irregularly spaced data. For
realized variance modeling Peter Hansen at Stanford showed that one should
use the "previous tick" method for aligning data to a common time clock and
not an linear interpolation around neighboring ticks. The latter method
leads to degenerate results as you sample more frequently since the
quadratic variation of a line is zero.

The type of alignment discussed below is handled in the timeSeries class in
S-PLUS using the align() function. Diethelm Wuertz implemented a subset of
this class in R and I think the align() function is there too.

 ________________________________
 From: r-sig-finance-bounces at stat.math.ethz.ch
[mailto:r-sig-finance-bounces at stat.math.ethz.ch] On Behalf Of Adrian
Trapletti
Sent: Thursday, November 08, 2007 3:37 AM
To: rory.winston at gmail.com
Cc: R-Finance
Subject: Re: [R-SIG-Finance] Interpolating/comparing two irregulartime/price
sequences?

Rory,

There is no best method for synchronizing high frequency data. It depends on
the application. One of the pioneers for high frequency financial data
modelling was http://www.olsen.ch . In the 90ies they published some
articles where they used interpolation schemes to model irregularly spaced
high frequency data with standard discrete time series methods. You can find
some articles on their website. Currently, there is a lot of work on the
topic realized variance/volatility, and when it comes to multivariate
applications, you may find some methods there
http://www.google.ch/search?hl=en&q=%22realized+covariance%22&btnG=Search&meta=

Best regards
Adrian

Message: 1
Date: Wed, 7 Nov 2007 18:31:16 +0000
From: "Rory Winston" <rory.winston at gmail.com>
Subject: [R-SIG-Finance] Interpolating/comparing two irregular
 time/price sequences?
To: r-sig-finance at stat.math.ethz.ch
Message-ID:
 <3f446aa30711071031j37936e36i933be63c90f9ce4c at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi all

I have two data frames, that both look like the following:

head(series1)

 timestamp mid spread
1 1.194438e+12 2.10011 0.000260
2 1.194438e+12 2.10010 0.000290
...

These two time sequences are sampled on price ticks, so the interval
between ticks is stochastic and irregular. The time sequences are also
of different lengths, i.e. one may have 8 hours worth of data, the
other may have 4. My issue is that I want to compare these two series
for similarity - they should be producing almost exactly the same
data, although potentially at slightly different timestamps (hence the
sampling irregularity). I can subset the data so that they span
roughly the same time intervals, but the number of ticks in each
series will be different. Basically what I am trying to achieve is
some sort of constant interpolation based on a time index - so that if
series A starts at 08:01, contains 10,000 ticks, and ends at 16:05,
and series B starts at 08:00, contains 7,000 ticks, and ends at 16:06,
I would like to be able to index from series A into series B at say,
each timestamp in A. Using a simple example, for the following series
A and B:

A:
time tick
16:01 2.05
16:02 2.06

B:
time tick
16:00 2.04
16:02 2.06

I would like to be able to index from A into B at each tick from A, so
I would get an output series that was the value of B at each time A
ticked:

C
time tick
16:01 2.04 <--- constant interpolation from value of B @ 16:00
16:02 2.06

Has anyone done anything like this before? I'm looking at the zoo
package to see if it can help me, but I havent quite figured out how
to do this kind of thing yet. Is this even a good way to checking
whether series B is very similar to series A at the discrete tick
intervals? Any better methods?(I guess another way might be to align
the two subsetted series exactly and just take differences).

Thanks
Rory

--
Adrian Trapletti
Wildsbergstrasse 31
8610 Uster
Switzerland

Phone : +41 (0) 44 9945630
Mobile : +41 (0) 76 3705631

Email : a.trapletti at swissonline.ch

You can also get the previous tick by using

xy <- na.locf(xy)

in place of the na.approx line in Achim's code earlier in this thread.
Just a few quick comments on this issue

The Olsen group book, Introduction to High Frequency Finance, discusses
various interpolation schemes to align multiple irregularly spaced data. For
realized variance modeling Peter Hansen at Stanford showed that one should
use the "previous tick" method for aligning data to a common time clock and
not an linear interpolation around neighboring ticks. The latter method
leads to degenerate results as you sample more frequently since the
quadratic variation of a line is zero.

The type of alignment discussed below is handled in the timeSeries class in
S-PLUS using the align() function. Diethelm Wuertz implemented a subset of
this class in R and I think the align() function is there too.

 _____

From: r-sig-finance-bounces at stat.math.ethz.ch
[mailto:r-sig-finance-bounces at stat.math.ethz.ch] On Behalf Of Adrian
Trapletti
Sent: Thursday, November 08, 2007 3:37 AM
To: rory.winston at gmail.com
Cc: R-Finance
Subject: Re: [R-SIG-Finance] Interpolating/comparing two irregulartime/price
sequences?

Rory,

There is no best method for synchronizing high frequency data. It depends on
the application. One of the pioneers for high frequency financial data
modelling was http://www.olsen.ch . In the 90ies they published some
articles where they used interpolation schemes to model irregularly spaced
high frequency data with standard discrete time series methods. You can find
some articles on their website. Currently, there is a lot of work on the
topic realized variance/volatility, and when it comes to multivariate
applications, you may find some methods there
http://www.google.ch/search?hl=en
<http://www.google.ch/search?hl=en&q=%22realized+covariance%22&btnG=Search&m
eta=> &q=%22realized+covariance%22&btnG=Search&meta=

Best regards
Adrian

Message: 1

Date: Wed, 7 Nov 2007 18:31:16 +0000

From: "Rory Winston"  <mailto:rory.winston at gmail.com>
<rory.winston at gmail.com>

Subject: [R-SIG-Finance] Interpolating/comparing two irregular

       time/price      sequences?

To: r-sig-finance at stat.math.ethz.ch

Message-ID:

<mailto:3f446aa30711071031j37936e36i933be63c90f9ce4c at mail.gmail.com>

<3f446aa30711071031j37936e36i933be63c90f9ce4c at mail.gmail.com>

Content-Type: text/plain; charset=ISO-8859-1

Hi all

I have two data frames, that both look like the following:

head(series1)

    timestamp     mid   spread

1 1.194438e+12 2.10011 0.000260

2 1.194438e+12 2.10010 0.000290

...

These two time sequences are sampled on price ticks, so the interval

between ticks is stochastic and irregular. The time sequences are also

of different lengths, i.e. one may have 8 hours worth of data, the

other may have 4. My issue is that I want to compare these two series

for similarity - they should be producing almost exactly the same

data, although potentially at slightly different timestamps (hence the

sampling irregularity). I can subset the data so that they span

roughly the same time intervals, but the number of ticks in each

series will be different. Basically what I am trying to achieve is

some sort of constant interpolation based on a time index - so that if

series A starts at 08:01, contains 10,000 ticks, and ends at 16:05,

and series B starts at 08:00, contains 7,000 ticks, and ends at 16:06,

I would like to be able to index from series A into series B at say,

each timestamp in A. Using a simple example, for the following series

A and B:

A:

time   tick

16:01 2.05

16:02 2.06

B:

time tick

16:00 2.04

16:02 2.06

I would like to be able to index from A into B at each tick from A, so

I would get an output series that was the value of B at each time A

ticked:

C

time tick

16:01 2.04 <--- constant interpolation from value of B @ 16:00

16:02 2.06

Has anyone done anything like this before? I'm looking at the zoo

package to see if it can help me, but I havent quite figured out how

to do this kind of thing yet. Is this even a good way to checking

whether series B is very similar to series A at the discrete tick

intervals? Any better methods?(I guess another way might be to align

the two subsetted series exactly and just take differences).

Thanks

Rory

--

Adrian Trapletti

Wildsbergstrasse 31

8610 Uster

Switzerland

Phone :   +41 (0) 44 9945630

Mobile :  +41 (0) 76 3705631

Email :   a.trapletti at swissonline.ch

       [[alternative HTML version deleted]]

_______________________________________________
R-SIG-Finance at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.

It really depends on the application what methods are appropriate. 
Possibilities include interpolation schemes (linear, previous tick, 
other), modelling in a framework that allows missing values (e.g. state 
space and Kalman filter), model prices/changes and time increments (e.g. 
Rob Engle published some work in this area).

For exmple, when important macro announcements are released, liquid 
instruments are traded immediately and the prices adjust very quickly 
(within less than a 1/10th of a second) to the new information. For less 
liquid instruments there is maybe for a longer time (several seconds up 
to several minutes) no trade. However, that does not mean that the price 
for the less liquid instrument did not update (you cannot trade anymore 
on the last observed price). It just means that there is no observation. 
Previous tick interpolation would lead to wrong conclusions (spurious 
lead/lag) in this example.

There is a lot of research in high-frequency finance in the hedge fund 
and investment bank industry (e.g. algorithmic trading, automatic market 
making). However, due to the nature of the business most is proprietary 
research.

Best regards
Adrian

Thanks everyone for their extremely helpful comments on this issue.
Eric, that is a very interesting point you have raised. Did Peter
publish a paper on this topic? If so, do you happen to know the
title? I feel intuitively that the previous tick method should be more
reliable than interpolation for high-frequency data, although it would
be nice to see some research on this topic confirming this to be the
case.

Thanks
Rory

On Nov 8, 2007 9:33 PM, Eric Zivot <ezivot at u.washington.edu> wrote:

Just a few quick comments on this issue

The Olsen group book, Introduction to High Frequency Finance, discusses
various interpolation schemes to align multiple irregularly spaced data. For
realized variance modeling Peter Hansen at Stanford showed that one should
use the "previous tick" method for aligning data to a common time clock and
not an linear interpolation around neighboring ticks. The latter method
leads to degenerate results as you sample more frequently since the
quadratic variation of a line is zero.

The type of alignment discussed below is handled in the timeSeries class in
S-PLUS using the align() function. Diethelm Wuertz implemented a subset of
this class in R and I think the align() function is there too.

________________________________
From: r-sig-finance-bounces at stat.math.ethz.ch
[mailto:r-sig-finance-bounces at stat.math.ethz.ch] On Behalf Of Adrian
Trapletti
Sent: Thursday, November 08, 2007 3:37 AM
To: rory.winston at gmail.com
Cc: R-Finance
Subject: Re: [R-SIG-Finance] Interpolating/comparing two irregulartime/price
sequences?

Rory,

There is no best method for synchronizing high frequency data. It depends on
the application. One of the pioneers for high frequency financial data
modelling was http://www.olsen.ch . In the 90ies they published some
articles where they used interpolation schemes to model irregularly spaced
high frequency data with standard discrete time series methods. You can find
some articles on their website. Currently, there is a lot of work on the
topic realized variance/volatility, and when it comes to multivariate
applications, you may find some methods there
http://www.google.ch/search?hl=en&q=%22realized+covariance%22&btnG=Search&meta=

Best regards
Adrian

Message: 1
Date: Wed, 7 Nov 2007 18:31:16 +0000
From: "Rory Winston" <rory.winston at gmail.com>
Subject: [R-SIG-Finance] Interpolating/comparing two irregular
time/price sequences?
To: r-sig-finance at stat.math.ethz.ch
Message-ID:
<3f446aa30711071031j37936e36i933be63c90f9ce4c at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi all

I have two data frames, that both look like the following:

head(series1)

timestamp mid spread
1 1.194438e+12 2.10011 0.000260
2 1.194438e+12 2.10010 0.000290
...

These two time sequences are sampled on price ticks, so the interval
between ticks is stochastic and irregular. The time sequences are also
of different lengths, i.e. one may have 8 hours worth of data, the
other may have 4. My issue is that I want to compare these two series
for similarity - they should be producing almost exactly the same
data, although potentially at slightly different timestamps (hence the
sampling irregularity). I can subset the data so that they span
roughly the same time intervals, but the number of ticks in each
series will be different. Basically what I am trying to achieve is
some sort of constant interpolation based on a time index - so that if
series A starts at 08:01, contains 10,000 ticks, and ends at 16:05,
and series B starts at 08:00, contains 7,000 ticks, and ends at 16:06,
I would like to be able to index from series A into series B at say,
each timestamp in A. Using a simple example, for the following series
A and B:

A:
time tick
16:01 2.05
16:02 2.06

B:
time tick
16:00 2.04
16:02 2.06

I would like to be able to index from A into B at each tick from A, so
I would get an output series that was the value of B at each time A
ticked:

C
time tick
16:01 2.04 <--- constant interpolation from value of B @ 16:00
16:02 2.06

Has anyone done anything like this before? I'm looking at the zoo
package to see if it can help me, but I havent quite figured out how
to do this kind of thing yet. Is this even a good way to checking
whether series B is very similar to series A at the discrete tick
intervals? Any better methods?(I guess another way might be to align
the two subsetted series exactly and just take differences).

Thanks
Rory

--
Adrian Trapletti
Wildsbergstrasse 31
8610 Uster
Switzerland

Phone : +41 (0) 44 9945630
Mobile : +41 (0) 76 3705631

Email : a.trapletti at swissonline.ch

Adrian Trapletti
Wildsbergstrasse 31
8610 Uster
Switzerland

Phone :   +41 (0) 44 9945630
Mobile :  +41 (0) 76 3705631

Email :   a.trapletti at swissonline.ch