high frequency data analysis in R

Fri, May 22, 2009 6:59 AM

Having looked a lot at high-frequency data, let me make a few 
corrections here.

1) True, you have no transaction prices outside of when trades happen.  
You could impute prices; but, that would also need to account for the 
downward bias when an instrument has not traded for a while.  Also, you 
cannot trust trade timestamps 100%; there is some publishing delay.  
(The usual reasoning being that market makers need time to hedge after 
getting hit/lifted.)

2) You do always have the NBBO and maybe even the book.  Better still, 
you can probably trust quote timestamps.

3) Using these two data streams together requires caution.  Trades are 
published with delay; and, while the delay might be small, it is not 
small relative to the number of quote changes.  Therefore, you have 
NBBOs and you have trade prices; but, matching them up is not 
straightforward.  Worse:  There is endogeneity that can creep in.  A 
trade may induce a change in quotes.  Comparing the trade price to 
quotes after the trade occurred will bias any comparison.

If you want to read up on a way to handle that delay, look at sections 3 
and 4 of my trade direction paper at 
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1032701.  The basic 
idea: you might want to average quotes before the trade using something 
like a gamma distribution for the delay -- so you have \hat{q}_t = 
\int_0^T GammaCDF(s) q_{t-s} ds.

You can also find code to do this at http://tigger.uic.edu/~daler/code.html

Just a few thoughts since we seem to be drifting toward mixing quotes 
and trades indiscriminately.

Dale

Dale W.R. Rosenthal
Assistant Professor, Department of Finance
University of Illinois at Chicago
http://tigger.uic.edu/~daler
SSRN: http://ssrn.com/author=906862