high frequency data analysis in R
Having looked a lot at high-frequency data, let me make a few corrections here. 1) True, you have no transaction prices outside of when trades happen. You could impute prices; but, that would also need to account for the downward bias when an instrument has not traded for a while. Also, you cannot trust trade timestamps 100%; there is some publishing delay. (The usual reasoning being that market makers need time to hedge after getting hit/lifted.) 2) You do always have the NBBO and maybe even the book. Better still, you can probably trust quote timestamps. 3) Using these two data streams together requires caution. Trades are published with delay; and, while the delay might be small, it is not small relative to the number of quote changes. Therefore, you have NBBOs and you have trade prices; but, matching them up is not straightforward. Worse: There is endogeneity that can creep in. A trade may induce a change in quotes. Comparing the trade price to quotes after the trade occurred will bias any comparison. If you want to read up on a way to handle that delay, look at sections 3 and 4 of my trade direction paper at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1032701. The basic idea: you might want to average quotes before the trade using something like a gamma distribution for the delay -- so you have \hat{q}_t = \int_0^T GammaCDF(s) q_{t-s} ds. You can also find code to do this at http://tigger.uic.edu/~daler/code.html Just a few thoughts since we seem to be drifting toward mixing quotes and trades indiscriminately. Dale
Message: 14 Date: Thu, 21 May 2009 13:48:45 -0400 From: Eugene Tyurin <etyurin at skipstonellc.com> Subject: Re: [R-SIG-Finance] high frequency data analysis in R To: Michael <comtech.usa at gmail.com>, r-sig-finance at stat.math.ethz.ch High-frequency is not my specialty either, but a quote caught my attention: On Thu, May 21, 2009 at 11:38 AM, Michael <comtech.usa at gmail.com> wrote:
My data are price change arrivals, irregularly spaced. But when there is no price change, the price stays constant. Therefore, in fact, at any time instant, you give me a time, I can give you the price at that very instant of time. So irregularly spaced data can be easily sampled to be regularly spaced data.
From a trader's perspective, you do not have "the price" at any time
outside of the instant a trade took place - you have NBBO (and market depth). Last trade's price may or may not be transactable again on either long or short side. You can alternatively say that you have an instanteneous "mid-market price" and a bid/ask spread to work with. Correct me if I'm wrong - I'd like to know how people in HF really look at their data. -- ET.
Dale W.R. Rosenthal Assistant Professor, Department of Finance University of Illinois at Chicago http://tigger.uic.edu/~daler SSRN: http://ssrn.com/author=906862