high frequency data analysis in R
If there is a way to call R functions within from C++, that should solve the large-data-set problem, right? On the other hand, you only need to truncate data into smaller trunks, for example, using SAS?
On Thu, May 21, 2009 at 9:13 AM, Hae Kyung Im <hakyim at gmail.com> wrote:
I think in general you would need some sort of pre-processing before using R. You can use periodic sampling of prices, but you may be throwing away a lot of information. This is a method that used to be recommended more than 5 years ago in order to mitigate the effect of market noise. At least in the context of volatility estimation. Here is my experience with tick data: I used FX data to calculate estimated daily volatility using TSRV (Zhang et al 2005 http://galton.uchicago.edu/~mykland/paperlinks/p1394.pdf). Using the time series of estimated daily volatilities, I forecasted volatilities for 1 day up to 1 year ahead. The tick data was in Quantitative Analytics database. I used their C++ API to query daily data, computed the TSRV estimator in C++ and saved the result in text file. Then I used R to read the estimated volatilities and used FARIMA to forecast volatility. An interesting thing about this type of series is that the fractional coefficient is approximately 0.4 in many instances. Bollerslev has a paper commenting on this fact. In another project, I had treasury futures market depth data. The data came in plain text format, with one file per day. Each day had more than 1 million entries. I don't think I could handle this with R. To get started I decided to use only actual trades. I used Python to filter out the trades. So this came down to ~60K entries per day. This I could handle with R. I used to.period from xts package to aggregate the data. In order to handle market depth data, we need some efficient way to access (query) this huge database. I looked a little bit into kdb but you have to pay ~25K to buy the software for one processor. I haven't been able to look more into this for now. Haky On Thu, May 21, 2009 at 10:15 AM, Jeff Ryan <jeff.a.ryan at gmail.com> wrote:
Not my domain, but you will more than likely have to aggregate to some sort of regular/homogenous type of series for most traditional tools to work. xts has to.period to aggregate up to a lower frequency from tick-level data. Coupled with something like na.locf you can make yourself some high frequency 'regular' data from 'irregular' Regular and irregular of course depend on what you are looking at (weekends missing in daily data can still be 'regular'). I'd be interested in hearing thoughts from those who actually tread in the high-freq domain... A wealth of information can be found here: ?http://www.olsen.ch/publications/working-papers/ Jeff On Thu, May 21, 2009 at 10:04 AM, Michael <comtech.usa at gmail.com> wrote:
Hi all, I am wondering if there are some special toolboxes to handle high frequency data in R? I have some high frequency data and was wondering what meaningful experiments can I run on these high frequency data. Not sure if normal (low frequency) financial time series textbook data analysis tools will work for high frequency data? Let's say I run a correlation between two stocks using the high frequency data, or run an ARMA model on one stock, will the results be meaningful? Could anybody point me some classroom types of treatment or lab tutorial type of document which show me what meaningful experiments/tests I can run on high frequency data? Thanks a lot!
_______________________________________________ R-SIG-Finance at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first.
-- Jeffrey Ryan jeffrey.ryan at insightalgo.com ia: insight algorithmics www.insightalgo.com
_______________________________________________ R-SIG-Finance at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first.