Prev 4102 / 15274 Next

high frequency data analysis in R

Michael

Thu, May 21, 2009 9:16 AM

If there is a way to call R functions within from C++, that should
solve the large-data-set problem, right?
On the other hand, you only need to truncate data into smaller trunks,
for example, using SAS?

On Thu, May 21, 2009 at 9:13 AM, Hae Kyung Im <hakyim at gmail.com> wrote:

I think in general you would need some sort of pre-processing before using R.

You can use periodic sampling of prices, but you may be throwing away
a lot of information. This is a method that used to be recommended
more than 5 years ago in order to mitigate the effect of market noise.
At least in the context of volatility estimation.

Here is my experience with tick data:

I used FX data to calculate estimated daily volatility using TSRV
(Zhang et al 2005
http://galton.uchicago.edu/~mykland/paperlinks/p1394.pdf). Using the
time series of estimated daily volatilities, I forecasted volatilities
for 1 day up to 1 year ahead. The tick data was in Quantitative
Analytics database. I used their C++ API to query daily data, computed
the TSRV estimator in C++ and saved the result in text file. Then I
used R to read the estimated volatilities and used FARIMA to forecast
volatility. An interesting thing about this type of series is that the
fractional coefficient is approximately 0.4 in many instances.
Bollerslev has a paper commenting on this fact.

In another project, I had treasury futures market depth data. The data
came in plain text format, with one file per day. Each day had more
than 1 million entries. I don't think I could handle this with R. To
get started I decided to use only actual trades. I used Python to
filter out the trades. So this came down to ~60K entries per day. This
I could handle with R. I used to.period from xts package to aggregate
the data.

In order to handle market depth data, we need some efficient way to
access (query) this huge database. I looked a little bit into kdb but
you have to pay ~25K to buy the software for one processor. I haven't
been able to look more into this for now.

Haky




On Thu, May 21, 2009 at 10:15 AM, Jeff Ryan <jeff.a.ryan at gmail.com> wrote:

Not my domain, but you will more than likely have to aggregate to some
sort of regular/homogenous type of series for most traditional tools
to work.

xts has to.period to aggregate up to a lower frequency from tick-level
data. Coupled with something like na.locf you can make yourself some
high frequency 'regular' data from 'irregular'

Regular and irregular of course depend on what you are looking at
(weekends missing in daily data can still be 'regular').

I'd be interested in hearing thoughts from those who actually tread in
the high-freq domain...

A wealth of information can be found here:

?http://www.olsen.ch/publications/working-papers/

Jeff

On Thu, May 21, 2009 at 10:04 AM, Michael <comtech.usa at gmail.com> wrote:

Hi all,

I am wondering if there are some special toolboxes to handle high
frequency data in R?

I have some high frequency data and was wondering what meaningful
experiments can I run on these high frequency data.

Not sure if normal (low frequency) financial time series textbook data
analysis tools will work for high frequency data?

Let's say I run a correlation between two stocks using the high
frequency data, or run an ARMA model on one stock, will the results be
meaningful?

Could anybody point me some classroom types of treatment or lab
tutorial type of document which show me what meaningful
experiments/tests I can run on high frequency data?

Thanks a lot!

_______________________________________________
R-SIG-Finance at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.

_______________________________________________
R-SIG-Finance at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.

Thread (24 messages)

Michael high frequency data analysis in R May 21 Jeff Ryan high frequency data analysis in R May 21 Michael high frequency data analysis in R May 21 Michael high frequency data analysis in R May 21 Liviu Andronic high frequency data analysis in R May 21 Hae Kyung Im high frequency data analysis in R May 21 Michael high frequency data analysis in R May 21 Michael high frequency data analysis in R May 21 Jeff Ryan high frequency data analysis in R May 21 Hae Kyung Im high frequency data analysis in R May 21 Hae Kyung Im high frequency data analysis in R May 21 Jeff Ryan high frequency data analysis in R May 21 Dirk Eddelbuettel Kdb (Was: high frequency data analysis in R) May 21 Eugene Tyurin high frequency data analysis in R May 21 Rowe, Brian Lee Yung (Portfolio Analytics) Preprocessing RData file (Was: Kdb (Was: high frequency data analysis in R)) May 21 Jeff Ryan Preprocessing RData file (Was: Kdb (Was: high frequency data analysis in R)) May 21 Rowe, Brian Lee Yung (Portfolio Analytics) Preprocessing RData file (Was: Kdb (Was: high frequency data analysis in R)) May 21 Shane Conway high frequency data analysis in R May 21 Jeff Ryan Preprocessing RData file (Was: Kdb (Was: high frequency data analysis in R)) May 21 Jeff Ryan Preprocessing RData file (Was: Kdb (Was: high frequency data analysis in R)) May 21 Michael high frequency data analysis in R May 21 Whit Armstrong Preprocessing RData file (Was: Kdb (Was: high frequency data analysis in R)) May 21 Steve Jaffe Preprocessing RData file (data.table and ff, bigmemory) May 22 Jeff Ryan Preprocessing RData file (data.table and ff, bigmemory) May 22