Skip to content

Keeping persistent data collections

3 messages · Dino Veritas, Brian G. Peterson, G See

#
On Sun, 2011-11-06 at 22:43 -0500, Dino Veritas wrote:
I store only tick data, as I can easily get to any other frequency from
tick.  I've considered also storing daily data, but in the end I decide
it is too much trouble to (additionally) manage, and just store tick.
You are correct that the 'indexing' package is very powerful.  It is
also not done yet.  

As I said, I store tick data.  The way I do this is with single files
per day of data per symbol, pre-parsed into xts objects and stored to
disk in one directory per symbol (using 'save').    

I then use FinancialInstrument to keep track of all the instrument
metadata, and getSymbols to load the data into R when I need it (and
over the time-frames that I require).  We currently download tick data
for about 2500 tradeable instruments per day, and maintain archives
going back several years.  We have the .instrument environment stored on
the same file server as the data, and every .Rprofile in the firm points
to this so that everyone has access to getInstrument and getSymbols

I know someone who works in the hedge fund industry, mostly with monthly
data, with some daily data sprinkled in.  He uses the same approach I
have outlined of storing the metadata in FinancialInstrument, and
getSymbols to access the data.  He typically stores one consolidated CSV
file per instrument, because CSV files are easy to add on to with a
batch process.  

For lower frequency data (let's say daily or lower) a database is
certainly an option, and there are getSymbols wrappers that could be
adapted to whatever schema you decided to use. Obviously, there are tick
data database providers such as OneTick and kdb, and if you have this
problem and the resources to need this type of solution, you probably
already know that you are in this camp, and know that these providers
have R interfaces of varying quality.

The FinancialInstrument package has a 'parsers' directory included in
the 'inst' directory of the package with many examples of download and
parse routines for regular loading of data from a variety of free or
subscription providers.  This should give you a lot of material to begin
working with your own data providers.
We wrote and use FinancialInstrument for this purpose.

As I said earlier, I see no value in storing different periodicities,
and store only tick.

One of the reasons that I chose to write a getSymbols wrapper for
retrieving our tick data stores is that resources like this list have
extensive experience about using getSymbols, and it is therefore easy
for people at our firm to become familiar with using the data. 

Also, I am reasonably confident that as the indexing package matures,
there will be a getSymbols method for it as well, and if appropriate I
can easily convert all my data in one batch pass and it will be
transparent to my users.

I made what I now realize to have been a mistake at a previous firm in
writing a data retrieval function that was not compatible with
getSymbols which was more complex to teach people how to use it, and
less compatible with huge amounts of other publicly available code.

quantmod and FinancialInstrument contain examples of various getSymbols
methods that may meet your needs, or that could serve as templates for
your custom in-house data source.
Regards,

    - Brian
#
I do what Brian described, and I use a couple functions from
FinancialInstrument to do it.

library(FinancialInstrument)
?saveSymbols.days
?saveSymbols.common
?getSymbols.FI

(I just noticed that those 2 saveSymbols.* functions do not allow for a
data extension other
than the old .rda.  I will probably update that today.)

I put together a little example, which I'll attach as well as paste below.

This is how I do it, but I certainly encourage suggestions for improvement.

HTH,
Garrett
AAA
2011-10-31 09:04:00  0.05152989
2011-10-31 09:05:00  0.12797379
2011-10-31 09:06:00  0.96025183
2011-10-31 09:07:00 -0.23265907
2011-10-31 09:08:00  1.77706849
2011-10-31 09:09:00 -1.29139344
Open     High      Low    Close
2007-01-02 50.03978 50.11778 49.95041 50.11778
2007-01-03 50.23050 50.42188 50.23050 50.39767
2007-01-04 50.42096 50.42096 50.26414 50.33236
2007-01-05 50.37347 50.37347 50.22103 50.33459
2007-01-06 50.24433 50.24433 50.11121 50.18112
2007-01-07 50.13211 50.21561 49.99185 49.99185
from='2011-10-31')
[1] "AAA"
[1] "DDD"
AAA
2011-10-31 09:04:00  0.05152989
2011-10-31 09:05:00  0.12797379
2011-10-31 09:06:00  0.96025183
2011-10-31 09:07:00 -0.23265907
2011-10-31 09:08:00  1.77706849
2011-10-31 09:09:00 -1.29139344
Open     High      Low    Close
2007-01-02 50.03978 50.11778 49.95041 50.11778
2007-01-03 50.23050 50.42188 50.23050 50.39767
2007-01-04 50.42096 50.42096 50.26414 50.33236
2007-01-05 50.37347 50.37347 50.22103 50.33459
2007-01-06 50.24433 50.24433 50.11121 50.18112
2007-01-07 50.13211 50.21561 49.99185 49.99185
[1] "DDD"
split_method='days'))
[1] "AAA"
[1] "AAA"

        
On Mon, Nov 7, 2011 at 3:20 AM, Brian G. Peterson <brian at braverock.com>wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20111107/dc8ee49e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rdatasaving.R
Type: text/x-r-source
Size: 1208 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20111107/dc8ee49e/attachment.bin>