Keeping persistent data collections
I do what Brian described, and I use a couple functions from FinancialInstrument to do it. library(FinancialInstrument) ?saveSymbols.days ?saveSymbols.common ?getSymbols.FI (I just noticed that those 2 saveSymbols.* functions do not allow for a data extension other than the old .rda. I will probably update that today.) I put together a little example, which I'll attach as well as paste below. This is how I do it, but I certainly encourage suggestions for improvement. HTH, Garrett
library(FinancialInstrument)
# object with daily periodicity data(sample_matrix) DDD <- as.xts(sample_matrix)
#object with minute periodicity AAA <- xts(rnorm(1:10000), Sys.time()-(60*1:10000)) AAA <- align.time(AAA) colnames(AAA) <- "AAA"
# look at the objects we're going to store head(AAA)
AAA 2011-10-31 09:04:00 0.05152989 2011-10-31 09:05:00 0.12797379 2011-10-31 09:06:00 0.96025183 2011-10-31 09:07:00 -0.23265907 2011-10-31 09:08:00 1.77706849 2011-10-31 09:09:00 -1.29139344
head(DDD)
Open High Low Close 2007-01-02 50.03978 50.11778 49.95041 50.11778 2007-01-03 50.23050 50.42188 50.23050 50.39767 2007-01-04 50.42096 50.42096 50.26414 50.33236 2007-01-05 50.37347 50.37347 50.22103 50.33459 2007-01-06 50.24433 50.24433 50.11121 50.18112 2007-01-07 50.13211 50.21561 49.99185 49.99185
mydir <- getwd()
saveSymbols.days("AAA", base_dir=mydir)
saveSymbols.common("DDD", base_dir=mydir)
# now that they are on disk,
# remove them from workspace
rm("AAA", "DDD")
# get from disk
getSymbols("AAA", src='FI', dir=mydir, split_method='days',
from='2011-10-31') [1] "AAA"
getSymbols("DDD", src='FI', dir=mydir, split_method='common')
[1] "DDD"
head(AAA)
AAA 2011-10-31 09:04:00 0.05152989 2011-10-31 09:05:00 0.12797379 2011-10-31 09:06:00 0.96025183 2011-10-31 09:07:00 -0.23265907 2011-10-31 09:08:00 1.77706849 2011-10-31 09:09:00 -1.29139344
head(DDD)
Open High Low Close 2007-01-02 50.03978 50.11778 49.95041 50.11778 2007-01-03 50.23050 50.42188 50.23050 50.39767 2007-01-04 50.42096 50.42096 50.26414 50.33236 2007-01-05 50.37347 50.37347 50.22103 50.33459 2007-01-06 50.24433 50.24433 50.11121 50.18112 2007-01-07 50.13211 50.21561 49.99185 49.99185
#-------- # You can setSymbolLookup so that getSymbols will know where # to look. There are 2 ways to setSymbolLookup: explicitly, # or by setting the "src" field of an instrument. # explicitly setSymbolLookup(DDD=list(src='FI', dir=mydir, split_method='common'))
getSymbols("DDD")
[1] "DDD"
# by using the "src" field of an instrument
stock("AAA", currency("USD"), src=list(src='FI', dir=mydir,
split_method='days')) [1] "AAA"
getSymbols("AAA", from='2011-10-31')
[1] "AAA"
# cleanup
rm("AAA", "DDD")
unlink("AAA", recursive=TRUE)
unlink("DDD", recursive=TRUE)
On Mon, Nov 7, 2011 at 3:20 AM, Brian G. Peterson <brian at braverock.com>wrote:
On Sun, 2011-11-06 at 22:43 -0500, Dino Veritas wrote:
Hello, I recently found this list and have been reading deeply the archives. I am wondering how people here maintain their collections of
data
for easy use in R. I am wondering a few things: 1) How do members of this list deal with keeping persistent data collections with R? I was thinking of individual xts objects by asset and frequency (such as AAPL daily, AAPL minute, AAPL 60m, etc). While I can store and maintain these xts objects on disk and load them into R as needed, I am wondering if there is a more better solution.
I store only tick data, as I can easily get to any other frequency from tick. I've considered also storing daily data, but in the end I decide it is too much trouble to (additionally) manage, and just store tick.
2) Coming from that, I have been looking into the indexing package for my needs. It seems very useful for managing a lot of large data sets in memory, but I am not sure it is a good method for maintaining persistent data, I have found trouble adding information to existing data that is indexed on disk. Do poster here use indexing for this purpose? I did find an old post or two touching on that with no specifics. I would like to be able to combine the ability of indexing to have many large data sets available in memory with persistent storage of data. Has anyone any experience doing this?
You are correct that the 'indexing' package is very powerful. It is also not done yet. As I said, I store tick data. The way I do this is with single files per day of data per symbol, pre-parsed into xts objects and stored to disk in one directory per symbol (using 'save'). I then use FinancialInstrument to keep track of all the instrument metadata, and getSymbols to load the data into R when I need it (and over the time-frames that I require). We currently download tick data for about 2500 tradeable instruments per day, and maintain archives going back several years. We have the .instrument environment stored on the same file server as the data, and every .Rprofile in the firm points to this so that everyone has access to getInstrument and getSymbols I know someone who works in the hedge fund industry, mostly with monthly data, with some daily data sprinkled in. He uses the same approach I have outlined of storing the metadata in FinancialInstrument, and getSymbols to access the data. He typically stores one consolidated CSV file per instrument, because CSV files are easy to add on to with a batch process. For lower frequency data (let's say daily or lower) a database is certainly an option, and there are getSymbols wrappers that could be adapted to whatever schema you decided to use. Obviously, there are tick data database providers such as OneTick and kdb, and if you have this problem and the resources to need this type of solution, you probably already know that you are in this camp, and know that these providers have R interfaces of varying quality. The FinancialInstrument package has a 'parsers' directory included in the 'inst' directory of the package with many examples of download and parse routines for regular loading of data from a variety of free or subscription providers. This should give you a lot of material to begin working with your own data providers.
3) How do people keep track of all the data sets within R? Are there any useful packages for keeping track of multiple sets of financial data and the information about them?
We wrote and use FinancialInstrument for this purpose. As I said earlier, I see no value in storing different periodicities, and store only tick. One of the reasons that I chose to write a getSymbols wrapper for retrieving our tick data stores is that resources like this list have extensive experience about using getSymbols, and it is therefore easy for people at our firm to become familiar with using the data. Also, I am reasonably confident that as the indexing package matures, there will be a getSymbols method for it as well, and if appropriate I can easily convert all my data in one batch pass and it will be transparent to my users. I made what I now realize to have been a mistake at a previous firm in writing a data retrieval function that was not compatible with getSymbols which was more complex to teach people how to use it, and less compatible with huge amounts of other publicly available code. quantmod and FinancialInstrument contain examples of various getSymbols methods that may meet your needs, or that could serve as templates for your custom in-house data source.
4) Any other pointers? I know many here are well versed and manage large data sets with R. Any tips you have or even simply showing me in a
helpful
direction to useful packages you use is great. This list is a great help for me and I am still browsing old threads!
Regards, - Brian -- Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock
_______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20111107/dc8ee49e/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: Rdatasaving.R Type: text/x-r-source Size: 1208 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20111107/dc8ee49e/attachment.bin>