An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20111104/dd65bda0/attachment.pl>
DBI solution
3 messages · Ben Nachtrieb, Gabor Grothendieck, Paul Gilbert
On Fri, Nov 4, 2011 at 7:08 PM, Ben Nachtrieb <ben.nachtrieb at gmail.com> wrote:
Hello, I'm building (from the ground up) a PostgreSQL time series database for use with R. I'd like to get some preliminary guidance (help me get pointed in the correct direction). Here are some details: I am PC/Windows centric. I have data of all frequencies equal to or greater than Daily (no inter-day data). We have prices and technical data, macro data, company descriptive, and financial data. I don't have to do any 'DBA' stuff per-say (no messy stuff like split adjustments, ticker changes, etc.) as that is done for me; however, I will have to adjust for look-ahead, create new factor values, transform, etc. the data that we have and store it in database form. Can someone point me to the best solution/packages for this given that I have to stay in the R and PostgreSQL world? I see RpgSQL, RposgreSQL, TSPostgreSQL, DBI, TSdbi, and much more... I am hoping someone can narrow things down for me. Thanks so much!
RpgSQL and RPostgreSQL are DBI-based drivers for R and the others are higher level packages. I can only address my own package but RpgSQL was developed for use with sqldf. It uses RJDBC (which uses JDBC and Java) and DBI. Feedback is that RpgSQL is easier to install on Windows than alternatives. The use of RJDBC/JDBC may slow it down relative to alternatives although speed may not be material if you are just importing a bunch of time series and then working with them in R as opposed to constantly going back to the database. sqldf is a package that lets you use R data frames with several database back-ends including RpgSQL. You would not use sqldf with your own database but might use it with R data frames to play around with PostgreSQL. Resources are - the Installation info in this link: http://cran.r-project.org/web/packages/RpgSQL/index.html - ?pgSQL help page (see example at end of page) - http://sqldf.googlecode.com most of which is about sqldf and sqlite although much of it applies and there is a bit about using PostgreSQL with sqldf.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
5 days later
Ben (Possibly too late to be preliminary guidance, but ...) TSdbi is the base package in a group of packages, including TSPostgreSQL, which try to provide a common interface (API) to time series databases. That is, you specify the connection, and after that all of your R code syntax can be the same, and does not depend on the specifics of the underlying mechanism. These packages are almost all wrappers for other packages (eg RPostgreSQL), so the main benefits to using them rather than the underlying packages are that they provide a common interface, and a mechanism for returning a specified time series representation. (For example, the fame package returns tis series, but TSfame handles conversion and allows the possibility of returning other representations like zoo series.) The SQL variants (TSPostgreSQL, TSMySQL, TSSQLite, TSodbc, and untested TSOracle) include table structure definitions for the database. The database does not need to be built with R, but the table structure needs to be respected for the TS* SQL variants to work. The non-SQL variants (TSfame, TSxls, TSgetSymbol, TShistQuote, ...) pull data from other sources, mostly the web except for TSfame. With the SQL tables I believe series of daily frequency and lower are handled fairly well. (I work mostly with monthly and quarterly data, but also use daily and weekly data.) In theory, tick data (time stamped series) are also handled, but I have never work with that kind of data, so it is not well tested. The tables also provide a mechanism for storing meta data descriptions of series, so you can store company descriptions, but there is no SQL structure within the description. That is, you could not do a SQL query to select certain types of businesses based on the description. (This would probably not be too difficult to implement, but it is not in the structure provided.) There is a not very well tested mechanism for handling series name changes (by an alias). I believe this could be used for ticker changes, but I'm not sure. Being an economist rather than a financial person, I'm not exactly sure what you mean by "adjust for look-ahead, create new factor values, transform, etc." The database stores the time series data, but these things sound like the sort of thing I would do in R rather than in the database. If you already have a backend SQL database, and are just building the interface not building the database, then the TSdbi package has a function TSquery that may be useful. I use this to construct time series from a relational SQL database that was built for purposes other than storing time series. I do not work much in Windows, but the TS* packages should work without problem, as long as the underlying packages work in Windows. Thus, you need the PostgreSQL drivers to install RPostgreSQL, and then everything should work. HTH, Paul
-----Original Message----- From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance- bounces at r-project.org] On Behalf Of Ben Nachtrieb Sent: November 4, 2011 7:08 PM To: r-sig-finance at r-project.org Subject: [R-SIG-Finance] DBI solution Hello, I'm building (from the ground up) a PostgreSQL time series database for use with R. I'd like to get some preliminary guidance (help me get pointed in the correct direction). Here are some details: I am PC/Windows centric. I have data of all frequencies equal to or greater than Daily (no inter-day data). We have prices and technical data, macro data, company descriptive, and financial data. I don't have to do any 'DBA' stuff per-say (no messy stuff like split adjustments, ticker changes, etc.) as that is done for me; however, I will have to adjust for look-ahead, create new factor values, transform, etc. the data that we have and store it in database form. Can someone point me to the best solution/packages for this given that I have to stay in the R and PostgreSQL world? I see RpgSQL, RposgreSQL, TSPostgreSQL, DBI, TSdbi, and much more... I am hoping someone can narrow things down for me. Thanks so much! -- Ben [[alternative HTML version deleted]]
_______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
==================================================================================== La version fran?aise suit le texte anglais. ------------------------------------------------------------------------------------ This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. ------------------------------------------------------------------------------------ Le pr?sent courriel peut contenir de l'information privil?gi?e ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires d?sign?s est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer imm?diatement et envoyer sans d?lai ? l'exp?diteur un message ?lectronique pour l'aviser que vous avez ?limin? de votre ordinateur toute copie du courriel re?u.