general status query on DBI etc
Vincent J. Carey, Jr. wrote:
Here's my understanding of the R-RDBMS interface situation, after poking around CRAN and the SIG page/mailing list. 1) DBI is available on CRAN but it is not clear if any compliant drivers are on hand; of particular interest are postgres and oracle drivers
Correct. As of today no driver is compliant.
2) Rdbi is available on sourceforge; Rdbi.PgSQL is also available there and the latter provides the postgres driver for the Rdbi (not DBI) API. Tim Keitt indicates that all the C code that would be necessary for a postgres driver for DBI is present in Rdbi.PgSQL, but DBI-compliant R code needs to be written 3) ROracle is available on CRAN, and 0.3-3 DESCRIPTION identifies it as transitional; the next version will satisfy the DBI API.
Yes, those plans are current.
4) RODBC is at CRAN devel (not on the main package distribution page) and the RS-DBI.pdf document suggests that unix ODBC is inadequately developed. Based on my limited web searching, ODBC does not seem a viable approach for unix at present.
Hmm, that was written sometime ago. The unixODBC project has been making quite a bit of progress, and the driver manager seems to be getting quite good (although I haven't tested it too much). My main concern back then was the lack of ODBC drivers. Currently the open source DBMS (at least PostsgreSQL and MySQL) seem to provide quite decent ODBC drivers. The availability of free/open source drivers on Linux/Unix/MacOS for Oracle, MS SQL server, and others is still an issue, AFAIK; (there are good commercial drivers for Unix, though). My feeling is that an R-ODBC interface is critical -- certainly on Windows, but also on Unix and probably on Mac (I'm not sure how Mac deals w. DBMSs).
Note: Bioconductor has made substantial use of RPgSQL for databasing genomic annotation data. The fact that RPgSQL has been abandoned by its maintainer is a source of concern. We are starting to strategize on the problem of storing and analyzing large quantities of expression data in RDBMS and we look to the DB-SIG for guidance on resources related to this problem. Questions 1) How far off is the DBI-compliant ROracle? Are there risks that code developed using the transitional version will require substantial reworking when the new version emerges?
The current ROracle (same for RMySQL and, of less interest, RSQLite) is fairly close to the DBI. All the functionality in the DBI is available in these other packages. From the user's point of view, the only difference is the function names (close(con) vs dbDisconnect(con), etc.). These do not require very extensive work to have them compliant with the DBI. At the programming level the issue is also straightforward -- the current implementations are done using S3 style classes and probably should be migrated to S4 classes. (Somewhat ironically, both the Oracle and MySQL interfaces were originally implemented in S4, ported backwards to S3 style classes for R compatibility and now they can finally be fully implemented with S4 classes.) In terms of programming, I think we're talking about a week's effort, or less.
2) Is anyone working on the postgres driver for DBI? Apparently most of the C code is available.
When implementing a driver, the C portion is probably where the most work is required. I'm not very familiar with this code, but perhaps moving to the DBI wouldn't be too difficult. Last december I wrote DBI interfaces on top of both the existing RPgSQL and RODBC, but I thought (and Tim agreed with me) that the resulting layering of S4 on top of S3 classes wasn't ideal, so these DBI.RpSQL and DBI.RODBC packages were not made public.
3) RS-DBI.pdf suggests a number of alternative architectures (e.g., ODBC, JDBC). Is the slow emergence of drivers for DBI ascribable to uncertainty about the long-term viability of the DBI approach? Has RSJava matured to the point where one might prefer a JDBC-centered approach?
Re: ODBC, see my comments above. Re: R/JDBC, I'm not sure -- I have little experience with Java, so perhaps others can comment. The slow emergence of drivers for the DBI, in my opinion, is the lack of volunteers. Even when we were drafting the DBI the participation was not exactly overwhelming, as you probably noticed when you looked at the r-sig-db archives. But I feel that it is important to have a common interface to DBMS, one reason being to abstract out (at least as far as R is concerned) the details of getting your data into your analysis. Thus in the long run we should be deciding what DBMS to use based on their merits and not on whether the R API to DBMS "A" is well thought out but not for "B" (asumming they provide more or less similar functionality). Moreover, I think that Perl's DBI/DBD, Java's JDBC, ODBC, and Python's DB-API have proved the viability of this approach. Of course, any flaws in the R/S DBI should be fixed -- but that's just implementation:-)
Thanks -- --- Vince Carey, PhD Ass't Prof Med (Biostatistics) Harvard Medical School Channing Laboratory - ph 6175252265 fa 6177311541 cell 8572126768 181 Longwood Ave Boston MA 02115 USA stvjc at channing.harvard.edu
_______________________________________________ R-sig-DB mailing list -- R Special Interest Group R-sig-DB at stat.math.ethz.ch http://www.stat.math.ethz.ch/mailman/listinfo/r-sig-db
David A. James Statistics Research, Room 2C-253 Phone: (908) 582-3082 Bell Labs, Lucent Technologies Fax: (908) 582-3340 Murray Hill, NJ 09794-0636