Skip to content

RJDBC to OpenOffice Calc as RODBC to MS Excel

4 messages · Metz, Thomas (IRRI), Brian Ripley, Marc Schwartz +1 more

#
Under Windows, I have used RODBC to connect to Excel spreadsheets as per
the example below: 

library(RODBC);
connect = odbcConnectExcel("testdata.xls");
query = "SELECT [data$.ethn], [data$.sex], [data$.age], 
                [data$.height], [data$.weight], 
                [label$.label]
         FROM [data$], [label$] 
         WHERE [data$.ethn] = [label$.ethn];"
data = sqlQuery(connect, query);
odbcClose(connect);

[data$] and [label$] are two named sheets in the Excel spreadsheet
testdata.xls. [.ethn], [.sex], [.age], [.height], [.weight], and
[.label] are cloumn names that appear in the first row in the sheets. I
can also have UNION queries that allow me to overcome the spreadsheet
row limitation of a single sheet. The idea is to allow normalization of
data in a spreadsheet and leveraging the power of SQL, without using a
database. 

Can the same be done under Windows (Linux?) with OpenOffice Calc using
RJDBC? Are there ODBC drivers for OpenOffice Calc? 

I know that the right solution would be to use a database, but this is
outside the comfort zone of many users who rely mainly on spreadsheets
to collect, manipulate and analyze their data.

Thomas Metz 
International Rice Research Institute
Philippines
#
On Fri, 14 Dec 2007, Metz, Thomas (IRRI) wrote:

            
An awful lot of that is Microsoft warts on SQL, so it will not be 
portable.  But in a more standard syntax (drop the [] and $) it should be 
doable over any connection that supports SQL queries.

The question is whether OO calc has suitable drivers as an ODBC/JDBC 
server.  Not a question for this list!  (I suspect the answer is no: 
Microsoft's drivers effectively use the Access engine to work with 
spreadsheet files and even plain text.  I don't even see drivers for OO 
base.)

  
    
#
On Fri, 2007-12-14 at 07:31 +0000, Prof Brian Ripley wrote:
There are none to my knowledge.  Most of OO.org's ODBC/JDBC integration
is one-way. That is, it can connect within the suite and to external
sources, but does not appear to provide connectivity to enable external
applications to acquire data stored within OO.org's apps.

Base is an embedded version of HSQLDB (http://www.hsqldb.org), which is
a java based application. In theory, it would support a JDBC interface,
but I have seen none and when the subject comes up on the OO.org lists,
no solutions are forthcoming. It's too bad, at least within this
context, that the OO.org folks elected to use HSQLDB rather than SQLite,
which was the the other option under consideration. Perhaps Sun's
influence, vis-a-vis Java, won the day here.

Bearing in mind that OO.org's Write and Calc documents are just 'zipped'
XML files, it would be possible to parse the data stored within such
documents. I suspect Max Kuhn has spent much time on this for odfWeave.

There are Perl modules that can provide a level of interaction here. For
example, OpenOffice::Parse::SXC
(http://search.cpan.org/~dclee/OpenOffice-Parse-SXC-0.03/SXC.pm)
provides the means to parse a Calc file directly, without needing the
OO.org API.

One could wrap that module in an R function via a system() call and then
interact with a Calc file directly.

HTH,

Marc Schwartz
#
On Dec 14, 2007 8:28 AM, Marc Schwartz <marc_schwartz at comcast.net> wrote:
I haven't played much with Calc, but it if the Calc file has formulas
or anything other than raw data, this would probably be a mess to
parse.

The first thing that I would try is to use command line tools (like
ooconvert or jodconverter) to convert the file to csv and then read it
in. These tools were written for Write documents, but they may also be
good at converting other types of od* documents.

Good luck,

Max