Skip to content

excel files and R

8 messages · Simon Fear, Brian Ripley, (Ted Harding) +5 more

#
Many many thanks. I did look at this but I have absolutely no idea of
the
background so got completely lost. Can you recommend a gentle
introduction/overview to this area, based on the assumption that my
current
knowledge equals zero? Indeed, could you make an argument that I should
ever
*want* to run R from within Excel or vice versa? I think I just want to
get
the data from Excel ('cos that's how it nearly always comes), but I
don't
want to process it in Excel, when I have R ...

I guess all that I and apparently others really want is that "foreign"
might
include read.excel, like it has read.sas and read.spss. Which is
essentially
what Bernhard Pfaff's recent post offers - thanks again Bernhard - but
using
RODBC instead of foreign.


-----Original Message-----
From: Erich Neuwirth [mailto:erich.neuwirth at univie.ac.at]
Sent: 25 June 2003 10:42
To: Morrison, Gordon; r-help at stat.math.ethz.ch
Subject: Re: [R] excel files and R

And there is Thomas Baier's and my RCOM package
which would allow to run R from within Excel
or exchange data between Excel and R with R
as the main interface.
Morrison, Gordon wrote:
better
Temple
Excel.

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 644449
Fax: +44 (0) 1379 644445
email: Simon.Fear at synequanon.com
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}
#
On Wed, 25 Jun 2003, Simon Fear wrote:

            
The Excel .xls format is poorly documented, probably deliberately
obfuscated.  A direct interface is on the TODO list: it should be quite
easy in Windows and possible in other OSes (there is code in Gnumeric, for
example).  In any case, this is a lot harder than the interfaces currently 
in foreign.
#
On 25-Jun-03 Simon Fear wrote:
In that case there is a very simple solution (asuming you have access
to Excel).

Open the Excel file in Excel, and save it out as a comma-separated file
(.csv). You may need to clean this up a bit (depending on how sloppily
it was created -- a lot of people create very messy Excel files), but
usually you don't. (For the same reason, it can be best to do the export
to CSV yourself, rather than asking the sender to send you a CSV file,
unless you trust their competence.)

Then, in R, do something like

  X <- read.csv("excelfile.csv")

You will then have a dataframe X (with variables named as in the column
names in the Excel file).

This is what I always do when I get Excel files; it has always worked.

See ?read.csv for options.

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 25-Jun-03                                       Time: 12:20:54
------------------------------ XFMail ------------------------------
#
On Wed, 25 Jun 2003, Simon Fear wrote:

            
It would be nice, but it's quite hard to read Excel off Windows.

The formats in foreign are either documented by the vendor (accurately in
the case of Stata and Epi Info, with some omissions for SAS XPORT) or that
have been reverse-engineered by someone else (read.spss is based on
PSPP, an attempt at an SPSS clone by Ben Pfaaf, and I think Duncan
Murdoch did read.S).


While it isn't usual to say nice things about commercial vendors on these
lists I would like to note that Stata not only documents its file format
in its manuals (with some helpful C snippets for the trickier parts), but
made available the file format for their `large data set' version 7/SE,
which I didn't buy.


	-thomas
#
On Wed, 2003-06-25 at 09:20, Thomas Lumley wrote:
Simon,

To add to Thomas' comments and respond to your thoughts, if one were so
inclined, given that R is a volunteer effort, I suspect that an addition
to 'foreign' for Excel would indeed be appreciated by many users.

One resource, with appropriate attribution given, would be the source
code for OpenOffice.org's (OOo) Calc. Since Calc can read and write
Excel formats without using Windows/Office DLL's, it seems reasonable to
presume that OOo has reverse engineered the native Excel file structure.
Since OOo's source is available under the GPL, this could provide the
basis for a "read.excel" function.

Yet another would be Gnumeric, which like Calc is GPL'd and can read and
write native Excel file formats.

More information is available at:

http://www.openoffice.org/dev_docs/source/1.0.3/source.html

http://www.gnome.org/projects/gnumeric/


Food for thought...   :-)

Regards,

Marc Schwartz
#
On Wed, 25 Jun 2003 11:51:44 +0100 (BST), you wrote in message
<Pine.LNX.4.44.0306251149030.2122-100000 at gannet.stats>:
The OpenOffice web site has reasonably complete documentation on this
web page:  <http://sc.openoffice.org>.  It skips some of the more
obscure features, but those probably wouldn't be of interest to R
either.

It looks to me as though it would be relatively easy to write (in R,
using the streams code) a reader that could read strings and numbers.
Interpreting formulas would be a lot harder.  (It's possible the last
value of a formula is stored in the .xls file, in which case it would
be easily read too.)

One thing I'm not sure about:  recent .xls versions store strings in
Unicode.  Does R have cross-platform Unicode support?

Duncan
#
On Wed, Jun 25, 2003 at 10:02:07AM -0500, Marc Schwartz wrote:
The Gretl econometrics package (http://gretl.sf.net) also has a small
library for reading Excel files (in the file plugin/excel_import.c), its
code goes back to

/*
  Based on xls2csv (David Rysdam, 1998), as distributed in the 
  "catdoc" package by Vitus Wagner, with help from the Gnumeric
  excel plugin by Michael Meeks.
 */
      
I have meant to muck with this for some time now, but this is a very low
priority item for me and other things keep popping up.

Dirk
Fan
#
For loading Excel data and many others file formats,
one possibility is to use the free conversion utility: DataLoad.

See: http://www.vsn-intl.com/genstat/downloads/datald.htm
(there're probably also other links)

It should be easy to create R wrappers to use that utility.

Cheers
--
Fan
Marc Schwartz wrote: