Skip to content

xlsReadWrite Pro and embedding objects and files in Excel worksheets

16 messages · James W. MacDonald, Hin-Tak Leung, Mark Kimpel +5 more

#
Hans-Peter and other R developers,

How are you? Have you made any progess with embedding Url's in Excel?

Well, I have been busy thinking of more things for you to do;)

My colleagues in the lab are not R literate, and some are barely 
computer literate, so I give them everything in Excel workbooks. I have 
gradually evolved a system such that these workbooks have become 
compendia of my data, output, and methods. That, in fact, is why I 
bought the Pro version of xlsReadWritePro. I have been saving graphics 
as PDF files, then inserting them as object in Excel sheets.

What I would like to be able to do is to embed objects (files) in sheets 
of a workbook directly from within R. I would also like to be able to 
save my current R workspace as an object embedded in a sheet so that in 
the future, if packages change, I could go back and recreate the 
analysis. I do not need to be able to manuipulate files that R has not 
created, like a PDF file from another user. I would, however, like to be 
able to save my graphics as PDF files inside a worksheet, even if it 
meant creating a  temp file or something.

Before people begin talking about how MySQL or some other database could 
handle all that archiving, let me say that that is not what my 
colleagues want. They want a nice Excel file that they can take home on 
there laptops. One thing I like about worksheets is that they themselves 
can contain many embedded files, so it keeps our virtual desks neater 
and less confusing.

Hans, if you could do this, it would be of tremendous benefit to me and 
hopefully a lot of people. R developers tend to think that all 
scientists are running Linux on 64-bit computers, but most biomedical 
researches still store date in Excel files. This won't solve everybody's 
needs, but it could be a start.

Well, let me know what you think. I am cc'ing R-devel to see if any of 
those guys have ideas as well.

Thanks,
Mark
#
I don't know of any native xls read/write facility in R, either
in core or as add-ons (I could be wrong), but if you want some source 
code to scavenge on to build some R package out of it, there are two
perl modules, Spreadsheet::ParseExcel and Spreadsheet::WriteExcel
which are small enough to "read from front cover to back cover",
so to speak, might be useful for reference and steal code from.

The other open-source packages which can read/write excel files
are gnumeric and openoffice and probably too big to find one's way 
around the source code to steal there :-).

Good luck.

HTL
Mark W Kimpel wrote:
#
Have you looked at RDCOMClient? I would imagine you could do what you 
want with this package.

http://www.omegahat.org/RDCOMClient/

Best,

Jim
Hin-Tak Leung wrote:

  
    
#
James W. MacDonald wrote:
Interesting point. But the dcom client would be windows-specific?
(those I mentioned - the perl mondules, openoffice, run on windows,
as well as unix boxes - not sure about gnumeric :-).

In fact there is a very perverse way of doing it - ActiveState provides
a PerlCom product for hooking up dcom with activestate perl. i.e. you 
can go via R -> RDComClient -> PerlCom -> ActiveState Perl -> 
SpreadSheet::* . Just so that it does not require Excel installed
or an MS Office license...

In that sense, probably technology based on bridging over odbc
is also acceptable?

HTL
#
Thanks for the useful suggestions. I am not as CS savvy as some of you, 
so maybe Hans-Peter could reply? I haven't checked into it, but does his 
  package write to files that are OpenOffice compliant? Would that 
satisfy more users?

Mark
Hin-Tak Leung wrote:

  
    
#
Its not entirely clear to me what it is that you are looking
for.  Maybe you want to create an Excel spreadsheet with a hyperlink
to a web page?  This R code will do that.  It requires a Windows machine that
has Excel running on it.


library(RDCOMClient)
xl <- COMCreate("Excel.Application")
xl[["Visible"]] <- TRUE
wkbk <- xl$Workbooks()$Add()

sh <- xl$ActiveSheet()

B2R <- sh$Range("B3")
B2R[["Formula"]] <- '=HYPERLINK("http://www.r-project.org")'

wkbk$SaveAs("\\test-url.xls")
xl$Quit()
On 2/8/07, Mark W Kimpel <mwkimpel at gmail.com> wrote:
#
I meant that the machine has Excel on it.  Excel does not have to be running
prior to running the R code as the R code will start up and shut
down Excel itself.
On 2/8/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
#
The gdata etc. packages (I cannot remember which of the g* packages  
it is) contains a read.xls function which reads an excel file based  
on a PERL script. I have used it for small stuff and for that it  
worked fine. I don't think they contain a write module though.

Kasper
On Feb 8, 2007, at 3:16 AM, Hin-Tak Leung wrote:

            
#
Another option for creating XLS files it to write out HTML instead. Excel can
read html files just fine, and a useful trick is giving the html file a .xls
extension. So, from the user's point of view, it is an excel file even
though it's just an html file. 

Using html works great for embedding links and formatted tables. You can use
the R2HTML package to generate HTML files, including formatting for a large
number of R objects.

One thing you can't do with this approach is include graphics. In theory you
could do that by extending this approach. Excel can read in *.mhtml files,
which are multipart mime-encoded bundles that include the html file plus
mime-encoded graphics files that go with it. You could generate png files in
R to include. Excel will also happily read in mhtml files with a .xls
extension. The following links could help you get started:

http://en.wikipedia.org/wiki/MHTML
http://finzi.psych.upenn.edu/R/library/caTools/html/base64.html

I don't know of an R package that has a function to encode files as a
multipart mime, but the link above is a good start.

- Tom

Tom Short
EPRI
Mark W Kimpel wrote:

  
    
#
On Thursday 08 February 2007 2:09 pm, tshort wrote:

            
Tclib has mime encoding module one could use it from within R with 
.Tcl("package require tcllib")

                    best

                        Vladimir Dergachev
#
Gabor,

What I want is a bit more than hyperlinks, although I did ask the 
package developer about that to. My idea is, from within R, place things 
like pdf files and .Rdata directly into an Excel spreadsheet. As a 
practical matter, if I can create a report with some data that someone 
else can manipulate as a "regular" spreadsheet (ex. sort gene lists) and 
then have other sheets that contain pdf output files of graphs I do 
within R. I would also like to archive my R workspace at time of 
analysis so that I could, if I had to, the analysis again. As I and 
others are constantly tweaking what functions do, it is sometimes 
impossible for me to go back and figure out what versions of what 
functions I was using. sessionInfo won't do what I want.

Since Hans-Peter came up with his really nice package, I thought I would 
throw this out as an idea. I have been doing this manually for some time 
  and my boss likes it because he only has to get one file from me, not 
10. I include worksheets with the values of parameters passed to 
functions, abbreviations, etc. Then 5 months from now and he wants me to 
explain the sheet to him, everything is in one place.

In a way, I want to treat an Excel spreadsheet as a list (the workbook) 
that can contain different kinds of objects (spreadsheets, pdfs, Rdata, 
ex.). The Excel file acts as a binder for these different files. My boss 
doesn't even want to deal with zipped files because when they are 
unzipped he ends up with tons of files.

I know this might not make a lot of sense to UNIX users who mostly 
interact with other programmers, but for those of us who deal with the 
computer-barely-literate biologists who run Windows, it could be a nice 
way of keeping things together.

BTW, I only mention Excel and Windows because that is what I use. I 
think it would be great to come up with a common format that Linux, Mac, 
and UNIX users could use. Could openOffice serve that purpose?

Thanks for your input.

Mark
Gabor Grothendieck wrote:

  
    
#
If Excel has the capability to do it then by controlling Excel from R
using RDCOMClient or rcom packages you can do it (in Windows).
For example, the code below creates a plot in R and then creates an Excel
spreadsheet and inserts it.

Get up to speed on VBA and then use the Macro recorder in
Excel while you do it manually and look at the macro source that it
generates to find out what VBA commands it uses for a particular task.

plot(1:10)
savePlot("c:\\myplot", "wmf")

library(RDCOMClient)
xl <- COMCreate("Excel.Application")
xl[["Visible"]] <- TRUE
wkbk <- xl$Workbooks()$Add()

sh <- xl$ActiveSheet()

sh$Pictures()$Insert("C:\\myplot.wmf")

wkbk$SaveAs("\\test-pic.xls")
xl$Quit()
On 2/9/07, Mark W Kimpel <mwkimpel at gmail.com> wrote:
2 days later
#
Hi Mark,

Sorry to not reply earlier, I was away this week.
Yes, but I started with the update of the free version and got delayed
there (not that I didn't know that paying customers should be
prefered...) - I'll update the pro version right now and you should
have it until Wednesday, maybe tomorrow. Another person asked to write
formulas directly, it will be included also.
Good idea.

Regarding your post about your needs (report generation, container, ...):

I think the suggestion from Gabor with controlling Excel from within R
is an excellent way. xlsReadWritePro could be an option which has some
advantages and some disadvantages.

ActiveX/RDCOMClient:
- you can do everything that Excel supports
- this comes at a price: the interface from R is a bit technical. (the
suggestion with VBA and Macro recorder is a good one imho. Reading
about the Excel Object Model may also help)
- it's more free (what a statement for an excel-based dependency...)
- dependencies (installed Excel, RDCOMClient (probably a non-issue at
your situation))
- more Excelversions are supported

xlsReadWritePro:
- you can only do what is currently implemented
- the interface (should) let you program on a higher level and tries
to shield you from technical details. One of the goals was to give an
easy to use, well tested and well documented interface that feels
R-ish.
- no dependencies (Perl/DCOM/Excel/Java)
- it is native, i.e. works directly on the file.
- (at least potentially) it could be ported to Linux
- currently only Excel v97 - 2003. Excel 2007 is planned to follow
(~end of 07, no promises).

It depends on your situation. ActiveX was not an option for us as we
needed to work on the plain file. Otherwise we have pratically the
same requirements a you (just with other data).

I could potentially implement almost the whole Excel object model
functionality within xlsReadWritePro. But to be honest, it is costly
and I don't think many people would need that.

We basically implemented in the pro version what we needed ourself
internally. Upon request I try to add features if they fit well in the
existing interface and if I have time (or if it is payed for) but I
cannot give any promises.

Best regards,
Hans-Peter
#
Hans-Peter,

Welcome back, I hope you had a good time away :)

I got what I thought were some answers that misinterpreted what my 
intent was, so let me ask again and try to be more clear.

I would like to be able to use a single Excel spreadsheet as an archive 
for any output I generate in a single R session, including pdf files of 
graphics and possibly the R history or even the R workspace itself. I 
would envision writing the files to be inserted first to a temp file and 
then inserting them into Excel using whatever commands (Visual Basic?) 
that would do that. I know it can be done from the menu, so I am pretty 
sure that it can be done with VB. I am unsure, however, exactly how you 
are generating the Excel files. For my own edification, are you using VB 
or something similar?

Also, to make this not so Windows specific, would these files be 
compatible with openOffice or some other open-source spreadsheet program 
that would be compatible with the other OS's that R users employ? That 
might make it more broadly appealing.

I know that efforts have been made to generate R-compendia that include 
code and data. I think that is great for archiving analyses for the use 
of other R users, but what I am seeking would be much more friendly to 
my end users, which are biologists and psychologists. They don't want 
zip archives that generate a bunch of files, they want just one file 
with everything neat and tidy.

For example, one file with these sheets:

1. .Rworkspace generated at time the last sheet was written. This would 
be akin to an R-compendia but mostly designed for archiving, not 
data-abstraction.
2. methods page. I am working on an auto-generating methods page that 
could be copy and pasted into a paper with minimal editing. I am even 
including references that vary depending on the p.adjust method I use.
3. parameters that are passed to major functions like filters. I list 
what filter I used and how many probesets remained after the filter was 
applied.
4. graphs
5. matrix output.

I am interested in what others think of this idea in general. I am not 
just trying to get something for myself. If someone has a better idea of 
how to package analyses with graphics for the end-user, I'd love to hear it.

I would also be interested in feedback from other developers as to what 
they think of my general idea. Is it worth pursuing? Would it be worthy 
of a simple package?

Thanks for your help and hard-work Hans-Peter and I look forward to 
hearing how things are going,

Mark
Hans-Peter wrote:

  
    
#
Hi Mark,
What we do is:
- assemble all generated files (xls/png/txt) in a specific *folder*
- the pictures currently end in zip files
- Excel currently only holds the data (no images etc.)
- we don't save any workspace or .Rdata. Each calculation starts from
scratch (sometimes with cached data, but this is transparent)
- we have some packages with the most important/stable source code
- other code we source each time before a calculation
- all code is in a subversion repository and therefore has it's full history.

This works very nice for us. It was not so easy, i.e. we needed some
time to find a lean and flexible setup. R still feels difficult for
me.
I use Delphi (Object Pascal). The hardwork is done by a 3rd party
library which I bought (Flexcel/tmssoftware.com). You can download the
source of the free xlsReadWrite to see how it was done. (To compile
you would have to buy the flexcel library which - unfortunately - is
not open source and I am not allowed to distribute it.)
No, its pure Excel format. - IIRC there is a ODF Toolkit Project which
could be used to generate the files (maybe odfSweave does this
already?). The ODF Toolkit Project is certainly appealing, not least
because of the license situation.
About (small) original data, methods, parameters and matrix output I
agree.   Not so sure about graphs. I prefere to keep things separated.

You can do this already today by controlling Excel from R (see tipps
from Gabor) or (in part) with xlsReadWritePro.
2 days later
#
Hi Mark et al.

2007/2/12, Hans-Peter <gchappi at gmail.com>:
I have now uploaded the new version 1.1.0 which runs all my tests
fine. It will be publicly linked at the website as soon as the free
version is also ready and after some more testing and beautifying of
the help text.

Downloads:
- Program: http://treetron.googlepages.com/xlsReadWritePro_1.1.0.zip
- Testscripts (e.g. formula and link):
http://treetron.googlepages.com/xlsReadWritePro_TestData_1.1.0.zip
- Update-msg pro:
http://treetron.googlepages.com/UpdateMsg_xlsReadWritePro_1.1.0.txt
- Update-msg free version:
http://treetron.googlepages.com/UpdateMsg_xlsReadWrite_1.3.0.txt
(draft, I have to finish some details)

Regards,
Hans-Peter


PS: I bcc the email to some people who made suggestions. Unfortunately
the update of the free version took longer as planned. I have to be
more prudent with giving time indications...