Skip to content

Read in from multiple Excel wksheets

4 messages · dM/, Jonathan Daily, Ben Bolker +1 more

dM/
#
I?ve got an Excel workbook with about 30 worksheets.  Each worksheet
has 10000 rows of data over 30 columns.

I?d like to read the data from each worksheet into a dataframe or
matrix in R for processing.  Normally, I use read.csv when interacting
with Excel but I?d rather manipulate a multisheet workbook directly
than set about splitting the original workbook and saving down each
part as a csv.

So far, I?ve tried using read.xlsx from the xlsx package.  This works
fine for small test files ? e.g. suppose I?m trying to read from the
test_file workbook on my desktop.  The following code extracts rows 1
and 2 from worksheet = ?johnny?.

setwd("C:\\Documents and Settings\\dmenezes\\Desktop")
info<-
read.xlsx("test_file.xlsx",sheetName="johnny",rowIndex=1:2,header=FALSE)
info

However, when I try to apply this to my real, large workbook, things
go wrong, with the following error message.  Any ideas/workarounds?

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl,  :
  java.lang.OutOfMemoryError: Java heap space
#
Java uses "heap space" when creating new objects. My guess is that
since the default size is 128 Mb iirc, you are reading in an object
larger than this. I don't know the guts of the xlsx package or if
there is a way to increase the heap, but you may get by if you can
divide up your data import.

HTH,
Jon
On Tue, Jun 14, 2011 at 12:06 PM, dM/ <david.n.menezes at gmail.com> wrote:

  
    
#
dM/ <david.n.menezes <at> gmail.com> writes:
[snip]
Try read.xls from the gdata package?
  Or see http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows&s=excel
for more suggestions on I/O between R and Excel.
#
On Tue, Jun 14, 2011 at 12:06 PM, dM/ <david.n.menezes at gmail.com> wrote:
Try increasing the heap size.  Restart R and:

options(java.parameters="-Xmx1024m")
library(...)