I?ve got an Excel workbook with about 30 worksheets. Each worksheet
has 10000 rows of data over 30 columns.
I?d like to read the data from each worksheet into a dataframe or
matrix in R for processing. Normally, I use read.csv when interacting
with Excel but I?d rather manipulate a multisheet workbook directly
than set about splitting the original workbook and saving down each
part as a csv.
So far, I?ve tried using read.xlsx from the xlsx package. This works
fine for small test files ? e.g. suppose I?m trying to read from the
test_file workbook on my desktop. The following code extracts rows 1
and 2 from worksheet = ?johnny?.
setwd("C:\\Documents and Settings\\dmenezes\\Desktop")
info<-
read.xlsx("test_file.xlsx",sheetName="johnny",rowIndex=1:2,header=FALSE)
info
However, when I try to apply this to my real, large workbook, things
go wrong, with the following error message. Any ideas/workarounds?
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl, :
java.lang.OutOfMemoryError: Java heap space
Read in from multiple Excel wksheets
4 messages · dM/, Jonathan Daily, Ben Bolker +1 more
Java uses "heap space" when creating new objects. My guess is that since the default size is 128 Mb iirc, you are reading in an object larger than this. I don't know the guts of the xlsx package or if there is a way to increase the heap, but you may get by if you can divide up your data import. HTH, Jon
On Tue, Jun 14, 2011 at 12:06 PM, dM/ <david.n.menezes at gmail.com> wrote:
I?ve got an Excel workbook with about 30 worksheets. ?Each worksheet
has 10000 rows of data over 30 columns.
I?d like to read the data from each worksheet into a dataframe or
matrix in R for processing. ?Normally, I use read.csv when interacting
with Excel but I?d rather manipulate a multisheet workbook directly
than set about splitting the original workbook and saving down each
part as a csv.
So far, I?ve tried using read.xlsx from the xlsx package. ?This works
fine for small test files ? e.g. suppose I?m trying to read from the
test_file workbook on my desktop. ?The following code extracts rows 1
and 2 from worksheet = ?johnny?.
setwd("C:\\Documents and Settings\\dmenezes\\Desktop")
info<-
read.xlsx("test_file.xlsx",sheetName="johnny",rowIndex=1:2,header=FALSE)
info
However, when I try to apply this to my real, large workbook, things
go wrong, with the following error message. ?Any ideas/workarounds?
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl, ?:
?java.lang.OutOfMemoryError: Java heap space
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
=============================================== Jon Daily Technician =============================================== #!/usr/bin/env outside # It's great, trust me.
dM/ <david.n.menezes <at> gmail.com> writes:
I?ve got an Excel workbook with about 30 worksheets. Each worksheet has 10000 rows of data over 30 columns. I?d like to read the data from each worksheet into a dataframe or matrix in R for processing. Normally, I use read.csv when interacting with Excel but I?d rather manipulate a multisheet workbook directly than set about splitting the original workbook and saving down each part as a csv.
[snip]
However, when I try to apply this to my real, large workbook, things go wrong, with the following error message. Any ideas/workarounds?
Try read.xls from the gdata package? Or see http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows&s=excel for more suggestions on I/O between R and Excel.
On Tue, Jun 14, 2011 at 12:06 PM, dM/ <david.n.menezes at gmail.com> wrote:
I?ve got an Excel workbook with about 30 worksheets. ?Each worksheet
has 10000 rows of data over 30 columns.
I?d like to read the data from each worksheet into a dataframe or
matrix in R for processing. ?Normally, I use read.csv when interacting
with Excel but I?d rather manipulate a multisheet workbook directly
than set about splitting the original workbook and saving down each
part as a csv.
So far, I?ve tried using read.xlsx from the xlsx package. ?This works
fine for small test files ? e.g. suppose I?m trying to read from the
test_file workbook on my desktop. ?The following code extracts rows 1
and 2 from worksheet = ?johnny?.
setwd("C:\\Documents and Settings\\dmenezes\\Desktop")
info<-
read.xlsx("test_file.xlsx",sheetName="johnny",rowIndex=1:2,header=FALSE)
info
However, when I try to apply this to my real, large workbook, things
go wrong, with the following error message. ?Any ideas/workarounds?
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl, ?:
?java.lang.OutOfMemoryError: Java heap space
Try increasing the heap size. Restart R and: options(java.parameters="-Xmx1024m") library(...)
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com