Skip to content

Memory issues in R

6 messages · Neotropical bat risk assessments, David Winsemius, Stefan Grosse +3 more

#
How do people deal with R and memory issues?
   I have tried using gc() to see how much memory is used at each step.
   Scanned Crawley R-Book and all other R books I have available and the FAQ
   on-line but no help really found.
   Running WinXP Pro (32 bit) with 4 GB RAM.
   One SATA drive pair is in RAID 0 configuration with 10000 MB allocated as
   virtual memory.
   I do have another machine set up with Ubuntu but it only has 2 GB RAM and
   have not been able to get R installed on that system.
   I can run smaller sample data sets w/o problems and everything plots as
   needed.
   However I need to review large data sets.
   Using latest R version 2.9.0 (2009-04-17)
   My  data is in CSV format with a header row and is a big data set with
   1,200,240 rows!
   E.g. below:
   Dur,TBC,Fmax,Fmin,Fmean,Fc,S1,Sc,
   9.81,0,28.78,24.54,26.49,25.81,48.84,14.78,
   4.79,1838.47,37.21,29.41,31.76,29.52,241.77,62.83,
   4.21,5.42,28.99,26.23,27.53,27.4,76.03,11.44,
   10.69,193.48,30.53,25.4,27.69,25.4,-208.19,26.05,
   15.5,248.18,30.77,24.32,26.57,24.92,-202.76,18.64,
   14.85,217.47,31.25,24.62,26.93,25.56,-88.4,10.32,
   11.86,158.01,33.61,25.24,27.66,25.32,83.32,17.62,
   14.05,229.74,30.65,24.24,26.76,25.24,61.87,14.06,
   8.71,264.02,31.01,25.72,27.56,25.72,253.18,19.2,
   3.91,10.3,25.32,24.02,24.55,24.02,-71.67,16.83,
   16.11,242.21,29.85,24.02,26.07,24.62,79.45,19.11,
   16.81,246.48,28.57,23.05,25.46,23.81,-179.82,15.95,
   16.93,255.09,28.78,23.19,25.75,24.1,-112.21,16.38,
   5.12,107.16,32,29.41,30.46,29.41,134.45,20.88,
   16.7,150.49,27.97,22.92,24.91,23.95,42.96,16.81
   .... etc
   I am getting the following warning/error message:
   Error: cannot allocate vector of size 228.9 Mb
   Complete listing from R console below:
   > library(batcalls)
   Loading required package: ggplot2
   Loading required package: proto
   Loading required package: grid
   Loading required package: reshape
   Loading required package: plyr
   Attaching package: 'ggplot2'
           The following object(s) are masked from package:grid :
            nullGrob
   > gc()
            used (Mb) gc trigger (Mb) max used (Mb)
   Ncells 186251  5.0     407500 10.9   350000  9.4
   Vcells  98245  0.8     786432  6.0   358194  2.8
   > BR <- read.csv ("C:/R-Stats/Bat calls/Reduced bats.csv")
   > gc()
             used (Mb) gc trigger  (Mb) max used  (Mb)
   Ncells  188034  5.1     667722  17.9   378266  10.2
   Vcells 9733249 74.3   20547202 156.8 20535538 156.7
   > attach(BR)
   > library(ggplot2)
   > library(MASS)
   > library(batcalls)
   > BRC<-kde2d(Sc,Fc)
   Error: cannot allocate vector of size 228.9 Mb
   > gc()
              used  (Mb) gc trigger  (Mb)  max used  (Mb)
   Ncells   198547   5.4     667722  17.9    378266  10.2
   Vcells 19339695 147.6  106768803 814.6 124960863 953.4
   >
   Tnx for any insight,
   Bruce
#
On Apr 26, 2009, at 11:20 AM, Neotropical bat risk assessments wrote:

            
They should read the R-FAQ and the Windows FAQ as you say you have.

http://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-uses_0021
On the basis of my Windows experience this may not be enough  
information. (The drive information is fairly irrelevant.)
The R-Win-FAQ suggests:

?Memory
?memory.size    # "for information about memory usage. The limit can  
be raised by calling memory.limit "

Although you read the FAQs,  have you zeroed in on the relevant  
sections? What does memory.size report? And what happens when you run  
R "alone" in WinXP and alter the default settings with memory.limit?
It's long, but not particularly wide. Last year I was getting  
satisfactory work done on a 990K by 50-60 column dataset in a memory  
constraint of 4GB on a different OS. Your constraint is in the 2.5-  
3.0 GB area but your dataframe is only a third of the size.
So you got the data into memory. That does not appear to exceed the  
capacity of your hardware setup, if you address the options offered  
above.
Looks like you need to use memory.limit(<some bigger number>)
#
On Sun, 26 Apr 2009 09:20:12 -0600 Neotropical bat risk assessments
<neotropical.bats at gmail.com> wrote:
NBRA> 
NBRA>    How do people deal with R and memory issues?
NBRA>    I have tried using gc() to see how much memory is used at each
NBRA> step. Scanned Crawley R-Book and all other R books I have
NBRA> available and the FAQ on-line but no help really found.
NBRA>    Running WinXP Pro (32 bit) with 4 GB RAM.

There is a limit on windows, read the FAQ:
http://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-uses_0021

So either you use a (64bit) Linux with enough memory or you use
packages or a SQL solution that is able to deal with huge datasets.
(biglm for example)

Stefan
#
Neotropical bat risk assessments wrote:
Maybe not the general solution you're looking for, but would you get
reasonable results by either (1) subsampling data or (2) reading the
data file in chunks and averaging the kernel densities you get from
each chunk?
1 day later
#
Others may have mentioned this, but you might try loading your data  
in a small database like mysql and then pulling smaller portions of  
your data in via a package like RMySQL or RODBC.

One approach might be to split the data file into smaller pieces  
outside of R, then read the smaller pieces into R one at a time,  
subsequently creating aggregations (counts and sums of your data  
fields).  From these aggregations you can create an "aggregated"  
dataset that is smaller and more pithy that you ultimately may graph  
with ggplot2 or other libraries of your choice.

-Avram
On Apr 26, 2009, at 8:20 AM, Neotropical bat risk assessments wrote:

            
#
If by "review" you mean read in summary information then sqldf
can do that using the sqlite database in two lines of code.
You don't have to install, set up or define the database at all.  sqldf and the
underlying RSQLite will do all that for you.  See example
6b on the home page:

http://code.google.com/p/sqldf/#Example_6._File_Input

On Sun, Apr 26, 2009 at 11:20 AM, Neotropical bat risk assessments
<neotropical.bats at gmail.com> wrote: