Skip to content

cannot allocate vector of size in merge (PR#765)

6 messages · viktorm@pdf.com, Saikat DebRoy, Viktor Moravetski +3 more

#
Full_Name: Viktor Moravetski
Version: Version 1.2.0 (2000-12-13)
OS: Win-NT 4.0 SP5
Submission from: (NULL) (209.128.81.199)


I've started R (v.1.20) with command:
rgui --vsize 450M --nsize 40M

Then at the command prompt:
used (Mb) gc trigger (Mb)
Ncells  358534  9.6   41943040 1120
Vcells 3469306 26.5   58982400  450
Error: vector memory exhausted (limit reached?)


In S-Plus it worked fine, no problems. 
It looks like that "R" cannot merge dataframes with 
more than 30K rows. It has enough memory, so what limit 
was reached and what should I do?

Thanks.



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
viktorm> Full_Name: Viktor Moravetski
  viktorm> Version: Version 1.2.0 (2000-12-13)
  viktorm> OS: Win-NT 4.0 SP5
  viktorm> Submission from: (NULL) (209.128.81.199)


  viktorm> I've started R (v.1.20) with command:
  viktorm> rgui --vsize 450M --nsize 40M

  viktorm> Then at the command prompt:
  >> gc()
  viktorm>           used (Mb) gc trigger (Mb)
  viktorm> Ncells  358534  9.6   41943040 1120
  viktorm> Vcells 3469306 26.5   58982400  450

  >> df <- data.frame(x=1:30000,y=2,z=3)
  >> merge(df,df)
  viktorm> Error: vector memory exhausted (limit reached?)

Do you really need such a large number of Ncells ? I think not. Try
starting R without specifying --nsize (and maybe even --vsize). In
1.2.0, R would  automatically allocate more memory if the intial value
is not enough. But (as far as I know) it would not decrease the amount
of memory below the initial amount.

  viktorm> In S-Plus it worked fine, no problems. 
  viktorm> It looks like that "R" cannot merge dataframes with 
  viktorm> more than 30K rows. It has enough memory, so what limit 
  viktorm> was reached and what should I do?
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Hi Saikat,
Yes, I don't need to specify nsize and vsize for version 1.2.0. 
But error is the same, even if start R with default parameters. 
It grows in memory and then gives an error. 
See output below:
used (Mb) gc trigger (Mb)
Ncells  358534  9.6     597831 16.0
Vcells 3330352 25.5    3736477 28.6
Error: cannot allocate vector of size 3515625 Kb

        
Saikat DebRoy wrote:
--
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 14 Dec 2000, Viktor Moravetski wrote:

            
^^^^^^^^^^!!

This is the problem. A merge of n rows takes n^2 space, because each row
of the first data frame is compared to  each row of the second.  You are
trying to allocate 3.5Gb, which is almost certainly more memory than you
have. This is (30000^2)*4

	-thomas
Thomas Lumley
Assistant Professor, Biostatistics
University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Viktor Moravetski <viktorm@pdf.com> writes:
I think you're just experiencing the fact that the merge() function in
R is not implemented very efficiently (to say the least). For many
practical purposes, this can be worked around using e.g. match().

Contributions are welcome...
#
On Fri, 15 Dec 2000 viktorm@pdf.com wrote:

            
Um.  Whats `v.1.20' and where did you get that from?  In particular how did
you compile it and which run-time are you using? You clearly have not read
the documentation on the command-line flags for version 1.2.0, or even the
top item in NEWS.
How do you know it has enough memory: it has just told you it has not?
I think you are using an unreleased version and not reading the
documentation on the changes.  The CHANGES file says

  New command-line option --max-mem-size to set the maximum memory
  allocation: it has a minimum allowed value of 10M.  This is intended
  to catch attempts to allocate excessive amounts of memory which may
  cause other processes to run out of resources.  The default is the
  smaller of the amount of physical RAM in the machine and 256Mb.

and NEWS says (first item)

    o   There is a new memory management system using a generational
        garbage collector.  This improves performance, sometimes
        marginally but sometimes by double or more.  The workspace is
        no longer statically sized and both the vector heap and the
        number of nodes can grow as needed.  (They can shrink again,
        but never below the initially allocated sizes.)  See ?Memory
        for a longer description, including the new command-line
        options to manage the settings.


Beyond that, R's merge uses a flexible but memory-intensive algorithm. If
you want to do merges on this scale we recommend that you use one of the
RDBMS interfaces to a tool optimized for the job.


I really don't understand why you filed a bug report on your lack of
reading of documentation: please see the section on reporting bugs in the
FAQ.