Defragmentation of memory
On Mon, 5 Sep 2016, M?ns Magnusson wrote:
Dear all developers, I'm working with a lot of textual data in R and need to handle this batch by batch. The problem is that I read in batches of 10 000 documents and do some calculations that results in objects that consume quite some memory (calculate unigrams, 2-grams and 3-grams). In every iteration a new objects (~ 500 mB) is created (and I can't control the size, so a new object needs to be created each iteration). The speed of this computations is decreasing every iteration (first iteration 7 sec, after 30 iterations 20-30 minutes per iteration). I (think) I localized the problem to R:s memory handling and that my approach is fragmenting the memory. If I do this batch handling in Bash and starting up a new R session for each batch it takes ~ 7 sec per batch, so it is nothing with the individual batches. The garbage collector do not seem to handle this (potential) fragmentation. Can the reason of the poor performance after a couple of iterations be that I'm fragmenting the memory? If so, is there a solution that can used to handle this within R, such as defragmentation or restarting R from within R?
Highly unlikely. Fragmentation is rarely an issue on a 64-bit OS and the symptoms would be different. To get help with what is actually happening please post a minimal reproducible example, and please not in html. Best, luke
With kind regards M?ns Magnusson PhD Student, Statistics, Link?ping University. [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu