Skip to content

Determining the maximum memory usage of a function

2 messages · Jonathan Greenberg, Ramon Diaz-Uriarte

#
Folks:

I apologize for the cross-posting between r-help and r-sig-hpc, but I
figured this question was relevant to both lists.  I'm writing a
function to be applied to an input dataset that will be broken up into
chunks for memory management reasons and for parallel execution.  I am
trying to determine, for a given function, what the *maximum* memory
usage during its execution is (which may not be the beginning or the
end of the function, but somewhere in the middle), so I can "plan" for
the chunk size (e.g. have a table of chunk size vs. max memory usage).

Is there a trick for determining this?

--j

--
Jonathan A. Greenberg, PhD
Assistant Professor
Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 217-300-1924
http://www.geog.illinois.edu/~jgrn/
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
#
Dear Jonathan,

You mention parallel execution, so I assume that you want to find out the
max memory consumed by the sum of all of your R processes. I guess one
option would be to call gc() at the end of each of your processes. gc()
reports the max used. So return those values, and add them. I am not sure
this will really work well if you use forking, though: I think you can
overestimate, by a large margin, the real memory consumed as many pages
might be shared.


So I do that from the shell.  Since summing memory used by different
processes without double counting (i.e., properly accounting for shared
memory pages, dynamically loaded code, etc) is not easy to do, I do it the
other way around. I assume (well, I try to ensure that) no other processes
will start running/consuming a lot of memory (or, conversely, no other
processes will quit, freeing a lot of memory) while the ones I am
benchmarking run. So I just keep track periodically of the total amount of
free memory (as reported by free, and adding buffers and cached) and, at
the end, subtract max and min (one might as well subtract the minimal
amount of free memory from the initial amount of free memory).



So there are two main pieces:

a) Launching an infinite loop that periodically calls "free" and stores
that somewhere (in this case, the file "free.RAM.txt").  Something like

while true; do free | grep 'buffers/cache' | awk '{print $4}' >> free.RAM.txt; sleep 0.5; done &


b) Killing that as soon as your R process is done, and getting the max,
the min, and the difference (or the min, the initial, and the difference,
which should be fairly similar). I do that from the shell too, calling
R. For instance:


Rscript --vanilla -e 'tmp <- scan("free.RAM.txt"); usage <- (max(tmp) - min(tmp))/(1024^2); cat(usage)')



This is a shell script that I use that puts together the above, and takes
as input the name of the R script for which I want to measure memory usage
(and produces some output at the end).



#############################

RBIN=~/mysources/R-3.0.1-B/bin/R ## wherever the R you are timing lives

SCRIPT=$1
POST=$(date +"%H-%M_%m-%d-%Y")

rm free.RAM.txt

## Decrease the value after sleep if you are concerned about
## missing a peak

while true; do free | grep 'buffers/cache' | awk '{print $4}' >> free.RAM.txt; sleep 0.5; done &
FREE_RAM_PID=$!

$RBIN --vanilla < $SCRIPT > $SCRIPT.$POST.Rout 

kill $FREE_RAM_PID

TOT_RAM_USAGE=$(Rscript --vanilla -e 'tmp <- scan("free.RAM.txt"); usage <- (max(tmp) - min(tmp))/(1024^2); cat(usage)')
mv free.RAM.txt free.RAM.txt.$SCRIPT.$POST

echo
echo
echo $TOT_RAM_USAGE

echo "Total RAM usage = " $TOT_RAM_USAGE >> $SCRIPT.$POST.summary

#########################

Best,

R.


P.S. I've removed r-help from the addresses to avoid cross-posting.
On Thu, 20 Jun 2013 09:45:39 -0500,Jonathan Greenberg <jgrn at illinois.edu> wrote: