Skip to content
Prev 7805 / 21307 Next

[Bioc-devel] [bithelp] Memory issues with BiocParallel::SnowParam()

Hi Valerie,

I met with our cluster admin (Mark) today to clarify some things. He
will give a better reply about mem_free and h_vmem in our cluster. The
short answer is that we use both with the recommendation to set h_vmem
set 1 GB higher than mem_free. And I believe that we don't have
cgroups enabled. Also check Harris' off list message.


The data from the tests also show the difference between
MulticoreParam() and SnowParam(). Previously, I was told that
MulticoreParam() should be as low as SnowParam(), and Martin hinted
that the tools used for memory reporting in my cluster might be
misleading me (see https://support.bioconductor.org/p/62551/#62880).
So I ran the generic example test with 10 times more data but the same
number of cores and Mark inspected the system manually while the jobs
were running. In more detail, he ssh'ed into the nodes running the
jobs and looked at the output from "top" among other things. I put his
comments on the repo and commented them at
https://github.com/lcolladotor/SnowParam-memory/commit/714f8ef7bb9c2295e9f34cf137853426b6dccdd9

The logs and updated email info is available at the longerTest branch
https://github.com/lcolladotor/SnowParam-memory/tree/longerTest which
I'll likely merge onto the gh-pages branch unless you prefer to keep
them separate.


I've been thinking about your "Can you output the actual used?"
question, and the short answer is that the maximum vmem reported on
the emails is the maximum used memory, not reserved. This is what Mark
told me. However, we have a "qmem" bash script available in our
cluster which runs "qstat" and parses the output. One such line looks
like this:

6516139 lcollado node=060 vmem=4.4G, maxvmem=6.7G elapsed=01:23:14 serial-3.1.x

So we can see the memory under use at that moment, and the maximum
memory used so far.  I ran a 6th set of tests (now with both examples,
unlike the 5th one which is only the generic example) and recorded the
memory use. I recorded this information in 2 second intervals and made
a few plots which are available at
http://lcolladotor.github.io/SnowParam-memory/#Longer_tests
MulticoreParam() and SnowParam() use similar amounts of memory most of
the time, except for when a huge peak in the beginning. We can also
see the differences between R 3.1.x and 3.2.x.

Hopefully, this detailed info will answer your question and give you
more tools to work with.


Best,
Leo



On Wed, Jul 15, 2015 at 11:56 PM, Valerie Obenchain
<vobencha at fredhutch.org> wrote:
Ok!
memory used by the job.
Our cluster admin group has encouraged us to specify both mem_free and
h_vmem (normally just a bit higher than mem_free).
our cluster.

Mark told me that our cluster uses Open Grid Engine. In particular,
OGS/Grid Engine 2011.11.
=)