Calls to Boost interprocess / big.matrix

Ritchie,

It sounds like you have already tested the code on an Ubuntu cluster and
see the types of behavior/behaviour you expect: faster runtimes with
increasing number of cores, etc... (as opposed to what you are seeing on
the RedHat cluster)?

However: foreach with doMC can leverage shared memory are designed for
single nodes of a cluster (as you probably know, doSNOW would be more
elegant for distributing jobs on a cluster, but may not always be
possible).  A memory-mapped file provides a means of "sharing" a single
object across nodes, and is kind of like "poor man's shared memory".  It
sounds like you are using a job submission system to distribute the work,
and then foreach/doMC within nodes.  This is fine and will work with
bigmemory/foreach/doMC.

But be careful in your testing to consider both performance using cores on
a single node versus performance on a cluster with multiple nodes.

However, here's some speculation: it may have to do with the filesystem.
In early testing, we tried the "newest and greatest" high-performance
parallel filesystem on one of our clusters, and I don't even remember the
specific details.  Performances plummeted.  The reason was that the mmap
driver implemented for the filesystem was obsessed with maintaining
coherency.  Imagine: one node does some work and changes something, that
change needs to be reflected in the memory-mapped file as well as then up
in RAM on other machines that have cached that element in RAM.  It's pretty
darn important (and a reason to consider a locking strategy via package
synchronicity if you run concurrency risks in your algorithm).  In any
event, we think that the OS was checking coherency even upon _reads_ and
not just _writes_.  Huge traffic jams and extra work.

The help solve the puzzle, we used an old-school NFS partition on the same
machine, and were back up to full-speed in no time.  You might give that a
try if possible.

Jay