Skip to content
Prev 1685 / 2152 Next

Unreproducable crashes of R-instances on cluster running Torque

On Mon, May 13, 2013 at 4:08 AM, Rainer M. Krug <Rainer at krugs.de> wrote:
Torque is a batch system.  The underlying OS (typically linux) is
responsible for memory management.
Some clusters do have something in place to try to do this, but it is
not a simple task to implement well since Torque is not really
"responsible" for memory management once a job is running.
You will probably need to talk to your cluster admins, but on our
cluster, I simply login to a node and run "top".  Other clusters have
dedicated monitoring tools.  Finally, some clusters have configured a
job postscript that reports on job resource usage.  All of these
issues are best dealt with by talking with cluster administrators
since each cluster (even those running torque) are unique in some
ways.

Sean