Skip to content
Prev 1672 / 2152 Next

Unreproducable crashes of R-instances on cluster running Torque

Dear Sean,
thanks for your suggestions in spite of my obscure descriptions. I'll try  
to clarify some points:
I get things like
	Error: cannot allocate vector of size 304.6 Mb
However, the jobs are started with the Torque option
	#PBS -l mem=3gb
When I submit this job alone, everything works like a charm, so 3 gb seem  
to suffice, right? With 20 or more jobs, I get the memory message. I  
assumed Torque would only start a job if the ressources are available, is  
that a misconception?
I cannot tell more precisely; the admin just told me he had to reboot this  
node. Before that, the entire queue-handling of Torque seemed to have come  
to a halt.
If that means being frugal, removing unused objects and preallocation of  
matrices I've tried my best. Adding some calls to gc() seemed to improve  
the situation only slightly.
For memory issue, the message above is thrown. For other jobs, the process  
just terminates without any more output just after having read some large  
input files.
I agree that this is unlikely an R memory leak, however, I am trying to  
find out what I can still do from my side or if I can point the admin at  
some Torque configurations problems, which is what I suspect.
Has anyone observed similar behaviour and knows a fix?

Thanks in advance,
Till


R version 2.12.1 (2010-12-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] graphics  grDevices datasets  stats     utils     methods   base

other attached packages:
[1] Rmpi_0.5-9