I ran some tests on my Linux machine, and I was able to
undo a cpuset restriction from within an R session using:
> system(sprintf('taskset -p 0xffffffff %d', Sys.getpid()))
I was also able to start an unrestricted R session from a
restricted shell session using:
$ taskset 0xffffffff R
or
$ numactl -C 0-3 R
I have no idea how your R sessions are becoming restricted, so
I have no idea if this will work, but it's worth a try.
- Steve
On Thu, Nov 24, 2011 at 8:32 AM, Claudia Beleites
<claudia.beleites at ipht-jena.de> wrote:
This is really a shot in the dark, but you could try executing:
clusterEvalQ(cl, readLines(sprintf('/proc/%d/status', Sys.getpid())))
and look for the lines that mention "Cpus_allowed". It's
conceivable that your snow workers have been restricted to
execute on a subset of the node's cores. But that seems
rather unlikely since you're using a socket cluster.
It's getting even more weird - I don't seem to be able to reproduce the
behaviour ...
I was able (twice, but not reproducibly) to get it working as I want:
I opened terminal (xfce4-term) on my desktop and log into the server with
ssh -X -C claudia at 172.17.42.86
and start an R session there,
all worked well: 2 workers using 6 cores each for the multiplication.
proc status output:
cat (readLines(sprintf('/proc/%d/status', Sys.getpid())), sep = "\n")
Name: R
State: R (running)
SleepAVG: 98%
Tgid: 31571
Pid: 31571
PPid: 7664
TracerPid: 0
Uid: 508 508 508 508
Gid: 509 509 509 509
FDSize: 256
Groups: 509
VmPeak: 560904 kB
VmSize: 560904 kB
VmLck: 0 kB
VmHWM: 276500 kB
VmRSS: 276496 kB
VmData: 405528 kB
VmStk: 140 kB
VmExe: 2856 kB
VmLib: 18248 kB
VmPTE: 916 kB
StaBrk: 19587000 kB
Brk: 22682000 kB
StaStk: 7fffabd50130 kB
Threads: 6
SigQ: 1/79872
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000180001e4a
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
Cpus_allowed:
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00ffffff
Mems_allowed: 00000000,00000001
For the other processes,
either via
* terminal -> ssh -> R
* terminal -> emacs -> ess -> R
* terminal -> ssh -> xfce4-panel -> terminal -> R
* terminal -> ssh -> xfce4-panel -> emacs -> ess -> R
I sometimes get 2 cores working in parallel as shown by the cpu graph
applet, sometimes the applet indicates only one core and the snow timing
plot indicates both workers worked at the same time, but took twice as long
as system.time of the matrix multiplication
The proc status is different for those:
cat (readLines(sprintf('/proc/%d/status', Sys.getpid())), sep = "\n")
Name: R
State: R (running)
SleepAVG: 98%
Tgid: 2983
Pid: 2983
PPid: 2956
TracerPid: 0
Uid: 508 508 508 508
Gid: 509 509 509 509
FDSize: 256
Groups: 509
VmPeak: 560768 kB
VmSize: 545148 kB
VmLck: 0 kB
VmHWM: 342960 kB
VmRSS: 327336 kB
VmData: 389768 kB
VmStk: 144 kB
VmExe: 2856 kB
VmLib: 18248 kB
VmPTE: 1012 kB
StaBrk: 121ab000 kB
Brk: 1a342000 kB
StaStk: 7fffa9d4aa10 kB
Threads: 6
SigQ: 2/79872
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000004
SigCgt: 0000000180001e4a
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
Cpus_allowed:
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed: 00000000,00000001
So the last part of cpus_allowed is 00000001 instead of 00ffffff.
What exactly does that tell me? The man page was not particularly
enlightning...
How can I change that restriction?
Thanks a lot for your help,
Claudia
PS: I have to leave soon for a seminar over the weekend, so I won't be able
to try out things again before Monday.
- Steve
On Wed, Nov 23, 2011 at 11:09 AM, Claudia Beleites
<claudia.beleites at ipht-jena.de> wrote:
You don't say how you set GOTO_NUM_THREADS to 6.
sorry, I forgot to tell you all:
I used
export GOTO_NUM_THREADS=6
in the shell before starting R.
and I did check by
clusterEvalQ(cl, system ("echo $GOTO_NUM_THREADS"))
which gave me 6 for both workers.
so does:
clusterEvalQ(cl, Sys.getenv('GOTO_NUM_THREADS'))
[[1]]
[1] "6"
[[2]]
[1] "6"
I did not know the Sys.getenv/Sys.setenv functions, though.
Thanks,
Claudia
You
might want to verify that it did get set in each of the snow worker
processes by using the command:
clusterEvalQ(cl, Sys.getenv('GOTO_NUM_THREADS'))
If it returns any empty strings in the resulting list, then the
environment variable is not set in the corresponding worker.
You probably should set this variable through an appropriate
shell startup file, but you could at least temporarily use:
clusterEvalQ(cl, Sys.setenv(GOTO_NUM_THREADS=6))
- Steve
On Wed, Nov 23, 2011 at 9:44 AM, Claudia Beleites
<claudia.beleites at ipht-jena.de> wrote:
Dear all,
I'm just doing my first steps with parallelized calculations and got
quite
confused.
Here's what I want, what I have and what I did:
- I want to parallelize calculations on a Centos server with 2 x 6
cores
and
8 GB RAM (it is actually part of a cluster, but I have access only to
this
node, and the other nodes do not (yet) have R installed).
- My Data is too large to work with in one piece.
But it comes in separate files of suitable size: I can work nicely with
2
to
3 samples in memory at the same time.
- So my idea was to start up a snow socket cluster with 2 or 3 workers.
- In addition I want to use an optimized and blas. Linear algebra is
only
a
small part of the analysis so it does make sense to have the socket
cluster
with as many workers as possible and have the linear algebra parts use
up
to
n / nworkers cores.
So I built R 2.14.0 using gotoblas2 and set $GOTO_NUM_THREADS to 6.
Matrix
multiplication in a fresh R session now is much faster and CPU usage
shows
the expected 6 cores working:
system.time ({m<- matrix (1:9e6, 3e3); m%*%m; NULL})
User System verstrichen
5.219 0.126 1.111
However, the socket clusters seem not to use the GOTO_NUM_THREADS:
library (snow)
cl<- makeCluster(2,type="SOCK")
tm<- snow.time(clusterEvalQ(cl, {m<- matrix (1:9e6, 3e3); m%*%m;
NULL}))
tm
elapsed send receive node 1 node 2
9.553 0.001 0.010 9.510 9.543
[[1]]
send_start send_end recv_start recv_end exec
[1,] 0 0.001 9.511 9.512 9.51
[[2]]
send_start send_end recv_start recv_end exec
[1,] 0.001 0.001 9.544 9.553 9.543
CPU usage shows 2 cores working, and the times correspond to that.
What configuration do I need to do in order to make the blas use more
threads for the worker processes? Anything else I should do
differently?
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] snow_0.3-8
Additional questions:
- Is there some command like sessionInfo () that yields information
about
the blas (particularly NUM_THREADS)?
- Is there some command that I can use to tell the blas how many
threads
to
use during an R session? Can I set environment variables from within R?
Searching didn't help as I got only info about R environments...) Would
that
actually help here?
Thanks a lot for your help.
Claudia
--
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany
email: claudia.beleites at ipht-jena.de
phone: +49 3641 206-133
fax: +49 2641 206-399