Skip to content

Parallel loop

10 messages · Scott Raynaud, rsparapa at mcw.edu, Simon Urbanek +4 more

#
I'm trying to convert a loop in a simulation program to a parallel
process using multicore.? It looks like this:

simus=1000?
set.seed(12345)
for(iter in 1:simus){
?
#bunch of code setting up random draws from various 
#distributions for X matrix?
?
#Computaion of Y
?
(fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2))
?
estbeta<-fixef(fitmodel)
?sdebeta<-sqrt(diag(vcov(fitmodel)))
? for(l in 1:betasize)
? {? 
?? cibeta<-estbeta[l]-sgnbeta[l]*z1score*sdebeta[l]
??? if(beta[l]*cibeta>0)????????????? powaprox[[l]]<-powaprox[[l]]+1
????? sdepower[l,iter]<-as.numeric(sdebeta[l])
? }? 
##------------------------------------------------------------------------##
????????????????????????? } ##? iteration end here
?
?
The variable fitmodel is defined elsewhere and consists of
fixed and random parts.? Betasize is the length of the vector of
fixed effects.? Looks like near the bottom there's a 
counter that increments if the slope estimate doesn't trap 0.? 
The next line records the standard deviation.? It's these two items
that I need to recover.
?
So I'm thinking that I'm going to recode this way to run on
a machine that has 4-Xeon CPUs:
?
library(multicore)
options(cores=4)
simus=1000
sim.base.fun<-function(iter){? #replaces for(iter in 1:simus){

#bunch of code setting up random draws from various 
#distributions for X matrix?
?
#Computaion of Y
?
(fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2))
?
estbeta<-fixef(fitmodel)
?sdebeta<-sqrt(diag(vcov(fitmodel)))
? for(l in 1:betasize)
? {? 
?? cibeta<-estbeta[l]-sgnbeta[l]*z1score*sdebeta[l]
??? if(beta[l]*cibeta>0)?????
??? return(powaprox[[1]]<-powaprox[[1]]+1 #replaces powaprox[[l]]<-powaprox[[l]]+1
??? retunr(sedpower[1,iter]<-as.numeric(sdebeta[1]) #replaces sdepower[l,iter]<-as.numeric(sdebeta[l])
?????}
????????????????????????????????????? }#end of sim.base.fun
sim.fun<_lapply(iter in 1:simus),sim.base.fun)
?
?
Do I need a collect statement?? How do I kill the worker
processes?? One thing I'm not accounting for is the random
number generating process.? I've seen a couple of ways of
doing this, one using mclapply and another using rlecuyer,
but I'm not sure how to make them work.?? Suggestions?
Is there a way to replicate the same random numbers used 
in the conventional loop in the parallel loop?
?
My machine has 4-Intel? Xeon? Processor X5650 CPUs.
When I type:
?
library(multicore)
multicore:::detectCores()
?
it returns 4.? However, the Intel cutsheet says each of these proceesors
has 6 cores.? How can I acces them?? I know there's a point of diminishing
returns but I can't figure out what it is unless I test it.
#
Scott,
On Mar 6, 2012, at 4:19 PM, Scott Raynaud wrote:

            
No, the above won't work (all typos aside) since you are still using assignments. Inherently the above is not parallelizable as-is because of
powaprox[[l]]<-powaprox[[l]]+1
but that is trivially removed. Also the inner for loop is entirely superfluous.

Since your code was incomplete this is just a suggestion as we can't test anything (I'm not even checking if what you're doing makes any sense), but it should give you an idea of what to do:

sim.base <- function(iter) { 
  ## your unstated code goes here ...
  fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2)
  estbeta <- fixef(fitmodel)
  sdebeta <- sqrt(diag(vcov(fitmodel)))
  list( powf = estbeta-sgnbeta*z1score*sdebeta*beta > 0,
        sdepower = sdebeta)
  )
}

res <- lapply(seq.int(simus), sim.base)
powf <- sapply(res, function(x) x$powf)
sdepower <- sapply(res, function(x) x$sdepower)
powapprox <- apply(powf, 1, sum)


To parallelize, replace lapply above with mclapply.
No, if you use mclapply it's all done for you.
No, they go away once they're done computing.
There is but you may need read up on that - you have to use a generator that produces a streaming sequence such that you can skip forward (also see recent discussion here and R-devel). For the above you may get away without it by simply setting different seeds in each iteration.
You can set any number of cores you want using mc.cores=... argument - the detected cores are just a default if you don't specify anything.

What OS/distro is this? It is very unusual to see the number of cores reported wrongly by the system ...

Cheers,
Simon
1 day later
#
Thanks for the feedback.? Part of my problem is that I need the most
recent copy of R.? My IS team told me they can only get an earlier
copy using apt-get, but I think there must be a way so I've tasked them
with figuring it out.? 

My?OS is Kubuntu. I just thought it odd that it could only detect the 
number of CPUs rather than the number of cores.? I'm completely
new to parallel processing but it seems that something is not right
in the core detection.

One quetion I still have regrds child processes. I understand they 
finish on their own, but what if I need to kill those processing because 
of an obvious problem.? How can I do that?

----- Original Message -----
From: Simon Urbanek <simon.urbanek at r-project.org>
To: Scott Raynaud <scott.raynaud at yahoo.com>
Cc: "r-sig-hpc at r-project.org" <r-sig-hpc at r-project.org>
Sent: Tuesday, March 6, 2012 8:27 PM
Subject: Re: [R-sig-hpc] Parallel loop

Scott,
On Mar 6, 2012, at 4:19 PM, Scott Raynaud wrote:

            
No, the above won't work (all typos aside) since you are still using assignments. Inherently the above is not parallelizable as-is because of
powaprox[[l]]<-powaprox[[l]]+1
but that is trivially removed. Also the inner for loop is entirely superfluous.

Since your code was incomplete this is just a suggestion as we can't test anything (I'm not even checking if what you're doing makes any sense), but it should give you an idea of what to do:

sim.base <- function(iter) { 
? ## your unstated code goes here ...
? fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2)
? estbeta <- fixef(fitmodel)
? sdebeta <- sqrt(diag(vcov(fitmodel)))
? list( powf = estbeta-sgnbeta*z1score*sdebeta*beta > 0,
? ? ? ? sdepower = sdebeta)
? )
}

res <- lapply(seq.int(simus), sim.base)
powf <- sapply(res, function(x) x$powf)
sdepower <- sapply(res, function(x) x$sdepower)
powapprox <- apply(powf, 1, sum)


To parallelize, replace lapply above with mclapply.
No, if you use mclapply it's all done for you.
No, they go away once they're done computing.
There is but you may need read up on that - you have to use a generator that produces a streaming sequence such that you can skip forward (also see recent discussion here and R-devel). For the above you may get away without it by simply setting different seeds in each iteration.
You can set any number of cores you want using mc.cores=... argument - the detected cores are just a default if you don't specify anything.

What OS/distro is this? It is very unusual to see the number of cores reported wrongly by the system ...

Cheers,
Simon
#
Scott Raynaud wrote:
If you kill the parent, then all of the children should die.  On Ubuntu,
r-base and r-base-dev are the packages; see
http://cran.r-project.org/bin/linux/ubuntu/README 
You don't need the most recent.  Just 2.14.0 or higher.
#
Scott,
On Mar 8, 2012, at 11:37 AM, Scott Raynaud wrote:

            
I don't quite understand how this is related to your question ... The version of R plays no role here ...
The core detection in multicore on Linux simply looks at /proc/cpuinfo so if it's not right, then your OS is reporting something odd. Note that the detection is just a fall-back if you don't specify anything, so it's really up to you how many parallel processes you want to use.
If you interrupt the master process it automatically kills all child processes and cleans up - at least for all high-level functions like mclapply. If you use low-level functions, you can always use kill(children()); collect() 

Cheers,
Simon
6 days later
#
So what I meant by the latest version was that I think that I want to do would be 
easier in package parallel, but I'm a noob at this so I don't know for sure.? Anyway,
I need 2.14.1 to get to that package.

My IS people insist that the latest version of R avaialble via apt-get is 
2.13.1.? Anything later they claim will have to be compiled.? True?

----- Original Message -----
From: Simon Urbanek <simon.urbanek at r-project.org>
To: Scott Raynaud <scott.raynaud at yahoo.com>
Cc: "r-sig-hpc at r-project.org" <r-sig-hpc at r-project.org>
Sent: Thursday, March 8, 2012 8:30 PM
Subject: Re: [R-sig-hpc] Parallel loop

Scott,
On Mar 8, 2012, at 11:37 AM, Scott Raynaud wrote:

            
I don't quite understand how this is related to your question ... The version of R plays no role here ...
The core detection in multicore on Linux simply looks at /proc/cpuinfo so if it's not right, then your OS is reporting something odd. Note that the detection is just a fall-back if you don't specify anything, so it's really up to you how many parallel processes you want to use.
If you interrupt the master process it automatically kills all child processes and cleans up - at least for all high-level functions like mclapply. If you use low-level functions, you can always use kill(children()); collect() 

Cheers,
Simon
#
On 15 March 2012 at 09:43, Scott Raynaud wrote:
| So what I meant by the latest version was that I think that I want to do would be 
| easier in package parallel, but I'm a noob at this so I don't know for sure.? Anyway,
| I need 2.14.1 to get to that package.
| 
| My IS people insist that the latest version of R avaialble via apt-get is 
| 2.13.1.? Anything later they claim will have to be compiled.? True?

False. See the R FAQ.

Dirk
#
(Apologies if this has already been discussed) I've found it unexpectedly painless to compile R from source on a linux machine, if you'd rather not wait for your IS people. From memory, I believe the necessary commands are:

./configure --without-X
make

-----Original Message-----
From: r-sig-hpc-bounces at r-project.org [mailto:r-sig-hpc-bounces at r-project.org] On Behalf Of Scott Raynaud
Sent: Thursday, March 15, 2012 12:43 PM
To: r-sig-hpc at r-project.org
Subject: Re: [R-sig-hpc] Parallel loop

My IS people insist that the latest version of R avaialble via apt-get is 2.13.1.? Anything later they claim will have to be compiled.? True?
#
On Thu, Mar 15, 2012 at 05:27:23PM +0000, Michael Spiegel wrote:

            
One must also have some sort of FORTRAN compiler available.  Many
systems don't.

Norm Matloff
#
On 12-03-15 02:02 PM, Norm Matloff wrote:
This strikes me a a pretty severe constraint on a system that is for 
HPC.  Don't you find that limits the local tuning you might do, the 
availability of packages, etc.?  Is there a reason for not installing a 
fortran compiler?  Do you have a C compiler?  (I am still trying to 
understand the various piece of parallel computing, and the sorts of 
environments people use, but this caught me by surprise.)

Paul