Parallel loop

I'm trying to convert a loop in a simulation program to a parallel
process using multicore.? It looks like this:

simus=1000?
set.seed(12345)
for(iter in 1:simus){
?
#bunch of code setting up random draws from various 
#distributions for X matrix?
?
#Computaion of Y
?
(fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2))
?
estbeta<-fixef(fitmodel)
?sdebeta<-sqrt(diag(vcov(fitmodel)))
? for(l in 1:betasize)
? {? 
?? cibeta<-estbeta[l]-sgnbeta[l]*z1score*sdebeta[l]
??? if(beta[l]*cibeta>0)????????????? powaprox[[l]]<-powaprox[[l]]+1
????? sdepower[l,iter]<-as.numeric(sdebeta[l])
? }? 
##------------------------------------------------------------------------##
????????????????????????? } ##? iteration end here
?
?
The variable fitmodel is defined elsewhere and consists of
fixed and random parts.? Betasize is the length of the vector of
fixed effects.? Looks like near the bottom there's a 
counter that increments if the slope estimate doesn't trap 0.? 
The next line records the standard deviation.? It's these two items
that I need to recover.
?
So I'm thinking that I'm going to recode this way to run on
a machine that has 4-Xeon CPUs:
?
library(multicore)
options(cores=4)
simus=1000
sim.base.fun<-function(iter){? #replaces for(iter in 1:simus){

#bunch of code setting up random draws from various 
#distributions for X matrix?
?
#Computaion of Y
?
(fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2))
?
estbeta<-fixef(fitmodel)
?sdebeta<-sqrt(diag(vcov(fitmodel)))
? for(l in 1:betasize)
? {? 
?? cibeta<-estbeta[l]-sgnbeta[l]*z1score*sdebeta[l]
??? if(beta[l]*cibeta>0)?????
??? return(powaprox[[1]]<-powaprox[[1]]+1 #replaces powaprox[[l]]<-powaprox[[l]]+1
??? retunr(sedpower[1,iter]<-as.numeric(sdebeta[1]) #replaces sdepower[l,iter]<-as.numeric(sdebeta[l])
?????}
????????????????????????????????????? }#end of sim.base.fun
sim.fun<_lapply(iter in 1:simus),sim.base.fun)
?
?
Do I need a collect statement?? How do I kill the worker
processes?? One thing I'm not accounting for is the random
number generating process.? I've seen a couple of ways of
doing this, one using mclapply and another using rlecuyer,
but I'm not sure how to make them work.?? Suggestions?
Is there a way to replicate the same random numbers used 
in the conventional loop in the parallel loop?
?
My machine has 4-Intel? Xeon? Processor X5650 CPUs.
When I type:
?
library(multicore)
multicore:::detectCores()
?
it returns 4.? However, the Intel cutsheet says each of these proceesors
has 6 cores.? How can I acces them?? I know there's a point of diminishing
returns but I can't figure out what it is unless I test it.