Skip to content

help with parallel processing code

9 messages · 1Rnwb, Dennis Murphy, Max Kuhn

#
Hello R gurus,

I have the code below for which i need help and pointers to make it run in
parallel on a dual core win7 computer with R 2.13.x, using foreach,
iterators,doMC.
library(scatterplot3d) # Loads 3D library.
library(fields)
library(MASS)
library(ROCR)
library(verification)
library(caret)
library(gregmisc)

##simulated data
d=replicate(9, rnorm(40)+10) 
colnames(d)<-c("LEPTIN","SAA","PTH","sEGFR","IGFBP6","MMP2","OPG","IGFBP3","PDGFAABB")
mols=c("LEPTIN","SAA","PTH","sEGFR","IGFBP6","MMP2","OPG","IGFBP3","PDGFAABB")

####Name of the results output
  
file1="AUCvalues_3plex.csv"
  temp1=c('protein comb', 'AUC')
  pdf('ROC Charts-3plex.pdf')
 
#generate combinations
  pc3 = combinations(n=length(mols),r=3)
 
#runing the combinations
  for (len in 1:dim(pc3)[1])
   {
      prs = pc3[len,]
     
## new data mat
 samples <- mols[prs]
 mat <-data [,c(samples,'Self_T1D')]
 mat<-mat[complete.cases(mat),]
 
##### LDA #########
 rows<- c(1:nrow(mat))
 
 scores <- c()
 labels <-c()
 for (itr in 1:1000)
 {
 train <- sample(rows, length(rows)-1)
 label =0 ; if (mat$Self_T1D[-train] == "N") label = 1  #need the value for
this line, should it be 'N' or 'Y'
 z <- lda(Self_T1D ~ ., mat, subset = train)
 score = predict(z, mat[-train, ])$pos[1]
 scores <- c(scores, score) 
 labels<- c(labels, label) 
 }
 
########## ROC #########
 
 pred <- prediction(scores, labels)
 perf <- performance(pred,"tpr", "fpr")
 plot(perf,colorize = F)
 # plot a ROC curve for a single prediction run
 # with CI by bootstrapping and fitted curve
 #roc.plot(labels,scores, xlab = "False positive rate",
 #ylab = "True positive rate", main = NULL, CI = T, n.boot = 100, plot =
"both", binormal = TRUE)
 auc <- as.numeric(performance(pred, measure = "auc", x.measure =
"cutoff")@y.values)
 auc = round(auc,3)
      text(.9,0,paste("  AUC=", auc, sep=" "), cex=1)
 names1 = paste (samples, collapse="+")
 text(.8,.05,names1, cex=0.75)
      temp = c(names1,  auc)
      temp1 = rbind(temp1 , temp)
print(paste(3,' ','done len=',len," ",names1,' ',date(),sep=''))
   }

dev.off()
write.csv(temp1 , file=file1)

Thanks
sharad


--
View this message in context: http://r.789695.n4.nabble.com/help-with-parallel-processing-code-tp3944303p3944303.html
Sent from the R help mailing list archive at Nabble.com.
#
sorry for noise
the simulated data should be like this

d=data.frame(replicate(9, rnorm(40)+10),rep(c('y','n'),20))
colnames(d)<-c("LEPTIN","SAA","PTH","sEGFR","IGFBP6","MMP2","OPG","IGFBP3","PDGFAABB","group") 

--
View this message in context: http://r.789695.n4.nabble.com/help-with-parallel-processing-code-tp3944303p3944922.html
Sent from the R help mailing list archive at Nabble.com.
#
my modification gives me error
+  {
+  train <- sample(rows, length(rows)-1)
+  label =0 ; if (mat$Self_T1D[-train] == "N") label = 1  #need the value
for this line, should it be 'N' or 'Y'
+  z <- lda(mat[train,4] ~ mat[train,1:3])
+  score = predict(z, mat[-train, ])$pos[1]
+  scores <- c(scores, score)
+  labels<- c(labels, label)
+  }
Error in { : task 1 failed - "could not find function "lda""

--
View this message in context: http://r.789695.n4.nabble.com/help-with-parallel-processing-code-tp3944303p3945243.html
Sent from the R help mailing list archive at Nabble.com.
#
Did you load the class package before calling lda()?

Dennis
On Thu, Oct 27, 2011 at 10:14 AM, 1Rnwb <sbpurohit at gmail.com> wrote:
#
I have had issues with some parallel backends not finding functions
within a namespace for packages listed in the ".packages" argument or
explicitly loaded in the body of the foreach loop. This has occurred
with MPI but not with multicore. I can get around this to some extent
by calling the functions using the namespace (eg foo:::bar) but this
is pretty kludgy.
R version 2.13.2 (2011-09-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] doMPI_0.1-5     Rmpi_0.5-9      doMC_1.2.3      multicore_0.1-7
foreach_1.3.2   codetools_0.2-8 iterators_1.0.5

Max
On Thu, Oct 27, 2011 at 4:30 PM, 1Rnwb <sbpurohit at gmail.com> wrote:

  
    
#
the part of the question dawned on me now is, should I try to do the parallel
processing of the full code or only the iteration part? if it is full code
then I am at the complete mercy of the R help community or I giveup on this
and let the computation run the serial way, which is continuing from past
sat.
Sharad

--
View this message in context: http://r.789695.n4.nabble.com/help-with-parallel-processing-code-tp3944303p3948118.html
Sent from the R help mailing list archive at Nabble.com.
3 days later
#
I'm not sure what you mean by full code or the iteration. This uses
foreach to parallelize the loops over different tuning parameters and
resampled data sets.

The only way I could set to split up the parallelism is if you are
fitting different models to the same data. In that case, you could
launch separate jobs for each model. If the data is large and quickly
read from disk, that might be better than storing it in memory and
sequentially running models in the same script. We have decent sized
machines here, so we launch different jobs per model and then
parallelize each (even if it is using 2-3 cores it helps).

Thanks,

Max
On Fri, Oct 28, 2011 at 10:49 AM, 1Rnwb <sbpurohit at gmail.com> wrote: