Skip to content
Back to formatted view

Raw Message

Message-ID: <6429e494e198487fb4fb6c5bcf368a6d@exch1-mel.nexus.csiro.au>
Date: 2016-08-04T23:29:01Z
From: Alexander.Herr at csiro.au
Subject: foreach {parallel} nested with for loop to update data.frame column

Hiya,
Yes that would work, also an aggregate with merge can work, but I really would like to make this a parallel calculation with farming out the first loop to different workers and put the output together again into the data.frame with additional columns. This will speed up work with very large files and avoid running out of memory issues

Cheers
H
XXXXXXXXXXXXXXXXXXXXXX Petr wrote XXXXXXXXXXXXXXXXXXXXXXXXXXXX
Hi 

I may be completely wrong but isn't it work for ave? With your example I get 

> fac<-interaction(xyz[,1], xyz[,2], drop=TRUE) 
> xyz[,4]<-ave(xyz$z, fac, FUN= min) 
> head(xyz) 
? ?x ?y ? ? z ?mins 
1 13 15 ?1.97 -2.91 
2 17 ?9 14.90 -2.81 
3 ?9 10 34.68 -1.97 
4 17 ?6 ?4.26 -2.63 
5 ?3 12 ?0.12 ?0.12 
6 19 11 ?7.91 ?7.91 
> 

Cheers 
Petr 

> -----Original Message----- 
> From: R-help [mailto:[hidden email]] On Behalf Of 
> [hidden email] 
> Sent: Thursday, August 4, 2016 3:32 AM 
> To: [hidden email] 
> Subject: [R] foreach {parallel} nested with for loop to update data.frame 
> column 
> 
> Hi List, 
> 
> Trying to update a data.frame column within a foreach nested for loop 
> 
> ### trial data 
> set.seed(666) 
> xyz<-as.data.frame(cbind(x=rep(rpois(5000,10),2)+1, 
> y=rep(rpois(5000,10),2)+1,z=round(runif(10000, min=-3, max=40),2))) 
> xyz$mins<-rep(NA, nrow(xyz)) 
> 
> cl<-makeCluster(16) ?#adjust to your cluster number 
> registerDoParallel(cl) 
> 
> counter=0 
> foreach(i=unique(xyz[,1]), .combine=data.frame, .verbose=T) %dopar% { 
> ? ? ? ? for( j in unique(xyz[,2])) { 
> ? ? ? ? ? ? ? ? xyz[xyz[,2] == j ,4]<-min(xyz[xyz[,2] == j ,2]) 
> ? ? ? ? } 
> 
> } 
> 
> stopCluster(cl) 
> 
> This is obviously not working. Any hints? 
> 
> Thanx 
> Herry