request help with replication and snowFT
Paul,
Could you please explain more about the intended use of parallelPerform?
Make parallel computing in R as easy as possible.
I tried, but could not find a convenient way to send functions to the nodes before the main function would run. I could build everything into a package and install that on the nodes, but for development and testing, that makes for a pretty tedious process.
You can just pack the additional functions into the main function. In
your example,
## The main function of interest
myNorm<- function (x){
myA<- function( x ){
2 *x
}
myB<- function( x ){
3 * x
}
myC<- function( x, y){
x + y
}
whew<- myA(x)
whewyou<- myB(whew)
whewwho<- myC(whew, whewyou)
y<- rnorm(whewwho)
list(x, whew, whewyou, whewwho, y, sum(y))
}
The Sys.info call can be defined in an init function:
nodeinfo<- function() Sys.info()[c("nodename","machine")]
Then, you can use performParallel:
res1<- performParallel(cnt, x=myx, fun=myNorm, initfun=nodeinfo, seed=mySeeds)
Would you care to put an example like this in your documentation?
I can do that. Note that snowFT is not an attempt to replace snow - it is its extension. It simplifies its usage in order to make it accessible to people who might otherwise feel too intimidated by parallel computing (and offers some additional benefits such as reproducibility and fault tolerance). But for people who want to explore and dig into the details, the full set of snow functions is available.
And then explain how a user can grab any one arbitrary stream and re-run it for interactive investigation of its properties. When we run this thing 1000 times and 2 are really off the usual result, we want to dig in and try to see what happened.
There is no function that allows this. It would be a nice extension though - let me know if you have a suggestion for an implementation. Hana
Here's my test case:
### r: number of streams. Should be set as BIGGEST number of runs=streams
### you could ever want to replicate. It sets a framework of streams
### that is the same on all nodes. Here I have 33 streams, only 10 nodes.
### snowFT handles the problem of creating 33 separate streams, so there
### is one ready for each possible run, no matter which node is doing
### the work.
r<- 33
### cnt: number of nodes
cnt<- 10
cl<- makeClusterFT(cnt, type="MPI")
### From snowFT methods:
printClusterInfo(cl)
### Can use SNOW methods as well.
### Testing with SNOW methods: sends function to each system
clusterCall( cl, function() Sys.info()[c("nodename","machine")])
### Some user-written functions involved in a simulation
myA<- function( x ){
2 *x
}
myB<- function( x ){
3 * x
}
myC<- function( x, y){
x + y
}
## The main function of interest
myNorm<- function (x){
whew<- myA(x)
whewyou<- myB(whew)
whewwho<- myC(whew, whewyou)
y<- rnorm(whewwho)
list(x, whew, whewyou, whewwho, y, sum(y))
}
mySeeds<- c(1231, 2323, 43435, 12123, 22442, 634654)
##create "x" vector.
myx<- sample(1:8, r, replace=T)
## Send functions to systems with SNOW functions
clusterExport(cl, "myA")
clusterExport(cl, "myB")
clusterExport(cl, "myC")
clusterSetupRNG.FT(cl, type = "RNGstream", streamper="replicate", n=r,
seed=mySeeds)
res1<- clusterApplyFT(cl, x=myx, fun=myNorm, seed=mySeeds)
print(res1[[1]])