snow and foreach memory issue?
Zhang, Ivan wrote:
Hi everyone,
I have a question regarding ram usage of snow and foreach.
I am running windows XP with 4 gb ram and intel quadcore 2.66.
I recently tried to implement multicore processing using 'multicore' and
'foreach' until I realized that multicore didn't work well on Windows
and switched to using 'snow' and 'foreach' which works nicely.
I hashed out my own method without dopar however, for some reason I was
eating up my ram really quickly.
I wanted to see if anyone can figure out why perhaps there is something
I don't understand about snow as I just recently started using it.
Suppose X_mat is a series of regressors for Y.
Datapoints is a very large matrix of points.
The pseudo code is as follows:
someFunc = function(cl,...) {
for ( i in 1:n) {
model = generateModel(Y[,i],X_mat)
dp=iter(datapoints, by='row',
chunksize=floor(nrow(datapoints)/NUMCORES))
assign("model", model, .GlobalEnv) #couldn't figure out how
to get cluster export to work within a function.
clusterExport(cl, "model") #this only reads from
globalenv?
pred <- do.call(c, clusterApply(cl, as.list(dp), function(x)
as.list(iter(datapoints, <etc>)) makes a full copy of datapoints. clusterApply creates another copy, distributed across cores. So if they're on the same machine you now use 3 * sizeof(datapoints) memory, even before any calculations done on the workers. Ouch. A first approach might iter(datapoints, chunksize=nrow(datapoints) / N) coupled with clusterApplyLB -- there will be N chunks (assuming N > NUMCORES), and clusterApplyLB will only ever have in play a portion NUMCORES / N of datapoints (clusterApply would divide the N chunks into two groups, again forwarding the entire data to the workers!), so the memory use will be (2 + NUMCORES / N) * sizeof(datapoints). A better approach is to avoid the duplication implied by as.list(iter(<etc>)). This would require an implementation like snow::dynamicClusterApply, where the first NUMCORES chunks of iter() are forwarded to the workers, and then a loop is entered where the manager receives one result and forwards the next chunk to the worker that provided the result. Memory use would then be (1 + NUMCORES / N) * sizeof(datapoints). Presumably this is the strategy taken by %dopar%. multicore should be the winner here, though, since all workers should have access to the data without copying -- sizeof(datapoints) memory use. I haven't used multicore extensively, and especially not on windows. When you say "it didn't work well" it would be helpful to understand why. My limited experimentation suggested no problems when used with data sets that were not too close to the windows memory limits. Perhaps you are really just running out of memory, and multicore is not reporting this as nicely as it could? I'm sure the multicore author would appreciate something more precise in terms of user experience. A final consideration is that calculations on the workers are likely to duplicate a subset of datapoints, so that actual memory use will include an additional component that scales approximately linearly with NUMCORES. If the worker computations are memory intensive, then you'll quickly find yourself in trouble again. Hope that helps, Martin
predict(model, x))
...
}
}
When I ran this code, my ram would go up from the initial 2.5
incrementally up 500 mb after each run until it ate up 4 GB. So for
n>=3, the computer would throttle.
When I found out about the new registerDoSnow, it improved my
performance (props to Stephen Weston) here's the pseudo code for the
equivalent above.
someFunc = function(cl,...) {
registerDoSnow(cl)
for ( i in 1:n) {
model = generateModel(Y[,i],X_mat)
pred <- foreach(dp=iter(datapoints, by='row',
chunksize=floor(nrow(datapoints)/NUMCORES)), .combine=c, .verbose=TRUE)
%dopar% { predict(Tail.lo,dp) })}
}
Aside from slightly shorter lines, the performance was more stable, the
ram ran from 2.5 up to 3.2 GB ish, and stayed stable, and performed
better because it wouldn't run out of cache.
However, I just want to understand what is the difference between the
two treatments so that it would make such a large difference and whether
I am doing something wrong in the first example.
Thanks,
-Ivan
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793