Parallel linear model
On Aug 22, 2012, at 7:18 PM, Norm Matloff wrote:
On Wed, Aug 22, 2012 at 06:03:36PM -0500, Paul Johnson wrote:
This is a great example and I would like to use it in class. But I think I don't understand the implications of the system.time output you get. I have a question about this below. Would you share your thoughts?...
Paul is bringing up a very important point here. There are various OS dependencies that can really change things. A notable example is that if one calls something like mclapply(), the time actually spent by the child R processes probably will NOT be counted in the User time.
That is actually wrong. It is true for snow where the processes are separate, but most systems do account for child user time in mclapply: # Linux
system.time(mclapply(1:32, function(x) for(i in 1:1e6) x+x, mc.cores=32))
user system elapsed 27.330 1.468 0.944
system.time((function(x) for(i in 1:1e6) x+x)(1))
user system elapsed 0.736 0.000 0.734 # OS X
system.time(mclapply(1:16, function(x) for(i in 1:1e6) x+x, mc.cores=16))
user system elapsed 9.386 0.357 0.876
system.time((function(x) for(i in 1:1e6) x+x)(1))
user system elapsed 0.425 0.004 0.428 Cheers, Simon
The latter will likely just measure how much time the parent process spend in parceling out the work to the children, and in collecting together the results. You have the same problem on a cluster, where the worker processes set up by clusterApply() or whatever aren't counted. You could on the other hand have the opposite problem in some OSes, where once gets the SUM of the times of the children. Using Elapsed time might be a little crude, but generally good enough. Norm
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc