Parallel linear model
On 22 August 2012 at 23:22, Norm Matloff wrote:
| | In rereading your posting now, Dirk, I suddenly realized that there is | one aspect of this that I'd forgotten about: An ordinary call to | system.time() does not display all the information returned by that | function! | | That odd statement is of course due to the fact that the print | method for objects of class proc_time displays only 3 of the 5 numbers. | If one actually looks at the 5 numbers individually, you can separate | the time of the parent process from the sum of the child times. That | separation is apparently what rbenchmark gives you, right? | | As I said earlier, the quick-and-dirty way to handle this is to use the | Elapsed time, typically good enough (say on a dedicated machine). After | all, if we are trying to develop a fast parallel algorithm, what the | potential users of the algorithm care about is essentially the Elapsed | time. That seems fair in most cases. | But at the other extreme, a very fine timing goal might be to try to | compute what is called the makespan, which in this case would be the | maximum of all the child times, rather than the sum of the child times. | I say "try," because I don't see any systems way to accomplish this, | short of inserting calls to something like clock_gettime() inside each | thread. Maybe you could look at what microbenchmark does [ as it covers all the OS-level dirty work ] and see if it generalizes to multiple machines? Dirk | Norm |
| On Wed, Aug 22, 2012 at 07:53:02PM -0500, Dirk Eddelbuettel wrote:
| > | > The difference between user and elapsed is an old hat. Here is a great | > example (and IIRC first shown here by Simon) with no compute time: | > | > R> system.time(mclapply(1:8, function(x) Sys.sleep(1))) ## 2 cores by default | > user system elapsed | > 0.000 0.012 4.014 | > R> system.time(mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8)) | > user system elapsed | > 0.012 0.020 1.039 | > R> | > | > so elapsed time is effectively the one second a Sys.sleep(1) takes, plus | > overhead, if we allow for all eight (hyperthreaded) cores here. By Brian | > Ripley's choice a default of two is baked-in, so clueless users only get a | > small gain. "user time" is roughly the actual system load _summed over all | > processes / threads_. | > | > With that, could I ask any of the participants in the thread to re-try with a | > proper benchmarking package such as rbenchmark or microbenchmark? Either one | > beats to the socks of system.time: | > | > R> library(rbenchmark) | > R> benchmark( mclapply(1:8, function(x) Sys.sleep(1)), mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8), replications=1) | > test replications elapsed relative user.self sys.self user.child sys.child | > 1 mclapply(1:8, function(x) Sys.sleep(1)) 1 4.013 3.89612 0.000 0.008 0.000 0.004 | > 2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8) 1 1.030 1.00000 0.004 0.008 0.004 0.000 | > R> | > | > and | > | > R> library(microbenchmark) | > R> microbenchmark( mclapply(1:8, function(x) Sys.sleep(1)), mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8), times=1) | > Unit: seconds | > expr min lq median uq max | > 1 mclapply(1:8, function(x) Sys.sleep(1)) 4.01377 4.01377 4.01377 4.01377 4.01377 | > 2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8) 1.03457 1.03457 1.03457 1.03457 1.03457 | > R> | > | > (and you normally want to run either with 10 or 100 or ... replications / | > times). | > | > Dirk | > | > -- | > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com | > | > _______________________________________________ | > R-sig-hpc mailing list | > R-sig-hpc at r-project.org | > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc | | _______________________________________________ | R-sig-hpc mailing list | R-sig-hpc at r-project.org | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com