An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20111006/7e5e4857/attachment.pl>
parallel::mclapply() dummy function on Windows?
9 messages · Tim Triche, Jr., Brian Ripley, Martin Morgan +1 more
On Thu, 6 Oct 2011, Tim Triche, Jr. wrote:
Hi all, Would it be possible to have the new 'parallel' library export a dummy function, something akin to if(Windows) mclapply <- lapply to paper over the lack of fork() support on said platform? This may not be the world's greatest idea, but it would make it easier for me to maintain my package and still offer most users good parallel support. Plus, I can't
Why would it make it easier? And how could using a dummy for 'most users' (who are on Windows) offer them 'good parallel support'?
really see where it would cause problems, but then I don't develop R, myself.
Take a look at e.g. package 'boot' to see how to offer alternatives. (A version that uses 'parallel' is pending on CRAN, or see http://www.stats.ox.ac.uk/pub/R/boot_1.3-3.tar.gz .) Package 'parallel' may in future offer a higher-level abstraction layer that makes offers such a choice, but as the 'boot' code shows, deciding what to send to the workers in a snow-style cluster is not simple. Note that it is not just Windows that lacks fork support: some front-ends (notably Rstudio) do not work with forking at present. And some parts of parallel (and multicore/snow) do not work reliably on some OSes (e.g. Solaris).
Thanks for any thoughts on the matter. [[alternative HTML version deleted]]
Please do follow the posting guide: no HTML and use the signature block for your real name and credentials.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20111007/bddd7e8e/attachment.pl>
On Fri, 7 Oct 2011, Tim Triche, Jr. wrote:
On Thu, Oct 6, 2011 at 11:25 PM, Prof Brian Ripley <ripley at stats.ox.ac.uk>
wrote:
Why would it make it easier? ?And how could using a dummy for 'most
users' (who are on Windows) offer them 'good parallel support'?
Good point. ?Most of my users are on unix, because my use of mclapply() is
primarily to expedite processing of raw scanner data. ?Only a handful of
users for the packages that call mclapply() are on Windows. ?Right now, I
default to having parallel=FALSE flags all over the place, but I'd prefer
for the default to be "go as fast as practical in the common case", i.e.,
Unix. ?It would have been more accurate for me to say "I would like to
parallelize by default, without having the methods fail on Windows in the
default configuration" than to claim that I want "good parallel support" for
Windows. ?When I have tried using the foreach/doMC combination in the past,
it has not worked out satisfactorily, so I don't know how well I can support
Windows users... period.
Take a look at e.g. package 'boot' to see how to offer
alternatives. (A version that uses 'parallel' is pending on
CRAN, or see http://www.stats.ox.ac.uk/pub/R/boot_1.3-3.tar.gz
.) Package 'parallel' may in future offer a higher-level
abstraction layer that makes offers such a choice, but as the
'boot' code shows, deciding what to send to the workers in a
snow-style cluster is not simple.
It seems similar to what I do (off topic: why do you use the file extension
'.q' for all of the R/S code files?): pass flags around. ?I suppose I was
*I* don't: the author did. It was the recommendation for S long ago: .S is taken and .q stood for qpe, at one time (mind 1980s, AFAIR) the proposed name for 'new S'.
just being lazy, but I would love to default to "go as fast as possible" without having Windows users get left out in the cold (unless they add flags to their function calls).? Thank you for your suggestions, I will look into this further. -- Tim Triche, Jr. USC Biostatistics
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dear Tim,
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of Tim Triche, Jr. Sent: October-07-11 3:05 PM To: Prof Brian Ripley Cc: r-devel Subject: Re: [Rd] parallel::mclapply() dummy function on Windows? On Thu, Oct 6, 2011 at 11:25 PM, Prof Brian Ripley <ripley at stats.ox.ac.uk>wrote:
Why would it make it easier? And how could using a dummy for 'most
users'
(who are on Windows) offer them 'good parallel support'?
Good point. Most of my users are on unix, because my use of mclapply() is primarily to expedite processing of raw scanner data. Only a handful of users for the packages that call mclapply() are on Windows. Right now, I default to having parallel=FALSE flags all over the place, but I'd prefer
for
the default to be "go as fast as practical in the common case", i.e.,
Unix.
It would have been more accurate for me to say "I would like to
parallelize
by default, without having the methods fail on Windows in the default configuration" than to claim that I want "good parallel support" for
Windows.
When I have tried using the foreach/doMC combination in the past, it has
not
worked out satisfactorily, so I don't know how well I can support Windows users... period.
Why don't you just apply the approach you initially suggested in your own package, defining mclapply() the way you want it? I hope this helps, John
Take a look at e.g. package 'boot' to see how to offer alternatives. (A
version that uses 'parallel' is pending on CRAN, or see http://www.stats.ox.ac.uk/pub/**R/boot_1.3-3.tar.gz<http://www.stats.o x.ac.uk/pub/R/boot_1.3-3.tar.gz>.) Package 'parallel' may in future offer a higher-level abstraction layer that makes offers such a choice,
but
as the 'boot' code shows, deciding what to send to the workers in a snow- style cluster is not simple.
It seems similar to what I do (off topic: why do you use the file
extension
'.q' for all of the R/S code files?): pass flags around. I suppose I was just being lazy, but I would love to default to "go as fast as possible" without having Windows users get left out in the cold (unless they add
flags
to their function calls). Thank you for your suggestions, I will look into this further. -- Tim Triche, Jr. USC Biostatistics [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 10/07/2011 06:03 PM, John Fox wrote:
Dear Tim,
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of Tim Triche, Jr. Sent: October-07-11 3:05 PM To: Prof Brian Ripley Cc: r-devel Subject: Re: [Rd] parallel::mclapply() dummy function on Windows? On Thu, Oct 6, 2011 at 11:25 PM, Prof Brian Ripley <ripley at stats.ox.ac.uk>wrote:
Why would it make it easier? And how could using a dummy for 'most
users'
(who are on Windows) offer them 'good parallel support'?
Good point. Most of my users are on unix, because my use of mclapply() is primarily to expedite processing of raw scanner data. Only a handful of users for the packages that call mclapply() are on Windows. Right now, I default to having parallel=FALSE flags all over the place, but I'd prefer
for
the default to be "go as fast as practical in the common case", i.e.,
Unix.
It would have been more accurate for me to say "I would like to
parallelize
by default, without having the methods fail on Windows in the default configuration" than to claim that I want "good parallel support" for
Windows.
When I have tried using the foreach/doMC combination in the past, it has
not
worked out satisfactorily, so I don't know how well I can support Windows users... period.
Why don't you just apply the approach you initially suggested in your own package, defining mclapply() the way you want it?
Hi John et al.,
Individual packages will become littered with ad hoc solutions,
constructed without, for instance, the wisdom and experience of Prof.
Ripley about platforms or environments in which it is appropriate to use
mclapply. For instance, Tim's pseudo-code if (Windows) ... translated as
if (.Platform$OS.type == "windows") doesn't sound like its the correct
test; at least
exists("mclapply", getNamespace("parallel"))
but probably more. Also, doesn't parallel's name space differ between
platforms, requiring the package author to import(parallel) rather than
the better practice of importFrom(parallel, mclapply) ?
Martin
I hope this helps, John
Take a look at e.g. package 'boot' to see how to offer alternatives. (A
version that uses 'parallel' is pending on CRAN, or see http://www.stats.ox.ac.uk/pub/**R/boot_1.3-3.tar.gz<http://www.stats.o x.ac.uk/pub/R/boot_1.3-3.tar.gz>.) Package 'parallel' may in future offer a higher-level abstraction layer that makes offers such a choice,
but
as the 'boot' code shows, deciding what to send to the workers in a snow- style cluster is not simple.
It seems similar to what I do (off topic: why do you use the file
extension
'.q' for all of the R/S code files?): pass flags around. I suppose I was just being lazy, but I would love to default to "go as fast as possible" without having Windows users get left out in the cold (unless they add
flags
to their function calls). Thank you for your suggestions, I will look into this further. -- Tim Triche, Jr. USC Biostatistics [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
Dear Martin, I don't have an opinion about whether what Tim wants to do is a good idea, but was responding to his comment that he would need "parallel=FALSE flags all over the place." Why could he not simply define mclapply <- if (.Platform$OS.type == "windows") base::lapply else parallel::mclapply in his package? Best, John
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of Martin Morgan Sent: October-08-11 8:16 AM To: John Fox Cc: ttriche at usc.edu; 'Prof Brian Ripley'; 'r-devel' Subject: Re: [Rd] parallel::mclapply() dummy function on Windows? On 10/07/2011 06:03 PM, John Fox wrote:
Dear Tim,
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of Tim Triche, Jr. Sent: October-07-11 3:05 PM To: Prof Brian Ripley Cc: r-devel Subject: Re: [Rd] parallel::mclapply() dummy function on Windows? On Thu, Oct 6, 2011 at 11:25 PM, Prof Brian Ripley <ripley at stats.ox.ac.uk>wrote:
Why would it make it easier? And how could using a dummy for 'most
users'
(who are on Windows) offer them 'good parallel support'?
Good point. Most of my users are on unix, because my use of mclapply() is primarily to expedite processing of raw scanner data. Only a handful of users for the packages that call mclapply() are on Windows. Right now, I default to having parallel=FALSE flags all over the place, but I'd prefer
for
the default to be "go as fast as practical in the common case", i.e.,
Unix.
It would have been more accurate for me to say "I would like to
parallelize
by default, without having the methods fail on Windows in the default configuration" than to claim that I want "good parallel support" for
Windows.
When I have tried using the foreach/doMC combination in the past, it has
not
worked out satisfactorily, so I don't know how well I can support Windows users... period.
Why don't you just apply the approach you initially suggested in your own package, defining mclapply() the way you want it?
Hi John et al., Individual packages will become littered with ad hoc solutions,
constructed
without, for instance, the wisdom and experience of Prof. Ripley about platforms or environments in which it is appropriate to use mclapply. For instance, Tim's pseudo-code if (Windows) ... translated as
if
(.Platform$OS.type == "windows") doesn't sound like its the correct test;
at
least
exists("mclapply", getNamespace("parallel"))
but probably more. Also, doesn't parallel's name space differ between
platforms, requiring the package author to import(parallel) rather than
the
better practice of importFrom(parallel, mclapply) ? Martin
I hope this helps, John
Take a look at e.g. package 'boot' to see how to offer alternatives. (A
version that uses 'parallel' is pending on CRAN, or see http://www.stats.ox.ac.uk/pub/**R/boot_1.3-3.tar.gz<http://www.stats .o x.ac.uk/pub/R/boot_1.3-3.tar.gz>.) Package 'parallel' may in future offer a higher-level abstraction layer that makes offers such a choice,
but
as the 'boot' code shows, deciding what to send to the workers in a snow- style cluster is not simple.
It seems similar to what I do (off topic: why do you use the file
extension
'.q' for all of the R/S code files?): pass flags around. I suppose I was just being lazy, but I would love to default to "go as fast as
possible"
without having Windows users get left out in the cold (unless they add
flags
to their function calls). Thank you for your suggestions, I will look into this further. -- Tim Triche, Jr. USC Biostatistics [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Sat, 8 Oct 2011, John Fox wrote:
Dear Martin, I don't have an opinion about whether what Tim wants to do is a good idea, but was responding to his comment that he would need "parallel=FALSE flags all over the place." Why could he not simply define mclapply <- if (.Platform$OS.type == "windows") base::lapply else parallel::mclapply in his package?
Because mclapply has additional arguments that would be passed by FUN to lapply as part of ... . We are contemplating having wrappers of mclapply and pvec on Windows equivalent to the behaviour with mc.cores = 1 on Unix. But that is nothing to do with original specious claim to which I responded: if you want good parallel performance for most users you need also to support both parLapply and mclapply (or at least, parLapply with a fork cluster). I think the import issue is a red herring: these functions are not called often enough for parallel::mclapply to be inefficient. And really importFrom is only better practice for things that will always be used, since it moves the computation from as-needed to every time the package is loaded.
Best, John
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of Martin Morgan Sent: October-08-11 8:16 AM To: John Fox Cc: ttriche at usc.edu; 'Prof Brian Ripley'; 'r-devel' Subject: Re: [Rd] parallel::mclapply() dummy function on Windows? On 10/07/2011 06:03 PM, John Fox wrote:
Dear Tim,
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of Tim Triche, Jr. Sent: October-07-11 3:05 PM To: Prof Brian Ripley Cc: r-devel Subject: Re: [Rd] parallel::mclapply() dummy function on Windows? On Thu, Oct 6, 2011 at 11:25 PM, Prof Brian Ripley <ripley at stats.ox.ac.uk>wrote:
Why would it make it easier? And how could using a dummy for 'most
users'
(who are on Windows) offer them 'good parallel support'?
Good point. Most of my users are on unix, because my use of mclapply() is primarily to expedite processing of raw scanner data. Only a handful of users for the packages that call mclapply() are on Windows. Right now, I default to having parallel=FALSE flags all over the place, but I'd prefer
for
the default to be "go as fast as practical in the common case", i.e.,
Unix.
It would have been more accurate for me to say "I would like to
parallelize
by default, without having the methods fail on Windows in the default configuration" than to claim that I want "good parallel support" for
Windows.
When I have tried using the foreach/doMC combination in the past, it has
not
worked out satisfactorily, so I don't know how well I can support Windows users... period.
Why don't you just apply the approach you initially suggested in your own package, defining mclapply() the way you want it?
Hi John et al., Individual packages will become littered with ad hoc solutions,
constructed
without, for instance, the wisdom and experience of Prof. Ripley about platforms or environments in which it is appropriate to use mclapply. For instance, Tim's pseudo-code if (Windows) ... translated as
if
(.Platform$OS.type == "windows") doesn't sound like its the correct test;
at
least
exists("mclapply", getNamespace("parallel"))
but probably more. Also, doesn't parallel's name space differ between
platforms, requiring the package author to import(parallel) rather than
the
better practice of importFrom(parallel, mclapply) ? Martin
I hope this helps, John
Take a look at e.g. package 'boot' to see how to offer alternatives. (A
version that uses 'parallel' is pending on CRAN, or see http://www.stats.ox.ac.uk/pub/**R/boot_1.3-3.tar.gz<http://www.stats .o x.ac.uk/pub/R/boot_1.3-3.tar.gz>.) Package 'parallel' may in future offer a higher-level abstraction layer that makes offers such a choice,
but
as the 'boot' code shows, deciding what to send to the workers in a snow- style cluster is not simple.
It seems similar to what I do (off topic: why do you use the file
extension
'.q' for all of the R/S code files?): pass flags around. I suppose I was just being lazy, but I would love to default to "go as fast as
possible"
without having Windows users get left out in the cold (unless they add
flags
to their function calls). Thank you for your suggestions, I will look into this further. -- Tim Triche, Jr. USC Biostatistics [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dear Brian,
-----Original Message----- From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] Sent: October-08-11 9:57 AM To: John Fox Cc: 'Martin Morgan'; ttriche at usc.edu; 'r-devel' Subject: RE: [Rd] parallel::mclapply() dummy function on Windows? On Sat, 8 Oct 2011, John Fox wrote:
Dear Martin, I don't have an opinion about whether what Tim wants to do is a good idea, but was responding to his comment that he would need "parallel=FALSE flags all over the place." Why could he not simply define mclapply <- if (.Platform$OS.type == "windows") base::lapply else parallel::mclapply in his package?
Because mclapply has additional arguments that would be passed by FUN to lapply as part of ... .
I did think of that and took a look at mclapply() before I responded. All of
the additional arguments occur after ... and have defaults. I assumed from
the original posting that Tim Triche is using the defaults (otherwise I
don't think he would have made his original suggestion), but even if he is
not, he could define mclapply() in his package as something like
mclapply <- if (.Platform$OS.type != "windows") parallel::mclapply
else function(X, FUN, ..., mc.preschedule = TRUE, mc.set.seed =
TRUE,
mc.silent = FALSE, mc.cores =
getOption("mc.cores", 2L),
mc.cleanup = TRUE, mc.allow.recursive =
TRUE))
base::lapply(X, FUN, ...)
As I said, I won't pretend that I know whether his general approach is
sound.
Best,
John
We are contemplating having wrappers of mclapply and pvec on Windows equivalent to the behaviour with mc.cores = 1 on Unix. But that is
nothing
to do with original specious claim to which I responded: if you want good parallel performance for most users you need also to support both
parLapply
and mclapply (or at least, parLapply with a fork cluster). I think the import issue is a red herring: these functions are not called often enough for parallel::mclapply to be inefficient. And really
importFrom
is only better practice for things that will always be used, since it
moves
the computation from as-needed to every time the package is loaded.
Best, John
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of Martin Morgan Sent: October-08-11 8:16 AM To: John Fox Cc: ttriche at usc.edu; 'Prof Brian Ripley'; 'r-devel' Subject: Re: [Rd] parallel::mclapply() dummy function on Windows? On 10/07/2011 06:03 PM, John Fox wrote:
Dear Tim,
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of Tim Triche, Jr. Sent: October-07-11 3:05 PM To: Prof Brian Ripley Cc: r-devel Subject: Re: [Rd] parallel::mclapply() dummy function on Windows? On Thu, Oct 6, 2011 at 11:25 PM, Prof Brian Ripley <ripley at stats.ox.ac.uk>wrote:
Why would it make it easier? And how could using a dummy for 'most
users'
(who are on Windows) offer them 'good parallel support'?
Good point. Most of my users are on unix, because my use of mclapply() is primarily to expedite processing of raw scanner data. Only a handful of users for the packages that call mclapply() are on Windows. Right now, I default to having parallel=FALSE flags all over the place, but I'd prefer
for
the default to be "go as fast as practical in the common case", i.e.,
Unix.
It would have been more accurate for me to say "I would like to
parallelize
by default, without having the methods fail on Windows in the default configuration" than to claim that I want "good parallel support" for
Windows.
When I have tried using the foreach/doMC combination in the past, it has
not
worked out satisfactorily, so I don't know how well I can support Windows users... period.
Why don't you just apply the approach you initially suggested in your own package, defining mclapply() the way you want it?
Hi John et al., Individual packages will become littered with ad hoc solutions,
constructed
without, for instance, the wisdom and experience of Prof. Ripley about platforms or environments in which it is appropriate to use mclapply. For instance, Tim's pseudo-code if (Windows) ... translated as
if
(.Platform$OS.type == "windows") doesn't sound like its the correct test;
at
least
exists("mclapply", getNamespace("parallel"))
but probably more. Also, doesn't parallel's name space differ between
platforms, requiring the package author to import(parallel) rather
than
the
better practice of importFrom(parallel, mclapply) ? Martin
I hope this helps, John
Take a look at e.g. package 'boot' to see how to offer alternatives. (A
version that uses 'parallel' is pending on CRAN, or see http://www.stats.ox.ac.uk/pub/**R/boot_1.3-3.tar.gz<http://www.sta ts .o x.ac.uk/pub/R/boot_1.3-3.tar.gz>.) Package 'parallel' may in future offer a higher-level abstraction layer that makes offers such a choice,
but
as the 'boot' code shows, deciding what to send to the workers in a snow- style cluster is not simple.
It seems similar to what I do (off topic: why do you use the file
extension
'.q' for all of the R/S code files?): pass flags around. I suppose I was just being lazy, but I would love to default to "go as fast as
possible"
without having Windows users get left out in the cold (unless they add
flags
to their function calls). Thank you for your suggestions, I will look into this further. -- Tim Triche, Jr. USC Biostatistics [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595