multicore by(), like mclapply?

Mon, Oct 10, 2011 1:24 PM

This is the sort of thing that should be measured, rather than
speculated about, but if you're using multicore all those subsets can
be made at the same time, not sequentially, so they add up to a copy
of the whole data.   Using data.table rather than a data.frame would
help, of course.

I would guess that splitting, garbage collecting, and then forking
would be most efficient -- reducing the chance that all the separate
processes end up separately garbage collecting the results of the
split.

It's a pity that forking messes up the profilers; makes it harder to
measure these things.

    -thomas

On Tue, Oct 11, 2011 at 9:14 AM, Joshua Wiley <jwiley.psych at gmail.com> wrote:

Thomas Lumley
Professor of Biostatistics
University of Auckland

multicore by(), like mclapply?

Thread (8 messages)