Message-ID: <CAJ55+dL_=oMbszz8KbxS3P=nhDYZR1432nbNhgqadvg3rWT39w@mail.gmail.com>
Date: 2011-10-10T19:19:30Z
From: Thomas Lumley
Subject: multicore by(), like mclapply?
In-Reply-To: <CAPr7RtVYX99gLN6+POw2R5kijRBqXz69w_7o=xtqwHKJzo+zRg@mail.gmail.com>
On Tue, Oct 11, 2011 at 7:54 AM, ivo welch <ivo.welch at gmail.com> wrote:
> hi josh---thx. ?I had a different version of this, and discarded it
> because I think it was very slow. ?the reason is that on each
> application, your version has to scan my (very long) data vector. ?(I
> have many thousand different cases, too.) ?I presume that by() has one
> scan through the vector that makes all splits.
by.data.frame() is basically a wrapper for tapply(), and the key line
in tapply() is
ans <- lapply(split(X, group), FUN, ...)
which should be easy to adapt for mclapply.
--
Thomas Lumley
Professor of Biostatistics
University of Auckland