multicore by(), like mclapply?
hi josh---thx. I had a different version of this, and discarded it because I think it was very slow. the reason is that on each application, your version has to scan my (very long) data vector. (I have many thousand different cases, too.) I presume that by() has one scan through the vector that makes all splits. regards, /iaw ---- Ivo Welch (ivo.welch at gmail.com)
On Mon, Oct 10, 2011 at 11:07 AM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
Hi Ivo,
My suggestion would be to only pass lapply (or mclapply) the indices.
That should be fast, subsetting with data table should also be fast,
and then you do whatever computations you will. ?For example:
require(data.table)
DT <- data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
setkey(DT, x)
lapply(as.character(unique(DT[,x])), function(i) DT[i])
the DT[i] object is the subset of the data table you want. ?You can
pass this to whatever function for computations you need.
Hope this helps,
Josh
On Mon, Oct 10, 2011 at 10:41 AM, ivo welch <ivo.welch at gmail.com> wrote:
dear r experts---Is there a multicore equivalent of by(), just like mclapply() is the multicore equivalent of lapply()? if not, is there a fast way to convert a data.table into a list based on a column that lapply and mclapply can consume? advice appreciated...as always. regards, /iaw ---- Ivo Welch (ivo.welch at gmail.com)
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/