Parallel linear model
Martin: This is a great example and I would like to use it in class. But I think I don't understand the implications of the system.time output you get. I have a question about this below. Would you share your thoughts?
On Wed, Aug 22, 2012 at 4:21 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
On 08/22/2012 12:47 AM, Patrik Waldmann wrote:
Hello,
I wonder if someone has experience with efficient ways of implicit
parallel execution of (repeated) linear models (as in the non-parallel
example below)? Any suggestions on which way to go?
Patrik Waldmann
pval<-c(1:n)
for (i in 1:n){
mod <- lm(y ~ x[,i])
pval[i] <- summary(mod)$coefficients[2,4]
}
As a different tack, the design matrix is the same across all regressions,
and if your data are consistently structured it may pay to re-calculate the
fit alone. Here's a loosely-tested version that uses a template from a full
fit augmented by the fit of individual columns to the same model
looselm <- function(y, xi, tmpl)
{
x <- cbind(`(Intercept)`= 1, xi=xi)
z <- lm.fit(x, y)
tmpl[names(z)] <- z
tmpl
}
This is used in f2
f0 <- function(x, y)
lapply(seq_len(ncol(x)),
function(i, x, y) summary(lm(y~x[,i]))$coefficients[2, 4],
x, y)
f1 <- function(x, y, mc.cores=8L)
mclapply(seq_len(ncol(x)),
function(i, x, y) summary(lm(y~x[,i]))$coefficients[2, 4],
x, y, mc.cores=mc.cores)
f2 <- function(x, y) {
tmpl <- lm(y~x[,1])
lapply(seq_len(ncol(x)),
function(i, x, y, tmpl) {
summary(looselm(y, x[,i], tmpl))$coefficients[2, 4]
}, x, y, tmpl)
}
f3 <- function(x, y, mc.cores=8) {
tmpl <- lm(y~x[,1])
mclapply(seq_len(ncol(x)),
function(i, x, y, tmpl) {
summary(looselm(y, x[,i], tmpl))$coefficients[2, 4]
}, x, y, tmpl, mc.cores=mc.cores)
}
with timings (for 1000 x 1000)
system.time(ans0 <- f0(x, y))
user system elapsed 23.865 1.160 25.120
system.time(ans1 <- f1(x, y, 8L))
user system elapsed 31.902 6.705 6.708
system.time(ans2 <- f2(x, y))
user system elapsed 5.285 0.296 5.596
system.time(ans3 <- f3(x, y, 8L))
user system elapsed 10.256 4.092 2.322
The ans2 version has user 5.2 and system 0.29, which are much better than ans3. Ordinarily, I'd focus on the "user" part, and I'd think f2 (ordinary lapply) is much faster. However, the "elapsed" value for ans3 is half of ans2. How can elapsed be smaller for ans3? I'm guessing that a larger amount of work is divided among 8 cores? When the mulicore functionality is called, "user" and "system" numbers double because _____? But the total time elapsed is smaller because a larger amount of compute time is divided among more cores? In a system with many users logged in at the same time, it appears the reasonable thing is to tell them to use lapply, as in ans2, because the aggregate amount of computational power used is one-half of the multicore amount in ans3. I mean, if we have a finite amount of computation that can take place, the multicore requires twice as much aggregate work. While that one user benefits from smaller elapsed time, the aggregate of the system's amount of work is doubled. Or am I just thinking of this like a Soviet Communist of the 1950s.... pj
Martin
Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu