Parallel linear model

Wed, Aug 22, 2012 4:03 PM

Martin:

This  is a great example and I would like to use it in class.  But I
think I don't understand the implications of the system.time output
you get.  I have a question about this below. Would you share your
thoughts?

On Wed, Aug 22, 2012 at 4:21 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:

On 08/22/2012 12:47 AM, Patrik Waldmann wrote:

Hello,


I wonder if someone has experience with efficient ways of implicit
parallel execution of (repeated) linear models (as in the non-parallel
example below)? Any suggestions on which way to go?

Patrik Waldmann

pval<-c(1:n)
for (i in 1:n){
mod <- lm(y ~ x[,i])
pval[i] <- summary(mod)$coefficients[2,4]
}


As a different tack, the design matrix is the same across all regressions,
and if your data are consistently structured it may pay to re-calculate the
fit alone. Here's a loosely-tested version that uses a template from a full
fit augmented by the fit of individual columns to the same model

looselm <- function(y, xi, tmpl)
{
    x <- cbind(`(Intercept)`= 1, xi=xi)
    z <- lm.fit(x, y)
    tmpl[names(z)] <- z
    tmpl
}

This is used in f2

f0 <- function(x, y)
    lapply(seq_len(ncol(x)),
           function(i, x, y) summary(lm(y~x[,i]))$coefficients[2, 4],
           x, y)

f1 <- function(x, y, mc.cores=8L)
    mclapply(seq_len(ncol(x)),
             function(i, x, y) summary(lm(y~x[,i]))$coefficients[2, 4],
             x, y, mc.cores=mc.cores)

f2 <- function(x, y) {
    tmpl <- lm(y~x[,1])
    lapply(seq_len(ncol(x)),
           function(i, x, y, tmpl)  {
               summary(looselm(y, x[,i], tmpl))$coefficients[2, 4]
           }, x, y, tmpl)
}

f3 <- function(x, y, mc.cores=8) {
    tmpl <- lm(y~x[,1])
    mclapply(seq_len(ncol(x)),
             function(i, x, y, tmpl)  {
                 summary(looselm(y, x[,i], tmpl))$coefficients[2, 4]
             }, x, y, tmpl, mc.cores=mc.cores)
}

with timings (for 1000 x 1000)

system.time(ans0 <- f0(x, y))

   user  system elapsed
 23.865   1.160  25.120

system.time(ans1 <- f1(x, y, 8L))

   user  system elapsed
 31.902   6.705   6.708

system.time(ans2 <- f2(x, y))

   user  system elapsed
  5.285   0.296   5.596

system.time(ans3 <- f3(x, y, 8L))

   user  system elapsed
 10.256   4.092   2.322

The ans2 version has user 5.2 and system 0.29, which are much better than ans3.

Ordinarily, I'd focus on the "user" part, and I'd think f2 (ordinary
lapply) is much faster.

However, the "elapsed" value for ans3 is half of ans2. How can elapsed
be smaller for ans3? I'm guessing that a larger amount of work is
divided among 8 cores?

When the mulicore functionality is called, "user" and "system" numbers
double because _____?

But the total time elapsed is smaller because a larger amount of
compute time is divided among more cores?

In a system with many users logged in at the same time, it appears the
reasonable thing is to tell them to use lapply, as in ans2, because
the aggregate amount of computational power used is one-half of the
multicore amount in ans3.  I mean, if we have a finite amount of
computation that can take place, the multicore requires twice as much
aggregate work.  While that one user benefits from smaller elapsed
time, the aggregate of the system's amount of work is doubled.

Or am I just thinking of this like a Soviet Communist of the 1950s....

pj

Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

Parallel linear model

Thread (17 messages)