Parallel linear model

Wed, Aug 22, 2012 5:21 PM

Hi Paul --

On 08/22/2012 04:03 PM, Paul Johnson wrote:

it's pretty fragile, relying on a design matrix where only a single 
column differs between runs.

think I don't understand the implications of the system.time output
you get.  I have a question about this below. Would you share your
thoughts?

On Wed, Aug 22, 2012 at 4:21 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:

On 08/22/2012 12:47 AM, Patrik Waldmann wrote:

Hello,


I wonder if someone has experience with efficient ways of implicit
parallel execution of (repeated) linear models (as in the non-parallel
example below)? Any suggestions on which way to go?

Patrik Waldmann

pval<-c(1:n)
for (i in 1:n){
mod <- lm(y ~ x[,i])
pval[i] <- summary(mod)$coefficients[2,4]
}


As a different tack, the design matrix is the same across all regressions,
and if your data are consistently structured it may pay to re-calculate the
fit alone. Here's a loosely-tested version that uses a template from a full
fit augmented by the fit of individual columns to the same model

looselm <- function(y, xi, tmpl)
{
     x <- cbind(`(Intercept)`= 1, xi=xi)
     z <- lm.fit(x, y)
     tmpl[names(z)] <- z
     tmpl
}

This is used in f2

f0 <- function(x, y)
     lapply(seq_len(ncol(x)),
            function(i, x, y) summary(lm(y~x[,i]))$coefficients[2, 4],
            x, y)

f1 <- function(x, y, mc.cores=8L)
     mclapply(seq_len(ncol(x)),
              function(i, x, y) summary(lm(y~x[,i]))$coefficients[2, 4],
              x, y, mc.cores=mc.cores)

f2 <- function(x, y) {
     tmpl <- lm(y~x[,1])
     lapply(seq_len(ncol(x)),
            function(i, x, y, tmpl)  {
                summary(looselm(y, x[,i], tmpl))$coefficients[2, 4]
            }, x, y, tmpl)
}

f3 <- function(x, y, mc.cores=8) {
     tmpl <- lm(y~x[,1])
     mclapply(seq_len(ncol(x)),
              function(i, x, y, tmpl)  {
                  summary(looselm(y, x[,i], tmpl))$coefficients[2, 4]
              }, x, y, tmpl, mc.cores=mc.cores)
}

with timings (for 1000 x 1000)

system.time(ans0 <- f0(x, y))

    user  system elapsed
  23.865   1.160  25.120

system.time(ans1 <- f1(x, y, 8L))

    user  system elapsed
  31.902   6.705   6.708

system.time(ans2 <- f2(x, y))

    user  system elapsed
   5.285   0.296   5.596

system.time(ans3 <- f3(x, y, 8L))

    user  system elapsed
  10.256   4.092   2.322

The ans2 version has user 5.2 and system 0.29, which are much better than ans3.

Ordinarily, I'd focus on the "user" part, and I'd think f2 (ordinary
lapply) is much faster.

However, the "elapsed" value for ans3 is half of ans2. How can elapsed
be smaller for ans3? I'm guessing that a larger amount of work is
divided among 8 cores?

Norm's answer made me look first at system.time then at print.proc_time, 
discovering that the column labelled 'user' is the sum of the parent and 
child process user times, and likewise for 'system' (the times for 
parent versus children are available in the system.time() return value). 
'elapsed' is wall time.

 > t1 = system.time(ans1 <- f1(x, y, 8L))
 > unclass(t1)
  user.self   sys.self    elapsed user.child  sys.child
      0.040      0.032      4.791     24.777      2.892
 > t3 <- system.time(ans3 <- f3(x, y, 8L))
 > unclass(t3)
  user.self   sys.self    elapsed user.child  sys.child
      0.060      0.016      1.190     14.009      1.852

(times are clearly varying between replicates, so perhaps rbenchmark is 
in order).

There is additional work associated with forked processes, e.g., 
serializing results to return them to the parent; I'm also not sure who 
(if anyone) gets dinged when processes are waiting to talk with one another.

Probably the 2x is a property of this example, reflecting the particular 
way computation versus communication and other overhead costs accrue. If 
the computation tasks were bigger, the communication and other overhead 
costs become less important.

For processing large data the tasks are probably big enough that it 
makes economic sense to consume as many resources as the system will 
allow. For exploratory, interactive jobs time is precious and it makes 
emotional sense to consume as many resources as the system will allow. 
Even in the best case, though, speed-up is only proportional to 1 / N.

Martin

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

Parallel linear model

Thread (17 messages)