Skip to content
Prev 45740 / 63424 Next

Speeding up build-from-source

On 04/27/2013 09:10 AM, Martin Morgan wrote:
Hi Martin,
	Thanks for the reply -- but I'm afraid the question you've answered 
isn't the question that I intended to ask.

	Based on your response, I think the answer to my question is likely 
"no."  But let me try rephrasing anyway, just in case:

	I'm certainly quite aware of "-j" as a make argument; if I weren't, the 
bottleneck would not be the byte-compilation, and the build would take 
rather more than 5 minutes :-)  That was the very first thing I tried. 
I don't believe that parallel make is as parallel as it theoretically 
could be.  (In fact, I see almost no parallelism between libraries on my 
system; individual .c files are parallelized nicely but only one library 
at a time.  This mostly matters at the compiling-bytecode step, since 
that's the biggest serial operation per library.)  My question is, has 
anyone thought about what it would take to parallelize the build further?

	I'm not sure that this can be done with just the makefiles.  But the 
following comment makes me at least a little suspicious:

""" src/library/Makefile
## FIXME: do some of this in parallel?
"""

	Surely some of the 'for' loops there could be unwound into proper make 
targets with dependency information?  I'm not sure if the dependency 
information would effectively force a serial compilation anyway, though?...

	Another approach, if the above is hard for some reason:  What I'm 
seeing is that the byte compilation is largely serial; but as you note, 
byte-compilation is optional.  Could the makefiles just defer it?; skip 
it up front and then do all the byte-compilations for all of the 
packages concurrently?  From a very cursory read of the code, it looks 
like the relevant code is in src/library/tools/R/makeLazyLoad.R?; and 
that file doesn't immediately look like it's doing anything that 
fundamentally couldn't be parallelized?  (ie., running multiple R 
processes at once, one per library; at a glance the logic looks nicely 
per-library.)

	A third approach could be to try to parallelize the logic in 
makeLazyLoad.R.  I would expect that to be at best much more difficult, 
though.

	Anyway, there are lots of things that look like they could in theory be 
done here.  And I know just enough at this point to be dangerous; not 
enough to contribute :-)  Hence my asking, has anyone thought about 
this?  If not, I assume the best thing for me to do would be to poke at 
it; try to figure out own my own how this works and what's most 
feasible.  But if anyone has any pointers, that would likely save me a 
bunch of time.  And if this is something that you prefer to keep serial 
for some reason, that would be good to know too, so I don't spend time 
on it.

Thanks,
Adam