Skip to content

[Bioc-devel] R cmd check time limits for BioConductor

5 messages · Hervé Pagès, Vincent Carey, Kevin R. Coombes +2 more

#
Hi,
Robert Gentleman wrote:
[...]
Just to clarify:

- We build BioC devel *and* BioC release every day.

- Some build machines are running both builds (devel and release) so at most
   12 hours can be spent on each build (the devel builds run from noon to midnight
   and the release builds from midnight to noon, Seattle time).

- The builds are parallelized i.e. up to 4 'R CMD check' processes can run
   simultaneously on the same build machine at any given time. As a consequence,
   an entire build run (250-270 packages) takes between 6 and 11 hours
   on each build machine (64-bit Linux like wilson1-2 are the fastest).
   Parallelization is the only way an entire build run can be done in less
   than 12 hours on all the machines.

- Note that 'R CMD check' is not the only command that is executed for each
   package. The build stages are: (a) install the dependencies, (b) run 'R CMD build',
   (c) run 'R CMD check' and (d) build the binary package (on Windows and Mac OS X
   only).

During the same build run, a lot of CPU cycles are wasted because the same
thing can be computed several times. For example each vignette is tested twice:
the 1st time by 'R CMD build' and the 2nd time by 'R CMD check'. We could easily
avoid this by running 'R CMD check --no-vignettes': that would probably make
the builds 10%-30% faster without compromising the current testing paradigm.
Other things are done several times (like installing the exact same package 2
or 3 times, even 4 times in some rare situations) but trying to avoid this
would more complicated.

Cheers,
H.
#
This is an important topic.  I believe that the building/checking
process is a key source of value added by the Bioc project for developers.

A worthwhile thought experiment: What would be done if we had unlimited
resources?  It seems to me that a primary resource that a developer
lacks access to is the range of platforms -- hardware, OS, compilers --
that we want users to be able to work with reliably.  Thus a very
high priority for the build/check system is coverage of the main
varieties of "system" that users are likely to use Bioconductor on.

The devel branch is very important because it indicates how ongoing
changes to R affect performance/accuracy/interactions of package code.  Most
developers aren't going to be updating R and all other packages approximately
daily.  Thus, without the devel build system, many developers would
find themselves working hard when a new R release became imminent, to
port abruptly to the new release.  The devel build system allows this
to occur somewhat more smoothly, and affords the possibility of feedback
to R core when tentative changes to R are problematic.

I recognize that the points just made are well-known, and these remarks
are not made defensively.  Instead I am trying to come up with some
a priori limits to testing requirements falling to bioconductor, as
opposed to requirements that lie uniquely with the developer.  So the
first two priorities are, briefly:

1) Cover the platforms
2) Track performance relative to evolving R

Now we know full well that our resources require that package testing
be limited to around five minutes.  Is this in itself a value or an
obstacle to ensuring project/package reliability?

My sense is that this constraint might be a value -- I understand that
a very multifaceted package may have many essential tests that can finish
rapidly but the sum of testing times exceeds our limits.  Perhaps that
package should be broken up... Perhaps it should get an exception...
I don't know.  Other things being equal, I think it is good for the software
that there be cases that are legitimate tests that run quickly.  It means
users can demonstrate the functionality in short order, that the tests
can be varied and examined without long delay, and that, probably, we
have capacity to do more tests of different facets of the package.
concerned about most involve tests that would indicate problems with
respect to priorities 1) and 2).  Portability problems, or dependencies
on ephemeral features of R, might only crop up with certain very long
tests ... but I think that would be exceptional.  Thus the deep tests
should be done "at home" and the light ones left for the project
system.

Finally, the complexity of the project test/build system has to be
kept very manageable -- we have {release, devel} X platform X {software,
experiment, annotation} and introducing a short/long testing stream
may be feasible but may not pay off.  The point is that even if we did
have a lot more resources I am not sure it would be sound to allow
indefinite test return times.

I am open to correction or criticism on any point made above.  I am
trying to articulate some points about testing in the project that
are probably quite superficial from the perspective of serious software
development and engineering -- yet the breadth of testing
accomplished and the necessity of release/devel branches is not very
widely appreciated among some of my contacts in other domains ...
so I have taken this opportunity to articulate these views to the devel
group.

The information transmitted in this electronic communica...{{dropped:10}}
#
Hi Vince,

I agree with everything you said. The conclusion I draw from it (which 
is, of course, compatible with my prior distribution on the issue ...) 
is that we really need two levels of testing. One would cover your 
issues (1) and (2) and, because of the need to complete checking and 
building every day, would have to run quickly. A more elaborate level of 
systematic "deep" regression testing should also be available which (i) 
would not be run by default but (ii) could be optionally included by "R 
CMD check" without forcing everyone who wants this kind of testing to 
have to reinvent the wheel. Given that conclusion, I'm going to move 
over to R-devel and try to convince them to add this feature to R....

Thanks,
	Kevin
Vincent Carey 525-2265 wrote:
#
This is just a followup to what Vince was asking "what would we want  
with unlimited resources.  I agree with others that there are often  
two tiers of tests I think a package could have. Small quick tests are  
(perhaps) sensible to run daily, but longer tests would only have to  
be run once in a while. I however strongly feel that the opportunity  
to get these tests run on a variety of platforms can prove  
indispensable for catching bugs you only see when moving across OS's  
and compilers. Perhaps I have been unlucky but the majority of the  
nasty bugs I have been involved with are of this type.

So to wrap it up: I am fine with only allowing 5 minutes per run, but  
in my dream world it would be possible for developers to request to  
get the results of a longer test. That will be very nice when (1) you  
are working hard on changing core things in the package and (2)  
nearing a new release when you are wondering about the accumulated  
changes in all packages. By having it being request only, we will  
spend relatively few cpu cycles on it compared to a daily check.

Of course, I have no great idea as to how such a system could be  
implemented. I assume granting ssh access is out of the question.

Kasper
On Jun 10, 2008, at 12:03 PM, Vincent Carey 525-2265 wrote: