Skip to content

[R-pkg-devel] Run-times of examples in vignettes

5 messages · Helmut Schütz, Dirk Eddelbuettel, Ben Bolker +1 more

#
Dear all,

is there somewhere an official statement about the maximum run-times of 
examples in vignettes?

Currently I got for our seven vignettes on my six-years old machine 7.9 
minutes. One of them (containing a lot of simulation code) take 7.1 
minutes. Hence, in the submission we asked for --no-vignettes (on 
Windows only). Interestingly on some flavours and operating systems this 
was also observed, whereas on others not. What I don't understand: Some 
zip-files (e.g. Windows r-oldrel) which state in the result-Flag 
--no-vignettes contain not only the *.Rmd (what I would expect) but also 
the *.R and *.html.
Since a while we submit the time-critical vignette pre-compiled, which 
should bring the total execution time below one minute. Is that fine for 
all flavours & OSs?

Helmut
#
On 27 October 2020 at 12:21, Helmut Sch?tz wrote:
| is there somewhere an official statement about the maximum run-times of 
| examples in vignettes?

Seven minutes is excessive. I have (long) gone by the rule of "about one
minute" each for tests and examples. Rcpp is slightly above [1], especially
on Windows. I also tone down tests when on CRAN (using a scheme devised
almost a decade ago and described a few times here and in other places) where
to the package, having an extra fourth digit in the microrelease such as the
.4 currently in 1.0.5.4 signals 'full tests' whereas a three-digit release
number (such as 1.0.5 on CRAN signals reduced tests).  Works for me and on
all CI instances, requires no user input and no magic environment variable.

Now, we have some (partial) empirics on this as my reverse depends checks for Rcpp time
the runs (and the success/failure) in a small SQLite db (all on GH) [2]. I
also explicitly skip some packages taking too much time. I should probably
automate looking at the times and updating the list of skipped packages
automatically...

Dirk

[1] https://cloud.r-project.org/web/checks/check_results_Rcpp.html

[2] This is from an machine that is loaded by the reverse depends running six
    tests in parallel over four cores. This would be quicker on a nicer machine
    but ... this what we have, and what we very much appreciate having access
    to. It is also 'total time' for 'R CMD check'.
    

    sqlite> select package, runtime/60 as timeInMins from results order by timeInMins desc limit 10;
    package     timeInMins      
    ----------  ----------------
    cbq         54.2629543542862
    rstanarm    51.9673245469729
    OpenMx      44.4183359424273
    survHE      40.4267136971156
    pcFactorSt  36.404546391964 
    metaBMA     35.6222194115321
    trialr      35.2238613526026
    emIRT       34.6994982639949
    bmgarch     34.3939555843671
    bsem        33.4894991954168
    sqlite>

    So the first four or five clearly are candidates for skipping.
#
Hi Dirk,

Dirk Eddelbuettel wrote on 2020-10-27 13:32:
Sure. The one vignette contains simulation code which needs 1E5 to 1E6 
sims to get a stable result. Fewer sims are simply not meaningful.
Since we use a pre-complied vignette now the execution time is 
essentially zero.
The others take 45 seconds in total.
If we would pre-compile the second slowest as well, we would be down for 
the remaining four at 12 seconds.
OK. Do you know of any reference for this "rule"?

Helmut
#
My general solution is to run time-consuming computations in advance 
and store the results in (e.g.) inst/vignette_data, whence they can be 
retrieved via system.file("vignette_data", "output.rda", 
package="my_pkg"). (I might also include the R script required to 
generate the file so that I could automatically re-make those outputs as 
required ...)
On 10/27/20 8:53 AM, Helmut Sch?tz wrote:
#
My solution is similar to Ben's solution. Except that the code for creating
the data is in the vignette. The chunk only runs when the data is not
available. The trick is to pass code to the eval argument instead of a
fixed TRUE or FALSE. See
https://github.com/ropensci/git2rdata/blob/bad8a4cf42049faa72a04b202c5a4dfc233b4046/vignettes/efficiency.Rmd#L293
for an example.

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op di 27 okt. 2020 om 16:29 schreef Ben Bolker <bbolker at gmail.com>: