[R-pkg-devel] Run-times of examples in vignettes - R-package-devel

Tue, Oct 27, 2020 4:21 AM #

Dear all,

is there somewhere an official statement about the maximum run-times of 
examples in vignettes?

Currently I got for our seven vignettes on my six-years old machine 7.9 
minutes. One of them (containing a lot of simulation code) take 7.1 
minutes. Hence, in the submission we asked for --no-vignettes (on 
Windows only). Interestingly on some flavours and operating systems this 
was also observed, whereas on others not. What I don't understand: Some 
zip-files (e.g. Windows r-oldrel) which state in the result-Flag 
--no-vignettes contain not only the *.Rmd (what I would expect) but also 
the *.R and *.html.
Since a while we submit the time-critical vignette pre-compiled, which 
should bring the total execution time below one minute. Is that fine for 
all flavours & OSs?

Helmut

Ing. Helmut Sch?tz
BEBAC?? Consultancy Services for
Bioequivalence and Bioavailability Studies
Neubaugasse 36/11
1070 Vienna, Austria
E helmut.schuetz at bebac.at
W https://bebac.at/
F https://forum.bebac.at/

Dirk Eddelbuettel

Tue, Oct 27, 2020 5:32 AM #

On 27 October 2020 at 12:21, Helmut Sch?tz wrote:

| is there somewhere an official statement about the maximum run-times of
| examples in vignettes?

Seven minutes is excessive. I have (long) gone by the rule of "about one
minute" each for tests and examples. Rcpp is slightly above [1], especially
on Windows. I also tone down tests when on CRAN (using a scheme devised
almost a decade ago and described a few times here and in other places) where
to the package, having an extra fourth digit in the microrelease such as the
.4 currently in 1.0.5.4 signals 'full tests' whereas a three-digit release
number (such as 1.0.5 on CRAN signals reduced tests). Works for me and on
all CI instances, requires no user input and no magic environment variable.

Now, we have some (partial) empirics on this as my reverse depends checks for Rcpp time
the runs (and the success/failure) in a small SQLite db (all on GH) [2]. I
also explicitly skip some packages taking too much time. I should probably
automate looking at the times and updating the list of skipped packages
automatically...

Dirk

[1] https://cloud.r-project.org/web/checks/check_results_Rcpp.html

[2] This is from an machine that is loaded by the reverse depends running six
tests in parallel over four cores. This would be quicker on a nicer machine
but ... this what we have, and what we very much appreciate having access
to. It is also 'total time' for 'R CMD check'.

sqlite> select package, runtime/60 as timeInMins from results order by timeInMins desc limit 10;
package timeInMins
---------- ----------------
cbq 54.2629543542862
rstanarm 51.9673245469729
OpenMx 44.4183359424273
survHE 40.4267136971156
pcFactorSt 36.404546391964
metaBMA 35.6222194115321
trialr 35.2238613526026
emIRT 34.6994982639949
bmgarch 34.3939555843671
bsem 33.4894991954168
sqlite>

So the first four or five clearly are candidates for skipping.

https://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org

Helmut Schütz

Tue, Oct 27, 2020 5:53 AM #

Hi Dirk,

Dirk Eddelbuettel wrote on 2020-10-27 13:32:

Sure. The one vignette contains simulation code which needs 1E5 to 1E6 
sims to get a stable result. Fewer sims are simply not meaningful.
Since we use a pre-complied vignette now the execution time is 
essentially zero.
The others take 45 seconds in total.
If we would pre-compile the second slowest as well, we would be down for 
the remaining four at 12 seconds.

OK. Do you know of any reference for this "rule"?

Helmut

Ing. Helmut Sch?tz
BEBAC?? Consultancy Services for
Bioequivalence and Bioavailability Studies
Neubaugasse 36/11
1070 Vienna, Austria
E helmut.schuetz at bebac.at
W https://bebac.at/
F https://forum.bebac.at/

Ben Bolker

Tue, Oct 27, 2020 8:29 AM #

My general solution is to run time-consuming computations in advance 
and store the results in (e.g.) inst/vignette_data, whence they can be 
retrieved via system.file("vignette_data", "output.rda", 
package="my_pkg"). (I might also include the R script required to 
generate the file so that I could automatically re-make those outputs as 
required ...)

On 10/27/20 8:53 AM, Helmut Sch?tz wrote:

Thierry Onkelinx

Tue, Oct 27, 2020 10:50 AM #

My solution is similar to Ben's solution. Except that the code for creating
the data is in the vignette. The chunk only runs when the data is not
available. The trick is to pass code to the eval argument instead of a
fixed TRUE or FALSE. See
https://github.com/ropensci/git2rdata/blob/bad8a4cf42049faa72a04b202c5a4dfc233b4046/vignettes/efficiency.Rmd#L293
for an example.

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op di 27 okt. 2020 om 16:29 schreef Ben Bolker <bbolker at gmail.com>:

______________________________________________
R-package-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel