Skip to content

[R-pkg-devel] Unusually long execution time for R.utils::gzip on r-devel-windows

5 messages · Stefan Mayer, Henrik Bengtsson, Kevin Ushey

#
Dear list,

I tried to submit an update to my R package imagefluency, but the update does not pass the incoming checks automatically. The problem is that one of the examples takes too long to execute ? but only under Windows with the development version of R (R Under development (unstable) (2024-02-15 r85925 ucrt)).

I was able to pin down the problem to using R.utils::gzip(). I created a test package that illustrates the problem: https://github.com/stm/ziptest
The package has two functions that zip a file given a file path. When using R.utils::gzip() (function `gzipit()` in the test package), I get a NOTE when checking the package using devtools::check_win_devel()

* checking examples ... [55s] NOTE
Examples with CPU (user + system) or elapsed time > 10s
       user system elapsed
gzipit 6.91  47.24   54.84

There is no issue with utils::zip() (function `zipit()` in the test package). Is this somehow a bug in R.utils::gzip(), or is there an issue with the combination of Windows and r-devel?

Best, Stefan
#
Author of R.utils here. I happen to investigate this too right now,
because of extremely slow win-builder performance of R.rsp checks,
which in turn depends on R.utils.

It's not obvious to me why this happens on win-builder. I've noticed
slower and slower win-builder/cran-incoming checks over the years,
despite the code not changing. Right now, I'm investigating a piece of
code that calls shell("dir") as a fallback to figure out if a file is
a symbol link or not - it could be that that takes a very long time on
win-builder.

So, stay tuned ... I'll report back when I find something out.

/Henrik

On Fri, Feb 16, 2024 at 4:43?PM Stefan Mayer
<stefan.mayer at uni-tuebingen.de> wrote:
#
I can confirm that this has to fixed in R.utils. This gist is that
R.utils does lots of validation of read/write permissions, and deep
down it rely on system("dir") as a fallback method. If this is down
toward dirname(tempdir()), then it'll find a lot of files, e.g.

    [1] " Datentr?ger in Laufwerk D: ist Daten"
    [2] " Volumeseriennummer: 1826-A193"
    [3] ""
    [4] " Verzeichnis von D:\\temp"
    [5] ""
    [6] "17.02.2024  09:06    <DIR>          ."
    [7] "14.02.2024  03:36                 0 cc6H4Sp5"
    [8] "15.02.2024  03:46                 0 cc6PwKb4"
    [9] "15.02.2024  16:25                 0 cc6RH27v"
   [10] "16.02.2024  01:50                 0 ccafzzMl"
   ...
[99997] "09.02.2024  04:48    <DIR>          RtmpURWDbA"
[99998] "14.02.2024  04:35    <DIR>          RtmpURWeVC"
[99999] "15.02.2024  04:00    <DIR>          RtmpUrwhHU"
 [ reached getOption("max.print") -- omitted 17165 entries ]
Time difference of 18.67841 secs

So, yeah, wow!  I'll look into fixing this, probably by removing this
fallback approach, which is very rarely needed; it was added way back
when Sys.readlink() didn't cover all cases.

/Henrik

On Fri, Feb 16, 2024 at 9:24?PM Henrik Bengtsson
<henrik.bengtsson at gmail.com> wrote:
#
Thanks for looking into this, Henrik! Feel free to let me know if I can help with anything.

- Stefan
#
FWIW, as far as I can tell, Sys.readlink() still doesn't handle
symlinks (or junction points) on Windows. Were you thinking of
normalizePath()? That does now resolve both symlinks and junction
points on Windows (courtesy of a lot of work from Tomas), although I
don't recall the exact versions in which support was introduced. But
that would still give you a more efficient way of detecting such files
on Windows.

Kevin

On Sat, Feb 17, 2024 at 3:21?AM Stefan Mayer
<stefan.mayer at uni-tuebingen.de> wrote: