Skip to content

[R-pkg-devel] Recurrent link timeout for common license at R CMD check

6 messages · Dirk Eddelbuettel, Michael Chirico, Jeff Newmiller +1 more

#
Most of the README.md files for my package list the license I chose, and most
do so via a 'badge' showing the license and a link to the 'upstream' source
of the license.  So far, so good.

As I happen to prefer GPL licenses, I link to the fsf.org website. And
several recent package uploads of mine were upheld and moved to 'Inspect'
state forcing poor overworked Uwe Ligges to manually look at the log file
to conclude 'yep, spurious, all good here' because of a mere timeout.

Same this morning: even after fiddling with the URL I use, testing several
times from here and noticing that 'oh dear this is apparently simply random'
I got one pass (Dortmund, Windows) and one fail (Vienna, Linux) so back to
'Inspect' and wasting Uwe's time it is.

I would rather skip that step and take advantage of automation at CRAN and
not create extra work. I am not quite sure what the best way forward is. I
can think of saying 'ok, folks in Boston cannot run a server' and link to
the Wikipedia page of the GPL. Seems wrong though as we like to show the
original text. I notice that the R website does the same by providing GPL-2
via a lopy copy: https://www.r-project.org/COPYING   Now, for the package I
was working on this morning I actually needed GPL-3 and not GPL-2 so no luck
there. 

Short of giving up and creating a GitHub Pages hosted copy of the licenses I
may need, is there another good source ... without the server timing out?
https://choosealicense.com/licenses/ is pretty good but doesn't of course
provide GPL-2 so no luck for me there for most of my 'GPL (>= 2)' packages.

Anybody have a better fix or idea? Maybe use the R sources (!!) and rely on
GitHub (most likely via a CDN) serving the licenses in 

  https://github.com/r-devel/r-svn/tree/main/share/licenses 
  https://github.com/r-devel/r-svn/blob/main/share/licenses/license.db

where via the .db (ascii text) file ones sees that all these licenses _are_
in fact served via

  https://www.r-project.org/Licenses/

which even acts as a 'pretty' landing page (which I think I once knew
existed, looked for but could not locate via links from either the top-level
www.r-project.org or cran.r-project.org). 

So should we all link to that?

Or not because it puts yet more load on the poor main r-project.org server
(or should we maybe CDN that or parts of it via cloudflare.com ?)

Cheers, Dirk
#
is the SPDX site any more reliable?

https://spdx.org/licenses/
On Thu, Sep 25, 2025, 7:47?AM Dirk Eddelbuettel <edd at debian.org> wrote:

            

  
  
#
On 25 September 2025 at 08:21, Michael Chirico wrote:
| is the SPDX site any more reliable?
| 
| https://spdx.org/licenses/

Great suggestion especially as it even differentiates between GPL "2 exactly"
and "2 or later" (as many of us here do, following R itself).

The look and feel of that site is little more 'ahem' but they make up for
that by being comprehensive. And its seems to be a Linux Foundation
initiative so it may have proper hosting

Dirk
| On Thu, Sep 25, 2025, 7:47?AM Dirk Eddelbuettel <edd at debian.org> wrote:
| 
| >
| > Most of the README.md files for my package list the license I chose, and
| > most
| > do so via a 'badge' showing the license and a link to the 'upstream' source
| > of the license.  So far, so good.
| >
| > As I happen to prefer GPL licenses, I link to the fsf.org website. And
| > several recent package uploads of mine were upheld and moved to 'Inspect'
| > state forcing poor overworked Uwe Ligges to manually look at the log file
| > to conclude 'yep, spurious, all good here' because of a mere timeout.
| >
| > Same this morning: even after fiddling with the URL I use, testing several
| > times from here and noticing that 'oh dear this is apparently simply
| > random'
| > I got one pass (Dortmund, Windows) and one fail (Vienna, Linux) so back to
| > 'Inspect' and wasting Uwe's time it is.
| >
| > I would rather skip that step and take advantage of automation at CRAN and
| > not create extra work. I am not quite sure what the best way forward is. I
| > can think of saying 'ok, folks in Boston cannot run a server' and link to
| > the Wikipedia page of the GPL. Seems wrong though as we like to show the
| > original text. I notice that the R website does the same by providing GPL-2
| > via a lopy copy: https://www.r-project.org/COPYING   Now, for the package
| > I
| > was working on this morning I actually needed GPL-3 and not GPL-2 so no
| > luck
| > there.
| >
| > Short of giving up and creating a GitHub Pages hosted copy of the licenses
| > I
| > may need, is there another good source ... without the server timing out?
| > https://choosealicense.com/licenses/ is pretty good but doesn't of course
| > provide GPL-2 so no luck for me there for most of my 'GPL (>= 2)' packages.
| >
| > Anybody have a better fix or idea? Maybe use the R sources (!!) and rely on
| > GitHub (most likely via a CDN) serving the licenses in
| >
| >   https://github.com/r-devel/r-svn/tree/main/share/licenses
| >   https://github.com/r-devel/r-svn/blob/main/share/licenses/license.db
| >
| > where via the .db (ascii text) file ones sees that all these licenses _are_
| > in fact served via
| >
| >   https://www.r-project.org/Licenses/
| >
| > which even acts as a 'pretty' landing page (which I think I once knew
| > existed, looked for but could not locate via links from either the
| > top-level
| > www.r-project.org or cran.r-project.org).
| >
| > So should we all link to that?
| >
| > Or not because it puts yet more load on the poor main r-project.org server
| > (or should we maybe CDN that or parts of it via cloudflare.com ?)
| >
| > Cheers, Dirk
| >
| > --
| > dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
| >
| > ______________________________________________
| > R-package-devel at r-project.org mailing list
| > https://stat.ethz.ch/mailman/listinfo/r-package-devel
| >
#
Isn't this the same old trade-off between static linking and dynamic linking or package vendoring vs. dependency that plagues all software? 

A world in which data/functionality are only a link away is amazingly compressible and featureful, but if you are on a plane or otherwise disconnected then all you have is the link. If you 'remember' (cache) the meaning of that link you (the consumer) can at least pretend you know what it would have delivered (I don't need to actually go to <https://www.r-project.org/COPYING> if I think I already know what it says), but of course so many things like that have implicit semantics (what if the license there is changed to inform the reader that a court case invalidated some of it's terms? or there is a usage counter that must be incremented by the retrieval to comply with the full terms? Should such time-varying information even be allowed to be part of the package release?).

The fact that CRAN cannot follow the link is a reminder that your users may not see the information that you intended to convey at the time of release when they need to look at it... and CRAN has a policy that if your contribution is incomplete that they don't want to accept it. There is a fundamental divide between the point of view that you have the right to put some of your content (license in this case) outside the package and their philosophy that you should be providing at the very least a link that is valid at the time they check it. But even that can never address the plight of the offline user with a local copy of CRAN not being able to evaluate the terms of usage. 

I am not clear how far CRAN should be bending for this issue... relying on Internet caching by some for-profit corporation doesn't really "solve" the fundamental issue that the package is incomplete as it stands... relying on the Internet is absolutely great for efficiency, but it is not really clear to me that doing so can consistently deliver a complete representation of the software and its terms of use. GPL2 allows separation of the deliverable and the source if there is a demonstrably reliable way to retrieve the complete source, but in this case the source is being delivered separately from the license and we seem to be finding that https links may not be passing a minimum bar set by CRAN. I don't happen to think moving that bar toward caching is a good idea... I have been offline before and probably will be again, and am also worried about the content at the end of that link changing the TOU later.

So, IMO this is just another incarnation of dependency hell.
On September 25, 2025 7:47:16 AM PDT, Dirk Eddelbuettel <edd at debian.org> wrote:

  
    
#
Dirk,
I think a local copy of the license is best.  There is some complexity in
linking to third party sites.  Do they guarantee to serve the content to
you and with what service level?  This seems to be the problem that you
have.  Do they undertake to maintain the URL for the length of time that
you need it?  How long will you need it for?  How will they tell you that
they are stopping supporting the URL?  How do you update the links to the
page that are in someone else's installation of your package when the URL
changes? As with the time check on package submissions, it is preferable to
link, if you link, to something that is under the control of the publishing
organisation itself, not a third party, unless there are some undertakings
in place, so I'd suggest the R project site if you absolutely must have a
remote copy.  Picking a site that you have no agreement with, either
implicitly or explicitly, does not cover those issues and so does not
minimise the risk of that third party dependency to this business process
(in this case the package check).

SPDX and the FSF Reuse tool do not appear to use the spdx website for
remotely accessed copies of licenses.  The license text referenced by those
tools live in Github and access is described in
https://github.com/spdx/license-list-data/blob/main/accessingLicenses.md
and in https://github.com/spdx/license-list-data/blob/main/README.md The
SPDX software BOM metadata standard appears to include the text of the
licenses that it rerences instead of using external references.  Similarly
the FSF Reuse tool copies licenses locally.

Jeff Newmiller's point about caching is a good one.  It is preferable to
deliver, as part of the package, the text of the license that you intend
people to read, not something under someone else's control, and this is
what the SPDX and FSF tools do.  By relying on a link you own whatever it
is that the third party serves, whether thats a license text, a connection
failure, a redirect, a truncated or hacked copy, or an ad.

Greg
On Fri, 26 Sept 2025 at 00:47, Dirk Eddelbuettel <edd at debian.org> wrote:

            

  
  
#
On 26 September 2025 at 15:34, Greg Hunt wrote:
| It is preferable to deliver, as part of the package, the text of the
| license that you intend people to read, not something under someone else's
| control, and this is what the SPDX and FSF tools do.

You answer something that was never in any way part of my question. Per
Writing R Extensions, Section 1.1.2:

     Whereas you should feel free to include a license file in your
  _source_ distribution, please do not arrange to _install_ yet another
  copy of the GNU ?COPYING? or ?COPYING.LIB? files but refer to the copies
  on <https://www.R-project.org/Licenses/> and included in the R
  distribution (in directory ?share/licenses?).

My question was *explicitly* about what URL *link* would be recommended to
avoid timing out under R CMD check (also see Subject: of this thread).

I think I may just link to the R-project site.

Dirk