Skip to content

CRAN policies

32 messages · Gabor Grothendieck, Paul Gilbert, Uwe Ligges +12 more

Messages 1–25 of 32

#
CRAN has for some time had a policies page at
http://cran.r-project.org/web/packages/policies.html
and we would like to draw this to the attention of package maintainers. 
  In particular, please

- always send a submission email to CRAN at r-project.org with the package
name and version on the subject line.  Emails sent to individual members 
of the team will result in delays at best.

- run R CMD check --as-cran on the tarball before you submit it.  Do
this with the latest version of R possible: definitely R 2.14.2,
preferably R 2.15.0 RC or a recent R-devel.  (Later versions of R are
able to give better diagnostics, e.g. for compiled code and especially
on Windows. They may also have extra checks for recently uncovered
problems.)

Also, please note that CRAN has a very heavy workload (186 packages were 
published last week) and to remain viable needs package maintainers to 
make its life as easy as possible.

Kurt Hornik
Uwe Ligges
Brian Ripley
#
One of the things I have noticed with the R 2.15.0 RC and --as-cran is 
that the I have to bump the version number of the working copy of my 
packages immediately after putting a version on CRAN, or I get an 
message about version suitability. This is probably a good thing for 
packages that I have changed, compared with my old habit of bumping the 
version number at arbitrary times, although the mechanics are a nuisance 
because I do not actually want to commit to the next version number at 
that point. For packages that I have not changed it is a bit worse, 
because I have to change the version number even though I have not yet 
made any changes to the package. This will mean, for example, that on 
R-forge it will look like there is a slightly newer version, even though 
there is not really.

I am curious how other developers approach this. Is it better to not 
specify --as-cran most of the time?  My feeling is that it is better to 
specify it all of the time so that I catch errors sooner rather than 
later, but maybe there is a better solution?

Paul
On 12-03-27 07:52 AM, Prof Brian Ripley wrote:
#
On 27.03.2012 16:17, Paul Gilbert wrote:
--as-cran is modelled rather closely after the CRAN incoming checks. 
CRAN checks if a new version has a new version number. Of course, you 
can ignore its result if you do not want to submit. The idea of using 
--as-cran is to apply it before you actually submit. Some parts require 
network connection etc.

Uwe
#
On Tue, Mar 27, 2012 at 7:52 AM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
Regarding the part about "warnings or significant notes" in that page,
its impossible to know which notes are significant and which ones are
not significant except by trial and error.
#
On 12-03-27 10:59 AM, Uwe Ligges wrote:
Yes but, for example, will R-forge run checks with --as-cran, and thus 
give warnings for any package unchanged from the one on CRAN, or run 
without --as-cran, and thus not give a true indication of whether the 
package is good to submit?

(No doubt R-forge will customise more, but I am trying to work out a 
strategy for my own automatic testing.)

Paul
#
On 27.03.2012 17:22, Paul Gilbert wrote:
This is a question for the R-forge maintainer. I would not expect it 
runs checks --as-cran, but I do now know.

Best,
Uwe
#
On 27.03.2012 17:09, Gabor Grothendieck wrote:
Right, it needs human inspection to identify false positives. We believe 
most package maintainers are able to see if he or she is hit by such a 
false positive.

Uwe Ligges
#
On 27/03/2012 15:17, Paul Gilbert wrote:
Yes.  It is only recommended for use just before submission.  It is not 
used by the CRAN daily checks, for example.

All it does it set some environment variables that you can also set in 
~/.R/check.Renviron, scripts ... and that is what the CRAN team do.  We 
introduced --as-cran to make it easier to explain to submitters how to 
get the check results we reported [*].

As for what the set is, read 'R Internals' or the code (it will vary by 
R version).

Given that we get several submissions per week with the same version 
number or name as a package already on CRAN, we do need submitters to 
run the 'incoming' check before submission.

[*] Since answering several emails a day about why their results were 
different was taking up far too much time.

  
    
#
2012/3/27 Uwe Ligges <ligges at statistik.tu-dortmund.de>:
The problem is that a note is generated and the note is correct. Its
not a false positive.  But that does not tell you whether its
"significant" or not.  There is no way to know.  One can either try to
remove all notes (which may not be feasible) or just upload it and by
trial and error find out if its accepted or not.
#
Is there a distinction as to NOTE vs. WARNING that is documented?  I've
always assumed (wrongly?) that NOTES weren't an issue with publishing on
CRAN, but that they may change to WARNINGS at some point.

Is the process by which this happens documented somewhere?

Jeff
On 3/27/12 11:09 AM, "Gabor Grothendieck" <ggrothendieck at gmail.com> wrote:

            
#
On 27.03.2012 19:10, Jeffrey Ryan wrote:
We won't kick packages off CRAN for Notes (but we will if Warnings are 
not fixed), but we may not accept new submissions with significant Notes.

Best,
Uwe Ligges
#
Thanks Uwe for the clarification on what goes and what stays.

Still fuzzy on the notion of "significant" though.  Do you have an example
or two for the list?

Jeff

P.S.
I meant to also thank all of CRAN volunteers for the momentous efforts
involved, and it is nice to see some explanation of how we can help, as
well as a peek into what goes on 'behind the curtain' ;-)
On 3/27/12 1:19 PM, "Uwe Ligges" <ligges at statistik.tu-dortmund.de> wrote:

            
#
2012/3/27 Uwe Ligges <ligges at statistik.tu-dortmund.de>:
Yes, I understand that but that does not really address the problem
that one has no idea of whether a Note is significant or not so the
only way to determine its significance is to submit your package and
see if its accepted or not.
#
An associated problem, for the wish list, is that it would be nice for 
package developers to have a way to automatically distinguish between 
NOTEs that can usually be ignored (e.g. a package suggests a package 
that is not available for cross reference checks - I have several case 
where the suggested package depends on the package being built, so this 
NOTE occurs all the time), and NOTEs that are really pre-WARNINGS, so 
that one can flag these and spend time fixing them before they become a 
WARNING or ERROR. Perhaps two different kinds of notes?

(And, BTW, having been responsible for a certain amount of the
   >[*] Since answering several emails a day about why their
   >results were different was taking up far too much time.
I think --as-cran is great.)

Paul
On 12-03-27 02:19 PM, Uwe Ligges wrote:
#
I have been wondering if it is possible to automate the checking
process to reduce human efforts, e.g. automatically check the packages
submitted to FTP, and send the package maintainer an email in case of
warnings or errors (otherwise just move it to CRAN); package
maintainers can appeal for a manual check by CRAN maintainers in case
of false positives. As a package author, I really hate to bother CRAN
maintainers each time I upload a new version and it passes R CMD check
successfully, in which case I should have received an automatic email
instead of Kurt's "hand-writing" "thanks, on CRAN now". Frankly
speaking, it makes me feel guilty sometimes to update my packages,
thinking of other 3700 packages on CRAN and how much time you CRAN
maintainers are spending on checking the packages.

I do not know how many package authors actually read this mailing
list, so these policies may not really reach some authors at all.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA
#
I've started using win-builder before submitting to CRAN.  This often
picks up problems that I don't see locally.

Hadley
#
On Tue, Mar 27, 2012 at 6:52 AM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
Thanks for the pointer - I did not know that this page existed. In
general, is there some easy way to track changes to this page and the
R extension manual over time?  It is difficult to keep track of the
best practices.

I'd also like to get clarification on "Packages should not write in
the users' home filespace, nor anywhere else on the file system apart
from the R session's temporary directory (or during installation in
the location pointed to by TMPDIR: and such usage should be cleaned
up)." - what is recommended practice for packages to maintain state
across instances?  Operating systems have standards for where
applications can store settings (e.g. as described in
http://pypi.python.org/pypi/appdirs/1.2.0).  Is it acceptable to for
packages to follow these conventions?

Hadley
#
Lots of very sensible policies here.  I have one request as someone
who has in several cases had to involve company lawyers over
intellectual property issues with packages on CRAN -- the first bullet
point on ownership of copyright and intellectual property rights could
be strengthened further.

To the existing text "The ownership of copyright and intellectual
property rights of all components of the package must be clear and
unambiguous (including from the authors specification in the
DESCRIPTION file). Where code is copied (or derived) from the work of
others (including from R itself), care must be taken that any
copyright statements are preserved and authorship is not
misrepresented.
Trademarks must be respected."

I would add a few additional points :

1. The text of the license itself should be included in the package in
a LICENSE or COPYING file, as most of these licenses have things that
need to be filled in with names and other data, and just referencing a
license name in the DESCRIPTION file is not really a great way to deal
with licensing metadata when used exclusively (it's a great complement
to a full, filled-out license in the package itself).

2. Per file copyright comment headers can help immensely with ensuring
compliance and the accidental incorporation of files under a different
license.  Comment header blocks with the author name and terms of
distribution could be recommended for all source files.

               - Murray

On Tue, Mar 27, 2012 at 4:52 AM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
#
On 28.03.2012 00:07, Hadley Wickham wrote:
The policy is meant not to overwrite user data or generate loads of 
temporary files from examples and pollute, e.g., the owkring directory.

Uwe Ligges
#
On 27.03.2012 20:33, Jeffrey Ryan wrote:
We have to look at those notes again and again in order to find if 
something important is noted, hence please always try to avoid all notes 
unless the effect is really intended!


Consider the Note "No visible binding for global variable"
We cannot know if your code intends to use such a global variable (which 
is undesirable in most cases), hence would let is pass if it seems to be 
sensible.

Another Note such as "empty section" or "partial argument match" can 
quickly be fixed, hence just do it and don't waste our time.

Best,
Uwe Ligges
#
On 27.03.2012 20:36, Gabor Grothendieck wrote:
We have to look at those notes again and again in order to find if 
something important is noted, hence please always try to avoid all notes 
unless the effect is really intended!


Consider the Note "No visible binding for global variable"
We cannot know if your code intends to use such a global variable (which 
is undesirable in most cases), hence would let is pass if it seems to be 
sensible.

Another Note such as "empty section" or "partial argument match" can 
quickly be fixed, hence just do it and don't waste our time.

Best,
Uwe Ligges
#
2012/3/28 Uwe Ligges <ligges at statistik.tu-dortmund.de>:
What is the point of notes vs warnings if you have to get rid of both
of them?  Furthermore, if there are notes that you don't have to get
rid of its not fair that package developers should have to waste their
time on things that are actually acceptable.  Finally, it makes the
whole system arbitrary since packages can be rejected based on
undefined rules.

Either divide notes into significant notes and ordinary notes and
clearly label them as such in the output of   R CMD check   or else
make the significant notes warnings so one can know in advance whether
the package passes R CMD check or not.
#
On 28.03.2012 16:30, Gabor Grothendieck wrote:
I tried to make clear that we cannot decide that automatically and it 
needs human inspection and thinking if some Note is significant or not. 
That why we have not made them Warnings where we are sure things have to 
be fixed.

Please always try to avoid all notes unless the effect is really 
intended! How hard can it be? If Notes could be completely ignored, they 
would not be Notes.

Uwe
#
On Thu, Mar 29, 2012 at 3:30 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
The "notes" are precisely the things for which clear rules can't be
written.  They are reported by CMD check because they are usually
signs of coding errors, but are not warnings because their use is
sometimes justified.

The 'No visible binding for global variable" is a good example.  This
found some bugs in my 'survey' package, which I removed. There is
still one note of this type, which arises when I have to handle two
different versions of the hexbin package with different internal
structures.  The note is a false positive because the use is guarded
by an if(), but  CMD check can't tell this.   So, it's a good idea to
remove all Notes that can be removed without introducing other code
problems, which is nearly all of them, but occasionally there may be a
good reason for code that produces a Note.

But if you want a simple, unambiguous, mechanical rule for *your*
packages, just eliminate all Notes.

   -thomas