Skip to content

codetools wrongly complains about lazy evaluation in S4 methods

14 messages · Serguei Sokol, Gabriel Becker, Mikael Jagan +6 more

#
In a package, I define a method for not-yet-generic function 'qr.X' like so:

     > setOldClass("qr")
     > setMethod("qr.X", signature(qr = "qr"), function(qr, complete, ncol) NULL)

The formals of the newly generic 'qr.X' are inherited from the non-generic
function in the base namespace.  Notably, the inherited default value of
formal argument 'ncol' relies on lazy evaluation:

     > formals(qr.X)[["ncol"]]
     if (complete) nrow(R) else min(dim(R))

where 'R' must be defined in the body of any method that might evaluate 'ncol'.
To my surprise, tools:::.check_code_usage_in_package() complains about the
undefined symbol:

     qr.X: no visible binding for global variable 'R'
     qr.X,qr: no visible binding for global variable 'R'
     Undefined global functions or variables:
       R

I claim that it should _not_ complain, given that lazy evaluation is a really
a feature of the language _and_ given that it already does not complain about
the formals of functions that are not S4 methods.

Having said that, it is not obvious to me what in codetools would need to change
here.  Any ideas?

I've attached a script that creates and installs a test package and reproduces
the check output by calling tools:::.check_code_usage_in_package().  Hope it
gets through.

Mikael
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: TestPackage.txt
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20230603/20664345/attachment.txt>
3 days later
#
Le 03/06/2023 ? 17:50, Mikael Jagan a ?crit?:
I think this issue is similar to the complaints about non defined 
variables in expressions involving non standard evaluation, e.g. column 
names in a data frame which are used as unquoted symbols. One of 
workarounds is simply to declare them somewhere in your code. In your 
case, it could be something as simple as:

 ? R=NULL

Best,
Serguei.
#
The API supported workaround is to call globalVariables, which,
essentially, declares the variables without defining them (a distinction R
does not usually make).

The issue with this approach, of course, is that its a very blunt
instrument. It will cause false negatives if you accidentally use the same
symbol in a standard evaluation context elsewhere in your code.
Nonetheless, that's the intended approach as far as i know.

Best,
~G



On Wed, Jun 7, 2023 at 1:07?AM Serguei Sokol via R-devel <
r-devel at r-project.org> wrote:

            

  
  
5 days later
#
Thanks both.  Yes, I was aware of globalVariables, etc.  I guess I was hoping
to be pointed to the right place in the source code, in case the issue could
be addressed properly, notably as it seems to have already been addressed for
functions that are not S4 methods, i.e., codetools is apparently not bothered
by

     def <- function(x = y) { y <- 0; x }

but still complains about

     setMethod("someGeneric", "someClass", def)

...

Mikael
On 2023-06-07 5:13 am, Gabriel Becker wrote:
#
Most of the errors, warnings and notes generated by R CMD check are 
generated by code in the tools package, usually in the tools/R/QC.R 
source file.  Search that file for the error message, then backtrack to 
find the code that causes it to be triggered.

If I recall correctly, it works on the evaluated source rather than the 
actual source, so it will only see the result of evaluating `setMethod` 
in your example.  I don't know the methods package well enough to know 
exactly what that does, but presumably it produces a function and hides 
it somewhere so that the S4 dispatch can find it when it needs to.

Duncan Murdoch
On 12/06/2023 2:03 p.m., Mikael Jagan wrote:
#
On Sat, 3 Jun 2023 11:50:59 -0400
Mikael Jagan <jaganmn2 at gmail.com> wrote:

            
In other words, codetools::checkUsage(base::qr.X) says nothing while
codetools::checkUsage(TestPackage::qr.X) complains. I think the
difference is that codetools::findFuncLocals sees an assignment to `R`
in the body of base::qr.X:

codetools::findFuncLocals(formals(base::qr.X), body(base::qr.X))
# [1] "cmplx"   "cn"      "ip"      "p"       "pivoted" "R"       "res"
# [8] "tmp"    

The problem, then, is that an S4 generic shouldn't be having such
assignments in its body. One way to fix this would be to modify
codetools::checkUsage to immediately return if inherits(fun,
'standardGeneric'), but I don't know enough about S4 to say whether
this is safe. (A more comprehensive fix would be to check every
encountered method against the formals of the generic, but that sounds
complicated.) Arguably, static analysis will always be wrong about
something, so we're trading a false positive for potential false
negatives.
#
On Sat, Jun 3, 2023 at 11:51?AM Mikael Jagan <jaganmn2 at gmail.com> wrote:

            
Perhaps I am misunderstanding something, but I think Mikael's expectations
about the scoping rules of R are wrong.  The enclosing environment of ncol
is where it was _defined_ not where it is _called_ (apologies if I am
messing up the computer science terminology here).

This suggests to me that codetools is right.  But a more extended example
would be useful. Perhaps there is something special with setOldClass()
which I am no aware of.

Also, Bioconductor has 100s of packages with S4 where codetools works well.

Kasper
#
I agree that this is not an R issue, but rather user error of not defining a proper generic so the check is right. Obviously, defining a generic with implementation-specific ncol default makes no sense at all, it should only be part of the method implementation. If one was to implement the same default behavior in the generic itself (not necessarily a good idea) the default would be ncol = if (complete) nrow(qr.R(qr, TRUE)) else min(dim(qr.R(qr, TRUE))) to not rely on the internals of the implementation.

Cheers,
Simon
1 day later
#
Thanks all - yes, I think that Simon's diagnosis ("user error") is correct:
in this situation one should define a reasonable generic function explicitly,
with a call to setGeneric, and not rely on the call inside of setMethod ...

But it is still not clear what the way forward should be (for package Matrix,
where we would like to export a method for 'qr.X').  If we do nothing, then
there is the note, already mentioned:

     * checking R code for possible problems ... NOTE
     qr.X: no visible binding for global variable ?R?
     Undefined global functions or variables:
       R

If we add the following to our R/AllGenerics.R :

     setGeneric("qr.X",
                function(qr, complete = FALSE, ncol, ...)
                    standardGeneric("qr.X"),
                useAsDefault = function(qr, complete = FALSE, ncol, ...) {
                    if(missing(ncol))
                        base::qr.X(qr, complete = complete)
                    else base::qr.X(qr, complete = complete, ncol = ncol)
                },
                signature = "qr")

then we get a startup message, which would be quite disruptive on CRAN :

     The following object is masked from 'package:base':

         qr.X

and if we further add setGenericImplicit("qr.X", restore = (TRUE|FALSE))
to our R/zzz.R, then for either value of 'restore' we encounter :

     ** testing if installed package can be loaded from temporary location
     Error: package or namespace load failed for 'Matrix':
      Function found when exporting methods from the namespace 'Matrix' which is 
not S4 generic: 'qr.X'

Are there possibilities that I have missed?

It seems to me that the best option might be to define an implicit generic
'qr.X' in methods via '.initImplicitGenerics' in methods/R/makeBasicFunsList.R,
where I see that an implicit generic 'qr.R' is already defined ... ?

The patch pasted below "solves everything", though we'd still have to think
about how to work for versions of R without the patch ...

Mikael

Index: src/library/methods/R/makeBasicFunsList.R
===================================================================
--- src/library/methods/R/makeBasicFunsList.R	(revision 84541)
+++ src/library/methods/R/makeBasicFunsList.R	(working copy)
@@ -263,6 +263,17 @@
  	       signature = "qr", where = where)
      setGenericImplicit("qr.R", where, FALSE)

+    setGeneric("qr.X",
+               function(qr, complete = FALSE, ncol, ...)
+                   standardGeneric("qr.X"),
+               useAsDefault = function(qr, complete = FALSE, ncol, ...) {
+                   if(missing(ncol))
+                       base::qr.X(qr, complete = complete)
+                   else base::qr.X(qr, complete = complete, ncol = ncol)
+               },
+               signature = "qr", where = where)
+    setGenericImplicit("qr.X", where, FALSE)
+
      ## our toeplitz() only has 'x'; want the generic "here" rather than "out 
there"
      setGeneric("toeplitz", function(x, ...) standardGeneric("toeplitz"),
  	       useAsDefault= function(x, ...) stats::toeplitz(x),
On 2023-06-13 8:01 pm, Simon Urbanek wrote:
#
I'd argue that at the root of the problem is that your qr.X() generic 
dispatches on all its arguments, including the 'ncol' argument which I 
think the dispatch mechanism needs to evaluate **before** dispatch can 
actually happen.

So yes lazy evaluation is a real feature but it does not play well for 
arguments of a generic that are involved in the dispatch.

If you explicitly defined your generic with:

 ?? setGeneric("qr.X", signature="qr")

you should be fine.

More generally speaking, it's a good idea to restrict the signature of a 
generic to the arguments "that make sense". For unary operations this is 
usually the 1st argument, for binary operations the first two arguments 
etc... Additional arguments that control the operation like modiflers, 
toggles, flags, rng seed, and other parameters, usually have not place 
in the signature of the generic.

H.
On 6/14/23 20:57, Mikael Jagan wrote:

  
    
#
Oh but I see now that you've already tried this in your R/AllGenerics.R, 
sorry for missing that, but that you worry about the following message 
being disruptive on CRAN:

 ??? The following object is masked from 'package:base':

 ??????? qr.X

Why would that be? As long as you only define methods for objects that 
**you** control everything is fine. In other words you're not allowed to 
define a method for "qr" objects because that method would override 
base::qr.X(). But the generic itself and the method that you define for 
your objects don't override anything so should not disrupt anything.

H.
On 6/15/23 13:51, Herv? Pag?s wrote:

  
    
#
On 2023-06-15 5:25 pm, Herv? Pag?s wrote:
Yes, maybe it would be fine in principle, and of course many popular
packages emit startup messages.  Still, in practice, I think that
people are quite accustomed to library(Matrix) being "silent",
and probably a nontrivial fraction of our reverse dependencies would
encounter new NOTEs about differences between *.Rout and *.Rout.save,
etc.

The fraction of users who will ever call this method for qr.X is very
very small compared to the fraction who will be confused or annoyed
by the message.  Hence my hope that an implicit generic qr.X could
become part of package methods, notably as an implicit generic qr.R
already lives there ...

Or maybe there is a way for Matrix to define qr.X as an implicit generic
without creating other problems, but my experiments with setGenericImplicit
were not promising ...

Mikael
#

        
> On 2023-06-15 5:25 pm, Herv? Pag?s wrote:
>> Oh but I see now that you've already tried this in your
    >> R/AllGenerics.R, sorry for missing that, 

yes, this one:

  setGeneric("qr.X",
	     function(qr, complete = FALSE, ncol, ...)
		 standardGeneric("qr.X"),
	     useAsDefault = function(qr, complete = FALSE, ncol, ...) {
		 if(missing(ncol))
		     base::qr.X(qr, complete = complete)
		 else base::qr.X(qr, complete = complete, ncol = ncol)
	     },
	     signature = "qr")

where the 'signature = "qr"' is perfect (and as Herv? suggests).

but the complication in useAsDefault = ..  is really a bit ugly
... notably if one compares with the nice original  base :: qr.X ()

    >> but that you worry about the following message being 
    >> disruptive on CRAN:
    >> 
    >> ??? The following object is masked from 'package:base':
    >> 
    >> ??????? qr.X
    >> 
    >> Why would that be? As long as you only define methods for
    >> objects that **you** control everything is fine. In other
    >> words you're not allowed to define a method for "qr"
    >> objects because that method would override
    >> base::qr.X(). But the generic itself and the method that
    >> you define for your objects don't override anything so
    >> should not disrupt anything.

    > Yes, maybe it would be fine in principle, and of course
    > many popular packages emit startup messages.  Still, in
    > practice, I think that people are quite accustomed to
    > library(Matrix) being "silent", and probably a nontrivial
    > fraction of our reverse dependencies would encounter new
    > NOTEs about differences between *.Rout and *.Rout.save,
    > etc.

I tend to agree with Herv? that the  "masked" warning here is
a false positive.

I also agree with Mikael  that we would not want this for every
default use of   library(Matrix)

I believe we should fix it by using conflictRules(), and from
reading ?conflictRules I understand we could do that by setting

  options(conflict.policy = list(mask.ok = "qr.X"))

possibly even in Matrix package's load or attach hook
[[and if really really necessary even as hack inside library()'s checks]]

    > The fraction of users who will ever call this method for
    > qr.X is very very small compared to the fraction who will
    > be confused or annoyed by the message.  Hence my hope that
    > an implicit generic qr.X could become part of package
    > methods, notably as an implicit generic qr.R already lives
    > there ...

    > Or maybe there is a way for Matrix to define qr.X as an
    > implicit generic without creating other problems, but my
    > experiments with setGenericImplicit were not promising ...

In principle, I'd say that setGenericImplicit() would be a good
/ "the correct" approach,  but as you already tried
unsuccessfully, I'm not claiming the principle.

Martin


    > Mikael

    >> H.
>> On 6/15/23 13:51, Herv? Pag?s wrote:
>>> 
    >>> I'd argue that at the root of the problem is that your
    >>> qr.X() generic dispatches on all its arguments,
    >>> including the 'ncol' argument which I think the dispatch
    >>> mechanism needs to evaluate **before** dispatch can
    >>> actually happen.
    >>> 
    >>> So yes lazy evaluation is a real feature but it does not
    >>> play well for arguments of a generic that are involved
    >>> in the dispatch.
    >>> 
    >>> If you explicitly defined your generic with:
    >>> 
    >>> ?? setGeneric("qr.X", signature="qr")
    >>> 
    >>> you should be fine.
    >>> 
    >>> More generally speaking, it's a good idea to restrict
    >>> the signature of a generic to the arguments "that make
    >>> sense". For unary operations this is usually the 1st
    >>> argument, for binary operations the first two arguments
    >>> etc... Additional arguments that control the operation
    >>> like modiflers, toggles, flags, rng seed, and other
    >>> parameters, usually have not place in the signature of
    >>> the generic.
    >>> 
    >>> H.
    >>>
>>> On 6/14/23 20:57, Mikael Jagan wrote:
>>>> Thanks all - yes, I think that Simon's diagnosis ("user
    >>>> error") is correct: in this situation one should define
    >>>> a reasonable generic function explicitly, with a call
    >>>> to setGeneric, and not rely on the call inside of
    >>>> setMethod ...
    >>>> 
    >>>> But it is still not clear what the way forward should
    >>>> be (for package Matrix, where we would like to export a
    >>>> method for 'qr.X').? If we do nothing, then there is
    >>>> the note, already mentioned:
    >>>> 
    >>>> ??? * checking R code for possible problems ... NOTE
    >>>> ??? qr.X: no visible binding for global variable ?R?
    >>>> ??? Undefined global functions or variables: ????? R
    >>>> 
    >>>> If we add the following to our R/AllGenerics.R :
    >>>> 
    >>>> ??? setGeneric("qr.X", ?????????????? function(qr,
    >>>> complete = FALSE, ncol, ...)  ??????????????????
    >>>> standardGeneric("qr.X"), ?????????????? useAsDefault =
    >>>> function(qr, complete = FALSE, ncol, ...) {
    >>>> ?????????????????? if(missing(ncol))
    >>>> ?????????????????????? base::qr.X(qr, complete =
    >>>> complete) ?????????????????? else base::qr.X(qr,
    >>>> complete = complete, ncol = ncol) ?????????????? },
    >>>> ?????????????? signature = "qr")
    >>>> 
    >>>> then we get a startup message, which would be quite
    >>>> disruptive on CRAN :
    >>>> 
    >>>> ??? The following object is masked from 'package:base':
    >>>> 
    >>>> ??????? qr.X
    >>>> 
    >>>> and if we further add setGenericImplicit("qr.X",
    >>>> restore = (TRUE|FALSE)) to our R/zzz.R, then for either
    >>>> value of 'restore' we encounter :
    >>>> 
    >>>> ??? ** testing if installed package can be loaded from
    >>>> temporary location ??? Error: package or namespace load
    >>>> failed for 'Matrix': ???? Function found when exporting
    >>>> methods from the namespace 'Matrix' which is not S4
    >>>> generic: 'qr.X'
    >>>> 
    >>>> Are there possibilities that I have missed?
    >>>> 
    >>>> It seems to me that the best option might be to define
    >>>> an implicit generic 'qr.X' in methods via
    >>>> '.initImplicitGenerics' in
    >>>> methods/R/makeBasicFunsList.R, where I see that an
    >>>> implicit generic 'qr.R' is already defined ... ?
    >>>> 
    >>>> The patch pasted below "solves everything", though we'd
    >>>> still have to think about how to work for versions of R
    >>>> without the patch ...
    >>>> 
    >>>> Mikael
    >>>> 
    >>>> Index: src/library/methods/R/makeBasicFunsList.R
    >>>> ===================================================================
    >>>> --- src/library/methods/R/makeBasicFunsList.R???
    >>>> (revision 84541) +++
    >>>> src/library/methods/R/makeBasicFunsList.R??? (working
    >>>> copy) @@ -263,6 +263,17 @@ ??????????? signature =
    >>>> "qr", where = where) ???? setGenericImplicit("qr.R",
    >>>> where, FALSE)
    >>>> 
    >>>> +??? setGeneric("qr.X", +?????????????? function(qr,
    >>>> complete = FALSE, ncol, ...)  +??????????????????
    >>>> standardGeneric("qr.X"), +?????????????? useAsDefault =
    >>>> function(qr, complete = FALSE, ncol, ...) {
    >>>> +?????????????????? if(missing(ncol))
    >>>> +?????????????????????? base::qr.X(qr, complete =
    >>>> complete) +?????????????????? else base::qr.X(qr,
    >>>> complete = complete, ncol = ncol) +?????????????? },
    >>>> +?????????????? signature = "qr", where = where) +???
    >>>> setGenericImplicit("qr.X", where, FALSE) + ???? ## our
    >>>> toeplitz() only has 'x'; want the generic "here" rather
    >>>> than "out there" ???? setGeneric("toeplitz",
    >>>> function(x, ...)  standardGeneric("toeplitz"),
    >>>> ??????????? useAsDefault= function(x, ...)
    >>>> stats::toeplitz(x),
    >>>>
>>>> On 2023-06-13 8:01 pm, Simon Urbanek wrote:
>>>>> I agree that this is not an R issue, but rather user
    >>>>> error of not defining a proper generic so the check is
    >>>>> right. Obviously, defining a generic with
    >>>>> implementation-specific ncol default makes no sense at
    >>>>> all, it should only be part of the method
    >>>>> implementation. If one was to implement the same
    >>>>> default behavior in the generic itself (not
    >>>>> necessarily a good idea) the default would be ncol =
    >>>>> if (complete) nrow(qr.R(qr, TRUE)) else
    >>>>> min(dim(qr.R(qr, TRUE))) to not rely on the internals
    >>>>> of the implementation.
    >>>>> 
    >>>>> Cheers, Simon
    >>>>> 
    >>>>>
>>>>>>
>>>>>> <jaganmn2 at gmail.com>
>>>>>> 
    >>>>>>> The formals of the newly generic 'qr.X' are
    >>>>>>> inherited from the non-generic function in the base
    >>>>>>> namespace.? Notably, the inherited default value of
    >>>>>>> formal argument 'ncol' relies on lazy evaluation:
    >>>>>>> 
    >>>>>>>> formals(qr.X)[["ncol"]]
    >>>>>>> ???? if (complete) nrow(R) else min(dim(R))
    >>>>>>> 
    >>>>>>> where 'R' must be defined in the body of any method
    >>>>>>> that might evaluate 'ncol'.
    >>>>>>> 
    >>>>>>
>>>>>> Mikael's
>>>>>> environment
>>>>>> (apologies if I am
>>>>>>
>>>>>> extended
>>>>>> setOldClass()
>>>>>>
>>>>>> codetools
>>>>>>
>>>>>>
>>>>>>
>>>>>> 
    >>>>> 
    >>>> 
    >>>> ______________________________________________
    >>>> R-devel at r-project.org mailing list
    >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>> -- 
    >>> Herv? Pag?s
    >>> 
    >>> Bioconductor Core Team hpages.on.github at gmail.com
    >>
5 days later
#

        
>> On 2023-06-15 5:25 pm, Herv? Pag?s wrote:
>>> Oh but I see now that you've already tried this in your
    >>> R/AllGenerics.R, sorry for missing that,

    > yes, this one:
In spite of the "ugliness" (above) and of the "conflict
rules"-resolution I've talked about below,
I've now committed the above + implicit generic definition to
R-devel's {methods} package.

After all, it provides a clean situation also for other S4-using
developers.

I'm still musing if this is a bug fix to be back ported to the
R-4-3-branch eventually.

Best,
Martin