Skip to content

Wish there were a "strict mode" for R interpreter. What about You?

21 messages · Paul Johnson, (Ted Harding), Hadley Wickham +8 more

#
Years ago, I did lots of Perl programming. Perl will let you be lazy
and write functions that refer to undefined variables (like R does),
but there is also a strict mode so the interpreter will block anything
when a variable is mentioned that has not been defined. I wish there
were a strict mode for checking R functions.

Here's why. We have a lot of students writing R functions around here
and they run into trouble because they use the same name for things
inside and outside of functions. When they call functions that have
mistaken or undefined references to names that they use elsewhere,
then variables that are in the environment are accidentally used. Know
what I mean?

dat <- whatever

someNewFunction <- function(z, w){
   #do something with z and w and create a new "dat"
   # but forget to name it "dat"
    lm (y, x, data=dat)
   # lm just used wrong data
}

I wish R had a strict mode to return an error in that case. Users
don't realize they are getting nonsense because R finds things to fill
in for their mistakes.

Is this possible?  Does anybody agree it would be good?
#
On 11-04-09 3:51 PM, Paul Johnson wrote:
It would be really bad, unless done carefully.

In your function the free (undefined) variables are dat and lm.  You 
want to be warned about dat, but you don't want to be warned about lm. 
What rule should R use to determine that?

(One possible rule would work in a package with a namespace.  In that 
case, all variables must be found in declared dependencies, the search 
could stop before it got to globalenv().  But it seems unlikely that 
your students are writing packages with namespaces.)

Duncan Murdoch
#
On 09-Apr-11 20:37:28, Duncan Murdoch wrote:
I'm with Duncan on this one! On the other hand, I can understand the
issues that Paul's students might encounter.

I think the right thing to so is to introduce the students to the
basics of scoping, early in the process of learning R.

Thus, when there is a variable (such as 'lm' in the example) which
you *expect* to already be out there (since 'lm' is in 'stats'
which is pre-loaded by default), then you can go ahead and use it.

But when your function uses a variable (e.g. 'dat') which just
*happened* to be out there when you first wrote the function,
then when you re-use the same function definition in a different
context things are likely to go wrong. So teach them that variables
which occur in functions, which might have any meaning in whatever
the context of use may be, should either be named arguments in
the argument list, or should be specifically defined within the
function, and not assumed to already exist unless that is already
guaranteed in every context in which the function would be used.

This is basic good practice which, once routinely adopted, should
ensure that the right thing is done every time!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 09-Apr-11                                       Time: 22:08:10
------------------------------ XFMail ------------------------------
#
On Sat, Apr 9, 2011 at 2:51 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
<anonymous>: no visible binding for global variable ?y?
<anonymous>: no visible binding for global variable ?x?
<anonymous>: no visible binding for global variable ?dat?

Which also picks up another bug in your function ;)

Hadley
#
On 4/9/2011 2:31 PM, Hadley Wickham wrote:
Is this run by "R CMD check"?  I've seen this message.


       "R CMD check" will give this message sometimes when I don't feel 
it's appropriate.  For example, I define a data object ETB in a package, 
then give that as the default in a function call like 
f(data.=ETB){if(missing(data.))data(ETB);  data.}.  When I run "R CMD 
check", I get "no visible binding for global variable 'ETB'", even 
though the function is tested and works during R CMD check.


       Spencer

  
    
#
On 11-04-09 7:02 PM, Spencer Graves wrote:
What is ETB?  Your code is looking for a global variable by that name, 
and that's what codetools is telling you.

Duncan Murdoch
#
On 4/9/2011 6:12 PM, Duncan Murdoch wrote:
Duncan:  Thanks for the question.


ETB is a data object in my package.  codetools can't find it because 
data(ETB) is needed before ETB becomes available.  codetools is not 
smart enough to check to see if ETB is a data object in the package.


Spencer

  
    
#
On 4/9/2011 6:12 PM, Duncan Murdoch wrote:
Duncan:  Thanks for the question.


ETB is a data object in my package.  codetools can't find it because 
data(ETB) is needed before ETB becomes available.  codetools is not 
smart enough to check to see if ETB is a data object in the package.


Spencer

  
    
#
On 11-04-09 9:22 PM, Spencer Graves wrote:
Okay, I understand what you are trying to do.  Yes, you have fooled 
codetools in this instance.

Duncan Murdoch
#
On 4/10/2011 6:10 AM, Duncan Murdoch wrote:
I'm sorry:  I did not intend to fool codetools.  ;-)


       I just wanted to provide sensible defaults in a way that seemed 
obvious to me.


       Thanks again for all your work on Rtools and the R project more 
generally.  Spencer

  
    
#
On Apr 10, 2011, at 15:10 , Duncan Murdoch wrote:

            
....
...but notice that the codetools warning is just that: It _is_ acknowledged that these things occasionally happen by design. There are a couple of cases in base R too:

* checking R code for possible problems ... NOTE
glm.fit: no visible binding for global variable ?n?
quantile.ecdf: no visible binding for global variable ?y?

I can't seem to spot the 'n' just now, though...
#
Are you sure that's not a bug?  There's:

aic.model <- aic(y, n, mu, weights, dev) + 2 * rank

and n.ok is defined, but n isn't defined anywhere.
I wonder why it warns on y, but not nobs.

Hadley
#
On Sun, 10 Apr 2011, Hadley Wickham wrote:

            
It is (or should be) defined by the call to

         eval(family$initialize)
It does when run on stats:::quantile.ecdf directly:
<anonymous>: no visible binding for global variable ?nobs?
<anonymous>: no visible binding for global variable ?y?

Maybe in the context where you saw this nobs is defined in an
enclosing environment.

luke

  
    
#
On Apr 10, 2011, at 19:54 , <luke-tierney at uiowa.edu> wrote:

            
...iff actually used by family$aic. And, it is a different n from n.ok (a vector, the per-element size parameter of the binomial)
It came from make check-devel, so I suspect that it picks up stats:::nobs() (which would be horribly wrong, but, well...)

  
    
#
On Sat, Apr 9, 2011 at 10:08 PM, Ted Harding <ted.harding at wlandres.net> wrote:

            
Would that be before or after you introduce them to the basics of
testing? Hint: AFTER!

Barry
#
On 2011-04-09, at 2:08 PM, Ted Harding wrote:

            
I know the basics of scoping perfectly well, but that doesn't stop me from occasionally misspelling a variable name that only causes an error much later.

OTOH, I think with Perl you can start declaring your variables "local" and keep the interpreter happy. But in R's context you then have to also start declaring what you expect to inherit from parent environments, and pretty soon the code is so encrusted with annotation barnacles that it loses the simplicity that  makes R so nice in the interactive mode. 

What would be really nice is if we had a smart R editor/IDE that would "DWIM" and put a red underline under a misspelled name, but leave it alone when, as Duncan said, it's in the environment. 

Davor
#
On Apr 11, 2011, at 11:28 AM, Davor Cubranic wrote:

            
... which is, of course, impossible since the editor has no idea what environment you will evaluate the function in ... It can make assumptions but they may as wrong as the spurious warnings discussed so people will complain either way ;)

Cheers,
Simon
#
On 4/11/2011 8:46 AM, Simon Urbanek wrote:
For the record, my "complaint" stemmed from my inability to see a 
way to get rid of that message in that context.  In most cases, I've 
found that message to be very valuable in identifying latent bugs in 
code.  In that context, however, the message seemed inappropriate.  
Duncan privately suggested I add "LazyData:  yes" to the package 
DESCRIPTION file.  I did that, and the offending message disappeared!


       Thanks again to Duncan.


       Best Wishes,
       Spencer

  
    
#
Another example:


plot.landsurveydata: no visible binding for global variable 'value'
plot.landsurveydata: no visible binding for global variable 'variable'


plot.landsurveydata <- function(...){
# ...
qplot(time., value, data=X, color=variable, ...)
# where value and variable are columns of the data.frame X


       Is there a way to tell "R CMD check" that qplot looks for time., 
value and variable as columns of X?


       Thanks,
       Spencer
On 4/11/2011 9:04 AM, Spencer Graves wrote:

  
    
#
On 4/11/11 11:04 AM, "Spencer Graves" <spencer.graves at prodsyse.com> wrote:

            
So would it be possible to have something akin to lint comment directives
to allow specific "errors" to be ignored by codetools?
#
On 11/04/2011 3:41 PM, Roebuck,Paul L wrote:
Of course it would.  Codetools is GPL licensed, so just do it.

Duncan Murdoch