Skip to content

R CMD check returns NOTE about package data set as global variable

16 messages · Brian Ripley, Hadley Wickham, Hervé Pagès +5 more

#
I'm developing a package that comes with a data set called RutgersMapB36. One of the package's functions requires this data frame. A toy example is:

test<-function() {
  data(RutgersMapB36)
  return(RutgersMapB36[,1])
}


R CMD check returns a NOTE:

test: no visible binding for global variable 'RutgersMapB36'

Is there any way to avoid this NOTE?

Thanks,

Brad
---
Brad McNeney
Statistics and Actuarial Science
Simon Fraser University
#
On 06/04/2012 19:46, Brad McNeney wrote:
Use data("RutgersMapB36"), which many think is good practice in code.

  
    
#
OK, thanks for the tip on good coding practice. I'm still getting the NOTE though when I make the suggested change.

In case it matters, I'm check'ing with

R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)

Brad

----- Original Message -----
#
Is the dataset something that package users will need, or just your
package's functions?
Hadley
On Fri, Apr 6, 2012 at 1:46 PM, Brad McNeney <mcneney at sfu.ca> wrote:

  
    
#
On Fri, 6 Apr 2012, Brad McNeney wrote:

            
Yes, you will:  data() is a function with side effects, which is 
contrary to the functional programming model being checked.  So there 
is no way to avoid all notes and use data().

If you want to make your code more understandable, consider using 
LazyData (see 'Writing R Extensions').  My view is that data() is a 
kludge from long ago when R had much less powerful memory management, 
except perhaps for very large datasets (at least 100MBs) when you may 
want to control when they are loaded into memory.

  
    
#
On Apr 6, 2012, at 21:33 , Brad McNeney wrote:

            
Hm? It's not like Brian to get such things wrong, did you check properly?

Perhaps the code checker is not smart enough to know that data() creates global variables. (That would be heuristic at best. You can't actually be sure that data() creates objects with the name given as the argument -- in fact, several objects might be created, possibly none named as the argument). 

You are not using LazyData, right?  You might consider doing that and forgetting about data() entirely.

  
    
#
On 04/06/2012 12:33 PM, Brad McNeney wrote:
Because when you do return(RutgersMapB36[,1]), the code checker has no
way to know that the RutgersMapB36 variable is actually defined.

Try this:

test<-function() {
    RutgersMapB36 <- NULL
    data(RutgersMapB36)
    return(RutgersMapB36[,1])
}

Cheers,
H.

  
    
#
On Apr 6, 2012, at 22:23 , Herv? Pag?s wrote:

            
That might remove the NOTE, but as far as I can see, it also breaks the code...
#
Package users should have access.

Brad 

----- Original Message -----
#
Thanks (to all), using LazyData removes the note.

Brad

----- Original Message -----
#
On 04/06/2012 01:33 PM, peter dalgaard wrote:
oops, right...

This should remove the NOTE and work (hopefully):

test<-function() {
    data("RutgersMapB36")  # loads RutgersMapB36 in .GlobalEnv
    RutgersMapB36 <- get("RutgersMapB36", envir=.GlobalEnv)
    return(RutgersMapB36[,1])
}

Cheers,
H.
#
On Fri, 2012-04-06 at 13:23 -0700, Herv? Pag?s wrote:
That won't work, but this should:

RutgersMapB36 <- NULL
test<-function() {
    data(RutgersMapB36)
    return(RutgersMapB36[,1])
}

Honestly, this is just another example of a non-helpful 'global
variable' NOTE.  I've removed many of these from our packages, often by
resorting to useless workarounds like this one, but I have never once
gotten a valid NOTE out of this message.  We provided other examples
earlier in a different thread.
#
On Fri, Apr 6, 2012 at 1:33 PM, peter dalgaard <pdalgd at gmail.com> wrote:
For data() per se, which by default clutter up the global environment,
you can do:

test<-function() {
  env <- new.env()
? data("RutgersMapB36", envir=env)
? env$RutgersMapB36[,1]
}

That is more explicit, and I do believe you won't get a NOTE about it.

Other than that, one can also use the following style (which still
seems to do the trick) for data(), attach(), load() et al., iff have
to use them:

test<-function() {
  # To avoid NOTEs by R CMD check
? RutgersMapB36 <- NULL; rm(RutgersMapB36);

? data(RutgersMapB36)
? return(RutgersMapB36[,1])
}


/Henrik
#
Hi Brian,
On 04/06/2012 02:04 PM, Brian G. Peterson wrote:
Other people might have a different experience. I've personally seen a
lot of true positive "no visible binding for global variable" notes in
the Bioconductor check results.

In that sense 'R CMD check' is no different from other code checking
tools like e.g. gcc -Wall. There are sometimes false positives, it's
unavoidable. Personally I can live with that.

Cheers,
H.
#
On Apr 6, 2012, at 23:04 , Brian G. Peterson wrote:

            
Actually, this one is perfectly valid. It is saying that you are messing with global variables, which you might not want to do in package code. It is admittedly rather unlikely that the user has a variable called "RutgersMapB36" lying around for you to clobber, but suppose that it was "x" or "mydata"...
3 days later
#
On 4/6/12 4:04 PM, "Brian G. Peterson" <brian at braverock.com> wrote:

            
variable'
resorting to
gotten a valid NOTE
earlier in a different
While I have seen a couple valid ones, it gets really old having to explain
these NOTEs to user community. It would really be nice to have something
equivalent to LINT comment directives (i.e., NOTREACHED, ARGSUSED, etc.)
that could be used to suppress "noise" messages.