Bounty on Error Checking
On 04.01.2013 15:22, Duncan Murdoch wrote:
On 04/01/2013 10:15 AM, Matthew Dowle wrote:
On 04.01.2013 14:56, Duncan Murdoch wrote:
On 04/01/2013 9:51 AM, Matthew Dowle wrote:
On 04.01.2013 14:03, Duncan Murdoch wrote:
On 13-01-04 8:32 AM, Matthew Dowle wrote:
On Fri, Jan 3, 2013, Bert Gunter wrote
Well... On Thu, Jan 3, 2013 at 10:00 AM, ivo welch <ivo.welch <at> anderson.ucla.edu> wrote:
Dear R developers---I just spent half a day debugging an R program, which had two bugs---I selected the wrongly named variable,
which
turns out to have been a scalar, which then happily
multiplied
as
if it was a matrix; and another wrongly named variable from a
data
frame, that triggered no error when used as a[["name"]] or a$name . there should be an option to turn on that throws an error inside R
when
one does this. I cannot imagine that there is much code that
wants
to
reference non-existing columns in data frames.
But I can -- and do it all the time: To add a new variable,
"d"
to
a data frame, df, containing only "a" and "b" (with 10 rows,
say):
df[["d"]] <- 1:10
Yes but that's `[[<-`. Ivo was talking about `[[` and `$`;
i.e.,
select only not assign, if I understood correctly.
Trying to outguess documentation to create error triggers is
a
very
bad idea.
Why exactly is it a very bad idea? (I don't necessarily
disagree,
just asking for more colour.)
R already has plenty of debugging tools -- and there is even
a
"debug" package. Perhaps you need a better programming editor/IDE.
There
are several listed on CRAN, RStudio, etc.
True, but that relies on you knowing there's a bug to hunt
for.
What
if you don't know you're getting incorrect results, silently? In a
similar
way that options(warn=2) turns known warnings into errors, to
enable
you
to be more strict if you wish,
I would say the point of options(warn=2) is rather to let you
find
the location of the warning more easily, because it will abort
the
evaluation.
True but as well as that, I sometimes like to run production
systems
with options(warn=2). I'd prefer some tasks to halt at the slightest
hint
of trouble than write a warning silently to a log file that may not
be
looked at. I think of that as being more strict, more robust. Since option(warn=2) is set even when there is no warning, to catch if one arises in future. Not just to find it more easily once you know there is a warning.
I would not recommend using code that issues warnings.
Not sure what you mean here.
I just meant that I consider warnings to be a problem (as you do),
so
they should all be fixed.
I see now, good.
an option to turn on warnings from `[[` and
`$` if the column is missing (select only, not assign) doesn't
seem
like
a bad option to have. Maybe it would reveal some previously
silent
bugs.
I agree that this would sometimes be useful, but a very common
convention is to do something like
if (is.null(obj$element)) { do something }
These would all have to be re-written to something like
if (missing.field(obj, "element") { do something }
There are several hundred examples of the first usage in base
R; I
imagine thousands more in contributed packages.
Yes but Ivo doesn't seem to be writing that if() in his code.
We're
only talking about an option that users can turn on for their own code, iiuc. Not anything that would affect or break thousands of packages. That's why I referred to the fact that all packages now have namespaces, in the earlier post.
I don't think the benefit of the change is worth all the work that would be
necessary
to implement it.
It doesn't seem to be a lot of work. I already posted a working straw man, for example, as a first step.
I understood the proposal to be that evaluating "obj$element"
would
issue a warning if element didn't exist. If that were the case,
then
the common test is.null(obj$element) would issue a warning in the cases where it now returns TRUE.
Yes, but only for obj$element appearing in Ivo's own code. Not if a package does that (including base). That's why I thought masking "[[<-" and "$<-" in .GlobalEnv might achieve that without affecting packages or base, although I don't know how such an option could be made available by R. Maybe options(strictselect=TRUE) would create those masks in .GlobalEnv, and options(strictselect=FALSE) would remove them. A package maintainer might choose to set that in their package to make it stricter (which would create those masks in the package's namespace too). Or users could just create those masks themselves, since it's only a few lines. Without affecting packages or base.
options() are global
I realise that. I was thinking that inside the options() function it could see if strictselect was being changed and then create the masks in .GlobalEnv. But I can see that is ugly, was just thinking out loud. Wasn't suggesting that "[[" would look at the value of strictselect.
but a package could change the meaning of $ or [[. It could even export those new definitions so that people who wanted the strict usage could use it. It would be hard to get the same performance as the base definitions, but for debugging purposes that might not matter.
So in principle this would be a (small) good idea then? Is it an option that R could provide? i.e. something for which a patch file for R would be considered by R core? Matthew