Skip to content

Unexpected argument-matching when some are missing

11 messages · Emil Bode, Ista Zahn, S Ellison +1 more

#
When trying out some variations with `[.data.frame` I noticed some (to me) odd behaviour, which I found out has nothing to do with `[.data.frame`, but rather with the way arguments are matched, when mixing named/unnamed and missing/non-missing arguments. Consider the following example:

 

myfun <- function(x,y,z) {

? print(match.call())

? cat('x=',if(missing(x)) 'missing' else x, '\n')

? cat('y=',if(missing(y)) 'missing' else y, '\n')

? cat('z=',if(missing(z)) 'missing' else z, '\n')

}

myfun(x=, y=, "z's value")

 

gives:

 

# myfun(x = "z's value")

# x= z's value 

# y= missing 

# z= missing

 

This seems very counterintuitive to me, I expect the arguments x and y to be missing, and z to get ?z?s value?. 

When I call myfun(,y=,"z's value"), x is missing, and y gets ?z?s value?.

Are my expectations wrong or is this a bug? And if my expectations are wrong, where can I find more information on argument-matching?

My gut-feeling says to call this a bug, but then I?m surprised no-one else has encountered it before.

 

And I don?t have multiple installations to work from, so could somebody else confirm this (if it?s not my expectations that are wrong) for R-devel/other R-versions/other platforms?

My setup: R 3.5.1, MacOS 10.13.6, both Rstudio 1.1.453 and R --vanilla from Bash

 

Best regards, 

Emil Bode
#
On Thu, Nov 29, 2018 at 5:09 AM Emil Bode <emil.bode at dans.knaw.nl> wrote:
Interesting. I would expect it to throw an error, since "x=" is not
syntactically complete. What does "x=" mean anyway? It looks like R
interprets it as "x was not set to anything, i.e., is missing". That
seems reasonable, though I think the example itself is pathological
and would prefer that it produced an error.

--Ista
#
Not just in 'myfun' ...

plot(x=1:10, y=)
plot(x=1:10, y=, 10:1)

In both cases, 'y=' is ignored. In the first, the plot is for y=NULL (so not 'missing' y)
In the second case, 10:1 is positionally matched to y despite the intervening 'missing' 'y='

So it isn't just 'missing'; it's 'not there at all'

Steve E
*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmaster at lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
#
On Thu, Nov 29, 2018 at 10:51 AM S Ellison <S.Ellison at lgcgroup.com> wrote:
What exactly is the difference between "missing" and "not there at all"?

--Ista
#
Well, I did mean it as "missing".
To me, it felt just as natural as providing an empty index for subsetting (e.g. some.data.frame[,,drop=FALSE])
I can't think of a whole lot of other uses than subsetting, but I think this issue may be mostly important when you're not entirely sure what a call is going to end up, when passing along arguments, or when calling an unknown function (as in variants of the apply-family, where you provide a function as an argument).
Or what happens if I use do.call(FUN, args=MyNamedList)? I have a bit more extensive example further down where you can more clearly see the unexpected output.

But the problem is that R does NOT treat it as simply "missing". That would have been reasonable, but instead, as in the example in my previous mail, 
myfun(x=, y=, "z's value") means x is assigned "z's value", and y and z are seen as missing. Which is not at all what I was expecting.

And is also not consistent with other behaviour, as myfun(,,"z's value") and myfun(x=, y=, z="z's value") do work as expected (at least what I was expecting)

The extensice example:
Suppose I want to write a function that selects data from some external source. In order to do this, we put the data in its own environment, where we look for variables called "df", "rows", "cols" and "drop", and use these to make a selection. I write this function:

doselect <- function(env) {
  do.call(`[.data.frame`, list(env$df, if(!is.null(env$rows)) env$rows, if(!is.null(env$cols)) env$cols, drop=if(!is.null(env$drop)) env$drop))
}

It works for this code:
myenv <- new.env()
assign('df', data.frame(a=1:2, b=3:4), myenv, inherits=FALSE)
assign('rows', 1, myenv, inherits=FALSE) # Code breaks if we don't have this line
assign('cols', 1, myenv, inherits=FALSE) # Code breaks if we don't have this line
assign('drop', FALSE, myenv, inherits=FALSE)
doselect(myenv)

But if we don't assign "rows" and/or "cols", the variable "drop" is inserted in the place of the first unnamed variable, so the result is the same as if calling
df[FALSE,,]:
[1] a b
<0 rows> (or 0-length row.names)

What I did expect was the same result as df[,,FALSE], i.e. the full data.frame. Of course I can rewrite the function "doselect", but I think my current call is how most people would write it (even though I admit the example in its entirety is far-fetched)


Best regards, 
Emil Bode
?On 29/11/2018, 14:58, "Ista Zahn" <istazahn at gmail.com> wrote:

        
On Thu, Nov 29, 2018 at 5:09 AM Emil Bode <emil.bode at dans.knaw.nl> wrote:
>
    > When trying out some variations with `[.data.frame` I noticed some (to me) odd behaviour, which I found out has nothing to do with `[.data.frame`, but rather with the way arguments are matched, when mixing named/unnamed and missing/non-missing arguments. Consider the following example:
    >
    > myfun <- function(x,y,z) {
    >   print(match.call())
    >   cat('x=',if(missing(x)) 'missing' else x, '\n')
    >   cat('y=',if(missing(y)) 'missing' else y, '\n')
    >   cat('z=',if(missing(z)) 'missing' else z, '\n')
    > }
    > myfun(x=, y=, "z's value")
    >
    > gives:
    >
    > # myfun(x = "z's value")
    > # x= z's value
    > # y= missing
    > # z= missing
    >
    > This seems very counterintuitive to me, I expect the arguments x and y to be missing, and z to get ?z?s value?.
    
    Interesting. I would expect it to throw an error, since "x=" is not
    syntactically complete. What does "x=" mean anyway? It looks like R
    interprets it as "x was not set to anything, i.e., is missing". That
    seems reasonable, though I think the example itself is pathological
    and would prefer that it produced an error.
    
    --Ista
    >
    > When I call myfun(,y=,"z's value"), x is missing, and y gets ?z?s value?.
   > Are my expectations wrong or is this a bug? And if my expectations are wrong, where can I find more information on argument-matching?
   > My gut-feeling says to call this a bug, but then I?m surprised no-one else has encountered it before.
   >
    > And I don?t have multiple installations to work from, so could somebody else confirm this (if it?s not my expectations that are wrong) for R-devel/other R-versions/other platforms?
    >
    > My setup: R 3.5.1, MacOS 10.13.6, both Rstudio 1.1.453 and R --vanilla from Bash
    >
    > Best regards,
    >
    > Emil Bode
    >
    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel
#
A "missing argument" in R means that an argument with no default value was omitted from the call, and that is what I meant by "missing".
But that is not what is happening here. I was talking about "y=" apparently being treated as not present in the call, rather than the argument y being treated as a missing argument.  

In these examples, plot.default has a default value for y (NULL) so y can never be "missing" in the sense of the 'missing argument' error (compare what happens with plot(y=1:10), which reports x as 'missing'). 
In the first example, y was (from the plot behaviour) taken as NULL - the default - so was not considered a missing argument. In the second, it was taken as 10:1 - again, non-missing, despite 10:1 being in the normal position for the (character) argument "type".
But neither call did anything at all with "y=". Instead, the behaviour is consistent with what would have happened if 'y=' were "not present at all" when counting position or named argument list, rather than if 'y' were an absent required argument. 
It _looks_ as if the initial call parsing silently ignored the malformed expression "y=" before any argument matching - positional or by name - takes place.

But I'm thinking that it'll take an R-core guru to explain what's going on here, so I was going to wait and see.

Steve Ellison



*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmaster at lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
#
On Thu, Nov 29, 2018 at 1:10 PM S Ellison <S.Ellison at lgcgroup.com> wrote:
Yes, I think all of that is correct. But y _is_ missing in this sense:
debugging in: plot(1:10, y = )
debug: UseMethod("plot")
Browse[2]> missing(y)
[1] TRUE

though this does not explain the behavior since
debugging in: plot(, , "l")
debug: UseMethod("plot")
Browse[2]> missing(y)
[1] TRUE

--Ista
#
It looks like you're right that somewhere in (presumably) match.call, the named, empty arguments are removed, such that the call plot(x=1:10, y=, 10:1) is translated to plot(x=1:10, 10:1).
But I would have expected it to be the same as plot(x=1:10, , 10:1) (note the ", ,"), which gives an error (10:1 is not a valid plot-type). In this case you get an error straightaway, I find this more interesting:
Both valid (no errors), albeit strange calls, but I'd say the first call is better code, it's clearer you intend to not give any value for y. But exactly this one gives unexpected results: it tries to plot at position (1, 'p'), or (1, NA).

And the behaviour as it is gives rise to some strange inconsistencies. I have gathered some examples below (at the very bottom of the thread, as it got quite extensive), where some variations are surprisingly different from each other.
There are also some issues when using data.frame(...)[i=, j=,...], but at least here you are warned about naming i and j.
But basically, it means any function where arguments like fun(,,) are a valid possibility should throw the same warning, e.g. any R-code replacement of [.matrix or [.array, or as in my examples, for data.table (and related structures)
?On 29/11/2018, 19:10, "S Ellison" <S.Ellison at LGCGroup.com> wrote:
> > plot(x=1:10, y=)
    > > plot(x=1:10, y=, 10:1)
    > >
    > > In both cases, 'y=' is ignored. In the first, the plot is for y=NULL (so not
    > 'missing' y)
    > > In the second case, 10:1 is positionally matched to y despite the intervening
    > 'missing' 'y='
    > >
    > > So it isn't just 'missing'; it's 'not there at all'
    > 
    > What exactly is the difference between "missing" and "not there at all"?
    
    A "missing argument" in R means that an argument with no default value was omitted from the call, and that is what I meant by "missing".
    But that is not what is happening here. I was talking about "y=" apparently being treated as not present in the call, rather than the argument y being treated as a missing argument.  
    
    In these examples, plot.default has a default value for y (NULL) so y can never be "missing" in the sense of the 'missing argument' error (compare what happens with plot(y=1:10), which reports x as 'missing'). 
    In the first example, y was (from the plot behaviour) taken as NULL - the default - so was not considered a missing argument. In the second, it was taken as 10:1 - again, non-missing, despite 10:1 being in the normal position for the (character) argument "type".
    But neither call did anything at all with "y=". Instead, the behaviour is consistent with what would have happened if 'y=' were "not present at all" when counting position or named argument list, rather than if 'y' were an absent required argument. 
    It _looks_ as if the initial call parsing silently ignored the malformed expression "y=" before any argument matching - positional or by name - takes place.
    
    But I'm thinking that it'll take an R-core guru to explain what's going on here, so I was going to wait and see.
    
    Steve Ellison
    

Exampled if what I (Emil) found odd:
---------------------------------------------------------------------------------------------------------------------------------------
a b
1: 1 3
a
1: 1
2: 2
a b
1: 1 3
a V1
1: 1  1
2: 2  1
a
1: 1
Error in `[.data.table`(data.table(a = 1:2, b = 3:4), i = , 1, by = "a") : 
  'by' or 'keyby' is supplied but not j
+   print(match.call())
+   cat('nargs: ', nargs(), '\n')
+   cat('x=',if(missing(x)) 'missing' else x, '\n')
+   cat('y=',if(missing(y)) 'missing' else y, '\n')
+   cat('z=',if(missing(z)) 'missing' else z, '\n')
+ }
myfun(z = "z's value")
nargs:  5 
x= missing 
y= missing 
z= z's value
Error in myfun(x = , y = , , , "z's value", , ) : 
  unused arguments (alist(, ))
Error in myfun(x2 = , y = , "z's value") : unused argument (alist(x2 = ))
Error in myfun(x = , x = , , "z's value") : 
  formal argument "x" matched by multiple actual arguments
myfun(x = 3, z = "z's value")
nargs:  4 
x= 3 
y= missing 
z= z's value
Error in myfun(y = rlang::missing_arg(), , "z's value", x = 3) : 
  unused argument ("z's value")
myfun(x = 3, y = rlang::missing_arg(), z = "z's value")
nargs:  4 
x= 3 
y=  
z= z's value
myfun(x = 3, y = rlang::missing_arg(), z = "z's value")
nargs:  3 
x= 3 
y=  
z= z's value
#
Although I said what I meant by 'missing' vs 'not present', it wasn't exactly what missing() means. My bad.
missing() returns TRUE if an argument is not specified in the call _whether or not_ it has a default, hence the behaviour of missing(y) in debug(plot).

But we can easily find out whether a default has been assigned:
plot(1:10, y=, type=)
Browse[2]> y
NULL
Browse[2]> type
"p"

... which is consistent with silent omission of 'y=' and 'type=' 


Still waiting for a guru...

Steve E



*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmaster at lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
#
But the main point is where arguments are mixed together:
...
Browse[2]> missing(y)
[1] FALSE
Browse[2]> y
[1] "l"
Browse[2]> type
[1] "p"

I think that's what I fall over mostly: that named, empty arguments behave entirely different from omitting them (", ,")

And I definitely agree we need a guru to explain it all to us (

Cheers, Emil Bode
?On 30/11/2018, 15:35, "S Ellison" <S.Ellison at LGCGroup.com> wrote:
> Yes, I think all of that is correct. But y _is_ missing in this sense:
    > > plot(1:10, y=)
    > > ...
    > Browse[2]> missing(y)
    
    Although I said what I meant by 'missing' vs 'not present', it wasn't exactly what missing() means. My bad.
    missing() returns TRUE if an argument is not specified in the call _whether or not_ it has a default, hence the behaviour of missing(y) in debug(plot).
    
    But we can easily find out whether a default has been assigned:
    plot(1:10, y=, type=)
    Browse[2]> y
    NULL
    Browse[2]> type
    "p"
    
    ... which is consistent with silent omission of 'y=' and 'type=' 
    
    
    Still waiting for a guru...
    
    Steve E
    
    
    
    *******************************************************************
    This email and any attachments are confidential. Any use, copying or
    disclosure other than by the intended recipient is unauthorised. If 
    you have received this message in error, please notify the sender 
    immediately via +44(0)20 8943 7000 or notify postmaster at lgcgroup.com 
    and delete this message and any copies from your computer and network. 
    LGC Limited. Registered in England 2991879. 
    Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
#
Argument matching is by name first, then the still missing arguments
are filled positionally. Unnamed missing arguments are thus left
missing. Does that help?

Michael
On Fri, Nov 30, 2018 at 8:18 AM Emil Bode <emil.bode at dans.knaw.nl> wrote: