Subscripting fails if name of element is "" (PR#8161)

Dear Thomas,
This looks deliberate (there is a function NonNullStringMatch that does 
the matching).  I assume this is because there is no other way to 
indicate that an element has no name.
If so, it is a documentation bug -- help(names) and FAQ 7.14 should 
specify this behaviour.  Too late for 2.2.0, unfortunately.
I respectfully disagree: the element has a name, its an empty string. Of
course "" is a doubtful name for an element, but as long as we allow this
name when assigning names()<- we also should handle it like a name in
subscripting. The alternative would be to disallow "" in names at all.
However, both alternatives rather look like code changes, not only
documentation. 

Best regards

Jens Oehlschl?gel
Highspeed-Freiheit. Bei GMX superg?nstig, z.B. GMX DSL_Cityflat,

Dear Thomas,

This looks deliberate (there is a function NonNullStringMatch that does
the matching).  I assume this is because there is no other way to
indicate that an element has no name.

If so, it is a documentation bug -- help(names) and FAQ 7.14 should
specify this behaviour.  Too late for 2.2.0, unfortunately.
I respectfully disagree: the element has a name, its an empty string. Of
course "" is a doubtful name for an element, but as long as we allow this
name when assigning names()<- we also should handle it like a name in
subscripting. The alternative would be to disallow "" in names at all.
However, both alternatives rather look like code changes, not only
documentation.
I think Thomas is right as to how S interprets this: "" is no name on 
assignment, wheread NA as a name is a different thing (there probably is a 
name, we just do not know what it is).

Here is the crux of the example.

p <- c(a=1, 2)
p <- c(a=1, 2)
names(p)
[1] "a" ""
p
a
1 2
p2 <- c(1,2)
names(p2) <- c("a", "")
identical(p, p2)
[1] TRUE

so giving the name is "" really is the same as giving no name.

`Error 1' is said to be
p[""]
<NA>
   NA

You haven't given a name, so I think that is right.  S (which has no 
character NAs) uses "" as the name, but here there may be a name or not.
P <- list(a=1, 2)
I think Jens then meant as `error 2' that
P
$a
[1] 1

[[2]]
[1] 2

shows no name for the second element, and that seems right to me (although 
S shows "" here).

Finally (`error 3')
P[""]
$"NA"
NULL

is a length-one list with name character-NA.  (S has no name here.)  That 
seems the right answer but if so is printed inconsistently.

I would say that
Q <- list(1, 2)
names(Q) <- c("a", NA)
Q
$a
[1] 1

$"NA"
[1] 2

was the only bug here (the name should be printed as <NA>).  Now that
comes from this bit of code

 		    if( isValidName(CHAR(PRINTNAME(TAG(s)))) )
 			sprintf(ptag, "$%s", CHAR(PRINTNAME(TAG(s))));
 		    else
 			sprintf(ptag, "$\"%s\"", CHAR(PRINTNAME(TAG(s))));

so non-syntactic names are printed surrounded by "".  Nowadays I think we 
would prefer ``, as in
A <- list("a+b"=1)
A
$"a+b"
[1] 1
A$"a+b"
[1] 1
A$`a+b`
[1] 1

but NA needs to be a special case as in
A <- list(1, 2)
names(A) <- c("NA", NA)
A
$"NA"
[1] 1

$"NA"
[1] 2
is.na(names(A))
[1] FALSE  TRUE
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
I haven't been following this conversation in order, but I think there's 
another bug here besides the one(s?) you identified:

Jens had this example:

 > x <- 1:4
 > names(x) <- c(NA, "NA", "a", "")
 > x[names(x)]
<NA> <NA>    a <NA>
    1    1    3   NA

Shouldn't the second entry in the result be 2, with name "NA"?  It seems 
the string "NA" has been converted to <NA> here.

Duncan Murdoch
On Thu, 6 Oct 2005, "Jens Oehlschl?gel" wrote:

Dear Thomas,

This looks deliberate (there is a function NonNullStringMatch that does
the matching).  I assume this is because there is no other way to
indicate that an element has no name.

If so, it is a documentation bug -- help(names) and FAQ 7.14 should
specify this behaviour.  Too late for 2.2.0, unfortunately.
I respectfully disagree: the element has a name, its an empty string. Of
course "" is a doubtful name for an element, but as long as we allow this
name when assigning names()<- we also should handle it like a name in
subscripting. The alternative would be to disallow "" in names at all.
However, both alternatives rather look like code changes, not only
documentation.

I think Thomas is right as to how S interprets this: "" is no name on 
assignment, wheread NA as a name is a different thing (there probably is a 
name, we just do not know what it is).

Here is the crux of the example.

p <- c(a=1, 2)

p <- c(a=1, 2)
names(p)
[1] "a" ""

p
a
1 2

p2 <- c(1,2)
names(p2) <- c("a", "")
identical(p, p2)
[1] TRUE

so giving the name is "" really is the same as giving no name.

`Error 1' is said to be

p[""]
<NA>
   NA

You haven't given a name, so I think that is right.  S (which has no 
character NAs) uses "" as the name, but here there may be a name or not.

P <- list(a=1, 2)

I think Jens then meant as `error 2' that

P
$a
[1] 1

[[2]]
[1] 2

shows no name for the second element, and that seems right to me (although 
S shows "" here).

Finally (`error 3')

P[""]
$"NA"
NULL

is a length-one list with name character-NA.  (S has no name here.)  That 
seems the right answer but if so is printed inconsistently.

I would say that

Q <- list(1, 2)
names(Q) <- c("a", NA)
Q
$a
[1] 1

$"NA"
[1] 2

was the only bug here (the name should be printed as <NA>).  Now that
comes from this bit of code

 		    if( isValidName(CHAR(PRINTNAME(TAG(s)))) )
 			sprintf(ptag, "$%s", CHAR(PRINTNAME(TAG(s))));
 		    else
 			sprintf(ptag, "$\"%s\"", CHAR(PRINTNAME(TAG(s))));

so non-syntactic names are printed surrounded by "".  Nowadays I think we 
would prefer ``, as in

A <- list("a+b"=1)
A
$"a+b"
[1] 1

A$"a+b"
[1] 1

A$`a+b`
[1] 1

but NA needs to be a special case as in

A <- list(1, 2)
names(A) <- c("NA", NA)
A
$"NA"
[1] 1

$"NA"
[1] 2

is.na(names(A))
[1] FALSE  TRUE

------------------------------------------------------------------------

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

I haven't been following this conversation in order, but I think there's 
another bug here besides the one(s?) you identified:

Jens had this example:

x <- 1:4
names(x) <- c(NA, "NA", "a", "")
x[names(x)]
<NA> <NA>    a <NA>
  1    1    3   NA

Shouldn't the second entry in the result be 2, with name "NA"?  It seems the 
string "NA" has been converted to <NA> here.
Yes, but I don't see it in PR#8161 where there is no name "NA" that I can
see.  (In other words it is not an instance of the subject line.)

The issue is that <NA> is matching "NA", and it should not.  As in the
code

Rboolean NonNullStringMatch(SEXP s, SEXP t)
{
     if (CHAR(s)[0] && CHAR(t)[0] && strcmp(CHAR(s), CHAR(t)) == 0)
 	return TRUE;
     else
 	return FALSE;
}

and there are more instances around.
Duncan Murdoch

Prof Brian Ripley wrote:
On Thu, 6 Oct 2005, "Jens Oehlschl?gel" wrote:

Dear Thomas,

This looks deliberate (there is a function NonNullStringMatch that does
the matching).  I assume this is because there is no other way to
indicate that an element has no name.

If so, it is a documentation bug -- help(names) and FAQ 7.14 should
specify this behaviour.  Too late for 2.2.0, unfortunately.
I respectfully disagree: the element has a name, its an empty string. Of
course "" is a doubtful name for an element, but as long as we allow this
name when assigning names()<- we also should handle it like a name in
subscripting. The alternative would be to disallow "" in names at all.
However, both alternatives rather look like code changes, not only
documentation.

I think Thomas is right as to how S interprets this: "" is no name on 
assignment, wheread NA as a name is a different thing (there probably is a 
name, we just do not know what it is).

Here is the crux of the example.

p <- c(a=1, 2)

p <- c(a=1, 2)
names(p)
[1] "a" ""

p
a
1 2

p2 <- c(1,2)
names(p2) <- c("a", "")
identical(p, p2)
[1] TRUE

so giving the name is "" really is the same as giving no name.

`Error 1' is said to be

p[""]
<NA>
   NA

You haven't given a name, so I think that is right.  S (which has no 
character NAs) uses "" as the name, but here there may be a name or not.

P <- list(a=1, 2)

I think Jens then meant as `error 2' that

P
$a
[1] 1

[[2]]
[1] 2

shows no name for the second element, and that seems right to me (although 
S shows "" here).

Finally (`error 3')

P[""]
$"NA"
NULL

is a length-one list with name character-NA.  (S has no name here.)  That 
seems the right answer but if so is printed inconsistently.

I would say that

Q <- list(1, 2)
names(Q) <- c("a", NA)
Q
$a
[1] 1

$"NA"
[1] 2

was the only bug here (the name should be printed as <NA>).  Now that
comes from this bit of code

 		    if( isValidName(CHAR(PRINTNAME(TAG(s)))) )
 			sprintf(ptag, "$%s", CHAR(PRINTNAME(TAG(s))));
 		    else
 			sprintf(ptag, "$\"%s\"", CHAR(PRINTNAME(TAG(s))));

so non-syntactic names are printed surrounded by "".  Nowadays I think we 
would prefer ``, as in

A <- list("a+b"=1)
A
$"a+b"
[1] 1

A$"a+b"
[1] 1

A$`a+b`
[1] 1

but NA needs to be a special case as in

A <- list(1, 2)
names(A) <- c("NA", NA)
A
$"NA"
[1] 1

$"NA"
[1] 2

is.na(names(A))
[1] FALSE  TRUE

------------------------------------------------------------------------

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595