bug in partial matching of attribute names
Prof Brian Ripley wrote:
On Tue, 13 Feb 2007, Tony Plate wrote:
Ok, thanks for the news of a fix in progress.
BTW, your suggested fix is incorrect. Consider having an exact match after two partial matches in the list of attributes.
oops. yes.
On the topic of the "names" attribute being treated specially, I
wonder if the do_attr() function might treat it a little too
specially. As far as I can tell, the loop in the first large block
code in do_attr() (attrib.c), which begins
/* try to find a match among the attributes list */
for (alist = ATTRIB(s); alist != R_NilValue; alist = CDR(alist)) {
will find a full or partial match for a "names" attribute (at least
for ordinary lists and vectors).
Then the large block of code after that, beginning:
/* unless a full match has been found, check for a "names"
attribute */
if (match != FULL && ! strncmp(CHAR(PRINTNAME(R_NamesSymbol)), str,
n)) {
seems unnecessary because a names attribute has already been checked
for. In the case of a partial match on the "names" attribute this code
will behave as though there is an ambiguous partial match, and
(incorrectly) return Nil. Is this second block of code specific to
the "names" attribute possibly a hangover from an earlier day when the
first loop didn't detect a "names" attribute? Or am I missing
something? Are there some other objects for which the first loop
doesn't include a "names" attribute?
Yes: I pointed you at the 'R internals' manual, but this is also on the
help page. 1D arrays and pairlists have names stored elsewhere. It
needs to be changed to be
- else if (match == PARTIAL) {
+ else if (match == PARTIAL && strcmp(CHAR(PRINTNAME(tag)),
"names")) {
Wouldn't it be equivalent (but clearer & a minute fraction more
computationally efficient) to put that test
'strcmp(CHAR(PRINTNAME(tag)), "names")' right at the start of that
block, i.e., as an additional condition in
if (match != FULL && ! strncmp(CHAR(PRINTNAME(R_NamesSymbol)), str,
n)) {
?
-- Tony Plate
-- Tony Plate Prof Brian Ripley wrote:
It happens that I was looking at this yesterday (in connection with encodings on CHARSXPs) and have a fix in testing across CRAN right now. As for "names", as you will know from reading 'R Internals' the names can be stored in more than one place, which is why it has to be treated specially. On Mon, 12 Feb 2007, Tony Plate wrote:
There looks to be a bug in do_attr() (src/main/attrib.c): incorrect partial matches of attribute names can be returned when there are an odd number of partial matches. E.g.:
x <- c(a=1,b=2) attr(x, "abcdef") <- 99 attr(x, "ab")
[1] 99
attr(x, "abc") <- 100 attr(x, "ab") # correctly returns NULL because of ambig partial match
NULL
attr(x, "abcd") <- 101 attr(x, "ab") # incorrectly returns non-NULL for ambig partial match
[1] 101
names(attributes(x))
[1] "names" "abcdef" "abc" "abcd"
The problem in do_attr() looks to be that after match is set to
PARTIAL2, it can be set back to PARTIAL again. I think a simple fix is
to add a "break" in this block in do_attr():
else if (match == PARTIAL) {
/* this match is partial and we already have a partial match,
so the query is ambiguous and we return R_NilValue */
match = PARTIAL2;
break; /* <---- ADD BREAK HERE */
} else {
However, if this is indeed a bug, would this be a good opportunity to
get rid of partial matching on attribute names -- it was broken anyway
-- so toss it out? :-) Does anyone depend on partial matching for
attribute names? My view is that it's one of those things like partial
matching of list and vector element names that seemed like a good idea
at first, but turns out to be more trouble than it's worth.
On a related topic, partial matching does not seem to work for the
"names" attribute (which I would regard as a good thing :-). However,
I'm puzzled why it doesn't work, because the code in do_attr() seems to
try hard to make it work. Can anybody explain why?
E.g.:
attr(x, "names")
[1] "a" "b"
attr(x, "nam")
NULL
sessionInfo()
R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base"
-- Tony Plate
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel