Hello,
First the version info:
platform powerpc-apple-darwin8.6.0
arch powerpc
os darwin8.6.0
system powerpc, darwin8.6.0
status
major 2
minor 3.1
year 2006
month 06
day 01
svn rev 38247
language R
version.string Version 2.3.1 (2006-06-01)
I have encountered some unusual behavior when trying to create new
columns in a data frame that have names that would generate a partial
match with an existing column with a longer name. It is my
understanding that replacement operations shouldn't have partial
matching, but it is not clear to me whether this applies only when
the named column exists and not for new assignments.
The first example:
> D = data.frame(M=c(T,T,F,F,F,T,F,T,F,F,T,T,T),V=I(sprintf("ZZ%02d",
1:13)),ABCD=13:1)
> D
M V ABCD
1 TRUE ZZ01 13
2 TRUE ZZ02 12
3 FALSE ZZ03 11
4 FALSE ZZ04 10
5 FALSE ZZ05 9
6 TRUE ZZ06 8
7 FALSE ZZ07 7
8 TRUE ZZ08 6
9 FALSE ZZ09 5
10 FALSE ZZ10 4
11 TRUE ZZ11 3
12 TRUE ZZ12 2
13 TRUE ZZ13 1
> D$CBA[D$M] = D$V[D$M]
> D
M V ABCD CBA
1 TRUE ZZ01 13 ZZ01
2 TRUE ZZ02 12 ZZ02
3 FALSE ZZ03 11 <NA>
4 FALSE ZZ04 10 <NA>
5 FALSE ZZ05 9 <NA>
6 TRUE ZZ06 8 ZZ06
7 FALSE ZZ07 7 <NA>
8 TRUE ZZ08 6 ZZ08
9 FALSE ZZ09 5 <NA>
10 FALSE ZZ10 4 <NA>
11 TRUE ZZ11 3 ZZ11
12 TRUE ZZ12 2 ZZ12
13 TRUE ZZ13 1 ZZ13
> D$ABC[D$M] = D$V[D$M]
> D
M V ABCD CBA ABC
1 TRUE ZZ01 13 ZZ01 ZZ01
2 TRUE ZZ02 12 ZZ02 ZZ02
3 FALSE ZZ03 11 <NA> 11
4 FALSE ZZ04 10 <NA> 10
5 FALSE ZZ05 9 <NA> 9
6 TRUE ZZ06 8 ZZ06 ZZ06
7 FALSE ZZ07 7 <NA> 7
8 TRUE ZZ08 6 ZZ08 ZZ08
9 FALSE ZZ09 5 <NA> 5
10 FALSE ZZ10 4 <NA> 4
11 TRUE ZZ11 3 ZZ11 ZZ11
12 TRUE ZZ12 2 ZZ12 ZZ12
13 TRUE ZZ13 1 ZZ13 ZZ13
I expected ABC to equal CBA with NA values in rows not assigned, but
instead it appears that an extraction from D$ABCD and coercion to
string is being performed in the process of creating D$ABC.
Here is something I believe is definitely a bug:
> D = data.frame(M=c(T,T,F,F,F,T,F,T,F,F,T,T,T),V=1:13,ABCD=13:1)
> D
M V ABCD
1 TRUE 1 13
2 TRUE 2 12
3 FALSE 3 11
4 FALSE 4 10
5 FALSE 5 9
6 TRUE 6 8
7 FALSE 7 7
8 TRUE 8 6
9 FALSE 9 5
10 FALSE 10 4
11 TRUE 11 3
12 TRUE 12 2
13 TRUE 13 1
> D$CBA[D$M] = D$V[D$M]
> D
M V ABCD CBA
1 TRUE 1 13 1
2 TRUE 2 12 2
3 FALSE 3 11 NA
4 FALSE 4 10 NA
5 FALSE 5 9 NA
6 TRUE 6 8 6
7 FALSE 7 7 NA
8 TRUE 8 6 8
9 FALSE 9 5 NA
10 FALSE 10 4 NA
11 TRUE 11 3 11
12 TRUE 12 2 12
13 TRUE 13 1 13
> D$ABC[D$M] = D$V[D$M]
> D
M V ABCD CBA ABC
1 TRUE 1 1 1 1
2 TRUE 2 2 2 2
3 FALSE 3 11 NA 11
4 FALSE 4 10 NA 10
5 FALSE 5 9 NA 9
6 TRUE 6 6 6 6
7 FALSE 7 7 NA 7
8 TRUE 8 8 8 8
9 FALSE 9 5 NA 5
10 FALSE 10 4 NA 4
11 TRUE 11 11 11 11
12 TRUE 12 12 12 12
13 TRUE 13 13 13 13
ABC is created as before with valued from ABCD in the unassigned
rows, but ABCD is being modified as well. The only difference form
the previous example is that V is now just a numeric column.
Anil Maliyekkel
Bugs with partial name matching during partial replacement (PR#9202)
8 messages · amaliy1 at uic.edu, Thomas Lumley, Anil Maliyekkel +3 more
The partial matching is fairly deeply built in to complex assignment, another example being
x<-list(ab=1:2)
names(x$ab)=c("A","B")
names(x$a)=c("a","b")
x
$ab A B 1 2 $a a b 1 2 because as evalseq works through the nested calls on the LHS the code being called doesn't know it is in an assignment call. The bug is a bug. It isn't specific to data frames or to replacing only some elements of a vector
x<-list(ab=1:2) x$a[]<-2:1 x
$ab [1] 2 1 $a [1] 2 1 It also happens when $ is replaced by [[. It looks like a failure to duplicate. A workaround would be not to modify list elements or database columns that don't exist ;). -thomas
On Tue, 5 Sep 2006, Thomas Lumley wrote:
The partial matching is fairly deeply built in to complex assignment,
<sip>
because as evalseq works through the nested calls on the LHS the code being called doesn't know it is in an assignment call.
The problem in
D = list(ABCD=2:1) D$ABC[]<-3:4 D
$ABCD [1] 3 4 $ABC [1] 3 4 is that eval.c:evalseq ends up with a reference to D$ABCD from evaluating D$ABC with partial matching. Since evalseq doesn't (and shouldn't) increase NAMED on these partially evaluated calls, NAMED is still 1 for D$ABCD. When evalseq's D$ABC has 3:4 assigned into it the vector is changed directly, since NAMED=1, and both D$ABC and D$ABCD change. The minimal fix would appear to be the horrible hack of incrementing NAMED whenever a list element is even looked at with partial matching. Otherwise evalseq would have to be taught to recognize aliasing from partial matching. -thomas
On Sep 5, 2006, at 5:54 PM, Thomas Lumley wrote:
The problem in
D = list(ABCD=2:1) D$ABC[]<-3:4 D
$ABCD [1] 3 4 $ABC [1] 3 4 is that eval.c:evalseq ends up with a reference to D$ABCD from evaluating D$ABC with partial matching. Since evalseq doesn't (and shouldn't) increase NAMED on these partially evaluated calls, NAMED is still 1 for D$ABCD. When evalseq's D$ABC has 3:4 assigned into it the vector is changed directly, since NAMED=1, and both D$ABC and D$ABCD change.
This problem does not appear when the following is done > D = list(ABCD=2:1) > D$ABC[]=c(3,4) > D $ABCD [1] 2 1 $ABC [1] 3 4 Or when this is done: > D = list(ABCD=2:1) > D[["ABC"]][]=3:4 > D $ABCD [1] 2 1 $ABC [1] 3 4 But it does appear when the following is done: > D = list(ABCD=2:1) > X = 3:4 > D$ABC[]=X > D $ABCD [1] 3 4 $ABC [1] 3 4 But not when the following is done: > D = list(ABCD=2:1) > X = 3:4 > X[1] = 1 > D$ABC[]=X > D $ABCD [1] 2 1 $ABC [1] 1 4 It appears to be a sequence specific bug for the $ operator, which might explain why it did not occur with my original examples where I had a character data column, but did appear where I had a numeric data column that was a sequence. Going back to the original partial replacement problem, is there anyway to turn off partial matching, or to selectively apply exact matching when trying to access a single element? I can get the desired single element match using the convoluted syntax D["ABC"] [[1]] and perform partial replacement operations on a new column. However, it would be nice to have single element access operator that does exact matching. Anil
"Anil" == Anil Maliyekkel <amaliy1 at uic.edu>
on Tue, 5 Sep 2006 22:50:30 -0500 writes:
[...................]
[...................]
Anil> Going back to the original partial replacement
Anil> problem, is there anyway to turn off partial matching,
One way would be to use S4 classes (and '@') instead of lists
(and '$').
That has other advantages {?validObject; methods,..}, but also can be
a drawback to you since you need to know in advance
what "components" (called "slots" for S4 objects) your object
will be allowed to contain.
Anil> or to selectively apply exact matching when trying to
Anil> access a single element? I can get the desired single
Anil> element match using the convoluted syntax D["ABC"]
Anil> [[1]] and perform partial replacement operations on a
Anil> new column. However, it would be nice to have single
Anil> element access operator that does exact matching.
Anil> Anil
2 days later
On Tue, 5 Sep 2006, Anil Maliyekkel wrote:
This problem does not appear when the following is done
D = list(ABCD=2:1) D$ABC[]=c(3,4) D
$ABCD [1] 2 1 $ABC [1] 3 4
<other examples snipped>
It appears to be a sequence specific bug for the $ operator, which might explain why it did not occur with my original examples where I had a character data column, but did appear where I had a numeric data column that was a sequence.
Appearances can be deceptive. The point is that 2:1 and 3:4 are integer vectors but c(3,4) is a double precision vector. Assigning values of a different type into a vector requires copying and so masks the bug.
Going back to the original partial replacement problem, is there anyway to turn off partial matching, or to selectively apply exact matching when trying to access a single element?
No.
I can get the desired single element match using the convoluted syntax D["ABC"] [[1]] and perform partial replacement operations on a new column. However, it would be nice to have single element access operator that does exact matching.
It wouldn't fix the bug, and we are running low on symbols that could be used. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
I can get the desired single element match using the convoluted syntax D["ABC"] [[1]] and perform partial replacement operations on a new column. However, it would be nice to have single element access operator that does exact matching.
It wouldn't fix the bug, and we are running low on symbols that could be used.
SV4 and Splus have a function called elNamed() (and a corresponding elNamed<-) that does exact matching. E.g.,
zlist<-list(abc=1, xyz=2) elNamed(zlist, "a")
NULL
elNamed(zlist, "a", mustfind=TRUE) # error should say "element", not "slot"
Problem in elNamed(zlist, "a", mustfind = TRUE): Failed to find required slot "a" Use traceback() to see the call stack
elNamed(zlist, "abc")
[1] 1
elNamed(zlist, "ab") <- 3 dput(zlist)
list("abc" = 1
, "xyz" = 2
, "ab" = 3
)
It works on lists and atomic vectors and I think it
is not intended to be a generic function. There is
a similar el(x, i) function for numeric subsetting
that is not intended to be generic (that is for
speed and safety when writing methods for [).
Its only documentation is some self-doc:
elNamed
# extract or (when used on left of assignment) replace the element of `object' associated # with `name'. Differs from `$' in using only exact matching. Set `mustfind=T' if you # want an error to occur when there is no such named element. NOT to be used for slots. function(object, name, mustfind = F) .Internal(elNamed(object, name, mustfind), "S_el_named", T, 0) It isn't used much and has some surprises. E.g., elNamed(zlist, "ab")<-NULL sets $ab to NULL; it doesn't remove it as ab$ab would. ---------------------------------------------------------------------------- Bill Dunlap Insightful Corporation bill at insightful dot com 360-428-8146 "All statements in this message represent the opinions of the author and do not necessarily reflect Insightful Corporation policy or position."
Anil Maliyekkel wrote:
[snip] Going back to the original partial replacement problem, is there anyway to turn off partial matching, or to selectively apply exact matching when trying to access a single element? I can get the desired single element match using the convoluted syntax D["ABC"] [[1]] and perform partial replacement operations on a new column. However, it would be nice to have single element access operator that does exact matching.
The easiest thing to do is probably to always supply full names, and if you want to create a new component, then create it in its entirity (not by assigning to a subset of a non-existent component). A couple of years ago I proposed (with code) an operator '$$' that did only exact matching, but that proposal didn't gather any interest at the time. It might actually make more sense to have the roles switched, so that the ordinary '$' required exact matches, while the special '$$' allowed partial matching (to allow for convenient interactive use). But, that's probably a bigger change than the R code base & community could bear. Then again, what about the following as a way forward to eliminating partial matching on names for "$" and "[[": (1) Announce that partial matching for "$" and "[[" is deprecated (1a) (optional) Introduce "$$" operator with partial matching, intended solely for interactive use, with QA checks to ensure that it is not used in packages (2) Introduce warnings upon use of partial matching with "$" and "[[", with an option() to turn them off. Initially these warnings are off by default, but QA tools turn them on, and package maintainers see the warnings. (3) After a year or two (assuming most packages no longer contain use of partial matching), change the default warning about partial matching to "on". (4) After another year, eliminate partial matching with "$" and "[[". Opinions? -- Tony Plate
Anil
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel