Skip to content

Unexpected behavior when giving a value to a new variable based on the value of another variable

7 messages · Angel Rodriguez, jim holtman, John McKown +3 more

#
Dear subscribers,

I've found that if there is a variable in the dataframe with a name very similar to a new variable, R does not give the correct values to this latter variable based on the values of a third value:
+                class = "data.frame")
age sample
1  67      1
2  62     NA
3  74      1
4  61     NA
5  60     NA
6  55     NA
7  60     NA
8  59     NA
9  58     NA
+                     .Names = c("age","samplem"), row.names = c(NA, -9L), class = "data.frame")
age samplem sample
1  67      NA      1
2  62       1      1
3  74       1      1
4  61       1      1
5  60       1      1
6  55       1      1
7  60       1      1
8  59       1      1
9  58      NA     NA



Any clue for this behavior?



My specifications:

R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252    LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C                   LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] foreign_0.8-61

loaded via a namespace (and not attached):
[1] tools_3.1.1




Thank you very much.

Angel Rodriguez-Laso
Research project manager
Matia Instituto Gerontologico
#
You are being bitten by the "partial matching" of the "$" operator
(see  ?"$" for a better explanation).  Here is solution that works:


**original**
+                     .Names = c("age","samplem"), row.names = c(NA,
-9L), class = "data.frame")
age samplem sample
1  67      NA      1
2  62       1      1
3  74       1      1
4  61       1      1
5  60       1      1
6  55       1      1
7  60       1      1
8  59       1      1
9  58      NA     NA
+                     .Names = c("age","samplem"), row.names = c(NA,
-9L), class = "data.frame")
age samplem sample
1  67      NA      1
2  62       1     NA
3  74       1      1
4  61       1     NA
5  60       1     NA
6  55       1     NA
7  60       1     NA
8  59       1     NA
9  58      NA     NA

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Aug 29, 2014 at 4:53 AM, Angel Rodriguez
<angel.rodriguez at matiainstituto.net> wrote:
#
On Fri, Aug 29, 2014 at 3:53 AM, Angel Rodriguez
<angel.rodriguez at matiainstituto.net> wrote:
<snip>
<snip>
That is unusual, but appears to be documented in a section from

?`[`

<quote>
Character indices

Character indices can in some circumstances be partially matched (see
pmatch) to the names or dimnames of the object being subsetted (but
never for subassignment). Unlike S (Becker et al p. 358)), R never
uses partial matching when extracting by [, and partial matching is
not by default used by [[ (see argument exact).

Thus the default behaviour is to use partial matching only when
extracting from recursive objects (except environments) by $. Even in
that case, warnings can be switched on by
options(warnPartialMatchDollar = TRUE).

Neither empty ("") nor NA indices match any names, not even empty nor
missing names. If any object has no names or appropriate dimnames,
they are taken as all "" and so match nothing.
</quote>

Note the commend about "partial matching" in the middle paragraph in
the quote above.
#
One clue is the help file for "$"...

?" $"

In particular there see the discussion of character indices and the "exact" argument.

You can also find this discussed in the Introduction to R document that comes with the software.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
On August 29, 2014 1:53:47 AM PDT, Angel Rodriguez <angel.rodriguez at matiainstituto.net> wrote:
#
On Fri, 29 Aug 2014 06:33:01 -0700 Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote
<...snip...>
<...sip...>
<...snip...>

Having seen all the responses about partial matching I almost understand. I've
also replicated the behaviour on R 2.11.1 so it's been around awhile. This
tells me it ain't a bug - so if any of the cognoscenti have the time and
inclination can someone give me a brief (and hopefully simple) explanation of
what is going on under the hood?

It looks (to me) like N$sample[N$age >= 65] <- 1 copies N$samplem to N$sample
and then does the assignment. If partial matching is the problem (which it
clearly is) my expectation is that  the  output should look like

   age samplem
1   67       1
2   62       1
3   74       1
4   61       1
5   60       1
6   55       1
7   60       1
8   59       1
9   58      NA
That is - no new column.
(and I just hate it when the world doesn't live up to my expectations!)

Bewildered and confused,
DMcP

____________________________________________________________
South Africas premier free email service - www.webmail.co.za 

Cotlands - Shaping tomorrows Heroes http://www.cotlands.org.za/
#
On Aug 29, 2014, at 8:54 PM, David McPearson wrote:

            
Not sure what you are seeing. I am seeing what you expected:

 > test <- data.frame(age=1:10, sample=1)
 > test$sample[test$age<5] <- 2
 > test
    age sample
1    1      2
2    2      2
3    3      2
4    4      2
5    5      1
6    6      1
7    7      1
8    8      1
9    9      1
10  10      1
#
On Aug 30, 2014, at 7:38 PM, David Winsemius wrote:

            
I realized later that I had not constructed a test of you behavior and  
that when I did I see the creation of a third column. The answer is to  
read the help page:

?`[<-`

"Character indices can in some circumstances be partially matched (see  
pmatch) to the names or dimnames of the object being subsetted (but  
never for subassignment). "

Note the caveat in parentheses.