Skip to content

extraction of sub-matrix by name

2 messages · Yurii Aulchenko, Tony Plate

#
Dear all,

sorry to bother you with potentially known issue --

we have noticed that if we select data frame rows by rownames, we get  
some results back if the match can be done unambiguously, though the  
match is not perfect (see example), e.g. x{"2",] will return a row if  
there is a unique row with name starting with "2" (but may be  
"2375745"!)

is that a planned behavior of R which will be maintained? for us it  
was a bit unexpected...

Yurii

-----------------------------
 > a <- data.frame(x=1:3, y=1:3)
 > rownames(a) <- c("2535","59617","555")
 > a
       x y
2535  1 1
59617 2 2
555   3 3
 > a["5",]
     x  y
NA NA NA
 > a["555",]
     x y
555 3 3
 > a["2",]
      x y
2535 1 1
 > version
                _
platform       i386-apple-darwin9.8.0
arch           i386
os             darwin9.8.0
system         i386, darwin9.8.0
status         Under development (unstable)
major          2
minor          11.0
year           2009
month          12
day            07
svn rev        50688
language       R
version.string R version 2.11.0 Under development (unstable)  
(2009-12-07 r50688)

-----------------------------------------------------------
Yurii Aulchenko
Erasmus MC Rotterdam
Department of Epidemiology, Ee 2200
Postbus 2040, 3000 CA Rotterdam
The Netherlands

phone: +31107043486
fax: +31107044657
#
This is documented behavior. From ?"[.data.frame":

Both ?[? and ?[[? extraction methods partially match row
names. By default neither partially match column names, but
?[[? will unless ?exact=TRUE?. If you want to do exact
matching on row names use ?match? as in the examples.

In the history of S, S-PLUS and R partial matching for subscripts and 
indices has been used in many places, but over time it has been removed 
from some. Some methods for "[[" have an "exact" argument that allows 
specification of whether you want exact or prefix matches. However, as 
the docs indicate, this doesn't apply to "[" for rownames in data 
frames. At this point, the behavior is unlikely to change -- it risks 
breaking too much old code (that's just my opinion based on observation 
of the evolution of R -- I have no control over that evolution). Follow 
the suggestion in the help for [.data.frame to get exact matching.

-- Tony Plate
Yurii Aulchenko wrote: