Skip to content

Subset using grepl

4 messages · Kang Min, Brian Ripley

#
Hi all,

I would like to subset a dataframe by using part of the level name.

x <- rep(LETTERS[1:20],3)
y <- rep(1:3, 20)
z <- paste(x,y, sep="")
random.data <- rnorm(60)
data <- as.data.frame(cbind(z, random.data))

I need rows that contain the letters A to J, so I tried:

subset(data, grepl(LETTERS[1:10], z)) # got only rows with A
subset(data, z %in% LETTERS[1:10]) # got no rows

I think I'm getting close to the solution but need a little bit of
help here, thanks in advance.

Kang Min
#
The grep comdition is "[A-J]"

BTW, why there are lots of unnecessary steps here, including using 
cbind() and subset():

x <- rep(LETTERS[1:20],3)
y <- rep(1:3, 20)
z <- paste(x,y, sep="")
random.data <- rnorm(60)
data <- data.frame(z, random.data)
data[grepl("[A-J]", z), ]

Now (for the paranoid and not needed in this example) in general the 
effect of "[A-Z]" depends on the locale, so you could write out 
"[ABCDEFIJK]" or create it by

cond <- paste("[", paste(LETTERS[1:10], collapse=""), "]", sep="")

Or use repl("[A-J]", z, perl=TRUE).
On Sat, 29 Jan 2011, Kang Min wrote:

            

  
    
#
Thanks Prof Ripley, the condition worked!
Btw I tried to search ?repl but I don't have documentation for it. Is
it in a non-basic package?
On Jan 29, 6:54?pm, Prof Brian Ripley <rip... at stats.ox.ac.uk> wrote:
#
On Sat, 29 Jan 2011, Kang Min wrote:

            
I meant grepl: the edit messed up (but not on my screen, as sometimes 
happens when working remotely).  The point is that 'perl=TRUE' 
guarantees that [A-J] is interpreted in ASCII order.