Skip to content

problem with pattern matching

10 messages · David Winsemius, jim holtman, Don MacQueen +3 more

#
dear all,

I got a problem with pattern matching using grep. I extracted a list of
characters from a data frame, and I tried to match this list of characters
to a column from another data frame. In return, I got only one match, but
there should be far more matches. Any ideas what has gone wrong?

Thanks
#
On Aug 4, 2009, at 11:16 AM, Rnewbie wrote:

            
In general this falls into the category of  a request to "read my  
mind". One, out of probably an infinite number, of ways to get such a  
result is to use if()  when you needed ifelse().
Cannot even assign a semantic meaning to that one. What is are "non- 
initial parts of the elements of another table"?


******************************************************************
******************************************************************
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
#
I wanted to extract my interested rows from a dataframe. I used:

grep(list$ID, dataframe$ID, value=T) #list contains a list of my interested
IDs

I got one match in return, which is the very first ID in list. It seems the
matching process just stopped, once the first match was found.
David Winsemius wrote:

  
    
#
I think you want to use either 'match' or '%in%'

x <- dataframe$ID %in% list$ID  # TRUE if it is in list
On Wed, Aug 5, 2009 at 5:36 AM, Rnewbie<xuancj at yahoo.com> wrote:

  
    
#
the problem in my case is that some of the cells in dataframe$ID contain
multiple IDs, but in list$ID there is only one ID in each cell, so some of
the IDs cannot be matched if using the fucntion 'match' or '%in%'
jholtman wrote:

  
    
#
Perhaps
   intersect()
or
   merge()
will help. But, like others, I find it difficult to understand 
exactly what you want. I'd suggest providing a short example with 
actual ID values.

-Don
At 2:36 AM -0700 8/5/09, Rnewbie wrote:

  
    
#
I have a list of IDs like this:

AB1234
AB4567
AB8901

In my dataset, there are IDs like this:

AB5555
AB7777 /// AB1234
AB4567 /// AB8901 /// AB6666

I used grep(list$ID, dataset$ID, value=T)
It returned only one match, which was the very first match AB7777 ///
AB1234. It seems once the first match was found, the matching procedure just
stopped.
Don MacQueen wrote:
interested
the
http://*www.*R-project.org/posting-guide.html

  
    
#
Hi,

I don't think grep can handle a vector of patterns.
[1] 4

This call is equivalent to:
grep( "foo1", c("fffoo5", "fffoo6", "fffoo2", "fffoo1") )


Maybe you could use the plyr package. I am only speculating, but something like this might work:

ddply( list, .(ID), function(x) dataframe[ grep(x$ID[[1]], dataframe$ID) , ] )

ddply splits "list" by ID in smaller dataframes. Assuming each ID is unique in list, you have dataframes of 1 line ("x" in the code line). So you take the ID and grep for it in dataframe. Then you return the corresponding line of dataframe (assuming there is always 1 and only 1 line or it might fail, not sure)

Maybe someone can come up with a more efficient way of doing it. The whole trick is to use grep with a vector of patterns.

Xavier

----- Mail Original -----
De: "Don MacQueen" <macq at llnl.gov>
?: "Rnewbie" <xuancj at yahoo.com>, r-help at r-project.org
Envoy?: Mercredi 5 Ao?t 2009 16h49:58 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne
Objet: Re: [R] problem with pattern matching

Perhaps
   intersect()
or
   merge()
will help. But, like others, I find it difficult to understand 
exactly what you want. I'd suggest providing a short example with 
actual ID values.

-Don
At 2:36 AM -0700 8/5/09, Rnewbie wrote:

  
    
#
grep(pattern,text) expects that pattern is a scalar string.
It, like quite a few other R functions, will not alert you if
you pass it several strings: it silently ignores all but the
first.  S+'s grep() will throw an errorg if length(pattern)!=0.
E.g.,

RS> grep(pattern=c("a+", "b+"), c("cat","dog","bear"), value=TRUE)
S+: Problem in regexpr(pattern, text): pattern should be a single character string, length is 2
S+: Use traceback() to see the call stack
R : [1] "cat"  "bear"

I think it would be better if this error were caught at runtime.
R does catch the 0-length argument, but gives a pretty generic
error message (perhaps to make translations easier):

RS> grep(pattern=character(), c("cat","dog","beet"))
S+: Problem in regexpr(pattern, text): pattern should be a single character string, length is 0
S+: Use traceback() to see the call stack
R : Error in grep(pattern = character(), c("cat", "dog", "beet")) :
R :   invalid argument

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
#
Thank you very much for the answer. I just solved the problem by writing a
loop for grep(), so that R runs through the ID list one by one. Probably it
is not the ideal method, but still I can now proceed to further work.
William Dunlap wrote: