Skip to content

Searching for a pattern within a vector

4 messages · Apoorva Gupta, Petr Savicky, Berend Hasselman +1 more

#
Dear R users,

I have a data.frame as follows

     a b c d e
 [1,] 1 1 1 0 0
 [2,] 1 1 0 0 0
 [3,] 1 1 0 0 0
 [4,] 0 1 1 1 1
 [5,] 0 1 1 1 1
 [6,] 1 1 1 1 1
 [7,] 1 1 1 0 1
 [8,] 1 1 1 0 1
 [9,] 1 1 1 0 0
[10,] 1 1 1 0 0

Within these 4 vectors, I want to choose those vectors for which I
have the pattern (0,0,1,1,1,1) occuring anywhere in the vector.
This means I want vectors a,c,e and not b and d.

Is there any grep feature which can help me do this?
Thank you
Apoorva
#
On Fri, Feb 24, 2012 at 01:00:00PM +0530, Apoorva Gupta wrote:
Hi.

A related thread was 

  [R] matching a sequence in a vector?

which started at

  https://stat.ethz.ch/pipermail/r-help/2012-February/303608.html
  https://stat.ethz.ch/pipermail/r-help/attachments/20120215/989a2e88/attachment.pl

and a summary of suggested solutions was at

  https://stat.ethz.ch/pipermail/r-help/2012-February/303756.html

Try the following, where any of the functions occur* described there
may be used instead of occur1. The original function returned the
vector "candidate" of the indices, where an occurence of "patrn"
in "exmpl" starts. For your purposes, the function has to be modified
in two directions.

  1. The output is the condition length(candidate) != 0 instead of "candidate".
  2. The argument "exmpl" is the first argument.

  # your data frame
  df <- structure(list(a = c(1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), 
    b = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), c = c(1L, 0L, 0L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L), d = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L),
    e = c(0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L)), .Names = c("a", "b", "c",
    "d", "e"), class = "data.frame", row.names = c(NA, -10L))

  # modified function occur1
  testoccur1 <- function(exmpl, patrn)
  {
    m <- length(patrn)
    n <- length(exmpl)
    candidate <- seq.int(length=n-m+1)
    for (i in seq.int(length=m)) {
        candidate <- candidate[patrn[i] == exmpl[candidate + i - 1]]
    }
    length(candidate) != 0
  }

  selection <- unlist(lapply(df, testoccur1, patrn=c(0,0,1,1,1,1)))
  selection 

      a     b     c     d     e 
   TRUE FALSE  TRUE FALSE  TRUE 

  df[, selection]

     a c e
  1  1 1 0
  2  1 0 0
  3  1 0 0
  4  0 1 1
  5  0 1 1
  6  1 1 1
  7  1 1 1
  8  1 1 1
  9  1 1 0
  10 1 1 0

In your post, you printed not a data frame, but a matrix. If your
structure is a matrix, try the following

  # your matrix
  mat <- as.matrix(df)
  mat

        a b c d e
   [1,] 1 1 1 0 0
   [2,] 1 1 0 0 0
   [3,] 1 1 0 0 0
   [4,] 0 1 1 1 1
   [5,] 0 1 1 1 1
   [6,] 1 1 1 1 1
   [7,] 1 1 1 0 1
   [8,] 1 1 1 0 1
   [9,] 1 1 1 0 0
  [10,] 1 1 1 0 0

  # selection of columns
  sel <- apply(mat, 2, testoccur1, patrn=c(0,0,1,1,1,1))
  mat[, sel]

        a c e
   [1,] 1 1 0
   [2,] 1 0 0
   [3,] 1 0 0
   [4,] 0 1 1
   [5,] 0 1 1
   [6,] 1 1 1
   [7,] 1 1 1
   [8,] 1 1 1
   [9,] 1 1 0
  [10,] 1 1 0

Hope this helps.

Petr Savicky.
#
On 24-02-2012, at 08:30, Apoorva Gupta wrote:

            
See this thread "[R] matching a sequence in a vector?" :

http://tolstoy.newcastle.edu.au/R/e17/help/12/02/4201.html

Berend
#
A different approach to this problem is via convolutional filtering.
  f <- function (x, pattern, tolerance = 1e-05) 
  {
      # x is a numeric matrix (or vector or data.frame) of data, pattern is
      # a vector.  This returns the number of times the pattern is found
      # in each column of x.
      # tolerance is just to account for floating point inaccuracy.
      m <- mean(pattern)
      centeredPattern <- pattern - m
      xp <- filter(as.matrix(x) - m, rev(centeredPattern))
      colSums(abs(xp - sum(centeredPattern^2)) < tolerance, na.rm = TRUE)
  }

E.g., with your example data
  z <- data.frame(
       a = c(1, 1, 1, 0, 0, 1, 1, 1, 1, 1),
       b = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
       c = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1),
       d = c(0, 0, 0, 1, 1, 1, 0, 0, 0, 0),
       e = c(0, 0, 0, 1, 1, 1, 1, 1, 0, 0))
  pattern <- c(0, 0, 1, 1, 1, 1)
we get
  > f(z, pattern)
  [1] 1 0 1 0 1
  > f(z, pattern) > 0 # what you asked for
  [1]  TRUE FALSE  TRUE FALSE  TRUE

(I've been showing kids on our local FIRST robotics team how
signal/image processing can be done and your example
reminded me of that.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com