Dear R users,
I have a data.frame as follows
a b c d e
[1,] 1 1 1 0 0
[2,] 1 1 0 0 0
[3,] 1 1 0 0 0
[4,] 0 1 1 1 1
[5,] 0 1 1 1 1
[6,] 1 1 1 1 1
[7,] 1 1 1 0 1
[8,] 1 1 1 0 1
[9,] 1 1 1 0 0
[10,] 1 1 1 0 0
Within these 4 vectors, I want to choose those vectors for which I
have the pattern (0,0,1,1,1,1) occuring anywhere in the vector.
This means I want vectors a,c,e and not b and d.
Is there any grep feature which can help me do this?
Thank you
Apoorva
Apoorva Gupta
Consultant
National Institute of Public Finance and Policy
On Fri, Feb 24, 2012 at 01:00:00PM +0530, Apoorva Gupta wrote:
Dear R users,
I have a data.frame as follows
a b c d e
[1,] 1 1 1 0 0
[2,] 1 1 0 0 0
[3,] 1 1 0 0 0
[4,] 0 1 1 1 1
[5,] 0 1 1 1 1
[6,] 1 1 1 1 1
[7,] 1 1 1 0 1
[8,] 1 1 1 0 1
[9,] 1 1 1 0 0
[10,] 1 1 1 0 0
Within these 4 vectors, I want to choose those vectors for which I
have the pattern (0,0,1,1,1,1) occuring anywhere in the vector.
This means I want vectors a,c,e and not b and d.
Hi.
A related thread was
[R] matching a sequence in a vector?
which started at
https://stat.ethz.ch/pipermail/r-help/2012-February/303608.htmlhttps://stat.ethz.ch/pipermail/r-help/attachments/20120215/989a2e88/attachment.pl
and a summary of suggested solutions was at
https://stat.ethz.ch/pipermail/r-help/2012-February/303756.html
Try the following, where any of the functions occur* described there
may be used instead of occur1. The original function returned the
vector "candidate" of the indices, where an occurence of "patrn"
in "exmpl" starts. For your purposes, the function has to be modified
in two directions.
1. The output is the condition length(candidate) != 0 instead of "candidate".
2. The argument "exmpl" is the first argument.
# your data frame
df <- structure(list(a = c(1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L),
b = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), c = c(1L, 0L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), d = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L),
e = c(0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L)), .Names = c("a", "b", "c",
"d", "e"), class = "data.frame", row.names = c(NA, -10L))
# modified function occur1
testoccur1 <- function(exmpl, patrn)
{
m <- length(patrn)
n <- length(exmpl)
candidate <- seq.int(length=n-m+1)
for (i in seq.int(length=m)) {
candidate <- candidate[patrn[i] == exmpl[candidate + i - 1]]
}
length(candidate) != 0
}
selection <- unlist(lapply(df, testoccur1, patrn=c(0,0,1,1,1,1)))
selection
a b c d e
TRUE FALSE TRUE FALSE TRUE
df[, selection]
a c e
1 1 1 0
2 1 0 0
3 1 0 0
4 0 1 1
5 0 1 1
6 1 1 1
7 1 1 1
8 1 1 1
9 1 1 0
10 1 1 0
In your post, you printed not a data frame, but a matrix. If your
structure is a matrix, try the following
# your matrix
mat <- as.matrix(df)
mat
a b c d e
[1,] 1 1 1 0 0
[2,] 1 1 0 0 0
[3,] 1 1 0 0 0
[4,] 0 1 1 1 1
[5,] 0 1 1 1 1
[6,] 1 1 1 1 1
[7,] 1 1 1 0 1
[8,] 1 1 1 0 1
[9,] 1 1 1 0 0
[10,] 1 1 1 0 0
# selection of columns
sel <- apply(mat, 2, testoccur1, patrn=c(0,0,1,1,1,1))
mat[, sel]
a c e
[1,] 1 1 0
[2,] 1 0 0
[3,] 1 0 0
[4,] 0 1 1
[5,] 0 1 1
[6,] 1 1 1
[7,] 1 1 1
[8,] 1 1 1
[9,] 1 1 0
[10,] 1 1 0
Hope this helps.
Petr Savicky.
Dear R users,
I have a data.frame as follows
a b c d e
[1,] 1 1 1 0 0
[2,] 1 1 0 0 0
[3,] 1 1 0 0 0
[4,] 0 1 1 1 1
[5,] 0 1 1 1 1
[6,] 1 1 1 1 1
[7,] 1 1 1 0 1
[8,] 1 1 1 0 1
[9,] 1 1 1 0 0
[10,] 1 1 1 0 0
Within these 4 vectors, I want to choose those vectors for which I
have the pattern (0,0,1,1,1,1) occuring anywhere in the vector.
This means I want vectors a,c,e and not b and d.
A different approach to this problem is via convolutional filtering.
f <- function (x, pattern, tolerance = 1e-05)
{
# x is a numeric matrix (or vector or data.frame) of data, pattern is
# a vector. This returns the number of times the pattern is found
# in each column of x.
# tolerance is just to account for floating point inaccuracy.
m <- mean(pattern)
centeredPattern <- pattern - m
xp <- filter(as.matrix(x) - m, rev(centeredPattern))
colSums(abs(xp - sum(centeredPattern^2)) < tolerance, na.rm = TRUE)
}
E.g., with your example data
z <- data.frame(
a = c(1, 1, 1, 0, 0, 1, 1, 1, 1, 1),
b = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
c = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1),
d = c(0, 0, 0, 1, 1, 1, 0, 0, 0, 0),
e = c(0, 0, 0, 1, 1, 1, 1, 1, 0, 0))
pattern <- c(0, 0, 1, 1, 1, 1)
we get
> f(z, pattern)
[1] 1 0 1 0 1
> f(z, pattern) > 0 # what you asked for
[1] TRUE FALSE TRUE FALSE TRUE
(I've been showing kids on our local FIRST robotics team how
signal/image processing can be done and your example
reminded me of that.)
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Berend Hasselman
Sent: Friday, February 24, 2012 12:25 AM
To: Apoorva Gupta
Cc: r-help at r-project.org
Subject: Re: [R] Searching for a pattern within a vector
On 24-02-2012, at 08:30, Apoorva Gupta wrote:
Dear R users,
I have a data.frame as follows
a b c d e
[1,] 1 1 1 0 0
[2,] 1 1 0 0 0
[3,] 1 1 0 0 0
[4,] 0 1 1 1 1
[5,] 0 1 1 1 1
[6,] 1 1 1 1 1
[7,] 1 1 1 0 1
[8,] 1 1 1 0 1
[9,] 1 1 1 0 0
[10,] 1 1 1 0 0
Within these 4 vectors, I want to choose those vectors for which I
have the pattern (0,0,1,1,1,1) occuring anywhere in the vector.
This means I want vectors a,c,e and not b and d.