An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120215/989a2e88/attachment.pl>
matching a sequence in a vector?
16 messages · Redding, Matthew, chuck.01, Rainer M Krug +5 more
this is ugly, but...
l <-length(patrn)
l2 <-length(exmpl)
out <- vector("list")
for(i in 1:(l2-l+1))
{
exmpl[i:(i+l-1)]
patrn==exmpl[i:(i+l-1)]
if(all(patrn==exmpl[i:(i+l-1)]))
{ out[[i]] <- i } else { out[[i]] <- "NA"}
}
out <- do.call(c, out)
as.numeric(out[which(out!="NA")])
## Cheers and HTH
Redding, Matthew-2 wrote
Hi All,
I've been trawling through the documentation and listserv archives on this
topic -- but
as yet have not found a solution. I'm sure this is pretty simple with R,
but I cannot work out how without
resorting to ugly nested loops.
As far as I can tell, grep, match, and %in% are not the correct tools.
Question:
given these vectors --
patrn <- c(1,2,3,4)
exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4)
how do I get the desired answer by finding the occurence of the pattern
and returning the starting indices:
6, 13, 23
Suggestions very much appreciated!
Kind regards,
Matt Redding, Ph.D.
Principal Scientist
Geochemist/Soil Chemist
Queensland Primary Industries & Fisheries
DEEDI
PO Box 102, Toowoomba, 4350, Qld
ph: 0746 881372
fax: 0746 881192
********************************DISCLAIMER**************...{{dropped:15}}
______________________________________________ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- View this message in context: http://r.789695.n4.nabble.com/matching-a-sequence-in-a-vector-tp4389523p4389560.html Sent from the R help mailing list archive at Nabble.com.
On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:
Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops. As far as I can tell, grep, match, and %in% are not the correct tools. Question: given these vectors -- patrn <- c(1,2,3,4) exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: 6, 13, 23
Hi.
If the pattern is not too long, try
m <- length(patrn)
n <- length(exmpl)
ind <- seq.int(length=n-m+1)
occur <- rep(TRUE, times=n-m+1)
for (i in seq.int(length=m)) {
occur <- occur & (patrn[i] == exmpl[ind + i - 1])
}
which(occur)
[1] 6 13 23
Hope this helps.
Petr Savicky.
On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:
Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops. As far as I can tell, grep, match, and %in% are not the correct tools. Question: given these vectors -- patrn <- c(1,2,3,4) exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: 6, 13, 23
Hi.
A more efficient version of the previous suggestion
is as follows.
m <- length(patrn)
n <- length(exmpl)
candidate <- seq.int(length=n-m+1)
for (i in seq.int(length=m)) {
candidate <- candidate[patrn[i] == exmpl[candidate + i - 1]]
}
candidate
[1] 6 13 23
In this solution, the set of candidate indices decreases. If
the prefixes of the searched pattern are rare, the set of
candidates is reduced in a few iterations and the remaining
iterations become faster.
Hope this helps.
Petr Savicky.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 15/02/12 05:17, Redding, Matthew wrote:
Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops.
No actual solution - but this sounds to me like a "moving window" statistic. I googled for "moving window R" and found, among others, the following: http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html Maybe this can give you some additional ideas - for your question, you would not calculate the e.g. mean of the moving window, but check if the sequence in the window is equal to the one you are looking for. Cheers, Rainer
As far as I can tell, grep, match, and %in% are not the correct
tools.
Question: given these vectors -- patrn <- c(1,2,3,4) exmpl <-
c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4)
how do I get the desired answer by finding the occurence of the
pattern and returning the starting indices: 6, 13, 23
Suggestions very much appreciated!
Kind regards,
Matt Redding, Ph.D. Principal Scientist Geochemist/Soil Chemist
Queensland Primary Industries & Fisheries DEEDI PO Box 102,
Toowoomba, 4350, Qld ph: 0746 881372 fax: 0746 881192
********************************DISCLAIMER**************...{{dropped:15}}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
- -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: Rainer at krugs.de Skype: RMkrug -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk87b3wACgkQoYgNqgF2egpqxACeNIMFFIDM6oqyejLR5yewNz2W R2AAn1elVRr0zqbADRFyZupWnMirAuZy =BXd9 -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 15/02/12 05:17, Redding, Matthew wrote:
Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops.
Just another idea: what about converting the vector to a character vector exmplstr <- paste(exmpl, collapse="") patrnstr <- paste(patrn, collapse="") and then search for patrnstr in exmplstr? Rainer
As far as I can tell, grep, match, and %in% are not the correct
tools.
Question: given these vectors -- patrn <- c(1,2,3,4) exmpl <-
c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4)
how do I get the desired answer by finding the occurence of the
pattern and returning the starting indices: 6, 13, 23
Suggestions very much appreciated!
Kind regards,
Matt Redding, Ph.D. Principal Scientist Geochemist/Soil Chemist
Queensland Primary Industries & Fisheries DEEDI PO Box 102,
Toowoomba, 4350, Qld ph: 0746 881372 fax: 0746 881192
********************************DISCLAIMER**************...{{dropped:15}}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
- -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: Rainer at krugs.de Skype: RMkrug -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk87dJ4ACgkQoYgNqgF2egojXACcDPXGfqB+8+Cmz80z+plX376y FX4An1+PWTr3OJceYVCHYxz4y02FZ/ei =ru9x -----END PGP SIGNATURE-----
On 15-02-2012, at 05:17, Redding, Matthew wrote:
Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops. As far as I can tell, grep, match, and %in% are not the correct tools. Question: given these vectors -- patrn <- c(1,2,3,4) exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: 6, 13, 23
patrn.rev <- rev(patrn) w <- embed(exmpl,length(patrn)) w.pos <- apply(w,1,function(r) all(r == patrn.rev)) which(w.pos) You can substitute the last three lines to get a oneliner. Berend
On Wed, Feb 15, 2012 at 10:26:44AM +0100, Berend Hasselman wrote:
On 15-02-2012, at 05:17, Redding, Matthew wrote:
Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops. As far as I can tell, grep, match, and %in% are not the correct tools. Question: given these vectors -- patrn <- c(1,2,3,4) exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: 6, 13, 23
patrn.rev <- rev(patrn) w <- embed(exmpl,length(patrn)) w.pos <- apply(w,1,function(r) all(r == patrn.rev)) which(w.pos)
Hi. If the speed is an issue and exmpl is long, the following modification may be faster. patrn.rev <- rev(patrn) w <- embed(exmpl,length(patrn)) which(rowSums(w == rep(patrn.rev, each=nrow(w))) == ncol(w)) [1] 6 13 23 For length(patrn) = 11 and length(exmpl) = 10000, i obtained a speed up by a factor of 10. Hope this helps. How large are the vectors "patrn" and "exmpl" in your application? Petr Savicky.
On 02/14/2012 11:45 PM, Petr Savicky wrote:
On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:
Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops. As far as I can tell, grep, match, and %in% are not the correct tools. Question: given these vectors -- patrn<- c(1,2,3,4) exmpl<- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: 6, 13, 23
match(exmpl, patrn) returns indexes that differ by 1 if the sequence patrn occurs n = length(patrn) r = rle(diff(match(exmpl, patrn)) == 1) we're looking for a run of TRUE's of length 3, and can find their ends (of the runs of diffs) as cumsum(r$length) cumsum(r$length)[r$values & r$length == (n - 1)] - (n - 2) Seems like there could be edge cases that I'm missing... Martin
Hi.
If the pattern is not too long, try
m<- length(patrn)
n<- length(exmpl)
ind<- seq.int(length=n-m+1)
occur<- rep(TRUE, times=n-m+1)
for (i in seq.int(length=m)) {
occur<- occur& (patrn[i] == exmpl[ind + i - 1])
}
which(occur)
[1] 6 13 23
Hope this helps.
Petr Savicky.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
On Wed, Feb 15, 2012 at 06:27:01AM -0800, Martin Morgan wrote:
On 02/14/2012 11:45 PM, Petr Savicky wrote:
On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:
Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops. As far as I can tell, grep, match, and %in% are not the correct tools. Question: given these vectors -- patrn<- c(1,2,3,4) exmpl<- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: 6, 13, 23
match(exmpl, patrn) returns indexes that differ by 1 if the sequence patrn occurs n = length(patrn) r = rle(diff(match(exmpl, patrn)) == 1) we're looking for a run of TRUE's of length 3, and can find their ends (of the runs of diffs) as cumsum(r$length) cumsum(r$length)[r$values & r$length == (n - 1)] - (n - 2) Seems like there could be edge cases that I'm missing...
Hi Martin: This is a nice solution. In my opinion, it works, whenever "patrn" does not contain duplicates. Petr.
On 15-02-2012, at 15:27, Martin Morgan wrote:
On 02/14/2012 11:45 PM, Petr Savicky wrote:
On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:
Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops. As far as I can tell, grep, match, and %in% are not the correct tools. Question: given these vectors -- patrn<- c(1,2,3,4) exmpl<- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: 6, 13, 23
match(exmpl, patrn) returns indexes that differ by 1 if the sequence patrn occurs n = length(patrn) r = rle(diff(match(exmpl, patrn)) == 1) we're looking for a run of TRUE's of length 3, and can find their ends (of the runs of diffs) as cumsum(r$length) cumsum(r$length)[r$values & r$length == (n - 1)] - (n - 2) Seems like there could be edge cases that I'm missing...
Clever. However it is quite slow. Berend
Thankyou all for your great, and creative solutions. There is definately more than one way to skin a cat. A colleague alerted me to another solution: seq.strt <- which( sapply( 1:(length(exmpl)-length(patrn)+1), function(i) isTRUE( all.equal( patrn, exmpl[ i + 0:(length(patrn)-1) ] ) ) ) ) Apparently this came from a post in the help archive that my searches missed. I think the solutions you have put up are more readable. Kind regards Matt
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Berend Hasselman Sent: Thursday, 16 February 2012 1:35 AM To: Martin Morgan Cc: r-help at r-project.org Subject: Re: [R] matching a sequence in a vector? On 15-02-2012, at 15:27, Martin Morgan wrote:
On 02/14/2012 11:45 PM, Petr Savicky wrote:
On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:
Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how
without resorting to ugly nested loops.
As far as I can tell, grep, match, and %in% are not the
correct tools.
Question: given these vectors -- patrn<- c(1,2,3,4) exmpl<- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence
of the pattern and returning the starting indices:
6, 13, 23
match(exmpl, patrn) returns indexes that differ by 1 if the sequence patrn occurs n = length(patrn) r = rle(diff(match(exmpl, patrn)) == 1) we're looking for a run of TRUE's of length 3, and can find
their ends
(of the runs of diffs) as cumsum(r$length) cumsum(r$length)[r$values & r$length == (n - 1)] - (n - 2) Seems like there could be edge cases that I'm missing...
Clever. However it is quite slow. Berend
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
********************************DISCLAIMER**************************** The information contained in the above e-mail message or messages (which includes any attachments) is confidential and may be legally privileged. It is intended only for the use of the person or entity to which it is addressed. If you are not the addressee any form of disclosure, copying, modification, distribution or any action taken or omitted in reliance on the information is unauthorised. Opinions contained in the message(s) do not necessarily reflect the opinions of the Queensland Government and its authorities. If you received this communication in error, please notify the sender immediately and delete it from your computer system network.
yet another solution (I think)
patrn<- c(1,2,3,4)
exmpl<- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4)
indx <- embed(rev(seq_along(exmpl)), length(patrn))
matches <- apply(indx, 1, function(.indx){
+ all(exmpl[.indx] == patrn) + })
indx[matches, 1L]
[1] 23 13 6
On Wed, Feb 15, 2012 at 6:32 PM, Redding, Matthew
<Matthew.Redding at deedi.qld.gov.au> wrote:
Thankyou all for your great, and creative solutions. There is definately more than one way to skin a cat. A colleague alerted me to another solution: seq.strt <- which( sapply( 1:(length(exmpl)-length(patrn)+1), function(i) isTRUE( all.equal( patrn, exmpl[ i + 0:(length(patrn)-1) ] ) ) ) ) Apparently this came from a post in the help archive that my searches missed. I think the solutions you have put up are more readable. Kind regards Matt
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Berend Hasselman Sent: Thursday, 16 February 2012 1:35 AM To: Martin Morgan Cc: r-help at r-project.org Subject: Re: [R] matching a sequence in a vector? On 15-02-2012, at 15:27, Martin Morgan wrote:
On 02/14/2012 11:45 PM, Petr Savicky wrote:
On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:
Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. ?I'm sure this is pretty simple with R, but I cannot work out how
without resorting to ugly nested loops.
As far as I can tell, grep, match, and %in% are not the
correct tools.
Question: given these vectors -- patrn<- c(1,2,3,4) exmpl<- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence
of the pattern and returning the starting indices:
6, 13, 23
match(exmpl, patrn) returns indexes that differ by 1 if the sequence patrn occurs ?n = length(patrn) ?r = rle(diff(match(exmpl, patrn)) == 1) we're looking for a run of TRUE's of length 3, and can find
their ends
(of the runs of diffs) as cumsum(r$length) ?cumsum(r$length)[r$values & r$length == (n - 1)] - (n - 2) Seems like there could be edge cases that I'm missing...
Clever. However it is quite slow. Berend
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
********************************DISCLAIMER**************************** The information contained in the above e-mail message or messages (which includes any attachments) is confidential and may be legally privileged. ?It is intended only for the use of the person or entity to which it is addressed. ?If you are not the addressee any form of disclosure, copying, modification, distribution or any action taken or omitted in reliance on the information is unauthorised. ?Opinions contained in the message(s) do not necessarily reflect the opinions of the Queensland Government and its authorities. ?If you received this communication in error, please notify the sender immediately and delete it from your computer system network.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
On Tue, Feb 14, 2012 at 11:17 PM, Redding, Matthew
<Matthew.Redding at deedi.qld.gov.au> wrote:
I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. ?I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops. As far as I can tell, grep, match, and %in% are not the correct tools. Question: given these vectors -- patrn <- c(1,2,3,4) exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: 6, 13, 23
Here is a one-liner: library(zoo) which(rollapply(exmpl, 4, identical, patrn, fill = FALSE, align = "left"))
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Wed, Feb 15, 2012 at 08:12:32PM -0500, Gabor Grothendieck wrote:
On Tue, Feb 14, 2012 at 11:17 PM, Redding, Matthew <Matthew.Redding at deedi.qld.gov.au> wrote:
I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. ?I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops. As far as I can tell, grep, match, and %in% are not the correct tools. Question: given these vectors -- patrn <- c(1,2,3,4) exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: 6, 13, 23
Here is a one-liner: library(zoo) which(rollapply(exmpl, 4, identical, patrn, fill = FALSE, align = "left"))
Hi.
There were several solutions in this thread. Their speed differs
quite significantly. Here is a comparison.
patrn <- 1:4
exmpl <- sample(1:4, 10000, replace=TRUE)
occur1 <- function(patrn, exmpl)
{
m <- length(patrn)
n <- length(exmpl)
candidate <- seq.int(length=n-m+1)
for (i in seq.int(length=m)) {
candidate <- candidate[patrn[i] == exmpl[candidate + i - 1]]
}
candidate
}
occur2 <- function(patrn, exmpl)
{
patrn.rev <- rev(patrn)
w <- embed(exmpl,length(patrn))
which(apply(w,1,function(r) all(r == patrn.rev)))
}
occur3 <- function(patrn, exmpl)
{
patrn.rev <- rev(patrn)
w <- embed(exmpl,length(patrn))
which(rowSums(w == rep(rev(patrn), each=nrow(w))) == ncol(w))
}
occur4 <- function(patrn, exmpl)
{
# requires patrn without duplicates
n = length(patrn)
r = rle(diff(match(exmpl, patrn)) == 1L)
cumsum(r$length)[r$values & r$length == (n - 1L)] - (n - 2L)
}
occur5 <- function(patrn, exmpl)
{
which( sapply( 1:(length(exmpl)-length(patrn)+1), function(i) isTRUE( all.equal( patrn, exmpl[i + 0:(length(patrn)-1) ] ) ) ) )
}
occur6 <- function(patrn, exmpl)
{
indx <- embed(rev(seq_along(exmpl)), length(patrn))
matches <- apply(indx, 1, function(.indx){
all(exmpl[.indx] == patrn)
})
rev(indx[matches, 1L])
}
occur7 <- function(patrn, exmpl)
{
which(rollapply(exmpl, length(patrn), identical, patrn, fill = FALSE, align = "left"))
}
library(zoo)
t1 <- system.time( out1 <- occur1(patrn, exmpl) )
t2 <- system.time( out2 <- occur2(patrn, exmpl) )
t3 <- system.time( out3 <- occur3(patrn, exmpl) )
t4 <- system.time( out4 <- occur4(patrn, exmpl) )
t5 <- system.time( out5 <- occur5(patrn, exmpl) )
t6 <- system.time( out6 <- occur6(patrn, exmpl) )
t7 <- system.time( out7 <- occur7(patrn, exmpl) )
print(identical(out1, out2))
print(identical(out1, out3))
print(identical(out1, out4))
print(identical(out1, out5))
print(identical(out1, out6))
print(identical(out1, out7))
print(rbind(t1, t2, t3, t4, t5, t6, t7))
The output was
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
user.self sys.self elapsed user.child sys.child
t1 0.001 0 0.001 0 0
t2 0.062 0 0.061 0 0
t3 0.002 0 0.002 0 0
t4 0.001 0 0.001 0 0
t5 1.749 0 1.749 0 0
t6 0.068 0 0.068 0 0
t7 0.172 0 0.172 0 0
Petr Savicky.
On 16-02-2012, at 09:01, Petr Savicky wrote:
Hi.
There were several solutions in this thread. Their speed differs
quite significantly. Here is a comparison.
patrn <- 1:4
exmpl <- sample(1:4, 10000, replace=TRUE)
occur1 <- function(patrn, exmpl)
{
m <- length(patrn)
n <- length(exmpl)
candidate <- seq.int(length=n-m+1)
for (i in seq.int(length=m)) {
candidate <- candidate[patrn[i] == exmpl[candidate + i - 1]]
}
candidate
}
occur2 <- function(patrn, exmpl)
{
patrn.rev <- rev(patrn)
w <- embed(exmpl,length(patrn))
which(apply(w,1,function(r) all(r == patrn.rev)))
}
occur3 <- function(patrn, exmpl)
{
patrn.rev <- rev(patrn)
w <- embed(exmpl,length(patrn))
which(rowSums(w == rep(rev(patrn), each=nrow(w))) == ncol(w))
}
occur4 <- function(patrn, exmpl)
{
# requires patrn without duplicates
n = length(patrn)
r = rle(diff(match(exmpl, patrn)) == 1L)
cumsum(r$length)[r$values & r$length == (n - 1L)] - (n - 2L)
}
occur5 <- function(patrn, exmpl)
{
which( sapply( 1:(length(exmpl)-length(patrn)+1), function(i) isTRUE( all.equal( patrn, exmpl[i + 0:(length(patrn)-1) ] ) ) ) )
}
occur6 <- function(patrn, exmpl)
{
indx <- embed(rev(seq_along(exmpl)), length(patrn))
matches <- apply(indx, 1, function(.indx){
all(exmpl[.indx] == patrn)
})
rev(indx[matches, 1L])
}
occur7 <- function(patrn, exmpl)
{
which(rollapply(exmpl, length(patrn), identical, patrn, fill = FALSE, align = "left"))
}
library(zoo)
t1 <- system.time( out1 <- occur1(patrn, exmpl) )
t2 <- system.time( out2 <- occur2(patrn, exmpl) )
t3 <- system.time( out3 <- occur3(patrn, exmpl) )
t4 <- system.time( out4 <- occur4(patrn, exmpl) )
t5 <- system.time( out5 <- occur5(patrn, exmpl) )
t6 <- system.time( out6 <- occur6(patrn, exmpl) )
t7 <- system.time( out7 <- occur7(patrn, exmpl) )
print(identical(out1, out2))
print(identical(out1, out3))
print(identical(out1, out4))
print(identical(out1, out5))
print(identical(out1, out6))
print(identical(out1, out7))
print(rbind(t1, t2, t3, t4, t5, t6, t7))
The output was
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
user.self sys.self elapsed user.child sys.child
t1 0.001 0 0.001 0 0
t2 0.062 0 0.061 0 0
t3 0.002 0 0.002 0 0
t4 0.001 0 0.001 0 0
t5 1.749 0 1.749 0 0
t6 0.068 0 0.068 0 0
t7 0.172 0 0.172 0 0
And by modifying occur5 to this
occur5 <- function(patrn, exmpl)
{
which( sapply( 1:(length(exmpl)-length(patrn)+1),
function(i) identical( patrn, exmpl[i + 0:(length(patrn)-1) ] ) ) )
}
occur5 can be made a lot faster.
On my computer instead of
user.self sys.self elapsed user.child sys.child
t1 0.001 0.000 0.001 0 0
t2 0.061 0.007 0.068 0 0
t3 0.002 0.001 0.002 0 0
t4 0.001 0.000 0.002 0 0
t5 1.640 0.037 1.677 0 0
t6 0.079 0.004 0.084 0 0
t7 0.256 0.004 0.260 0 0
I got
user.self sys.self elapsed user.child sys.child
t1 0.000 0.000 0.001 0 0
t2 0.060 0.004 0.065 0 0
t3 0.002 0.001 0.003 0 0
t4 0.001 0.000 0.002 0 0
t5 0.070 0.002 0.071 0 0
t6 0.076 0.000 0.077 0 0
t7 0.246 0.006 0.252 0 0
Berend