An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101217/58c2f850/attachment.pl>
Matching a pattern of vector of character strings in another vector of character strings
6 messages · Jing Liu, Liviu Andronic, Marc Schwartz +3 more
On Fri, Dec 17, 2010 at 2:34 PM, Jing Liu <quiet_jing0920 at hotmail.com> wrote:
M<- matrix(c("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3)
colnames(M)<- c("2006","2007","2008","2009","2010")
M
? ? 2006 2007 2008 2009 2010 [1,] "0" ?"1" ?"1" ?"*" ?"0" [2,] "0" ?"0" ?"0" ?"1" ?"1" [3,] "1" ?"1" ?"0" ?"1" ?"*"
pattern<- c("0","1")
I would like to find, for each row, if it contains exactly the pattern of two character strings, beginning with a "0" and followed by a "1", i.e, exactly "0" "1". If it does, at which year? E.g. It should return 2006 for row 1, 2008 for row 2 and 2008 for row 3.
I could only think of this
apply(M, 1, function(z) grep('01', paste(z, collapse='')))
[1] 1 1 1
apply(M, 1, function(z) grepl('01', paste(z, collapse='')))
[1] TRUE TRUE TRUE But it doesn't return the position of the matched string. So this isn't what you wanted. Regards Liviu
For as far as I know, the variations of the grep function group cannot search for a pattern that has 2 or more character strings. I could do it with a loop but I seek a more efficient way than a loop. How should I do it? Really appreciated for your help!!! Best regards, Jing Liu ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail
On Dec 17, 2010, at 7:58 AM, Liviu Andronic wrote:
On Fri, Dec 17, 2010 at 2:34 PM, Jing Liu <quiet_jing0920 at hotmail.com> wrote:
M<- matrix(c("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3)
colnames(M)<- c("2006","2007","2008","2009","2010")
M
2006 2007 2008 2009 2010 [1,] "0" "1" "1" "*" "0" [2,] "0" "0" "0" "1" "1" [3,] "1" "1" "0" "1" "*"
pattern<- c("0","1")
I would like to find, for each row, if it contains exactly the pattern of two character strings, beginning with a "0" and followed by a "1", i.e, exactly "0" "1". If it does, at which year? E.g. It should return 2006 for row 1, 2008 for row 2 and 2008 for row 3.
I could only think of this
apply(M, 1, function(z) grep('01', paste(z, collapse='')))
[1] 1 1 1
apply(M, 1, function(z) grepl('01', paste(z, collapse='')))
[1] TRUE TRUE TRUE But it doesn't return the position of the matched string. So this isn't what you wanted. Regards Liviu
For as far as I know, the variations of the grep function group cannot search for a pattern that has 2 or more character strings. I could do it with a loop but I seek a more efficient way than a loop. How should I do it? Really appreciated for your help!!! Best regards, Jing Liu
Try this:
colnames(M)[regexpr("01", apply(M, 1, paste, collapse = ""))]
[1] "2006" "2008" "2008" See ?regexpr for more info. HTH, Marc Schwartz
On Fri, Dec 17, 2010 at 09:34:57PM +0800, Jing Liu wrote:
Dear all, My question is illustrated by the following example: I have a matrix M:
M<- matrix(c("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3)
colnames(M)<- c("2006","2007","2008","2009","2010")
M
2006 2007 2008 2009 2010 [1,] "0" "1" "1" "*" "0" [2,] "0" "0" "0" "1" "1" [3,] "1" "1" "0" "1" "*"
pattern<- c("0","1")
I would like to find, for each row, if it contains exactly the pattern of two character strings, beginning with a "0" and followed by a "1", i.e, exactly "0" "1". If it does, at which year? E.g. It should return 2006 for row 1, 2008 for row 2 and 2008 for row 3.
If the pattern is always c("0","1"), the number of rows is large
and the number of years is relatively small, then this may
computed also using matrix calculations. For example
M <- matrix(c("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3)
colnames(M) <- c("2006","2007","2008","2009","2010")
year <- colnames(M)
status <- rep(NA, times=nrow(M))
for (i in seq(length(year) - 1)) {
status[M[, i] == "0" & M[, i+1] == "1"] <- year[i]
}
status # [1] "2006" "2008" "2008"
Petr Savicky.
On Dec 17, 2010, at 8:34 AM, Jing Liu wrote:
Dear all, My question is illustrated by the following example: I have a matrix M:
M<-
matrix
(c
("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3)
colnames(M)<- c("2006","2007","2008","2009","2010")
M
2006 2007 2008 2009 2010 [1,] "0" "1" "1" "*" "0" [2,] "0" "0" "0" "1" "1" [3,] "1" "1" "0" "1" "*"
pattern<- c("0","1")
I would like to find, for each row, if it contains exactly the pattern of two character strings, beginning with a "0" and followed by a "1", i.e, exactly "0" "1". If it does, at which year? E.g. It should return 2006 for row 1, 2008 for row 2 and 2008 for row 3. For as far as I know, the variations of the grep function group cannot search for a pattern that has 2 or more character strings. I could do it with a loop but I seek a more efficient way than a loop. How should I do it? Really appreciated for your help!!!
You can just paste() each row with collapse="._" and now can use grep-
ish functions as you were hoping to use.
> m2 <- apply(M, 1, paste, collapse="_")
> colnames(M)[(regexpr("0_1", m2)+1)/2] # assuming number of
characters per element are all 1
[1] "2006" "2008" "2008"
David Winsemius, MD West Hartford, CT
On Fri, Dec 17, 2010 at 9:10 AM, Marc Schwartz <marc_schwartz at me.com> wrote:
On Dec 17, 2010, at 7:58 AM, Liviu Andronic wrote:
On Fri, Dec 17, 2010 at 2:34 PM, Jing Liu <quiet_jing0920 at hotmail.com> wrote:
M<- matrix(c("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3)
colnames(M)<- c("2006","2007","2008","2009","2010")
M
? ? 2006 2007 2008 2009 2010 [1,] "0" ?"1" ?"1" ?"*" ?"0" [2,] "0" ?"0" ?"0" ?"1" ?"1" [3,] "1" ?"1" ?"0" ?"1" ?"*"
pattern<- c("0","1")
I would like to find, for each row, if it contains exactly the pattern of two character strings, beginning with a "0" and followed by a "1", i.e, exactly "0" "1". If it does, at which year? E.g. It should return 2006 for row 1, 2008 for row 2 and 2008 for row 3.
I could only think of this
apply(M, 1, function(z) grep('01', paste(z, collapse='')))
[1] 1 1 1
apply(M, 1, function(z) grepl('01', paste(z, collapse='')))
[1] TRUE TRUE TRUE But it doesn't return the position of the matched string. So this isn't what you wanted. Regards Liviu
For as far as I know, the variations of the grep function group cannot search for a pattern that has 2 or more character strings. I could do it with a loop but I seek a more efficient way than a loop. How should I do it? Really appreciated for your help!!! Best regards, Jing Liu
Try this:
colnames(M)[regexpr("01", apply(M, 1, paste, collapse = ""))]
[1] "2006" "2008" "2008" See ?regexpr for more info.
Here is a slight variation which would only be needed if its possible
that a row can have no 01 at all:
ix <- regexpr("01", apply(M, 1, paste, collapse = ""))
colnames(M)[ ifelse(ix > 0, ix, NA_integer_) ]
Note that we must use NA_integer_ and not NA if we want it to not only
work in the case where some rows have no 01 but also work in the case
that there are no 01's in any row at all.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com