test if elements of a character vector contain letters
On Aug 6, 2012, at 12:06 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
Perhaps I am missing something, but why use sapply() when grepl() is already vectorized?
is.letter <- function(x) grepl("[:alpha:]", x)
is.number <- function(x) grepl("[:digit:]", x)
Sorry, typos in the above from my C&P. Should be:
is.letter <- function(x) grepl("[[:alpha:]]", x)
is.number <- function(x) grepl("[[:digit:]]", x)
Marc
x <- c(letters, 1:26) x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='') x <- rep(x, 1e3)
str(x)
chr [1:52000] "a2" "b10" "c8" "d3" "e6" "f1" "g5" ...
system.time(is.letter(x))
user system elapsed 0.011 0.000 0.010
system.time(is.number(x))
user system elapsed 0.010 0.000 0.011 Regards, Marc Schwartz On Aug 6, 2012, at 11:51 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello,
Fun as an exercise in vectorization. 30 times faster. Don't look, guess.
Gave it up? Ok, here it is.
is_letter <- function(x, pattern=c(letters, LETTERS)){
sapply(x, function(y){
any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
})
}
# test ascii codes, just one loop.
has_letter <- function(x){
sapply(x, function(y){
y <- as.integer(charToRaw(y))
any((65 <= y & y <= 90) | (97 <= y & y <= 122))
})
}
x <- c(letters, 1:26)
x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
x <- rep(x, 1e3)
t1 <- system.time(is_letter(x))
t2 <- system.time(has_letter(x))
rbind(t1, t2, t1/t2)
user.self sys.self elapsed user.child sys.child
t1 15.69 0 15.74 NA NA
t2 0.50 0 0.50 NA NA
31.38 NaN 31.48 NA NA
Em 06-08-2012 17:25, Liviu Andronic escreveu:
Dear all I'm pretty sure that I'm approaching the problem in a wrong way. Suppose the following character vector:
(x[1:10] <- paste(x[1:10], sample(1:10, 10), sep=''))
[1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4"
x
[1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" "k"
"l" "m" "n"
[15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y"
"z" "1" "2"
[29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
"14" "15" "16"
[43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
How do you test whether the elements of the vector contain at least
one letter (or at least one digit) and obtain a logical vector of the
same dimension? I came up with the following awkward function:
is_letter <- function(x, pattern=c(letters, LETTERS)){
sapply(x, function(y){
any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
})
}
is_letter(x)
a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE p q r s t u v w x y z 1 2 3 4 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 20 21 22 23 24 25 26 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
is_letter(x, 0:9) ##function slightly misnamed
a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE p q r s t u v w x y z 1 2 3 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 20 21 22 23 24 25 26 TRUE TRUE TRUE TRUE TRUE TRUE TRUE Is there a nicer way to do this? Regards Liviu
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.