please comment on my function
You can alway convert to lower case afterwards with probably a shorter vector. You did not indicate that you needed that conversion; it only looked like you did it for the regular expression.
On Fri, Sep 14, 2012 at 3:13 PM, Sam Steingold <sds at gnu.org> wrote:
* jim holtman <wubygzna at tznvy.pbz> [2012-09-14 13:10:37 -0400]: more than half the time is in 'tolower' and 'nchar', so it is not all 'sub's problem.
aha, thanks!
This version runs a little faster since it does not need the 'tolower':
canonicalize.language <- function (s) {
# s <- tolower(s)
long <- nchar(s) == 5
s[long] <- sub("^([[:alpha:]]{2})[-_][[:alpha:]]{2}$","\\1",s[long])
s[nchar(s) != 2 & s != "c"] <- "unknown"
s
}
but it does not convert "EN" to "en", so it is not good for my purposes. -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/ http://thereligionofpeace.com http://mideasttruth.com http://iris.org.il http://honestreporting.com http://memri.org Life is like Tetris: failures accumulate, successes fade.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.