Back to formatted view
Raw Message

Message-ID: <CAAxdm-5y7PLfZ1D48EQXQzEaUJi=XgsEnT6amidJz6V5CrVjWQ@mail.gmail.com>
Date: 2012-09-15T00:06:33Z
From: jim holtman
Subject: please comment on my function
In-Reply-To: <874nn04dui.fsf@gnu.org>

You can alway convert to lower case afterwards with probably a shorter
vector.  You did not indicate that you needed that conversion; it only
looked like you did it for the regular expression.

On Fri, Sep 14, 2012 at 3:13 PM, Sam Steingold <sds at gnu.org> wrote:
>> * jim holtman <wubygzna at tznvy.pbz> [2012-09-14 13:10:37 -0400]:
>>
>> more than half the time is in 'tolower' and 'nchar', so it is not all
>> 'sub's problem.
>
> aha, thanks!
>
>> This version runs a little faster since it does not need the 'tolower':
>>
>> canonicalize.language <- function (s) {
>>   # s <- tolower(s)
>>   long <- nchar(s) == 5
>>   s[long] <- sub("^([[:alpha:]]{2})[-_][[:alpha:]]{2}$","\\1",s[long])
>>   s[nchar(s) != 2 & s != "c"] <- "unknown"
>>   s
>> }
>
> but it does not convert "EN" to "en", so it is not good for my purposes.
>
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
> http://www.childpsy.net/ http://thereligionofpeace.com http://mideasttruth.com
> http://iris.org.il http://honestreporting.com http://memri.org
> Life is like Tetris: failures accumulate, successes fade.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.