grep with fixed=TRUE and ignore.case=TRUE

Brian Ripley · 2007-05-17T09:28:55Z

On Thu, 17 May 2007, Petr Savicky wrote: >> strncasecmp is not standard C (not even C99), but R does have a substitute >> for it. Unfortunately strncasecmp is not usable with multibyte charsets: >> Linux systems have wcsncasecmp but that is not portable. In these days of >> widespread use of UTF-8 that is a blocking issue, I am afraid. > > What could help are the functions mbrtowc and towctrans and simple > long integer comparison. Are the functions mbrtowc and towctrans > available under Win

Brian Ripley

Thu, May 17, 2007 2:28 AM

On Thu, 17 May 2007, Petr Savicky wrote:

I don't see it in Rdll.hide.  It is a C99 function (see your unix man 
page).

UTF-8 is not usable under Windows, but tolower works in Windows DBCS (in 
so far as that makes sense: Chinese chars do not have 'case').

Rmbrtowc reflects an attempt to add UTF-8 support on Windows, but that is 
not currently active.

He may, but that is not what 'ignore case' means, more like 'case 
honouring'.

Yes, there is a comment on the help page to that effect.  But these are 
highly atypical uses. Try perl=TRUE, and be aware that the locale matters 
a lot in such tests (via the charset).

No one is attempting to make R a fast string-processing language and so 
developers resources are spent on performance where it matters to more 
typical usage.  (E.g. reducing duplication in as.double and friends speeds 
up just about every R session, and speeds up some numerical sessions 
dramatically.)

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

grep with fixed=TRUE and ignore.case=TRUE

Thread (7 messages)