Skip to content

Help with locale on OS X

2 messages · Don MacQueen, Simon Urbanek

#
For several years I've been running scripts with this expression

   substr(tmp.un[substr(tmp.un,1,1)=='u'],1,1) <- '\265'

whose purpose is to replace the letter 'u' with a 
Greek mu, to change, for example,  'ug/L' to 
'?g/L'. It has been working over several versions 
of R.

Recently, I started (sometimes?) getting error messages from the expression:
Error in "substr<-"(`*tmp*`, 1, 1, value = "<b5>") :
         invalid multibyte string

In the same R session:
[1] "C/en_US.UTF-8/C/C/C/C"

Yet, in another R session on the same machine (in a different directory)
[1] "C"

If I go back to the first session, that had the problem, and do
    > Sys.setlocale('LC_ALL','C')
then the error goes away. This is good.

But I have no idea why the locale is apparently 
getting set wrong; I don't believe I ever 
explicitly wrote a "set locale" expression of any 
sort anywhere in the scripts for this project. 
Maybe it's a side effect of something, but I 
don't know what.

Any suggestions would be most appreciated.

Thanks
-Don
R version 2.2.1, 2005-12-26, powerpc-apple-darwin7.9.0

attached base packages:
[1] "stats"   "utils"   "methods" "base"

other attached packages:
  xtable   rmacq ROracle     DBI
"1.3-0"   "1.0" "0.5-5" "0.1-9"

(and I will move to 2.3.0 soon)
#
On Apr 24, 2006, at 6:02 PM, Don MacQueen wrote:

            
With high probability this is due to UTF-8 locale being used. You can  
check that easily by looking for
"Natural language support but running in an English locale"
in the greeting.

Starting with R 2.3.0 (due to updated gettext) the locale is  
determined from the system settings. Previously, the system setting  
was always ignored. Right now it is used if no other locale setting  
is set, such as LANG, LC_ALL etc. If you want to force C locale, you  
should run your scripts using something like
LANG=C R
and this is what most scripts do if they want to force no-locale  
environment. Note, however, that UTF-8 is required by the system  
utilities, so forcing a non-UTF-8 locale will prevent you from using  
non-ASCII characters in file names or etc. On Mac OS X is it usually  
safer to write everything in UTF-8 including your scripts.

Cheers,
Simon