changing names with different character sets
On Feb 19, 2012, at 08:49 , Prof Brian Ripley wrote:
On 19/02/2012 07:30, Erin Hodgess wrote:
Dear R People: I'm trying to replicate something that I saw on an R blog. The first step is to load in the .rda file, which is fine. However, some of the names of the columns in the data frame have special characters, accents, and such.
Most of the world think characters with accents are normal, not special. The difference for R is going to be whether they are alphanumeric or not.
How do I get around this on a basic keyboard, please?
Copy-and-paste from names(dataframe) may work. But without an example or knowing your OS or your locale (but I remember you are in the US) it is hard to tell. The main issue is that what R regards as a valid name aka symbol depends on the locale, and so strictly in a US locale no non-ASCII characters are valid in names. In practice US locales tend to be set up either for a Western European character set (Latin-1, cp1252) or so that all alphanumeric Unicode characters in a human language are regarded as alphanumeric.
You could consider a strategy like this:
d <- data.frame(?blefl?de=1:2, Bl?b?rgr?d=3:4) d
?blefl?de Bl?b?rgr?d 1 1 3 2 2 4
names(d)
[1] "?blefl?de" "Bl?b?rgr?d"
iconv(names(d),to="ASCII//TRANSLIT")
[1] "AEbleflode" "Blabaergrod"
names(d) <- iconv(names(d),to="ASCII//TRANSLIT") d
AEbleflode Blabaergrod 1 1 3 2 2 4 (If the characters don't display correctly to begin with, you may need to figure out the appropriate from= argument to iconv() as well.)
Thanks, Erin
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com