An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120808/62bf8f29/attachment.pl>
Using unicode symbol has unexpected results in levels of factor object
2 messages · Wyatt, Kristin M, Peter Dalgaard
On Aug 9, 2012, at 06:53 , Wyatt, Kristin M wrote:
Dear all, When I use a unicode symbol in the labels for a factor object, the corresponding level does not display as expected. However, using levels() on the factor returns the desired output. I noticed the discrepancy when the legend labels from a call to ggplot() did not display the desired symbol, but an explicitly built legend using the same labels did. Example (I am trying to get the less than or equal to symbol):
.df <- data.frame(afp = c(0,0,1,1), time=c(0,2,0,1), surv=c(1, 0.5, 1, 0.4))
afpLabels <- c("AFP \u2264 16", "AFP > 16")
afpStrata <- factor(.df$afp, labels=afpLabels)
afpStrata
[1] AFP ? 16 AFP ? 16 AFP > 16 AFP > 16 Levels: AFP = 16 AFP > 16 The first level is reported as "AFP = 16".
levels(afpStrata)
[1] "AFP ? 16" "AFP > 16"
The desired result is produced with levels(). The code below shows this issue in context through calls to ggplot() if you don't mind loading all the libraries.
library(ggplot2) library(gridExtra) library(plyr) ggplot(.df, aes(time, surv)) + geom_step(aes(color = afpStrata), size = 1.0) ggplot(.df, aes(time, surv)) + geom_step(aes(color = afpStrata), size = 1.0) +
+ scale_colour_hue(breaks=afpLabels, labels=afpLabels)
I am running a pre-compiled version of R on Windows 7 (64-bit).
sessionInfo()
R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit)
For whatever it is worth, this works fine (both examples) under OSX Snow Leopard.
Looking at the code for print.factor, I would strongly suspect that the culprit is the line
n <- length(lev <- encodeString(levels(x), quote = ifelse(quote,
"\"", "")))
which figures since you are in a .1252 locale, not .utf8 (or UTF-8 or ...).
Over to the Windows/locale/charset experts...
Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com