This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --27464147-1221975610-1205822844=:9482 Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 8BIT This has already been corrected in R-devel. It was wrong to set the encoding to that of the element of 'x': gsub will have changed it (to native or UTF-8).
On Mon, 17 Mar 2008, christian.buchta at wu-wien.ac.at wrote:
This is a multi-part message in MIME format. --------------040104050805010601010607 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Hi, May this be an oversight? R version 2.6.2 Patched (2008-03-13 r44783) Copyright (C) 2008 The R Foundation for Statistical Computing ISBN 3-900051-07-0 ...
x <- "ab?" Encoding(x)
[1] "latin1"
Encoding(gsub("?","", x))
[1] "unknown"
Encoding(gsub("?","", x, perl = TRUE))
[1] "latin1" The code in src/main/pcre.c (see also do_tolower and do_strsplit in src/main/character.c) suggests to patch as attached.
x <- "ab?"
Encoding(gsub("?","", x))
[1] "latin1" Happy Easter Christian -- Christian Buchta -> Institute for Tourism and Leisure Studies -> Vienna University of Economics and Business Administration -> Vienna -> Austria -> Europe. Visit us on http://www.wu-wien.ac.at/itf/. --------------040104050805010601010607 Content-Type: text/plain; name="patch_44783" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch_44783" Index: src/main/character.c =================================================================== --- src/main/character.c (revision 44783) +++ src/main/character.c (working copy) @@ -1281,7 +1281,7 @@ strcat(u, t); } while(global && (st = fgrep_one_bytes(spat, s, useBytes)) >= 0); strcat(u, s); - SET_STRING_ELT(ans, i, mkChar(cbuf)); + SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i))); Free(cbuf); } } else { @@ -1337,7 +1337,7 @@ for (j = offset ; s[j] ; j++) *u++ = s[j]; *u = '\0'; - SET_STRING_ELT(ans, i, mkChar(cbuf)); + SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i))); Free(cbuf); } } --------------040104050805010601010607--
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 --27464147-1221975610-1205822844=:9482--