An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20080515/6e11dbdf/attachment.pl>
basename/dirname produce incorrect results
4 messages · ronggui, Brian Ripley
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20080515/d3b222fe/attachment.pl>
1 day later
I'm sorry, but that example make no sense to me -- you need to mark the encoding (and don't send HTML that will get stripped). This is presumably Windows, given the name.
On Thu, 15 May 2008, ronggui wrote:
The incorrect result incurs when the file path contains Chinese character. It seems that dirname/basename action on unit of byte instead of char, so the result in the following example is half of what is expected.
No, it works in widechars, that is UCS-2. I have a suspicion of what the problem is (it is related to attempts to handle embedded nuls), so please try tomorrow's R-patched to see if I have fixed it. If that does not work, we need a reproducible example, and that means a message in a known encoding. (One way to do so is to attach a plain text message, and tell us in the body of the message the encoding you used.)
g<-"d:\\如果含有中文\\如果含有中文.txt" dirname(g)
[1] "d:/如果含"
basename(g)
[1] "如果含有" -- HUANG Ronggui, Wincent http://ronggui.huang.googlepages.com/ Bachelor of Social Work, Fudan University, China Master of sociology, Fudan University, China Ph.D. Candidate, CityU of HK. [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
1 day later
I have beend realized the problem of reporting with non-English character this way. Just don't know how to do in a better way. Sorry for that. Now, the problem reported has been fixed ( in version (2008-05-15 r45703)). Thanks for your work. Best 2008/5/16 Prof Brian Ripley <ripley at stats.ox.ac.uk>:
I'm sorry, but that example make no sense to me -- you need to mark the encoding (and don't send HTML that will get stripped). This is presumably Windows, given the name. On Thu, 15 May 2008, ronggui wrote:
The incorrect result incurs when the file path contains Chinese character. It seems that dirname/basename action on unit of byte instead of char, so the result in the following example is half of what is expected.
No, it works in widechars, that is UCS-2. I have a suspicion of what the problem is (it is related to attempts to handle embedded nuls), so please try tomorrow's R-patched to see if I have fixed it. If that does not work, we need a reproducible example, and that means a message in a known encoding. (One way to do so is to attach a plain text message, and tell us in the body of the message the encoding you used.)
g<-"d:\\??????\\??????.txt" dirname(g)
[1] "d:/???"
basename(g)
[1] "????" -- HUANG Ronggui, Wincent http://ronggui.huang.googlepages.com/ Bachelor of Social Work, Fudan University, China Master of sociology, Fudan University, China Ph.D. Candidate, CityU of HK. [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
HUANG Ronggui, Wincent http://ronggui.huang.googlepages.com/ Bachelor of Social Work, Fudan University, China Master of sociology, Fudan University, China Ph.D. Candidate, CityU of HK.