Hi there, I tried to export the names of country to a csv file with write.csv(). In the resulted file, ?land was coverted to <c5>land. Is there any way could prevent this happening? Thanks! > abc [1] "?land" > write.table(abc, file = "") "x" "1" "<c5>land" Best, Jinsong
write.csv covert Åland to <c5>land
10 messages · Dr Eberhard W Lisse, Jinsong Zhao, John Kane +2 more
?file.write() look for fileEncoding? el
On 20/10/2020 11:13, Jinsong Zhao wrote:
Hi there, I tried to export the names of country to a csv file with write.csv(). In the resulted file, ?land was coverted to <c5>land. Is there any way could prevent this happening? Thanks!
abc
[1] "?land"
write.table(abc, file = "")
"x" "1" "<c5>land" Best, Jinsong
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist el at lisse.NA / * | Telephone: +264 81 124 6733 (cell) PO Box 8421 Bachbrecht \ / If this email is signed with GPG/PGP 10007, Namibia ;____/ Sect 20 of Act No. 4 of 2019 may apply
On 2020/10/20 17:23, Dr Eberhard W Lisse wrote:
?file.write() look for fileEncoding? el
There is no file.write(). I have tried fileEncoding = "utf8" and "latin1" in write.csv(). However, it does not have effect. The output is is <U+00C5>land or <c5>land. Best, Jinsong
On 20/10/2020 11:13, Jinsong Zhao wrote:
Hi there, I tried to export the names of country to a csv file with write.csv(). In the resulted file, ?land was coverted to <c5>land. Is there any way could prevent this happening? Thanks!
abc
[1] "?land"
write.table(abc, file = "")
"x" "1" "<c5>land" Best, Jinsong
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Apologies, I meant ?write.table() el
On 20/10/2020 12:38, Jinsong Zhao wrote:
On 2020/10/20 17:23, Dr Eberhard W Lisse wrote:
?file.write() look for fileEncoding? el
There is no file.write(). I have tried fileEncoding = "utf8" and "latin1" in write.csv(). However, it does not have effect. The output is is <U+00C5>land or <c5>land. Best, Jinsong
[...]
Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist el at lisse.NA / * | Telephone: +264 81 124 6733 (cell) PO Box 8421 Bachbrecht \ / If this email is signed with GPG/PGP 10007, Namibia ;____/ Sect 20 of Act No. 4 of 2019 may apply
Perhaps ?readr::write_delim() el
On 20/10/2020 12:45, Dr Eberhard W Lisse wrote:
Apologies, I meant ?write.table() el On 20/10/2020 12:38, Jinsong Zhao wrote:
On 2020/10/20 17:23, Dr Eberhard W Lisse wrote:
?file.write() look for fileEncoding? el
There is no file.write(). I have tried fileEncoding = "utf8" and "latin1" in write.csv(). However, it does not have effect. The output is is <U+00C5>land or <c5>land. Best, Jinsong
[...]
Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist el at lisse.NA / * | Telephone: +264 81 124 6733 (cell) PO Box 8421 Bachbrecht \ / If this email is signed with GPG/PGP 10007, Namibia ;____/ Sect 20 of Act No. 4 of 2019 may apply
Hi there,
Why the same string is displayed in different form?
> abc[,1]
[1] "?land" "Afghanistan"
> abc
name
1 <c5>land
2 Afghanistan
And more...
> dput(abc, "aa.txt")
> dget("aa.txt")
name
1 <c5>land
2 Afghanistan
> dget("aa.txt")[,1]
[1] "<c5>land" "Afghanistan"
Best,
Jinsong
On 2020/10/20 17:13, Jinsong Zhao wrote:
Hi there, I tried to export the names of country to a csv file with write.csv(). In the resulted file, ?land was coverted to <c5>land. Is there any way could prevent this happening? Thanks!
> abc
[1] "?land"
> write.table(abc, file = "")
"x" "1" "<c5>land" Best, Jinsong
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
It looks like an encoding problem. It works fine for me with R encoding set to UTF-8 Here is part of my sessionInfo() results [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 I would suggest issuing the command sessionInfo() and seeing what your encoding is.
On Tue, 20 Oct 2020 at 08:22, Jinsong Zhao <jszhao at yeah.net> wrote:
Hi there, Why the same string is displayed in different form?
> abc[,1]
[1] "?land" "Afghanistan"
> abc
name 1 <c5>land 2 Afghanistan And more...
> dput(abc, "aa.txt")
> dget("aa.txt")
name 1 <c5>land 2 Afghanistan
> dget("aa.txt")[,1]
[1] "<c5>land" "Afghanistan" Best, Jinsong On 2020/10/20 17:13, Jinsong Zhao wrote:
Hi there, I tried to export the names of country to a csv file with write.csv(). In the resulted file, ?land was coverted to <c5>land. Is there any way could prevent this happening? Thanks!
> abc
[1] "?land"
> write.table(abc, file = "")
"x" "1" "<c5>land" Best, Jinsong
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
John Kane Kingston ON Canada [[alternative HTML version deleted]]
You don't say, but I'd guess you're using Windows. In your code page, the character ? is probably not representable. At some point in the sequence of operations involved in printing the dataframe R puts the string into the native encoding, and since that's impossible on your system, it substitutes the <c5> instead. The fact that you can sometimes display it is because internally R uses UTF-8 as much as it can, and it can represent the character. One fix for this is to switch from Windows to some other OS. The others all have proper support for UTF-8. You might have luck changing your Windows code page to one that includes the ?, but then there'll be some other characters that are missed. You should definitely investigate Eberhard's advice, and test non-base packages like readr. They are all written much more recently than the base functions, and might have proper support for out-of-code-page characters. Duncan Murdoch
On 20/10/2020 8:20 a.m., Jinsong Zhao wrote:
Hi there, Why the same string is displayed in different form?
> abc[,1]
[1] "?land" "Afghanistan"
> abc
name 1 <c5>land 2 Afghanistan And more...
> dput(abc, "aa.txt")
> dget("aa.txt")
name 1 <c5>land 2 Afghanistan
> dget("aa.txt")[,1]
[1] "<c5>land" "Afghanistan" Best, Jinsong On 2020/10/20 17:13, Jinsong Zhao wrote:
Hi there, I tried to export the names of country to a csv file with write.csv(). In the resulted file, ?land was coverted to <c5>land. Is there any way could prevent this happening? Thanks!
> abc
[1] "?land"
> write.table(abc, file = "")
"x" "1" "<c5>land" Best, Jinsong
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank you very much for the hint. I tried it on a FreeBSD machine with locale set to en_US.UTF-8, it works fine. However, on my Windows machine, > Sys.getlocale() [1] "LC_COLLATE=Chinese (Simplified)_China.936;LC_CTYPE=Chinese (Simplified)_China.936;LC_MONETARY=Chinese (Simplified)_China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_China.936" It just worked as what I posted. BTW, I can not understand why a string could be displayed different as vector or as data frame. Best, Jinsong
On 2020/10/20 21:56, John Kane wrote:
It looks like an encoding problem.
It works fine for me with R encoding set to UTF-8
Here is part of my sessionInfo() results
[1] LC_CTYPE=en_CA.UTF-8 ? ? ? LC_NUMERIC=C
?[3] LC_TIME=en_CA.UTF-8 ? ? ? ?LC_COLLATE=en_CA.UTF-8
?[5] LC_MONETARY=en_CA.UTF-8 ? ?LC_MESSAGES=en_CA.UTF-8
I would suggest issuing the command
sessionInfo()
and seeing what your encoding is.
On Tue, 20 Oct 2020 at 08:22, Jinsong Zhao <jszhao at yeah.net
<mailto:jszhao at yeah.net>> wrote:
Hi there,
Why the same string is displayed in different form?
?> abc[,1]
[1] "?land"? ? ? ?"Afghanistan"
?> abc
? ? ? ? ? name
1? ? <c5>land
2 Afghanistan
And more...
?> dput(abc, "aa.txt")
?> dget("aa.txt")
? ? ? ? ? name
1? ? <c5>land
2 Afghanistan
?> dget("aa.txt")[,1]
[1] "<c5>land"? ? "Afghanistan"
Best,
Jinsong
On 2020/10/20 17:13, Jinsong Zhao wrote:
> Hi there,
>
> I tried to export the names of country to a csv file with
write.csv().
> In the resulted file, ?land was coverted to <c5>land. Is there
any way
> could prevent this happening? Thanks!
>
>? > abc
> [1] "?land"
>? > write.table(abc, file = "")
> "x"
> "1" "<c5>land"
>
> Best,
> Jinsong
>
-- John Kane Kingston ON Canada
Hi, One additional option that you might want to look at is to use ?writeLines with 'useBytes = TRUE', where the default is FALSE. Windows, as Duncan notes, is problematic with extended encodings, and you can actually get conflicted encoding of text, based upon what is used within R, versus the local system encoding set by the OS. There is an added step of complexity with writeLines(), of having to pre-format the line(s) to be output to conform to CSV required formatting. So you would need to paste() together each output line first using field delimiters, double quotes, etc. prior to output. Essentially, mimic the default formatting of write.csv(), on a line by line basis, and then output the resulting object to a text file, with a single call to writeLines(). Regards, Marc Schwartz
On Oct 20, 2020, at 10:28 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: You don't say, but I'd guess you're using Windows. In your code page, the character ? is probably not representable. At some point in the sequence of operations involved in printing the dataframe R puts the string into the native encoding, and since that's impossible on your system, it substitutes the <c5> instead. The fact that you can sometimes display it is because internally R uses UTF-8 as much as it can, and it can represent the character. One fix for this is to switch from Windows to some other OS. The others all have proper support for UTF-8. You might have luck changing your Windows code page to one that includes the ?, but then there'll be some other characters that are missed. You should definitely investigate Eberhard's advice, and test non-base packages like readr. They are all written much more recently than the base functions, and might have proper support for out-of-code-page characters. Duncan Murdoch On 20/10/2020 8:20 a.m., Jinsong Zhao wrote:
Hi there, Why the same string is displayed in different form?
> abc[,1]
[1] "?land" "Afghanistan"
> abc
name 1 <c5>land 2 Afghanistan And more...
> dput(abc, "aa.txt")
> dget("aa.txt")
name 1 <c5>land 2 Afghanistan
> dget("aa.txt")[,1]
[1] "<c5>land" "Afghanistan" Best, Jinsong On 2020/10/20 17:13, Jinsong Zhao wrote:
Hi there, I tried to export the names of country to a csv file with write.csv(). In the resulted file, ?land was coverted to <c5>land. Is there any way could prevent this happening? Thanks!
> abc
[1] "?land"
> write.table(abc, file = "")
"x" "1" "<c5>land" Best, Jinsong