Skip to content
Prev 359633 / 398502 Next

Please guide -- UTF-8 locale setting fails on Windows on writing

Milan,
Ok, Let me take a case of facebook. I used Rfacebook package
 to get posts (getPost()) which returns list() of data frames(post,
comments, Likes)

let me demonstrate 2 cases of read and write just as you suggested,
Case 1:::::::::
Lets say one of the facebook comment has below string value, in
Japanese language-->
"?????? - ???????? ?????"

On R console I now assign above string to variableas: x <- "?????? -
???????? ?????"
and write it as below:
write.csv(x, file='x.csv', row.names=F, fileEncoding='UTF-8')
I get this string in the file
""<U+4E16><U+754C><U+9910><U+798F><U+4E8B><U+5DE5> -
<U+9910><U+5EF3><U+54E1><U+5DE5><U+6C92><U+7CBE><U+6253><U+91C7> "

Case 2::::::::::::::
I create a notepad 'x.txt' and save Japanese string "?????? - ???????? ?????"
and read it as below:
read.table('x.txt', fileEncoding='UTF-8'), I get below output:

  V1
1  ?
Warning messages:
1: In read.table("x.txt", fileEncoding = "UTF-8") :
  invalid input found on input connection 'x.txt'
2: In read.table("x.txt", fileEncoding = "UTF-8") :
  incomplete final line found by readTableHeader on 'x.txt'

Above was for demonstration, I'm infact reading social media data
extracted, which ultimately is somewhere using httr package and
returning data frames.
I'm not sure how should I get it handled in Windows as I don't observe
this behavior in Mac where system locase is set to 'en_US.UTF-8'

Regards,
Sunny
On Mon, Mar 28, 2016 at 7:39 PM, Milan Bouchet-Valat <nalimilan at club.fr> wrote: