Skip to content
Prev 316001 / 398513 Next

how to read a website with Chinese Character

Thanks a lot. 

y <- iconv(x, "gb2312", "utf-8") does not work but

y <- iconv(x, "gb2312", "UTF8") works on my machine. Thank you for pointing to the right direction.


-----Original Message-----
From: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] 
Sent: Wednesday, January 23, 2013 6:16 PM
To: Hui Du
Cc: r-help at r-project.org
Subject: Re: [R] how to read a website with Chinese Character
On 13-01-23 8:19 PM, Hui Du wrote:
If you look at the first few lines of x you'll see this:

 > head(x)
[1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 
Transitional//EN\"\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"
[2] "<html xmlns=\"http://www.w3.org/1999/xhtml\">" 

[3] "<head>" 

[4] "<meta http-equiv=\"Content-Type\" content=\"text/html; 
charset=gb2312\" />"

At the end of line 4 it shows "charset=gb2312".  I didn't think that was 
an encoding, but this seems to do the conversion:

y <- iconv(x, "gb2312", "utf-8")
y

(I don't know if that will display properly on your Windows machine; it 
doesn't work on mine, because I don't have the fonts installed.  But it 
does work on my Mac.)

Duncan Murdoch