how to read a website with Chinese Character
On 13-01-23 8:19 PM, Hui Du wrote:
Hi all, I am planning to parse some information on a website which includes lots of Chinese characters. Does someone know how to read/display Chinese in R? Thanks. url = "http://www.teec.org.cn/html/renwujieshao/" x = readLines(url)
If you look at the first few lines of x you'll see this: > head(x) [1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">" [2] "<html xmlns=\"http://www.w3.org/1999/xhtml\">" [3] "<head>" [4] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=gb2312\" />" At the end of line 4 it shows "charset=gb2312". I didn't think that was an encoding, but this seems to do the conversion: y <- iconv(x, "gb2312", "utf-8") y (I don't know if that will display properly on your Windows machine; it doesn't work on mine, because I don't have the fonts installed. But it does work on my Mac.) Duncan Murdoch
I tried encoding = 'UTF-8' already but it didn't help. My R version is $platform [1] "i386-pc-mingw32" $arch [1] "i386" $os [1] "mingw32" $system [1] "i386, mingw32" $status [1] "" $major [1] "2" $minor [1] "15.0" HXD [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.