Query about Text Preprocessing (Encoding)

3 messages · Khadija Shakeel, Duncan Murdoch, Michael Dewey

Original

1

3

Khadija Shakeel

Sun, May 29, 2016 12:20 AM #

i want to work with Urdu language but R is only displaying Urdu text but
cant work with Urdu text. Actually I want to apply preproessing steps of
text mining. but R is nor responding for this text.
Help me how can I handle this problem?

here are some pictures of word cloud of Urdu text.

Khadija Shakeel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Cloud.PNG
Type: image/png
Size: 8455 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20160529/930a0413/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CLUSTPLOT_as.matrix_d.PNG
Type: image/png
Size: 24780 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20160529/930a0413/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wordCloud.PNG
Type: image/png
Size: 6824 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20160529/930a0413/attachment-0002.png>

Sun, May 29, 2016 7:45 AM #

On 29/05/2016 3:20 AM, Khadija Shakeel wrote:

R doesn't currently have a translation team (see 
translation.r-project.org) for Urdu, so it may be hard for you to get 
Urdu-specific support.  However, I would guess the problems you are 
having are common to other languages that use non-Roman alphabets, and 
you may get some advice from the translation teams for one of them.

The general issues that I know of are:

  - R needs to know your encoding.  On Unix-alikes the best support is 
for UTF-8; Windows support is weaker, because Windows tends to use 
UTF-16 or other multibyte encodings, and R's support for those is mixed.

  - You need to make sure your graphics device supports your alphabet. 
Not all graphics devices have character support for all languages.

Duncan Murdoch

Sun, May 29, 2016 8:33 AM #

Would it be a good idea to mention Urdu in the subject line as other 
people who deal with Urdu, but not specifically text mining, may be able 
to help? I have added it to my reply

On 29/05/2016 08:20, Khadija Shakeel wrote:

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Michael
http://www.dewey.myzen.co.uk/home.html