Skip to content
Back to formatted view

Raw Message

Message-ID: <a346979f-22ef-29b1-38d3-d688872c1df3@gmail.com>
Date: 2016-05-29T14:45:53Z
From: Duncan Murdoch
Subject: Query about Text Preprocessing (Encoding)
In-Reply-To: <CADFbrKpQz=EaVDYdzGy9_0J1R6QkM8_GW1_WZNLsBG4PXiaMyg@mail.gmail.com>

On 29/05/2016 3:20 AM, Khadija Shakeel wrote:
> i want to work with Urdu language but R is only displaying Urdu text but
> cant work with Urdu text. Actually I want to apply preproessing steps of
> text mining. but R is nor responding for this text.
> Help me how can I handle this problem?
>
> here are some pictures of word cloud of Urdu text.
>

R doesn't currently have a translation team (see 
translation.r-project.org) for Urdu, so it may be hard for you to get 
Urdu-specific support.  However, I would guess the problems you are 
having are common to other languages that use non-Roman alphabets, and 
you may get some advice from the translation teams for one of them.

The general issues that I know of are:

  - R needs to know your encoding.  On Unix-alikes the best support is 
for UTF-8; Windows support is weaker, because Windows tends to use 
UTF-16 or other multibyte encodings, and R's support for those is mixed.

  - You need to make sure your graphics device supports your alphabet. 
Not all graphics devices have character support for all languages.

Duncan Murdoch