Skip to content
Prev 324305 / 398503 Next

Help retrieving only Portuguese words from a file

On Tue, May 28, 2013 at 5:02 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
Is there any structure to the text? If it has complete paragraphs in
one of the three languages then you can probably make a better guess
of the language of the paragraph from the presence of key words. I
wonder if some of the code for detecting spam can help you here...
Train it on some known Portuguese, Spanish, and English text...

 If its just a stream of words in one of the languages in a random
order then it is difficult or impossible.

Barry
Message-ID: <CANVKczNchA7cH9MAVCuHBuogrSfummbqfWCphOb_X=m7-upRBg@mail.gmail.com>
In-Reply-To: <a0a55aa75bb44b4b819242f092e6e1a3@EX-0-HT0.lancs.local>