Back to formatted view
Raw Message

Message-ID: <1338303383.17767.1.camel@milan>
Date: 2012-05-29T14:56:23Z
From: Milan Bouchet-Valat
Subject: package tm: reading XML files
In-Reply-To: <4FC482EE.2090608@uu.nl>

Le mardi 29 mai 2012 ? 10:03 +0200, Ad Feelders a ?crit :
> Dear fellow R users,
> 
> I'm using the package tm for text mining, and have a problem with 
> reading in a corpus from XML files.
> When I copy the example from "Introduction to the tm package" of the 
> small reuters subset "crude", everything goes well, and I get a corpus 
> with the required meta data.
> When I read in the entire reuters21578 corpus in XML format however (or 
> a self-created subset thereof) the meta data is lost, and the files are 
> interpreted as plain text.
> I use the following command, where the indicated directory contains all 
> reuters 21578 documents as separate XML files:
> 
>  > reuters21578 <- Corpus(DirSource("C:/Data/Reuters/preprocessed"), 
> readerContol=list(reader=readReut21578XML))
You have a typo in that command, "readerContol" should be "readerControl".


My two cents