Skip to content
Prev 205915 / 398506 Next

parsing pdf files

[copied to list for posterity...]


Sorry. I am completely wrong. I've been using itext to split, fill in
forms and recombine PDF so assumed (wrongly) that text extraction was
possible.

In fact, reading the mailing lists is quite informative - clearly PDF
is not designed for this.

Try this

http://pdfbox.apache.org/commandlineutilities/ExtractText

can be run from command line so potentially could be automated.

Mark

2010/1/10 Mark Wardle <mark at wardle.org>: