PDF Reader
Thanks Brian and Adrian for your helpful suggestions. pdf2txt looks like it might do the trick (especially with that great wrapper you put on in adrian). I've found many hedge fund managers reluctant to give data out in forms other then pdf because they feel PDFs help them to prevent redistribution... maybe I should be pushing harder. Thanks again, Ben -----Original Message----- From: Brian G. Peterson [mailto:brian at braverock.com] Sent: Friday, July 10, 2009 1:57 PM To: Chiquoine, Ben Cc: r-sig-finance at stat.math.ethz.ch Subject: Re: [R-SIG-Finance] PDF Reader Ben, I wouldn't really consider this the appropriate forum for your query, but I'll answer it anyway, with emphasis on the finance-specific bits. There has existed for many years a utility called "pdf2txt". Note that this will extract text from a pdf, but may not do a great job with maintaining the column structure. In the past, I have had to resort to perl, php, or python to use regular expression matching to put the data into a tabular format that would be suitable for analysis in R or some other processing environment. Also, most fund managers, trustees, administrators, markets, brokerages, etc do have better data formats available for their investors/clients. Call them up and tell them that you need the data in machine-readable form, whether CSV, fixed width, Excel, whatever. Almost all of your sources should be able to provide this, though it may take some work. You may not get to choose the format, but any machine-readable format should be coercible into R or other analysis environments. Regards, - Brian
Chiquoine, Ben wrote:
Hi, First let me appoligize if this is the wrong venue for this
question...
I work for a small financial company and we often receive statements that are in pdf form. Pulling the data from these can be quite time consuming and I'm wondering if anyone on the list knows of a way to
read
a pdf in as text in R. I know that google has come out with a few
tools
that allow you to search the text of pdfs which has given me hope that something along these lines may be possible but I've been unable to
find
any R documentation on inputing data from PDFs. Any thoughts/suggestions would be much appreciated.
Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock ___________________________________________ This message and any attached documents contain information which may be confidential, subject to privilege or exempt from disclosure under applicable law. These materials are solely for the use of the intended recipient. If you are not the intended recipient of this transmission, you are hereby notified that any distribution, disclosure, printing, copying, storage, modification or the taking of any action in reliance upon this transmission is strictly prohibited. Delivery of this message to any person other than the intended recipient shall not compromise or waive such confidentiality, privilege or exemption from disclosure as to this communication. If you have received this communication in error, please notify the sender immediately and delete this message from your system.