Skip to content
Prev 4519 / 15274 Next

PDF Reader

Thanks Brian and Adrian for your helpful suggestions.  pdf2txt looks
like it might do the trick (especially with that great wrapper you put
on in adrian).  I've found many hedge fund managers reluctant to give
data out in forms other then pdf because they feel PDFs help them to
prevent redistribution... maybe I should be pushing harder.

Thanks again,

Ben

-----Original Message-----
From: Brian G. Peterson [mailto:brian at braverock.com] 
Sent: Friday, July 10, 2009 1:57 PM
To: Chiquoine, Ben
Cc: r-sig-finance at stat.math.ethz.ch
Subject: Re: [R-SIG-Finance] PDF Reader

Ben,

I wouldn't really consider this the appropriate forum for your query, 
but I'll answer it anyway, with emphasis on the finance-specific bits.

There has existed for many years a utility called "pdf2txt".  Note that 
this will extract text from a pdf, but may not do a great job with 
maintaining the column structure.  In the past, I have had to resort to 
perl, php, or python to use  regular expression matching to put the data

into a tabular format that would be suitable for analysis in R or some 
other processing environment.

Also, most fund managers, trustees, administrators, markets, brokerages,

etc do have better data formats available for their investors/clients.  
Call them up and tell them that you need the data in machine-readable 
form, whether CSV, fixed width, Excel, whatever.  Almost all of your 
sources should be able to provide this, though it may take some work.  
You may not get to choose the format, but any machine-readable format 
should be coercible into R or other analysis environments.

Regards,

  - Brian
Chiquoine, Ben wrote:
question...
read
tools
find