Skip to content
Prev 383846 / 398502 Next

Extracting the first currency value from PDF files

PDF files are actually "programs" that place graphic symbols on pages, and the order in which those symbols are placed (the order in which most pdf-to-text conversions return characters) may have nothing to do with how they appear visually. There is not even a guarantee that those symbols are represented as characters in the file... they could be part of embedded bitmaps.

In summary, you need to review what your "pdf_text" function is able to extract from your files without filtering... it may or may not be consistent enough to allow you to do what you want... and we certainly have no idea what it is able to extract from your files.
On May 13, 2020 6:33:03 AM PDT, Manish Mukherjee <manishmukherjee at hotmail.com> wrote: