Limitations of 'pdftotext' in Linux...

Well, as you know, sometimes people find fancy fonts they like, and then they want to use them.

One approach is to extract / list the fonts in the PDF files and log them.

Then over time you can see what are the offending fonts (assume that is the case).

Then, you can find a way to preprocess the PDF to strip / change / remove the pdftotext offending fonts.

Or, you can get the source code for pdftotext and try to recompile to support these new font families.

Naturally, the first step toward solving any problem is knowing what the problem is and it sound like you may have isolated it to non-supported pdftotext fonts.

FYI... .here is a GitHub link to the pdftotext source code:

https://github.com/jalan/pdftotext

Enjoy.

Agreed. Okay, thanks again for the help. I appreciate the time and I like the utility that you suggested (PDF_Checker) and I will keep it in mind for future issues. I'm going to do a bit more homework and then report back to the customer on what I have found. I would also like to donate to the site. Is there a link for me to do that? Thanks again.

Thanks for the kind thoughts, Ken.

It is not necessary to donate for this small help on our part; but I appreciate the kind thought and am glad you found our site helpful

It was my pleasure to help you.

Post back with any updates or start a new discussion if you have other issues.