Forum > LazUtils

pdf text extraction

(1/1)

howardpc:
Can anyone point me to a Pascal package or library that provides for extracting text from an existing .pdf (images, fonts, etc. are not required), which does not  depend on any commercial libraries?

Mick:
If it doesn't have to be Pascal native implementation, you can try with MuPDF.
There was even some Lazarus wrapper/binding done by Blestan (the user of this forum):
https://github.com/blestan/lazmupdf
Note that this wrapper is outdated a bit.
I'm not sure if the text functions were included into the Pascal wrapper/binding.
I've successfully made some POC based on that few months ago.
However, my POC thingy was PDF rendering, not text extracting.
But I know that text extracting is possible with MuPDF.

howardpc:
Thanks  :)

Navigation

[0] Message Index

Go to full version