Forum > LazUtils

pdf text extraction


Can anyone point me to a Pascal package or library that provides for extracting text from an existing .pdf (images, fonts, etc. are not required), which does not  depend on any commercial libraries?

If it doesn't have to be Pascal native implementation, you can try with MuPDF.
There was even some Lazarus wrapper/binding done by Blestan (the user of this forum):
Note that this wrapper is outdated a bit.
I'm not sure if the text functions were included into the Pascal wrapper/binding.
I've successfully made some POC based on that few months ago.
However, my POC thingy was PDF rendering, not text extracting.
But I know that text extracting is possible with MuPDF.

Thanks  :)


[0] Message Index

Go to full version