Lazarus

Programming => Packages and Libraries => LazUtils => Topic started by: howardpc on December 15, 2017, 02:37:20 pm

Title: pdf text extraction
Post by: howardpc on December 15, 2017, 02:37:20 pm
Can anyone point me to a Pascal package or library that provides for extracting text from an existing .pdf (images, fonts, etc. are not required), which does not  depend on any commercial libraries?
Title: Re: pdf text extraction
Post by: Mick on December 16, 2017, 03:00:21 am
If it doesn't have to be Pascal native implementation, you can try with MuPDF.
There was even some Lazarus wrapper/binding done by Blestan (the user of this forum):
https://github.com/blestan/lazmupdf (https://github.com/blestan/lazmupdf)
Note that this wrapper is outdated a bit.
I'm not sure if the text functions were included into the Pascal wrapper/binding.
I've successfully made some POC based on that few months ago.
However, my POC thingy was PDF rendering, not text extracting.
But I know that text extracting is possible with MuPDF.
Title: Re: pdf text extraction
Post by: howardpc on December 17, 2017, 10:48:11 am
Thanks  :)
TinyPortal © 2005-2018