Recent

Author Topic: pdf text extraction  (Read 4730 times)

howardpc

  • Hero Member
  • *****
  • Posts: 2832
pdf text extraction
« on: December 15, 2017, 02:37:20 pm »
Can anyone point me to a Pascal package or library that provides for extracting text from an existing .pdf (images, fonts, etc. are not required), which does not  depend on any commercial libraries?

Mick

  • Jr. Member
  • **
  • Posts: 51
Re: pdf text extraction
« Reply #1 on: December 16, 2017, 03:00:21 am »
If it doesn't have to be Pascal native implementation, you can try with MuPDF.
There was even some Lazarus wrapper/binding done by Blestan (the user of this forum):
https://github.com/blestan/lazmupdf
Note that this wrapper is outdated a bit.
I'm not sure if the text functions were included into the Pascal wrapper/binding.
I've successfully made some POC based on that few months ago.
However, my POC thingy was PDF rendering, not text extracting.
But I know that text extracting is possible with MuPDF.

howardpc

  • Hero Member
  • *****
  • Posts: 2832
Re: pdf text extraction
« Reply #2 on: December 17, 2017, 10:48:11 am »
Thanks  :)