Bookstore

Recent

Author Topic: REST Call to TIKA using INDY  (Read 282 times)

majlumbo

  • New Member
  • *
  • Posts: 14
REST Call to TIKA using INDY
« on: February 20, 2020, 10:10:43 pm »
I am trying to call Apache-TIKA via their REST API.

I have successfully been able to upload a PDF document and return the document's text via CURL
Code: [Select]
curl -X PUT --data-binary @<filename>.pdf http://localhost:9998/tika --header "Content-type: application/pdf"
That translated to INDY like so:
Code: [Select]
function GetPDFText(const FileName: String): String;
var
  IdHTTP:  TIdHTTP;
  Params: TIdMultiPartFormDataStream;
begin
  IdHTTP := TIdTTP.Create;
  try
    Params := TIdMultiPartFormDataStream.Create;
    try
      Params.Add('file', FileName, 'application/pdf')
      Result := IdHTTP.PUT('http://localhost:9998/tika', Params);
    finally
      Params.Free;
    end;   
  finally
    IdHTTP.Free;
  end;
end;
Now I want to upload a word document (.docx) I assumed that all I would need to do is change the Content-type when I add the file to Params, but that doesn't seem to produce any results, although I get no error reported back. I was able to get the following CURL command to work correctly
Code: [Select]
CURL -T <myDOCXfile>.docx http://localhost:9998/tika --header "Content-type: application/vnd.openxmlformats-officedocument.wordprocessingml.document"
Any ideas on how to modify the INDY code?
« Last Edit: February 20, 2020, 10:12:24 pm by majlumbo »

Remy Lebeau

  • Hero Member
  • *****
  • Posts: 736
    • Lebeau Software
Re: REST Call to TIKA using INDY
« Reply #1 on: February 21, 2020, 09:41:30 pm »
I have successfully been able to upload a PDF document and return the document's text via CURL
Code: [Select]
curl -X PUT --data-binary @<filename>.pdf http://localhost:9998/tika --header "Content-type: application/pdf"
That translated to INDY like so:

Your translation is wrong.  That CURL command does not send the file in 'multipart/form-data' format, it sends the file as-is with no wrapping around it at all.  That would translate to the following in Indy:

Code: [Select]
function GetPDFText(const FileName: String): String;
var
  IdHTTP:  TIdHTTP;
  FS: TFileStream;
begin
  IdHTTP := TIdHTTP.Create;
  try
    FS := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
    try
      IdHTTP.Request.ContentType := 'application/pdf';
      Result := IdHTTP.Put('http://localhost:9998/tika', FS);
    finally
      FS.Free;
    end;   
  finally
    IdHTTP.Free;
  end;
end;

If you want to send data in 'multipart/form-data' format, you need to use TIdHTTP.Post() instead of TIdHTTP.Put(), and you need to send it to 'http://localhost:9998/tika/form' instead of to 'http://localhost:9998/tika'.  Read the CURL and Apache-Tika documentations more carefully.

when I add the file to Params, but that doesn't seem to produce any results, although I get no error reported back.

Because you are not sending the file correctly to begin with.

I was able to get the following CURL command to work correctly
Code: [Select]
CURL -T <myDOCXfile>.docx http://localhost:9998/tika --header "Content-type: application/vnd.openxmlformats-officedocument.wordprocessingml.document"
Any ideas on how to modify the INDY code?

That CURL command also uses PUT to upload the file as-is, not in 'multipart/form-data' format.  Using the '-T <filename>' parameter for an HTTP url is similar to using the '-X PUT --data-binary @<filename>' parameters, eg:

Code: [Select]
function GetPDFText(const FileName: String): String;
var
  IdHTTP:  TIdHTTP;
  FS: TFileStream;
begin
  IdHTTP := TIdHTTP.Create;
  try
    FS := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
    try
      IdHTTP.Request.ContentType := 'application/vnd.openxmlformats-officedocument.wordprocessingml.document';
      Result := IdHTTP.Put('http://localhost:9998/tika', FS);
    finally
      FS.Free;
    end;   
  finally
    IdHTTP.Free;
  end;
end;
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

Remy Lebeau

  • Hero Member
  • *****
  • Posts: 736
    • Lebeau Software
Re: REST Call to TIKA using INDY
« Reply #2 on: February 21, 2020, 09:50:26 pm »
Seems your same question on StackOverflow got a more detailed answer.  You really shouldn't post the same question to multiple forums at the same time.
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

Thaddy

  • Hero Member
  • *****
  • Posts: 9782
Re: REST Call to TIKA using INDY
« Reply #3 on: February 21, 2020, 10:00:03 pm »
This is the best introduction, btw:
https://medium.com/@marcusfernstrm/create-rest-apis-with-freepascal-441e4aa447b7
Neat is that it only uses standard provided libraries.
Downside is medium.com, but that does not really matter, because the content is spot-on.

If you understand that, you can write your own rest api with ease.

Complements to the forum member who wrote that.
This actually belongs to the wiki?
« Last Edit: February 21, 2020, 10:06:24 pm by Thaddy »
I am more like donkey than shrek