Recent

Author Topic: [Solved] fcl-pdf : different behaviour cli/gui application with Unicode characte  (Read 1699 times)

tennis

  • New Member
  • *
  • Posts: 11
I have encountered a problem when using the fcl-pdf package
with regards to different behavior with Unicode characters.

The cli test application generates a correct PDF:
Code: Pascal  [Select][+][-]
  1. program pdfcli;
  2. {$mode objfpc}{$H+}
  3. uses
  4.   classes,
  5.   sysutils,
  6.   fppdf,
  7.   ShellApi;
  8.  
  9. var
  10.  FDoc: TPDFDocument;
  11.  FPage: TPDFPage;
  12.  FtBold: Integer;
  13.  Section: TPDFSection;
  14.  F: TFileStream;
  15.  FileName : String;
  16.  
  17. begin
  18.   FDoc := TPDFDocument.Create(Nil);
  19.   try
  20.     FDoc.Options := [poPageOriginAtTop, poUseImageTransparency, poMetadataEntry];
  21.     FDoc.StartDocument;
  22.     Section := FDoc.Sections.AddSection;
  23.     FPage := FDoc.Pages.AddPage;
  24.     FPage.PaperType := ptA4;
  25.     FPage.UnitOfMeasure := uomMillimeters;
  26.     Section.AddPage(FPage);
  27.     FtBold := FDoc.AddFont('Helvetica-Bold');
  28.  
  29.     FPage.SetFont(FtBold, 16);
  30.     FPage.SetColor(clBlack, False);
  31.     FPage.WriteText(150, 150, 'æøå');
  32.  
  33.     FileName := GetTempFileName('', 'invoice');
  34.     FileName := FileName + '.pdf';
  35.  
  36.     F := TFileStream.Create(FileName,fmCreate);
  37.     try
  38.       FDoc.SaveToStream(F);
  39.     finally
  40.       F.Free;
  41.     end;
  42.  
  43.     ShellExecute(0, 'open', PChar(FileName), Nil, PChar(ExtractFilePath(FileName)),1);
  44.   finally
  45.     FreeAndNil(FDoc);
  46.   end;
  47. end.
  48.  

The same code as a gui application generates garbage:
Code: Pascal  [Select][+][-]
  1. program pdfgui;
  2. {$mode delphi}{$H+}
  3. {$codepage UTF8}
  4. uses
  5.   {$IFDEF UNIX}{$IFDEF UseCThreads}
  6.   cthreads,
  7.   {$ENDIF}{$ENDIF}
  8.   Classes, SysUtils, Interfaces, Forms, StdCtrls,
  9.   fppdf, ShellApi;
  10.  
  11. type
  12.   TMyForm = class(TForm)
  13.   public
  14.     MyButton: TButton;
  15.     procedure ButtonClick(ASender: TObject);
  16.     constructor Create(AOwner: TComponent); override;
  17.   end;
  18.  
  19. procedure TMyForm.ButtonClick(ASender:TObject);
  20. var
  21.  FDoc: TPDFDocument;
  22.  FPage: TPDFPage;
  23.  FtBold: Integer;
  24.  Section: TPDFSection;
  25.  F: TFileStream;
  26.  FileName : String;
  27. begin
  28.   FDoc := TPDFDocument.Create(Nil);
  29.   try
  30.     FDoc.Options := [poPageOriginAtTop, poUseImageTransparency, poMetadataEntry];
  31.     FDoc.StartDocument;
  32.     Section := FDoc.Sections.AddSection;
  33.     FPage := FDoc.Pages.AddPage;
  34.     FPage.PaperType := ptA4;
  35.     FPage.UnitOfMeasure := uomMillimeters;
  36.     Section.AddPage(FPage);
  37.     FtBold := FDoc.AddFont('Helvetica-Bold');
  38.  
  39.     FPage.SetFont(FtBold, 16);
  40.     FPage.SetColor(clBlack, False);
  41.     FPage.WriteText(150, 150, 'æøå');
  42.  
  43.     FileName := GetTempFileName('', 'invoice');
  44.     FileName := FileName + '.pdf';
  45.  
  46.     F := TFileStream.Create(FileName,fmCreate);
  47.     try
  48.       FDoc.SaveToStream(F);
  49.     finally
  50.       F.Free;
  51.     end;
  52.  
  53.     ShellExecute(0, 'open', PChar(FileName), Nil, PChar(ExtractFilePath(FileName)),1);
  54.   finally
  55.     FreeAndNil(FDoc);
  56.   end;
  57.  
  58.   Close;
  59. end;
  60.  
  61. constructor TMyForm.Create(AOwner: TComponent);
  62. begin
  63.   inherited Create(AOwner);
  64.   Position := poScreenCenter;
  65.   Height := 400;
  66.   Width := 400;
  67.  
  68.   VertScrollBar.Visible := False;
  69.   HorzScrollBar.Visible := False;
  70.  
  71.   MyButton := TButton.Create(Self);
  72.   with MyButton do
  73.   begin
  74.     Height := 30;
  75.     Left := 100;
  76.     Top := 100;
  77.     Width := 100;
  78.     Caption := 'Close';
  79.     OnClick := ButtonClick;
  80.     Parent := Self;
  81.   end;
  82. end;
  83.  
  84. var
  85.   MyForm : TMyForm;
  86.  
  87. begin
  88.   Application.Scaled:=True;
  89.   Application.Initialize;
  90.   Application.CreateForm(TMyForm, MyForm);
  91.   Application.Run;
  92. end.
  93.  

I see the fcl-pdf package used the UTF8String in the WriteText procedure:
Code: Pascal  [Select][+][-]
  1. Procedure WriteText(X, Y: TPDFFloat; AText : UTF8String; const ADegrees: single = 0.0; const AUnderline: boolean = false; const AStrikethrough: boolean = false); overload;

Is this creating the problems?

I have tried inserting various converting routines available, but without any luck.

The test is performed on Windows 10 with Lazarus IDE v.2.0.10 r63526.
« Last Edit: October 15, 2020, 01:12:16 am by tennis »

ASerge

  • Hero Member
  • *****
  • Posts: 2223
In the second example you use {$codepage UTF8}, but not in the first. Read String Literals.

This doesn't work for me anyway, because fcl-pdf for standard fonts only assumes ansi characters, and for other characters, fonts need to be embedded. For example, to make it work, needed to do this:
Code: Pascal  [Select][+][-]
  1. FDoc.AddFont('c:\Windows\Fonts\segoeui.ttf', 'Segoe UI');

tennis

  • New Member
  • *
  • Posts: 11
Looking at the generated PDF file with the CLI application:
Code: Pascal  [Select][+][-]
  1. stream
  2. /F0 16 Tf
  3. 0 0 0 rg
  4. BT
  5. 425.20 416.80 TD
  6. (æøå) Tj
  7. ET
  8.  
  9. endstream
  10. endobj
  11.  
  12. 8 0 obj
  13. <<
  14. /Type /Font
  15. /Subtype /Type1
  16. /Encoding /WinAnsiEncoding
  17. /FirstChar 32
  18. /LastChar 255
  19. /BaseFont /Helvetica-Bold
  20. /Name /F0
  21. >>
  22. endobj
  23.  

The content looks to be written in ANSI/ : (æøå)

With the GUI code:
Code: Pascal  [Select][+][-]
  1. stream
  2. /F0 16 Tf
  3. 0 0 0 rg
  4. BT
  5. 425.20 416.80 TD
  6. (XE6XF8XE5) Tj
  7. ET
  8.  
  9. endstream
  10. endobj
  11.  
  12. 8 0 obj
  13. <<
  14. /Type /Font
  15. /Subtype /Type1
  16. /Encoding /WinAnsiEncoding
  17. /FirstChar 32
  18. /LastChar 255
  19. /BaseFont /Helvetica-Bold
  20. /Name /F0
  21. >>
  22. endobj
  23.  

The content looks to be written in UTF8 : (I recognize the sequence XE6XF8XE5)

This is probably the reason the CLI application works, but
the GUI application fails.  The CLI application writes the file
in the windows codepage CP1252 and the GUI application
write the file in UTF8.

The font then fails to render correctly for the UTF8 content
as it is has WinAnsiEncoding.

I expected something in the lines of:
Code: Pascal  [Select][+][-]
  1. FPage.WriteText(150, 150, RawBytestring(UTF8ToCP1252('æøå')));
  2.  
to work, but is does not. (I can hand edit the generated PDF file
with the GUI, change codepage to CP1252 and  type the character.
The PDF then renders correctly).

I will look into adding proper Unicode fonts to the application.

Thanks for the help.

tennis

  • New Member
  • *
  • Posts: 11
I got it to work with the following code:
Code: Pascal  [Select][+][-]
  1. procedure TMyForm.ButtonClick(ASender:TObject);
  2. var
  3.  FDoc: TPDFDocument;
  4.  FPage: TPDFPage;
  5.  FtBold: Integer;
  6.  Section: TPDFSection;
  7.  F: TFileStream;
  8.  FM : TMemoryStream;
  9.  FileName: String;
  10.  SR : RawByteString;
  11.  SA : AnsiString;
  12. begin
  13.   FDoc := TPDFDocument.Create(Nil);
  14.   try
  15.     FDoc.Options := [poPageOriginAtTop, poUseImageTransparency, poMetadataEntry];
  16.     FDoc.StartDocument;
  17.     Section := FDoc.Sections.AddSection;
  18.     FPage := FDoc.Pages.AddPage;
  19.     FPage.PaperType := ptA4;
  20.     FPage.UnitOfMeasure := uomMillimeters;
  21.     Section.AddPage(FPage);
  22.     FtBold := FDoc.AddFont('Helvetica-Bold');
  23.  
  24.     FPage.SetFont(FtBold, 16);
  25.     FPage.SetColor(clBlack, False);
  26.  
  27.     FPage.WriteText(150, 150, 'æøå');
  28.  
  29.     FileName := GetTempFileName('', 'invoice');
  30.     FileName := FileName + '.pdf';
  31.  
  32.     F := TFileStream.Create(FileName,fmCreate);
  33.     FM := TMemoryStream.Create();
  34.     try
  35.       FDoc.SaveToStream(FM);
  36.       SetString(SA, PAnsiChar(FM.Memory), FM.Size);
  37.       System.SetCodePage(RawByteString(SA), CP_UTF8 ,True);
  38.       SR := UTF8ToISO_8859_1(SA);
  39.       F.Write(SR[1], Length(SR));
  40.     finally
  41.       FM.Free;
  42.       F.Free;
  43.     end;
  44.  
  45.     ShellExecute(0, 'open', PChar(FileName), Nil, PChar(ExtractFilePath(FileName)),1);
  46.   finally
  47.     FreeAndNil(FDoc);
  48.   end;
  49.   Close;s
  50. end;

I read the content of the PDF to string and use the function
UTF8ToISO_8859_1 to convert to a ANSI codepage which
is written to disk.

I assume this solution is OK as I expect the standard PDF fonts to
be the same everywhere.
« Last Edit: October 15, 2020, 01:15:09 am by tennis »

tennis

  • New Member
  • *
  • Posts: 11
Researching the subject a bit is seems the standard PDF fonts
has their own encoding, called: StandardEncoding, MacRomanEncoding,
WinAnsiEncoding and PDFDocEncoding.

The WinAnsiEncoding intersect partly with the CP1252 and ISO-8859-1
code pages.

The fcl-pdf package should write text content using the standard
fonts in ANSI and perform a check if the UTF8 codepoint can be
translated to the PDF standard encoding without loss of information.

This is probably involved as any of the standard code pages can
not be used directly if the standard is to be followed.

clerfayt

  • Newbie
  • Posts: 5
@tennis: thank you so much for your research and providing a solution! This saved me hours :D

for anyone wondering where to find the function UTF8ToISO_8859_1 - it's defined in LConvEncoding

 

TinyPortal © 2005-2018