Recent

Author Topic: [SOLVED] How to detect if an UTF8-char is not contained in a Font?  (Read 2333 times)

Hartmut

  • Hero Member
  • *****
  • Posts: 891
My program uses procedure TCanvas.TextOut() to draw UTF8-characters (see screenshot). Some UTF8-characters like $C280..$C29F are not contained in my example Font (Arial). There a rectangle is drawn with a hex number in it. How can I detect those UTF8-characters, which are not contained in the current Font? Is there any way?

I use Lazarus 2.0.10 with Windows and Linux Ubuntu. I search a solution for both OS (must not be the same solution). Thanks in advance.
« Last Edit: December 16, 2023, 01:27:14 pm by Hartmut »

dsiders

  • Hero Member
  • *****
  • Posts: 1330
Re: How to detect if an UTF8-char is not contained in a Font?
« Reply #1 on: December 10, 2023, 08:20:43 pm »
My program uses procedure TCanvas.TextOut() to draw UTF8-characters (see screenshot). Some UTF8-characters like $C280..$C29F are not contained in my example Font (Arial). There a rectangle is drawn with a hex number in it. How can I detect those UTF8-characters, which are not contained in the current Font? Is there any way?

I use Lazarus 2.0.10 with Windows and Linux Ubuntu. I search a solution for both OS (must not be the same solution). Thanks in advance.

Good question.

Apparently TFont returns a default glyph for unsupported characters. TFreeTypeFont has access to the real glyph information , and even has TFreeTypeFont.CharWidthFromUnicode() to get the real glyph width. 0 would be a missing glyph.

HTH...
Preview the next Lazarus documentation release at: https://dsiders.gitlab.io/lazdocsnext


jamie

  • Hero Member
  • *****
  • Posts: 6794
Re: How to detect if an UTF8-char is not contained in a Font?
« Reply #3 on: December 11, 2023, 12:03:46 am »
I got something to look at using the GetFontUnicodeRanges etc.

Its just a short simple test, you can build on it.

The only true wisdom is knowing you know nothing

jamie

  • Hero Member
  • *****
  • Posts: 6794
Re: How to detect if an UTF8-char is not contained in a Font?
« Reply #4 on: December 11, 2023, 12:21:32 am »
Looking at that, I think these are the UnicodeString values, WCHAR, not UTF8.
The only true wisdom is knowing you know nothing

Hartmut

  • Hero Member
  • *****
  • Posts: 891
Re: How to detect if an UTF8-char is not contained in a Font?
« Reply #5 on: December 11, 2023, 10:39:48 am »
Thank you to all for your posts and suggestions. I will investigate them one by one.

Apparently TFont returns a default glyph for unsupported characters. TFreeTypeFont has access to the real glyph information , and even has TFreeTypeFont.CharWidthFromUnicode() to get the real glyph width. 0 would be a missing glyph.

This sounds *very* interesting, because it would be a solution for Windows *and* Linux. But I did not manage it to work in a reliable time. This is my code:

Code: Pascal  [Select][+][-]
  1. uses EasyLazFreeType; {for class 'TFreeTypeFont'}
  2.  
  3. type CharUTF8 = string[4]; {string for 1 UTF8-char}
  4.      s255 = string[255];
  5.  
  6. function short_PChar(s: s255): PChar;
  7.    {converts a short string to PChar}
  8.    var sa: ansi;
  9.    begin
  10.    sa:=s;
  11.    exit(PChar(sa));
  12.    end;
  13.  
  14. procedure check_UTF8char_contained(s: CharUTF8);
  15.    {checks if an UTF8-character is contained in a Font}
  16.    var FTF: TFreeTypeFont;
  17.        pc: PChar;
  18.        len: single;
  19.        uc: cardinal;
  20.        charlen: longint;
  21.    begin
  22.    pc:=short_PChar(s); {convert UTF8 to Unicode: }
  23.    uc:=UTF8CodepointToUnicode(pc,charlen);
  24.    writeln('uc=', uc);
  25.  
  26.    FTF:=TFreeTypeFont.Create;
  27. // FTF.SetNameAndStyle('Arial',[]);
  28.    FTF.Name:='Arial';
  29.  
  30.    len:=FTF.CharWidthFromUnicode(uc); {=> Access Violation}
  31.    writeln('len=', len);
  32.    FTF.Free;
  33.    end;
  34.  
  35. procedure Test_UTF8char_contained;
  36.    begin
  37.    check_UTF8char_contained(#$C2#$A7);
  38.    check_UTF8char_contained(#$C2#$87);
  39.    check_UTF8char_contained(#$E2#$86#$91);
  40.    end;

Because I have UTF8-characters like #$C2#$87 or #$E2#$86#$91 and function 'CharWidthFromUnicode' needs Unicode, I tried to convert this via 'UTF8CodepointToUnicode'. I hope this was correct (my knowledge about Unicode is near zero).

Above code throws an Access Violation in procedure TFreeTypeFont.LoadFace in Unit EasyLazFreeType:
Code: Pascal  [Select][+][-]
  1. procedure TFreeTypeFont.LoadFace;
  2. var errorNum: TT_Error;
  3.     familyItem: TCustomFamilyCollectionItem;
  4.     fontItem: TCustomFontCollectionItem;
  5. begin
  6.   DiscardFace;
  7.   if FStream <> nil then
  8.   begin
  9.     errorNum := TT_Open_Face(FStream,False,FFace);
  10.     if errorNum <> TT_Err_Ok then
  11.       raise exception.Create('Cannot open font (TT_Error ' + intToStr(errorNum)+') <Stream>');
  12.   end else
  13.   begin
  14.     if Pos(PathDelim, FName) <> 0 then
  15.     begin
  16.       errorNum := TT_Open_Face(FName,FFace);
  17.       if errorNum <> TT_Err_Ok then
  18.         raise exception.Create('Cannot open font (TT_Error ' + intToStr(errorNum)+') "'+FName+'"');
  19.     end else
  20.     begin
  21.       familyItem := Collection.Family[FName]; // Access Violation here
  22.       if familyItem = nil then
  23.         raise exception.Create('Font family not found ("'+FName+'")');
  24.       fontItem := familyItem.GetFont(FStyleStr);
  25.       if fontItem = nil then
  26.         raise exception.Create('Font style not found ("'+FStyleStr+'")');
  27.       FFace := fontItem.QueryFace(FontCollectionItemDestroyListener(self,@OnDestroyFontItem));
  28.       FFaceItem := fontItem;
  29.     end;
  30.   end;
  31.  
  32.   FFaceLoaded:= true;
  33.   UpdateInstance;
  34. end;

I found no documentaion for Unit EasyLazFreeType in a reliable time.

What I have is a Font-name from var 'Screen.Fonts'. I don't know, where the Font-file resides (on Linux there are a lot of folders with Fonts).

Please help. Meanwhile I will investigate the other suggestions.

Hartmut

  • Hero Member
  • *****
  • Posts: 891
Re: How to detect if an UTF8-char is not contained in a Font?
« Reply #6 on: December 11, 2023, 06:22:15 pm »
Meanwhile I tested the suggestion about Windows function 'GetFontUnicodeRanges' from Martin_fr and jamie. With jamie's demo (thank you for that) it was easy to get it to work. I added the possibility to select a Font by:
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. Var
  3.   R:LPGLYPHSET;
  4.   C,C2,I:Integer;
  5.   S:String;
  6. begin
  7. // Canvas.Font.Name:='Arial';
  8.    Canvas.Font.Name:='Courier New';
  9.  
  10.    C := GetFontUnicodeRanges(Canvas.Handle,nil); // Get count first.
  11.    If C<>0 Then
  12.      Begin
  13.        GetMem(R,C);
  14.        C2 := GetFontUnicodeRanges(canvas.Handle,R);
  15.        Caption := R^.cGlyphsSupported .Tostring;
  16.        For I := 0 To R^.cRanges-1 do
  17.         Begin
  18.           S:='$' + IntTOHex(R^.Ranges[I].wcLow,4) + '-';
  19.           S:=S + IntToHex(R^.Ranges[I].wcLow + R^.Ranges[i].cGlyphs-1,4);
  20.           Combobox1.Items.Add(S);
  21.         end;
  22.        FreeMem(R,C);
  23.      end;
  24. end;

For Font 'Courier New' I got on Windows 7 this list of 181 Unicode-ranges:
Code: Text  [Select][+][-]
  1. $0020-007E
  2. $00A0-01FF
  3. $0200-0220
  4. $0221-0236 => 0237 missing (correct)
  5. $0238-0259
  6. $025A-02AD
  7. $02AE-02E4 => 02E5..02EB missing (but does exist!)
  8. $02EC-0323
  9. $0324-034F
  10. $0350-036F => 0370..0373 missing (correct)
  11. $0374-0375 => 0376..0379 missing (correct)
  12. $037A-037E => 037F..0383 missing (correct)
  13. $0384-038A => 038B missing (correct)
  14. $038C-038C => 038D missing (correct)
  15. $038E-03A1
  16. ...

Most of this information is correct, but range U+02E5..U+02EB is announced to not exist (see above yellow marked), but does exist (see screenshot in red). This is not the only discrepancy which I found. So this info seems not to be 100% reliable...

One more reason that I hope for help getting function TFreeTypeFont.CharWidthFromUnicode() to work (see reply #5), which would be a solution for Linux too.

dsiders

  • Hero Member
  • *****
  • Posts: 1330
Re: How to detect if an UTF8-char is not contained in a Font?
« Reply #7 on: December 12, 2023, 01:46:26 am »
Thank you to all for your posts and suggestions. I will investigate them one by one.

Apparently TFont returns a default glyph for unsupported characters. TFreeTypeFont has access to the real glyph information , and even has TFreeTypeFont.CharWidthFromUnicode() to get the real glyph width. 0 would be a missing glyph.

This sounds *very* interesting, because it would be a solution for Windows *and* Linux. But I did not manage it to work in a reliable time. This is my code:

Code: Pascal  [Select][+][-]
  1. uses EasyLazFreeType; {for class 'TFreeTypeFont'}
  2.  
  3. type CharUTF8 = string[4]; {string for 1 UTF8-char}
  4.      s255 = string[255];
  5.  
  6. function short_PChar(s: s255): PChar;
  7.    {converts a short string to PChar}
  8.    var sa: ansi;
  9.    begin
  10.    sa:=s;
  11.    exit(PChar(sa));
  12.    end;
  13.  
  14. procedure check_UTF8char_contained(s: CharUTF8);
  15.    {checks if an UTF8-character is contained in a Font}
  16.    var FTF: TFreeTypeFont;
  17.        pc: PChar;
  18.        len: single;
  19.        uc: cardinal;
  20.        charlen: longint;
  21.    begin
  22.    pc:=short_PChar(s); {convert UTF8 to Unicode: }
  23.    uc:=UTF8CodepointToUnicode(pc,charlen);
  24.    writeln('uc=', uc);
  25.  
  26.    FTF:=TFreeTypeFont.Create;
  27. // FTF.SetNameAndStyle('Arial',[]);
  28.    FTF.Name:='Arial';
  29.  
  30.    len:=FTF.CharWidthFromUnicode(uc); {=> Access Violation}
  31.    writeln('len=', len);
  32.    FTF.Free;
  33.    end;
  34.  
  35. procedure Test_UTF8char_contained;
  36.    begin
  37.    check_UTF8char_contained(#$C2#$A7);
  38.    check_UTF8char_contained(#$C2#$87);
  39.    check_UTF8char_contained(#$E2#$86#$91);
  40.    end;

Because I have UTF8-characters like #$C2#$87 or #$E2#$86#$91 and function 'CharWidthFromUnicode' needs Unicode, I tried to convert this via 'UTF8CodepointToUnicode'. I hope this was correct (my knowledge about Unicode is near zero).

Above code throws an Access Violation in procedure TFreeTypeFont.LoadFace in Unit EasyLazFreeType:
Code: Pascal  [Select][+][-]
  1. procedure TFreeTypeFont.LoadFace;
  2. var errorNum: TT_Error;
  3.     familyItem: TCustomFamilyCollectionItem;
  4.     fontItem: TCustomFontCollectionItem;
  5. begin
  6.   DiscardFace;
  7.   if FStream <> nil then
  8.   begin
  9.     errorNum := TT_Open_Face(FStream,False,FFace);
  10.     if errorNum <> TT_Err_Ok then
  11.       raise exception.Create('Cannot open font (TT_Error ' + intToStr(errorNum)+') <Stream>');
  12.   end else
  13.   begin
  14.     if Pos(PathDelim, FName) <> 0 then
  15.     begin
  16.       errorNum := TT_Open_Face(FName,FFace);
  17.       if errorNum <> TT_Err_Ok then
  18.         raise exception.Create('Cannot open font (TT_Error ' + intToStr(errorNum)+') "'+FName+'"');
  19.     end else
  20.     begin
  21.       familyItem := Collection.Family[FName]; // Access Violation here
  22.       if familyItem = nil then
  23.         raise exception.Create('Font family not found ("'+FName+'")');
  24.       fontItem := familyItem.GetFont(FStyleStr);
  25.       if fontItem = nil then
  26.         raise exception.Create('Font style not found ("'+FStyleStr+'")');
  27.       FFace := fontItem.QueryFace(FontCollectionItemDestroyListener(self,@OnDestroyFontItem));
  28.       FFaceItem := fontItem;
  29.     end;
  30.   end;
  31.  
  32.   FFaceLoaded:= true;
  33.   UpdateInstance;
  34. end;

I found no documentaion for Unit EasyLazFreeType in a reliable time.

What I have is a Font-name from var 'Screen.Fonts'. I don't know, where the Font-file resides (on Linux there are a lot of folders with Fonts).

Please help. Meanwhile I will investigate the other suggestions.

There is a unit that is part of TAChart that demonstrates initializing the font collection in Free Type and getting a font with a specific name.
$(LazarusDir)/components/tachart/tafonts.pas. Pretty handy stuff.
Preview the next Lazarus documentation release at: https://dsiders.gitlab.io/lazdocsnext

nanobit

  • Full Member
  • ***
  • Posts: 165
Re: How to detect if an UTF8-char is not contained in a Font?
« Reply #8 on: December 12, 2023, 07:52:48 am »
Most of this information is correct, but range U+02E5..U+02EB is announced to not exist (see above yellow marked), but does exist (see screenshot in red). This is not the only discrepancy which I found. So this info seems not to be 100% reliable...

GetFontUnicodeRanges() might say the truth for U+02E5..U+02EB, whereas the textdraw (in your grid) uses fallback fonts.

Hartmut

  • Hero Member
  • *****
  • Posts: 891
Re: How to detect if an UTF8-char is not contained in a Font?
« Reply #9 on: December 12, 2023, 03:03:07 pm »
GetFontUnicodeRanges() might say the truth for U+02E5..U+02EB, whereas the textdraw (in your grid) uses fallback fonts.

Maybe this is the root cause.



There is a unit that is part of TAChart that demonstrates initializing the font collection in Free Type and getting a font with a specific name.
$(LazarusDir)/components/tachart/tafonts.pas. Pretty handy stuff.

Thank you very much for this info. Using Unit 'TAFonts' I got my code to run. Unfortunately the results are wrong:

Code: Pascal  [Select][+][-]
  1. type CharUTF8 = string[4]; {for 1 UTF8-char}
  2.      s255 = string[255];
  3.  
  4. function hexString(s: string): string;
  5.    {converts the chars of 's' to a hex string}
  6.    var z: string;
  7.        i: integer;
  8.    begin
  9.    z:=''; for i:=1 to length(s) do  z:=z + HexStr(ord(s[i]),2);
  10.    exit(z);
  11.    end;
  12.  
  13. function short_PChar(s: s255): PChar;
  14.    {converts a short string to a PChar}
  15.    var sa: ansistring;
  16.    begin
  17.    sa:=s;
  18.    exit(PChar(sa));
  19.    end;
  20.  
  21. const FontName = 'Arial'; {choose a Font}
  22.  
  23. procedure check_UTF8char_contained(s: CharUTF8);
  24.    {checks if UTF8-character 's' is contained in the Font 'FontName'}
  25.    var FTF: TFreeTypeFont;
  26.        pc: PChar;
  27.        len: single;
  28.        uc: cardinal;
  29.        charlen: longint;
  30.    begin
  31.    pc:=short_PChar(s); {convert UTF8 to Unicode: }
  32.    uc:=LazUTF8.UTF8CodepointToUnicode(pc, charlen);
  33.  
  34.    TAFonts.InitFonts(''); {is only executed the 1st time}
  35.    FTF:=TAFonts.LoadFont(FontName,[]); {load Font and create 'FTF'}
  36.  
  37.    if FTF=nil then len:=-999
  38.       else len:=FTF.CharWidthFromUnicode(uc); {get glyph length}
  39.  
  40.    Form1.Memo1.Lines.Add('$' + hexString(s) + ' = U+' + HexStr(uc,4) + ' = "' +
  41.                          s + '" len=' + IntToStr(round(len)));
  42.    FTF.Free;
  43.    end;
  44.  
  45. { TForm1 }
  46.  
  47. procedure TForm1.Button1Click(Sender: TObject);
  48.    begin
  49.    Memo1.Clear;
  50.    Memo1.Font.Name:=FontName;
  51.    Memo1.Font.Height:=20;
  52.    check_UTF8char_contained(#$C2#$A7); {exists}
  53.    check_UTF8char_contained(#$C2#$87); {not exists}
  54.    check_UTF8char_contained(#$CD#$B7); {exists on Linux only}
  55.    check_UTF8char_contained(#$CD#$B8); {not exists}
  56.    end;

As you can see in the attached screenshots for Linux and Windows, there never a length of zero is returned, even for non existing characters. Did I something wrong? Which OS did you test?

I attached my code as a compilable project.

Hartmut

  • Hero Member
  • *****
  • Posts: 891
Re: How to detect if an UTF8-char is not contained in a Font?
« Reply #10 on: December 16, 2023, 01:26:47 pm »
I closed this Topic because a solution for Windows was found. For Linux I opened a new Topic in the Linux Forum https://forum.lazarus.freepascal.org/index.php/topic,65554.0.html
Please write any Linux related stuff there. Thanks.

 

TinyPortal © 2005-2018