Recent

Author Topic: Chinese Fonts in fpPDF  (Read 3640 times)

dbannon

  • Hero Member
  • *****
  • Posts: 3156
    • tomboy-ng, a rewrite of the classic Tomboy
Chinese Fonts in fpPDF
« on: September 26, 2024, 08:41:08 am »
I believe that its not possible to display Chinese characters in a PDF using fpPDF (included in the FCL). Further, it seem the font cache in fptty cannot work with truetype fonts that have been 'collected' into a .ttc file. Thats sad as more and more (most?) fonts in Linux and MacOS are .ttc and Windows seems to be catching up too.

So, can interested parties have a look at my 'simple' demos and see if I have missed something, I honestly know very little about Chinese fonts !  For the record, I don't think its a UTF8 3 byte issue, other font sets using 3 bytes seem to work OK. The TTF fonts I use do seem, themselves happy to display Chinese characters on screen, just not in a the PDF.

I have attached two demo source files, both derived partly from the examples provided with the fpPDF unit. One is, perhaps the simplest code to make a PDF, down side is it requires hardwired paths to font files, it does not use the gTTFontCache that can make that easy.

The second demo source file, uses the cache, is capable of finding the fonts you request all by itself. But a bit harder to understand with a quick (or even slow) glance.

Both files are purely command line driven and are compiled with just a simple fpc line. I suggest at least FPC-Fixes, quite a lot of work seems to have gone into the caches, particularly for Windows since 3.2.2 (thanks!).

I am working with FPC-Fixes on Linux (Debian 12) and Windows 11.

I have included only the FreeSans font in the example zip because thee specialist Chinese ones are too big.

David
« Last Edit: September 26, 2024, 10:03:21 am by dbannon »
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

Thaddy

  • Hero Member
  • *****
  • Posts: 16179
  • Censorship about opinions does not belong here.
Re: Chinese Fonts in fpPDF
« Reply #1 on: September 26, 2024, 02:15:28 pm »
I have included only the FreeSans font in the example zip because thee specialist Chinese ones are too big.
From experience I know that it can bite you if you use the wrong font. In fppdf it is assumed you use a correct font, although there might be some issue with font families where it is possible that the wrong font substitution is made.
My font of choice is NSimSun which should work with fppdf. (I use that font for all my Chinese language related answers.)
Let me know if that works. And yes, it is big..
Note it not only covers simplified Chinese but also Japanese and Cyrillic and latin alphabets.
https://learn.microsoft.com/en-us/typography/font-list/simsun
This font can also be used on Linux.
Note that my knowledge of Chinese is very, very limited, but not absent.
« Last Edit: September 26, 2024, 02:28:39 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

rvk

  • Hero Member
  • *****
  • Posts: 6584
Re: Chinese Fonts in fpPDF
« Reply #2 on: September 26, 2024, 05:07:32 pm »
The problem might be that SimSun-ExtB is a .ttf file and SimSun/NSimSun is a .ttc file.
See https://learn.microsoft.com/nl-nl/typography/font-list/simsun

And in fpTTF.pp only .ttf and .otf files are read when building the FontCache.

Code: Pascal  [Select][+][-]
  1.         if (lowercase(ExtractFileExt(s)) = '.ttf') or
  2.            (lowercase(ExtractFileExt(s)) = '.otf') then
  3.         begin

So... .ttc files are not read into the FontCache and you can't use them with Doc.AddFont.

Of course you could try
Code: Pascal  [Select][+][-]
  1.     FontCh := Doc.AddFont('c:\windows\fonts\simsun.ttc', 'NSimSun');
but my guess is that fpPDF doesn't handle .ttc files at all !!

From TPDFDocument.LoadFont()
Code: Pascal  [Select][+][-]
  1.   if FileExists(lFName) then
  2.   begin
  3.     s := LowerCase(ExtractFileExt(lFName));
  4.     Result := (s = '.ttf') or (s = '.otf');
  5.   end
  6.   else
  7.     Raise EPDF.CreateFmt(rsErrReportFontFileMissing, [lFName]);

My font of choice is NSimSun which should work with fppdf. (I use that font for all my Chinese language related answers.)
Let me know if that works. And yes, it is big..
So Thaddy, can you show us a short simple example where you use NSimSun in fpPDF ??


(BTW. I think mORMot2 does have support for SimSun and ttc fonts)
« Last Edit: September 26, 2024, 05:10:49 pm by rvk »

dbannon

  • Hero Member
  • *****
  • Posts: 3156
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Chinese Fonts in fpPDF
« Reply #3 on: September 27, 2024, 02:58:55 am »
Thanks Thaddy, RVK.

The problem might be that SimSun-ExtB is a .ttf file and SimSun/NSimSun is a .ttc file.

Indeed rvk, the font cache (fpttf) will not load .ttc file.  It does not understand the format. And fpPDF will not load .ttc files either. So, .ttc files are ruled out.

The font cache does load .otf fonts and seems to understand them but when it it passes the font file name to fpPDF, it fails. Actually seems to try to load the font but crashes when the document is being finalised. Same thing happens if you supply the font file directly into fpPDF.

Thats why I included a demo that does NOT use the fontcache. The font name and path is "hard wired" in the call to Doc.AddFont().

One problem is fpPDF can work with ONLY .ttf files. Another is that even with a .ttf file (that does have Chinese characters) it does not display them.

The issues with the font cache are separate and, perhaps, incidental, it does not work with .ttc files. They are forecast to be future of fonts being space effective and I guess far better on a platform like Windows with poor disk i/o performance.

I read in one place that .ttf fonts differentiate between "screen fonts" and "printer fonts" and .otf fonts do not have that problem. But I could not find any further details of this claim. But as fpPDF is not interested in .otf fonts anyway ....

Quote
So Thaddy, can you show us a short simple example where you use NSimSun in fpPDF ??
Indeed, that would be useful. My Windows install does not have NSimSun but it would be trivial, on a system with it, to modify the simpleFPpdf.pas demo I provided to test it.

Quote
(BTW. I think mORMot2 does have support for SimSun and ttc fonts)

Even my app, tomboy-ng, will display Chinese characters on screen and printed. Just using the default Desktop font, Cantrell and, just tested, using Nimbus Sans, a .ttc font. (Linux, Debian 12).  No special action taken in either screen display of printing, just request that font.

Thanks for your interest.

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

Thaddy

  • Hero Member
  • *****
  • Posts: 16179
  • Censorship about opinions does not belong here.
Re: Chinese Fonts in fpPDF
« Reply #4 on: September 27, 2024, 07:21:16 am »
You could also try the noto sans font.
The ttf version of simsun is simsunb.ttf (huge)
You can use a tool like fontforge to extract ttf's from ttc files. The version in the ttc file is smaller, but still big.
If the font is ttf it will work with fppdf.
I have no example available, but it should always work.
« Last Edit: September 27, 2024, 07:32:22 am by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

dbannon

  • Hero Member
  • *****
  • Posts: 3156
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Chinese Fonts in fpPDF
« Reply #5 on: September 27, 2024, 08:41:00 am »
You could also try the noto sans font.
I tried using noto sans CJK SC, its part of a .ttc collection and does not work. So, I tried using fontforge to extract (actually reconstruct) the ,ttf file and, also, that did not work.
Quote from: Thaddy
The ttf version of simsun is simsunb.ttf (huge)
I understand that is a Windows (propriety font). Yes, of course its huge, there are something like 90,000 pictographs in the base character set. Do a bold, italics version of that and, wow !  A couple of days ago I did try getting the .ttf file from some open source CJK font collections, just as big to start with, Came down to a relatively small .ttf file but made no difference.

Quote from: Thaddy
If the font is ttf it will work with fppdf.

Sigh...
Thaddy, please read my long winded story above. I have tried all these approaches several days ago. It did not work. I extracted the "Noto Sans CJK SC" .ttf file and used it directly, without the font cache. It did not work. I tried simsun-extB on Windows and it did not work. I have other fonts here that include Chinese characters, they display on screen and print to my printer from my app. But they do not produce the goods when used with fpPDF.
Quote from: Thaddy

I have no example available, but it should always work.
Thaddy, my first post in this thread contains a zip file, two source files with .pas extension. The one called simpleFPpdf.pas is quite short, around line #71, replace my reference to "fonts/simsunb.ttf" with a full or relative reference to the font you believe does work and please try it. If it cannot be made work with like that, either there is something wrong with my code (what?) or it just does not work.
 
Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

paweld

  • Hero Member
  • *****
  • Posts: 1268
Re: Chinese Fonts in fpPDF
« Reply #6 on: September 27, 2024, 01:30:18 pm »
Attached is a working example - I checked on Windows 10 and Debian 12.
In the directory with the program, place the files (you only need one) TTF: NotoSerifCJKsc-VF.ttf [ https://files.brudnopis.ovh/file/SlUi3ZVso1en7Lc4/aWCITfFAib6AODWI/NotoSerifCJKsc-VF.7z ~22MB] and NotoSansCJKsc-VF.ttf [ https://files.brudnopis.ovh/file/SlUi3ZVso1en7Lc4/R0OEY1sXE5j1hjBG/NotoSansCJKsc-VF.7z ~13MB] (or the whole packages can be downloaded from the author's repository: https://github.com/notofonts/noto-cjk/releases )

I am working in Lazarus trunk with FPC 3.2-fixes, but I checked and it works with Lazarus 3.2 with FPC 3.2.2 without any problem
Best regards / Pozdrawiam
paweld

rvk

  • Hero Member
  • *****
  • Posts: 6584
Re: Chinese Fonts in fpPDF
« Reply #7 on: September 27, 2024, 02:48:16 pm »
I also think it is possible to convert the simsum.ttc to simsum.ttf. and use that.
You probably need to embed the font (of subset) in that case.

Thaddy

  • Hero Member
  • *****
  • Posts: 16179
  • Censorship about opinions does not belong here.
Re: Chinese Fonts in fpPDF
« Reply #8 on: September 27, 2024, 06:01:58 pm »
Note that the ttc format is quite easy too: it is just a header and a directory to a series of ttf fonts.
If I smell bad code it usually is bad code and that includes my own code.

rvk

  • Hero Member
  • *****
  • Posts: 6584
Re: Chinese Fonts in fpPDF
« Reply #9 on: September 27, 2024, 06:11:04 pm »
Yeah, I see it. Just a SimSun-01.ttf and NSimSun-02.ttf (last one should be monospaced).

fpPDF could have easily done that  ;)

dsiders

  • Hero Member
  • *****
  • Posts: 1282
Re: Chinese Fonts in fpPDF
« Reply #10 on: September 27, 2024, 09:44:18 pm »
Yeah, I see it. Just a SimSun-01.ttf and NSimSun-02.ttf (last one should be monospaced).

fpPDF could have easily done that  ;)

I did not even know about .ttc collections. Submit a feature request...
Preview the next Lazarus documentation release at: https://dsiders.gitlab.io/lazdocsnext

dbannon

  • Hero Member
  • *****
  • Posts: 3156
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Chinese Fonts in fpPDF
« Reply #11 on: September 28, 2024, 02:41:47 am »
Attached is a working example -
....

Wow, yep, you example works for me too. The difference between your code and mine is poSubsetFont as an option to the (pdf) doc. That appears to be incompatible with the Chinese fonts (but OK with all the others). Turning it off does not prevent font embedding (of a .ttf font) so, no longer trying to use .ttc files, I'll leave it off and do some more testing.

I am somewhat confronted by font files compressed with 7z, thats a new one on me !  But seems to work.  And I will need to experiment with the .ttf files I extracted using FontForge from Noto Sans CJK SC (rather than from your web link). I expect mine will work too now I have poSubsetFont turned off. Sigh...

Thanks heaps paweld, for restoring my faith in fcl-pdf, I was convinced it was a far more serious problem !

Davo

edit: clarify need for poSubsetFont

« Last Edit: September 28, 2024, 05:50:11 am by dbannon »
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

jianwt

  • Full Member
  • ***
  • Posts: 125
Re: Chinese Fonts in fpPDF
« Reply #12 on: September 28, 2024, 02:52:15 am »
Hope to have a perfect solution for Chinese, I use mORMot2 to generate PDF files, encounter Chinese font is also reported error!

dbannon

  • Hero Member
  • *****
  • Posts: 3156
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Chinese Fonts in fpPDF
« Reply #13 on: September 28, 2024, 03:14:55 am »
I also think it is possible to convert the simsum.ttc to simsum.ttf. and use that.
You probably need to embed the font (of subset) in that case.

Thats why I had poSubsetFont turned on in my example. And that was the cause of my problems.

simsum is a Microsoft proprietary font, so, the license may no permit such deconstruction.

Further, and I am on seriously shakey ground here, I suspect the .ttc fonts I was looking at are sometimes collections of .otf fonts rather than .ttf . Thats based on superficial observations of what FontForge was showing me. To make it more difficult, apparently its not unusual to see the extensions .otf and .ttf used interchangeably, nearly so but not correct.

And I still believe that .otf fonts don't work in fppdf (but wonder if that was also a poSubsetFont issue ?). More research is indicated.

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

dbannon

  • Hero Member
  • *****
  • Posts: 3156
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Chinese Fonts in fpPDF
« Reply #14 on: September 28, 2024, 08:23:49 am »
OK, here is a summary of what I have found. Firstly, just about all of what I have said further up this document is either wrong or dodgy !  And for that I apologize !

The reason why paweld's code worked (and mine did not) is primarily because of the fppdf doc option poSubsetFont, setting it totally messes with Chinese .ttf fonts and .otf fonts. (I suspect that some chinese .tty fonts are, in fact .otf.). Further rambling notes -

  • fpPDF cannot use .ttc font files.
  • fpPDF and the Font cache can use .otf fonts (if poSubsetFont is not true).
  • On Linux, eg "Noto Sans CJK SC" and friends are preinstalled as .ttc fonts. I extracted .otf fonts from the .ttc collection using FontForge and it works fine. (I did this a week ago, thats why paweld code worked even though the fonts he recommended were still in a 7z archive, sigh ...)
  • simsun-extB does not work even though it appears to be .ttf. simsun is a .ttc font collection and license constrained.
  • "Source Han Serif SC VF" downloaded from github, works, get https://github.com/adobe-fonts/source-han-serif/raw/release/Variable/TTF/SourceHanSerifSC-VF.ttf but probably better to get Noto.
  • Noto CJK in a range of weights, styles and region focus are available from https://github.com/notofonts/noto-cjk and thats a better option that using FontForge and only option for Windows users.
  • Noto CJK SC is also able to show the whole (?) Latin character set but not many of the extended characters used in European languages. As such, its perhaps usable as a default font for a Chinese speaker who occasionally lapses into English.
  • I have not been able to test the full range of Windows Chinese fonts, Microsoft wants me to install an apparently huge bundle and I have Windows on a quite small disk partition.


My model in my application will be to, offer a window that allows user to specify a font they would rather use in the PDF. If they do so, I will check that the font can be found (by the Cache) and, if so, that it is a .ttf or .otf.

As an alternative, I could look at each character, one by one, testing first to see if its a three byte character, then that its in the CJK range, then the above test to see if we can make it usable. This approach would then be able to handle extended European and CJK in the same document. But would be substantially slower and require more code. PDF is quite incidental to my app.

Thanks folks for the very positive help you have provided here, especially paweld who produced that magic code !

I will get a lot of this information on to the wiki as it becomes more proven. Will post here when done.

Davo





« Last Edit: September 28, 2024, 11:47:00 am by dbannon »
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

 

TinyPortal © 2005-2018