Recent

Author Topic: Example how to decode GB-2312-HZ Chinese text?  (Read 4491 times)

AlexTP

  • Hero Member
  • *****
  • Posts: 2557
    • UVviewsoft
Example how to decode GB-2312-HZ Chinese text?
« on: December 31, 2022, 07:26:41 pm »
I need the example code for reading HZ-encoded files - ie to convert contents of such file to UTF8String. I attach the small example file. I need that by using libiconv as the main Unix method.

HZ files have Asian chars encoded to ~{....} ascii parts.

I prefer cross-platform way using SetCodepage() RTL procedure.
« Last Edit: December 31, 2022, 07:31:30 pm by AlexTP »

paweld

  • Hero Member
  • *****
  • Posts: 1351
Best regards / Pozdrawiam
paweld

AlexTP

  • Hero Member
  • *****
  • Posts: 2557
    • UVviewsoft
Re: Example how to decode GB-2312-HZ Chinese text?
« Reply #2 on: December 31, 2022, 08:03:14 pm »
No! See code: 'Chinese, essentially the same as GB 2312' - but GB 2312-HZ is different one!

tetrastes

  • Hero Member
  • *****
  • Posts: 640
Re: Example how to decode GB-2312-HZ Chinese text?
« Reply #3 on: January 04, 2023, 11:21:56 pm »
It seems that libiconv doesn't support GB 2312-HZ, at least at Ubuntu 22.04.1, which has rather fresh glibc:
Code: Bash  [Select][+][-]
  1. $ iconv -V
  2. iconv (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
  3. Copyright (C) 2022 Free Software Foundation, Inc.
  4.  
  5. $ iconv -l | grep 2312
  6. CSGB2312//
  7. GB2312//

For comparison
Code: Bash  [Select][+][-]
  1. $ uconv -l | grep 2312
  2. ibm-1383_P110-1999 ibm-1383 GB2312 csGB2312 cp1383 1383 EUC-CN ibm-eucCN hp15CN ibm-1383_VPUA
  3. ibm-5478_P100-1995 ibm-5478 GB_2312-80 chinese iso-ir-58 csISO58GB231280 gb2312-1980 GB2312.1980-0
  4. HZ HZ-GB-2312

So I'm afraid that you have to use libicu, but the good news is that it is portable (and windows 10 has it already: https://learn.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-).

Just in case I attach the result of
Code: Bash  [Select][+][-]
  1. $ uconv -f HZ -t UTF-8 -o hz2ut8.txt test-gb2312-hz.txt
Is it correct?

AlexTP

  • Hero Member
  • *****
  • Posts: 2557
    • UVviewsoft
Re: Example how to decode GB-2312-HZ Chinese text?
« Reply #4 on: January 05, 2023, 02:24:18 pm »
So I need to use ICU on Windows, but  for my cross-platform app CudaText it is not the good way.

zen010101

  • Newbie
  • Posts: 5
Re: Example how to decode GB-2312-HZ Chinese text?
« Reply #5 on: February 01, 2025, 05:49:39 am »
@AlexTP

You can use code 52936 with SetCodePage, from RTL of FPC, to transocde to/from HZ-GB2312 charset. This way works on both Windows and Linux.

BTW: I'm the fan of the CudaText, I hope it can support GB18030 charset, which is China's  latest and most comprehensive character set. You can use the same way mentioned above, that the code page of GB18030 is 54936.

AlexTP

  • Hero Member
  • *****
  • Posts: 2557
    • UVviewsoft
Re: Example how to decode GB-2312-HZ Chinese text?
« Reply #6 on: February 01, 2025, 06:48:57 am »
I still don't understand where can I find Pascal code of
52936 and 54936 codepages. They are embedded to FPC. But where are codepages codes?

To add them to CudaText, all we need is support them here:
https://github.com/Alexey-T/EncConv
Maybe give the patch to EncConv?
« Last Edit: February 01, 2025, 07:25:08 am by AlexTP »

Zoran

  • Hero Member
  • *****
  • Posts: 1919
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: Example how to decode GB-2312-HZ Chinese text?
« Reply #7 on: February 01, 2025, 07:57:11 am »
I still don't understand where can I find Pascal code of
52936 and 54936 codepages. They are embedded to FPC. But where are codepages codes?

I don't know where they are in fpc. Anyway, wherever they were, seems to me that these values are used: https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers.

And I don't know if all these encodings are supported in fpc.
Swan, ZX Spectrum emulator https://github.com/zoran-vucenovic/swan

TRon

  • Hero Member
  • *****
  • Posts: 4271
Re: Example how to decode GB-2312-HZ Chinese text?
« Reply #8 on: February 01, 2025, 08:04:15 am »
Did you mean charmap and ucmap ?
Today is tomorrow's yesterday.

AlexTP

  • Hero Member
  • *****
  • Posts: 2557
    • UVviewsoft
Re: Example how to decode GB-2312-HZ Chinese text?
« Reply #9 on: February 01, 2025, 08:43:57 am »
Did you mean charmap and ucmap ?
I don't know what part of FPC has HZ encoding. I think HZ encoding support will depend on libiconv version, maybe it can work on one Linux version but not on another. Also may not work on BSD/Solaris/Mac.

 

TinyPortal © 2005-2018