Recent

Author Topic: Example how to decode GB-2312-HZ Chinese text?  (Read 2578 times)

AlexTP

  • Hero Member
  • *****
  • Posts: 2386
    • UVviewsoft
Example how to decode GB-2312-HZ Chinese text?
« on: December 31, 2022, 07:26:41 pm »
I need the example code for reading HZ-encoded files - ie to convert contents of such file to UTF8String. I attach the small example file. I need that by using libiconv as the main Unix method.

HZ files have Asian chars encoded to ~{....} ascii parts.

I prefer cross-platform way using SetCodepage() RTL procedure.
« Last Edit: December 31, 2022, 07:31:30 pm by AlexTP »

paweld

  • Hero Member
  • *****
  • Posts: 970
Best regards / Pozdrawiam
paweld

AlexTP

  • Hero Member
  • *****
  • Posts: 2386
    • UVviewsoft
Re: Example how to decode GB-2312-HZ Chinese text?
« Reply #2 on: December 31, 2022, 08:03:14 pm »
No! See code: 'Chinese, essentially the same as GB 2312' - but GB 2312-HZ is different one!

tetrastes

  • Sr. Member
  • ****
  • Posts: 473
Re: Example how to decode GB-2312-HZ Chinese text?
« Reply #3 on: January 04, 2023, 11:21:56 pm »
It seems that libiconv doesn't support GB 2312-HZ, at least at Ubuntu 22.04.1, which has rather fresh glibc:
Code: Bash  [Select][+][-]
  1. $ iconv -V
  2. iconv (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
  3. Copyright (C) 2022 Free Software Foundation, Inc.
  4.  
  5. $ iconv -l | grep 2312
  6. CSGB2312//
  7. GB2312//

For comparison
Code: Bash  [Select][+][-]
  1. $ uconv -l | grep 2312
  2. ibm-1383_P110-1999 ibm-1383 GB2312 csGB2312 cp1383 1383 EUC-CN ibm-eucCN hp15CN ibm-1383_VPUA
  3. ibm-5478_P100-1995 ibm-5478 GB_2312-80 chinese iso-ir-58 csISO58GB231280 gb2312-1980 GB2312.1980-0
  4. HZ HZ-GB-2312

So I'm afraid that you have to use libicu, but the good news is that it is portable (and windows 10 has it already: https://learn.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-).

Just in case I attach the result of
Code: Bash  [Select][+][-]
  1. $ uconv -f HZ -t UTF-8 -o hz2ut8.txt test-gb2312-hz.txt
Is it correct?

AlexTP

  • Hero Member
  • *****
  • Posts: 2386
    • UVviewsoft
Re: Example how to decode GB-2312-HZ Chinese text?
« Reply #4 on: January 05, 2023, 02:24:18 pm »
So I need to use ICU on Windows, but  for my cross-platform app CudaText it is not the good way.

 

TinyPortal © 2005-2018