Recent

Author Topic: Test availability of OS encodings, how?  (Read 387 times)

AlexTP

  • Hero Member
  • *****
  • Posts: 2045
    • UVviewsoft
Test availability of OS encodings, how?
« on: December 27, 2022, 12:27:40 pm »
I use funcs to convert encoding to/from UTF8. They work good. (Lazarus app, so it is utf8 in strings).

Code: Pascal  [Select][+][-]
  1. function StrUTF8ToEnc(const S: string; Enc: TSystemCodePage): string;
  2. var
  3.   buf: RawByteString;
  4. begin
  5.   if S='' then exit('');
  6.   buf:= S;
  7.   SetCodePage(buf, Enc, true);
  8.   SetCodePage(buf, CP_UTF8, false);
  9.   Result:= buf;
  10.   if Result='' then
  11.     raise EConvertError.Create('Cannot convert UTF-8 to DBCS code page');
  12. end;
  13.  
  14. function StrEncToUTF8(const S: string; Enc: TSystemCodePage): string;
  15. var
  16.   buf: RawByteString;
  17. begin
  18.   if S='' then exit('');
  19.   buf:= S;
  20.   SetCodePage(buf, Enc, false);
  21.   SetCodePage(buf, CP_UTF8, true);
  22.   Result:= buf;
  23.   if Result='' then
  24.     raise EConvertError.Create('Cannot convert DBCS code page to UTF-8');
  25. end;
  26.  

Now I want to check, in cross-platform way, the availability of encodings. E.g. EUC-JP (codepage 51932), EUC-KR (codepage 51949), EUC-TW (51950). I try to see are my funcs giving exception for text 'ABC' (the StrUTF8ToEnc works better, another one always gives 'ok'). It detects non-existence of EUC-JP on Windows 10. But it gives 'all ok' for EUC-KR and EUC-TW on Windows-10, but I see from SynWrite editor that these 2 encodings are not supported on Win10. So my method is not OK.

How to test in cross-platform way?
« Last Edit: December 27, 2022, 12:43:29 pm by AlexTP »

AlexTP

  • Hero Member
  • *****
  • Posts: 2045
    • UVviewsoft
Re: Test availability of OS encodings, how?
« Reply #1 on: December 27, 2022, 12:37:04 pm »
SynWrite checks the encoding availability by Windows-only code:

Code: Pascal  [Select][+][-]
  1.   function EncOK(n: integer): boolean;
  2.   const
  3.     p: AnsiString = 'pppp';
  4.   begin
  5.     Result:= MultiByteToWideChar(
  6.       n, 0,
  7.       PAnsiChar(p), Length(p),
  8.       nil, 0) > 0;
  9.   end;
  10.  

KodeZwerg

  • Hero Member
  • *****
  • Posts: 1181
  • Fifty shades of code.
    • Delphi & FreePascal
Re: Test availability of OS encodings, how?
« Reply #2 on: December 27, 2022, 04:51:42 pm »
Hello Alex, maybe this small snippet helps you out a little, done with Delphi and hope that Lazarus/FPC does do it same way, untested!
Code: Pascal  [Select][+][-]
  1. function TkzFileInfoResource.GetManifest(var Dump: TStringDynArray): Boolean;
  2. var
  3.   rs: TResourceStream;
  4.   h: THandle;
  5.   str: string;
  6.   LBuffer: TBytes;
  7.   LEncoding, DestEncoding: TEncoding;
  8.   LOffset: Integer;
  9. begin
  10.   Result := False;
  11.   if (not fCreated) then
  12.     Exit(False);
  13.   SetLastError(ERROR_SUCCESS);
  14.   h := LoadLibraryEx(PChar(fFileName), 0, LOAD_LIBRARY_AS_DATAFILE);
  15.   if ((h = INVALID_HANDLE_VALUE) or (h = 0)) then
  16.     Exit(False);
  17.   if (FindResource(h, MakeIntResource(1), RT_MANIFEST) = 0) then
  18.     begin
  19.       FreeLibrary(h);
  20.       Exit(False);
  21.     end;
  22.   try
  23.     rs := TResourceStream.CreateFromID(h, 1, RT_MANIFEST);
  24.     try
  25.       str := '';
  26.       SetLength(LBuffer, rs.Size);
  27.       rs.ReadBuffer(Pointer(LBuffer)^, Length(LBuffer));
  28.       LEncoding:= nil;
  29.       DestEncoding := TEncoding.Unicode;
  30.       LOffset := TEncoding.GetBufferEncoding(LBuffer, LEncoding);
  31.       if ((LEncoding <> nil) and (LEncoding.EncodingName <> DestEncoding.EncodingName)) then
  32.         LBuffer := LEncoding.Convert(LEncoding, DestEncoding, LBuffer, LOffset, Length(LBuffer) - LOffset);
  33.       str := TEncoding.Unicode.GetString(LBuffer, 0, Length(LBuffer));
  34.       if Length(str) > 0 then
  35.         begin
  36.           AddDynStr(Dump, str);
  37.           AddDynStr(Dump, '');
  38.           AddDynStr(Dump, Format('(detected %d chars in %d byte of "%s (%s)" encoded data.)', [LEncoding.GetCharCount(LBuffer), rs.Size, LEncoding.EncodingName, LEncoding.MIMEName], fFormatSettings));
  39.           Result := True;
  40.         end;
  41.     finally
  42.       rs.Free;
  43.     end;
  44.   except
  45.   end;
  46.   FreeLibrary(h);
  47. end;
(Warning: In Delphi type string is defined as UnicodeString not AnsiString!)
Above method will take a from a file the manifest, read all raw in, compute encoding, write all unicode encoded back to a string array and add its original encoding.
« Last Edit: Tomorrow at 31:76:97 by KodeZwerg »

AlexTP

  • Hero Member
  • *****
  • Posts: 2045
    • UVviewsoft
Re: Test availability of OS encodings, how?
« Reply #3 on: December 27, 2022, 04:56:01 pm »
Sorry, how can it be used to detect that codepage 51950 (for ex) is available?

KodeZwerg

  • Hero Member
  • *****
  • Posts: 1181
  • Fifty shades of code.
    • Delphi & FreePascal
Re: Test availability of OS encodings, how?
« Reply #4 on: December 27, 2022, 05:10:47 pm »
Yeah after re-read I see my mistake, I do it like above to not need care about what system support or not support, just convert it to make it compatible for displaying was my approach.
Sorry, I just can talk for windows not cross-platform but maybe it can bring you a step forward?
I would use a EnumFontFamiliesEx callback and check with a LOGFONT structure for one of the listed lfCharSet if system is capable to display somehow.
« Last Edit: Tomorrow at 31:76:97 by KodeZwerg »

tetrastes

  • Sr. Member
  • ****
  • Posts: 347
Re: Test availability of OS encodings, how?
« Reply #5 on: December 27, 2022, 09:14:38 pm »
You call SetCodePage in wrong sequence:
Code: Pascal  [Select][+][-]
  1. function StrUTF8ToEnc(const S: string; Enc: TSystemCodePage): string;
  2. ....
  3.   SetCodePage(buf, Enc, true);
  4.   SetCodePage(buf, CP_UTF8, false);
  5. ....

This code works at windows correctly, and is checked by MultiByteToWideChar also:
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}
  4.  
  5. uses windows;
  6.  
  7. var
  8.     a: AnsiString = 'qwerty';
  9.     s: RawByteString;
  10.     r, i: integer;
  11.     CP: word;
  12.  
  13. begin
  14.   i := 1;
  15.   repeat
  16.       case i of
  17.         1: CP := 51932;
  18.         2: CP := 51949;
  19.         3: CP := 51950;
  20.       end;
  21.       s := a;
  22.       SetCodePage(s, CP_UTF8, false);   // works without this also
  23.       writeln('CodePage: ', StringCodePage(s));
  24.       writeln('String: ', s);
  25.       SetCodePage(s, CP, true);
  26.       writeln('CodePage: ', StringCodePage(s));
  27.       writeln('String: ', s);
  28.  
  29.       r := MultiByteToWideChar(
  30.       CP, 0,
  31.       PAnsiChar(a), Length(a),
  32.       nil, 0);
  33.       writeln('Result of MultiByteToWideChar: ', r);
  34.       writeln;
  35.       Inc(i);
  36.   until i=4;
  37.  
  38.   readln;
  39. end.

Output is:
Code: Bash  [Select][+][-]
  1. CodePage: 65001
  2. String: qwerty
  3. CodePage: 1251
  4. String:
  5. Result of MultiByteToWideChar: 0
  6.  
  7. CodePage: 65001
  8. String: qwerty
  9. CodePage: 51949
  10. String: qwerty
  11. Result of MultiByteToWideChar: 6
  12.  
  13. CodePage: 65001
  14. String: qwerty
  15. CodePage: 1251
  16. String:
  17. Result of MultiByteToWideChar: 0
  18.  

and also corresponds to chcp command:
Code: Bash  [Select][+][-]
  1. C:\Users\root>chcp 51932
  2. Invalid code page
  3.  
  4. C:\Users\root>chcp 51949
  5. Active code page: 51949
  6.  
  7. C:\Users\root>chcp 51950
  8. Invalid code page

I suppose that it will work at other OSes, where SetCodePage is implemented (except of MultiByteToWideChar, naturally).
« Last Edit: December 27, 2022, 09:40:02 pm by tetrastes »

 

TinyPortal © 2005-2018