Recent

Author Topic: Constant codepage problems  (Read 13880 times)

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 844
Constant codepage problems
« on: March 24, 2017, 12:49:21 pm »
I have launcher program. All, it does - launches 32bit or 64bit version of my program, while keeping same current directory and command line parameters.

Code: Pascal  [Select][+][-]
  1. program Launcher;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses Windows, SysUtils;
  6.  
  7. {$R *.res}
  8.  
  9. type
  10.   TIsWOW64Process = function(hProcess:THandle;var IsWOW64:Boolean):Boolean;stdcall;
  11.  
  12. const
  13.   FileName = 'MyProgram.exe';
  14.   Dir32 = 'Bin32\';
  15.   Dir64 = 'Bin64\';
  16.  
  17. function Is64Process:Boolean;
  18.   var Kernel:HMODULE;IsWOW64Process:TIsWOW64Process;Temp:Boolean;
  19. begin
  20.   Result := false;
  21.   Kernel := LoadLibrary('kernel32');
  22.   if Kernel = 0 then Exit;
  23.   IsWOW64Process := TIsWOW64Process(GetProcAddress(Kernel, 'IsWow64Process'));
  24.   if not Assigned(IsWow64Process) then Exit;
  25.   IsWOW64Process(GetCurrentProcess, Temp);
  26.   Result := Temp;
  27.   FreeLibrary(Kernel);
  28. end;
  29.  
  30. var DirName, FilePath, Params:String;
  31.   I:Integer;StartupInfo:TStartupInfoW;
  32.   ProcessInfo:TProcessInformation;
  33.  
  34. begin
  35.   if Is64Process then begin
  36.     DirName := Dir64;
  37.   end
  38.   else begin
  39.     DirName := Dir32;
  40.   end;
  41.   FilePath := DirName + FileName;
  42.   Params := '"'+FilePath+'"';
  43.   for I := 1 to ParamCount do begin
  44.     Params := Params + ' ' + ParamStr(I);
  45.   end;
  46.   GetStartupInfoW(@StartupInfo);
  47.   if CreateProcessW(PWideChar(UTF8Decode(FilePath)),
  48.     PWideChar(UTF8Decode(Params)),
  49.     nil,
  50.     nil,
  51.     false,
  52.     CREATE_UNICODE_ENVIRONMENT,
  53.     nil,
  54.     PWideChar(UTF8Decode(GetCurrentDir)),
  55.     StartupInfo,
  56.     ProcessInfo) then begin
  57.       if ProcessInfo.hProcess <> 0 then begin
  58.         WaitForSingleObject(ProcessInfo.hProcess, INFINITE);
  59.       end;
  60.   end;
  61. end.
  62.  

Yeah, I know, that I can use pure WinApi - I've tried doing it, used GetCurrentDirectory and GetCommandLine, parsed command line, etc., but in this case anti-viruses go crazy and start to block my program as malware, so I decided to leave it as is. When it's compiled in Delphi - it works perfectly. But when compiled in Lazarus - it starts having constant codepage problems. I.e. if file name contains english characters only - then everything is fine. But if not - program fails to start.

For example following code returns what is shown on screenshots:
Code: Pascal  [Select][+][-]
  1.   MessageBoxW(0, PWideChar(GetCurrentDir), PWideChar('Debug'), 0);
  2.   MessageBoxW(0, PWideChar(UTF8Decode(GetCurrentDir)), PWideChar('Debug'), 0);
  3.  

And I can't get it working. Documentation suggests, that in order to add UTF-8 support to RTL, I should add LazUtils and LazUTF8 to "uses" of my program. And that after that I won't need to call UTF8Decode. I try to do it - but then compiler tells me, that it can't find these modules. I've also tried adding "-dEnableUTF8RTL", "-FcUTF8" switches to compiler and even tried calling SetMultiByteConversionCodePage(CP_UTF8) and SetMultiByteRTLFileSystemCodePage(CP_UTF8) directly - nothing helps.

How can I solve this problem? I'm ok with using UTF8 instead of Unicode (I guess, it's needed for portability purposes), but not when UTF8 causes so many problems.
« Last Edit: March 24, 2017, 12:58:08 pm by Mr.Madguy »
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

Thaddy

  • Hero Member
  • *****
  • Posts: 14157
  • Probably until I exterminate Putin.
Re: Constant codepage problems
« Reply #1 on: March 24, 2017, 02:16:32 pm »
Code: Pascal  [Select][+][-]
  1. {$mode delphiunicode}
That's needed because the string default then changes to unicodestring, which is not the case in {$mode objfpc}
Alternatively you can declare all current strings as unicodestring instead of string.

Your code does not rely on Lazarus libraries and it works as you expect with either of these changes.
« Last Edit: March 24, 2017, 02:19:59 pm by Thaddy »
Specialize a type, not a var.

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 844
Re: Constant codepage problems
« Reply #2 on: March 24, 2017, 02:37:37 pm »
Code: Pascal  [Select][+][-]
  1. {$mode delphiunicode}
That's needed because the string default then changes to unicodestring, which is not the case in {$mode objfpc}
Alternatively you can declare all current strings as unicodestring instead of string.

Your code does not rely on Lazarus libraries and it works as you expect with either of these changes.
Nothing has changed. Still hieroglyphs without UTF8Decode and ???????? with it. I had similar problem in another project (reading data from registry) and I solved it via using WinCPToUTF8/UTF8ToWinCP there, but, as I understand, these routines require LazUTF8 and I can't include it - compiler doesn't find it (I guess, cuz it's non-LCL application).

You should understand my situation. In Delphi one can have only 3 possible situations:
1) Ansi application on Ansi Windows (deprecated)
2) Ansi application on Unicode Windows (app should be codepage aware, deprecated)
3) Unicode application on Unicode Windows

In Lazarus:
1) Windows is Unicode
2) Strings are UTF-8
3) RTL routines and many libraries seem to still use Ansi strings, that can also be possibly UTF-8 ones (CP_UTF8, as I understand).

And constant conversions are needed between this 3 formats. Due to some reasons I can't just read Unicode string directly from registry even via standard TRegistry object. Ok. Lazarus uses UTF-8. But simple UTF-8 <-> Unicode conversion, that should be unambiguous, doesn't work either. How can I guess, that I need to use WinCPToUTF8 in this case? Does it actually Unicode-aware or it's still Ansi? All this conversions aren't transparent. I don't know, why FPC and Lazarus use UTF-8 instead of Unicode, but this choice causes way too many problems.
« Last Edit: March 24, 2017, 03:56:16 pm by Mr.Madguy »
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: Constant codepage problems
« Reply #3 on: March 24, 2017, 04:30:54 pm »
I'm ok with using UTF8 instead of Unicode...
Uhhh! UTF-8 is part of Unicode.
Thaddy's advice may work if you really do not depend on Lazarus libraries / packages anyhow.
Otherwise please use the default UTF-8 system implemented by Lazarus. See:
 http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus
There are only 2 things to remember:
 1. Assign a constant always to a type String variable.
 2. Use type UnicodeString explicitly for API calls that need it.
As an extra bonus your code is then compatible with Delphi at source code level although they use different encodings.

There are corner cases where it does not work, for example when a library code still uses an old ANSI codepage version of Windows API. The Wide-API must be used always.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 14157
  • Probably until I exterminate Putin.
Re: Constant codepage problems
« Reply #4 on: March 24, 2017, 04:51:27 pm »
I tested your code. You may have to start your console in UNICODE mode (cmd /U ). Windows uses UTF16 unicode, not UTF8. Therefor your code works in a unicode console if you change the mode to unicode.
You may also try chcp 65001 in the console.
Specialize a type, not a var.

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 844
Re: Constant codepage problems
« Reply #5 on: March 27, 2017, 09:27:33 am »
Uhhh! UTF-8 is part of Unicode.
Thaddy's advice may work if you really do not depend on Lazarus libraries / packages anyhow.
Otherwise please use the default UTF-8 system implemented by Lazarus. See:
 http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus
There are only 2 things to remember:
 1. Assign a constant always to a type String variable.
 2. Use type UnicodeString explicitly for API calls that need it.
As an extra bonus your code is then compatible with Delphi at source code level although they use different encodings.

There are corner cases where it does not work, for example when a library code still uses an old ANSI codepage version of Windows API. The Wide-API must be used always.
Yeah, and here is my problem. Windows uses UTF16. UTF8 can store exactly the same strings. Pro - it's more compact. Con - it's harder and slower to process. So, conversion from/to internal UTF8 format should be unambiguous - no codepages should be needed. But all of a sudden UTF8 -> UTF16 conversion gives me ?????????. What does it mean? Some codepages are involved? I use "W" versions of API. GetCurrentDir seem to be UTF8, as it returns hieroglyphs without UTF8Decode. May be it's not UTF8? May be it's AnsiString? But even if it's AnsiString - it should be codepage aware? Shouldn't it? Same problem happens with TRegistry, where strings have String type.

Definition of GetCurrentDir is following:
Code: Pascal  [Select][+][-]
  1. Function GetCurrentDir : {$ifdef FPC_UNICODE_RTL}UnicodeString{$else}AnsiString{$endif};
  2.  
And my Lazarus shows, that it's AnsiString.
I tested your code. You may have to start your console in UNICODE mode (cmd /U ). Windows uses UTF16 unicode, not UTF8. Therefor your code works in a unicode console if you change the mode to unicode.
You may also try chcp 65001 in the console.
My program isn't console application. It's windowed application without window.

Of course this code works perfectly:
Code: Pascal  [Select][+][-]
  1. program Launcher;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses Windows;
  6.  
  7. {$R *.res}
  8.  
  9. type
  10.   TIsWOW64Process = function(hProcess:THandle;var IsWOW64:Boolean):Boolean;stdcall;
  11.  
  12. const
  13.   FileName = 'MyProgram.exe';
  14.   Dir32 = 'Bin32\';
  15.   Dir64 = 'Bin64\';
  16.  
  17. function Is64Process:Boolean;
  18.   var Kernel:HMODULE;IsWOW64Process:TIsWOW64Process;Temp:Boolean;
  19. begin
  20.   Result := false;
  21.   Kernel := LoadLibrary('kernel32');
  22.   if Kernel = 0 then Exit;
  23.   IsWOW64Process := TIsWOW64Process(GetProcAddress(Kernel, 'IsWow64Process'));
  24.   if not Assigned(IsWow64Process) then Exit;
  25.   IsWOW64Process(GetCurrentProcess, Temp);
  26.   Result := Temp;
  27.   FreeLibrary(Kernel);
  28. end;
  29.  
  30. function GetCurrentDir:UnicodeString;
  31.   var Len:Integer;
  32. begin
  33.   Len := GetCurrentDirectoryW(0, nil);
  34.   SetLength(Result, Len - 1);
  35.   GetCurrentDirectoryW(Len, PWideChar(Result));
  36. end;
  37.  
  38. type
  39.   TParserState = (psFirstSimbol, psFileNameStart, psQoute, psFileNameEnd, psSpace);
  40.  
  41. function ParseCommandLine:UnicodeString;
  42.   var ParserState:TParserState;CmdLine:PWideChar;
  43. begin
  44.   ParserState := psFirstSimbol;
  45.   CmdLine := GetCommandLineW;
  46.   while CmdLine^ <> #0 do begin
  47.     case ParserState of
  48.       psFirstSimbol:begin
  49.         if CmdLine^ = ' ' then begin
  50.           Inc(CmdLine);
  51.         end
  52.         else begin
  53.           ParserState := psFileNameStart;
  54.         end;
  55.       end;
  56.       psFileNameStart:begin
  57.         if CmdLine^ = '"' then begin
  58.           ParserState := psQoute;
  59.         end
  60.         else begin
  61.           ParserState := psFileNameEnd;
  62.         end;
  63.         Inc(CmdLine);
  64.       end;
  65.       psQoute:begin
  66.         if CmdLine^ = '"' then begin
  67.           ParserState := psSpace;
  68.         end;
  69.         Inc(CmdLine);
  70.       end;
  71.       psFileNameEnd:begin
  72.         if CmdLine^ = ' ' then begin
  73.           ParserState := psSpace;
  74.         end;
  75.         Inc(CmdLine);
  76.       end;
  77.       psSpace:begin
  78.         if CmdLine^ <> ' ' then begin
  79.           Break;
  80.         end;
  81.         Inc(CmdLine);
  82.       end;
  83.     end;
  84.   end;
  85.   Result := CmdLine;
  86. end;
  87.  
  88. var DirName, FilePath:UnicodeString;
  89.   StartupInfo:TStartupInfoW;
  90.   ProcessInfo:TProcessInformation;
  91.  
  92. begin
  93.   if Is64Process then begin
  94.     DirName := Dir64;
  95.   end
  96.   else begin
  97.     DirName := Dir32;
  98.   end;
  99.   FilePath := DirName + FileName;
  100.   GetStartupInfoW(@StartupInfo);
  101.   if CreateProcessW(PWideChar(FilePath),
  102.     PWideChar('"' + FilePath + '" ' + ParseCommandLine),
  103.     nil,
  104.     nil,
  105.     false,
  106.     CREATE_UNICODE_ENVIRONMENT,
  107.     nil,
  108.     PWideChar(GetCurrentDir),
  109.     StartupInfo,
  110.     ProcessInfo) then begin
  111.       if ProcessInfo.hProcess <> 0 then begin
  112.         WaitForSingleObject(ProcessInfo.hProcess, INFINITE);
  113.       end;
  114.   end;
  115. end.
  116.  

But then... And unfortunately my organization uses Kaspersky Anti-Virus
« Last Edit: March 27, 2017, 02:55:53 pm by Mr.Madguy »
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: Constant codepage problems
« Reply #6 on: March 27, 2017, 10:21:44 am »
... But all of a sudden UTF8 -> UTF16 conversion gives me ?????????. What does it mean?
It means the String's dynamic encoding was wrong. The data was marked as UTF-8 but it was not.
Then you must convert it explicitly as early as possible:
 http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#Reading_.2F_writing_text_file_with_Windows_codepage
After that everything works thanks to the dynamic encoding and automatic conversion.

Quote
Some codepages are involved? I use "W" versions of API. GetCurrentDir seem to be UTF8, as it returns hieroglyphs without UTF8Decode. May be it's not UTF8? May be it's AnsiString?
Exactly. You should not need UTF8Decode or similar in any situation once the dynamic encoding is correct.

Quote
But even if it's AnsiString - it should be codepage aware? Shouldn't it? Same problem happens with TRegistry, where strings have String type.
It is codepage aware but the dynamic encoding does not match with the actual data.
TRegistry is explicitly mentioned in the wiki page as an open issue:
 http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#Open_issues
You may have to do some copy/paste programming from FPC's libs (or something).

Quote
Definition of GetCurrentDir is following:
Code: Pascal  [Select][+][-]
  1. Function GetCurrentDir : {$ifdef FPC_UNICODE_RTL}UnicodeString{$else}AnsiString{$endif};
And my Lazarus shows, that it's AnsiString.
No need for that.
Just define it as "String". Use UnicodeStrings as temporary variables for API calls that need them. Automatic conversion takes care of the rest.
Note: your code will be compatible with Delphi at source level then. Your current code with IFDEFs and AnsiStrings is not compatible.

BTW, the UTF8 <-> UTF16 conversions are surprisingly fast. You don't need to worry about them unless you have a huge amount of API calls.
« Last Edit: April 02, 2017, 12:49:18 am by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 844
Re: Constant codepage problems
« Reply #7 on: March 27, 2017, 10:45:46 am »
Strange, but I've found this code:
Code: Pascal  [Select][+][-]
  1. Procedure getdir(drivenr:byte;Var dir:rawbytestring);
  2. begin
  3.   Do_getdir(drivenr,dir);
  4.   { we should return results in the DefaultRTLFileSystemCodePage -> convert if
  5.     necessary }
  6.   setcodepage(dir,DefaultRTLFileSystemCodePage,true);
  7. end;
  8.  
And docs say, that DefaultRTLFileSystemCodePage can be set via SetMultiByteRTLFileSystemCodePage.

So I tried this code again:
Code: Pascal  [Select][+][-]
  1.   SetMultiByteConversionCodePage(CP_UTF8);
  2.   SetMultiByteRTLFileSystemCodePage(CP_UTF8);
  3.  

And all of a sudden it worked. Still with UTF8Decode, but worked.
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: Constant codepage problems
« Reply #8 on: March 27, 2017, 11:05:31 am »
And docs say, that DefaultRTLFileSystemCodePage can be set via SetMultiByteRTLFileSystemCodePage.
So I tried this code again:
Code: Pascal  [Select][+][-]
  1.   SetMultiByteConversionCodePage(CP_UTF8);
  2.   SetMultiByteRTLFileSystemCodePage(CP_UTF8);
And all of a sudden it worked. Still with UTF8Decode, but worked.
The "better unicode support" in Lazarus already does that. Did you read the wiki page at all? Apparently not.
You don't use LCL I guess. Then see:
 http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#Using_UTF-8_in_non_LCL_programs
It says: "add dependency for LazUtils package. Then add LazUTF8 unit in the uses section of main program file."
UTF8Decode is not needed. Just convert Ansi codepage data with WinCPToUTF8() of SetCodePage() when reading/writing from/to outside world.

... and please read the wiki page. That's why it was written. FPC's documentation does not help so much here because our "better unicode support" is a hack from FPC's point of view.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 844
Re: Constant codepage problems
« Reply #9 on: March 27, 2017, 11:42:32 am »
The "better unicode support" in Lazarus already does that. Did you read the wiki page at all? Apparently not.
You don't use LCL I guess. Then see:
 http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#Using_UTF-8_in_non_LCL_programs
It says: "add dependency for LazUtils package. Then add LazUTF8 unit in the uses section of main program file."
UTF8Decode is not needed. Just convert Ansi codepage data with WinCPToUTF8() of SetCodePage() when reading/writing from/to outside world.

... and please read the wiki page. That's why it was written. FPC's documentation does not help so much here because our "better unicode support" is a hack from FPC's point of view.
I've read it. And yeah "add dependency for LazUtils package. Then add LazUTF8 unit in the uses section of main program file." part confused me. Now I've figured out how to add dependency for LazUtils. Yeah, SetMultiByteConversionCodePage and SetMultiByteRTLFileSystemCodePage no longer needed. But UTF8Decode is still needed - even in DelphiUnicode mode. May be it's due to PWideChar() conversion?
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: Constant codepage problems
« Reply #10 on: April 02, 2017, 12:43:59 am »
But UTF8Decode is still needed - even in DelphiUnicode mode. May be it's due to PWideChar() conversion?
DelphiUnicode cannot be used with the Lazarus Unicode system. You can use either one or the other but not both.
UTF8Decode can be needed only if a string's encoding is wrong, ie. its data does not match with the dynamic encoding.
Without seeing the code I don't know why it happened.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 14157
  • Probably until I exterminate Putin.
Re: Constant codepage problems
« Reply #11 on: April 02, 2017, 02:34:19 pm »
The "better unicode in Lazarus" is just an advertisement. It actually breaks things (over and over again). I am getting used to it now, and I actually like what is done *within the context of Lazarus only!!!* - but boy what a mess. I had to revert to a more neutral editor - Geany on most platforms - to get code running. It should be clearly documented that Lazarus is an unsuitable editor for plain FPC or Delphi code, especially under Windows.
Quote
UTF8Decode can be needed only if a string's encoding is wrong
re-phrase that, plz.. It is only wrong within the context of Lazarus and its default UTF8 string type. Which makes me wonder who is wrong...With a choice for possibly the most memory inefficient unicode type. <Yeah,I am grumpy again... >:D >:D> but it works for Lazarus and is no mean feat. O:-)
« Last Edit: April 02, 2017, 02:48:10 pm by Thaddy »
Specialize a type, not a var.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: Constant codepage problems
« Reply #12 on: April 02, 2017, 04:27:26 pm »
The "better unicode in Lazarus" is just an advertisement. It actually breaks things (over and over again). I am getting used to it now, and I actually like what is done *within the context of Lazarus only!!!*
It is better than the old unicode support in LCL with all its UTF-8 specific functions. It does not claim to be better than unicode in Delphi or in other systems.
I think I added the word "better" to the new wiki page because first the improvements were getting documented in the old LCL unicode page and it was getting messy.
A new page was needed for this new system which is ... better. Yes.

Quote
- but boy what a mess. I had to revert to a more neutral editor - Geany on most platforms - to get code running. It should be clearly documented that Lazarus is an unsuitable editor for plain FPC or Delphi code, especially under Windows.
I didn't understand this one.
Most editors including Geany, Lazarus and Delphi on Windows store their text with UTF-8 encoding by default.
They also support other encodings when needed. Why would Lazarus be an unsuitable editor?

Quote
Quote
UTF8Decode can be needed only if a string's encoding is wrong
re-phrase that, plz.. It is only wrong within the context of Lazarus and its default UTF8 string type.
No, it is wrong compared to the dynamic codepage stored in an encoding aware String variable. I tried to clarify it with a sub-sentence:
  ..., ie. its data does not match with the dynamic encoding.
When the dynamic encoding matches with the actual data, all assignments will go right. There is an automatic conversion then which is nice and cool.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Constant codepage problems
« Reply #13 on: April 02, 2017, 05:54:19 pm »
The "better unicode in Lazarus" is just an advertisement. It actually breaks things (over and over again). I am getting used to it now, and I actually like what is done *within the context of Lazarus only!!!*
It is better than the old unicode support in LCL with all its UTF-8 specific functions. It does not claim to be better than unicode in Delphi or in other systems.
I think I added the word "better" to the new wiki page because first the improvements were getting documented in the old LCL unicode page and it was getting messy.
A new page was needed for this new system which is ... better. Yes.

Quote
- but boy what a mess. I had to revert to a more neutral editor - Geany on most platforms - to get code running. It should be clearly documented that Lazarus is an unsuitable editor for plain FPC or Delphi code, especially under Windows.
I didn't understand this one.
Most editors including Geany, Lazarus and Delphi on Windows store their text with UTF-8 encoding by default.
They also support other encodings when needed. Why would Lazarus be an unsuitable editor?

Quote
Quote
UTF8Decode can be needed only if a string's encoding is wrong
re-phrase that, plz.. It is only wrong within the context of Lazarus and its default UTF8 string type.
No, it is wrong compared to the dynamic codepage stored in an encoding aware String variable. I tried to clarify it with a sub-sentence:
  ..., ie. its data does not match with the dynamic encoding.
When the dynamic encoding matches with the actual data, all assignments will go right. There is an automatic conversion then which is nice and cool.
Sorry guys a quick off topic, Juha is the widestring still ole compatible in newer versions of lazarus?
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

Thaddy

  • Hero Member
  • *****
  • Posts: 14157
  • Probably until I exterminate Putin.
Re: Constant codepage problems
« Reply #14 on: April 02, 2017, 06:07:07 pm »
The "better unicode in Lazarus" is just an advertisement. It actually breaks things (over and over again). I am getting used to it now, and I actually like what is done *within the context of Lazarus only!!!*
It is better than the old unicode support in LCL with all its UTF-8 specific functions. It does not claim to be better than unicode in Delphi or in other systems.
I think I added the word "better" to the new wiki page because first the improvements were getting documented in the old LCL unicode page and it was getting messy.
A new page was needed for this new system which is ... better. Yes.

Quote
- but boy what a mess. I had to revert to a more neutral editor - Geany on most platforms - to get code running. It should be clearly documented that Lazarus is an unsuitable editor for plain FPC or Delphi code, especially under Windows.
I didn't understand this one.
Most editors including Geany, Lazarus and Delphi on Windows store their text with UTF-8 encoding by default.
They also support other encodings when needed. Why would Lazarus be an unsuitable editor?

Quote
Quote
UTF8Decode can be needed only if a string's encoding is wrong
re-phrase that, plz.. It is only wrong within the context of Lazarus and its default UTF8 string type.
No, it is wrong compared to the dynamic codepage stored in an encoding aware String variable. I tried to clarify it with a sub-sentence:
  ..., ie. its data does not match with the dynamic encoding.
When the dynamic encoding matches with the actual data, all assignments will go right. There is an automatic conversion then which is nice and cool.
Sorry guys a quick off topic, Juha is the widestring still ole compatible in newer versions of lazarus?
Ole strings are 16 bit Unicode by default. So you may run into trouble using UTF8 at some point.., mostly on Windows, though..... Juha just don't want to see those things. <sigh> Or lies about it, which is worse. On windows UTF8 is an in between solution that satisfies nobody, On linux it is great for most common distributions but not all.
It is still a good effort, but not for Windows, there it is a pain in the ass.
« Last Edit: April 02, 2017, 06:14:32 pm by Thaddy »
Specialize a type, not a var.

 

TinyPortal © 2005-2018