Recent

Author Topic: UpperCase, ToUpper, LowerCase and ToLower bug or my misunderstanding?  (Read 6129 times)

avra

  • Hero Member
  • *****
  • Posts: 2514
    • Additional info
This program:
Code: Pascal  [Select][+][-]
  1. program testcase;
  2. {$H+}
  3.  
  4. uses
  5.   {$IFDEF WINDOWS}windows,{$ENDIF} sysutils;
  6.  
  7. var
  8.   PreviousValue: uint;
  9.   MyStr: string;
  10. begin
  11.   {$IFDEF WINDOWS}
  12.     PreviousValue := GetConsoleOutputCP;
  13.     DefaultSystemCodePage := CP_UTF8;
  14.     SetConsoleOutputCP(CP_UTF8);
  15.   {$ENDIF}
  16.  
  17.   MyStr := 'Укључено';
  18.   WriteLn('PascalCase Жељко'.ToUpper);
  19.   WriteLn(MyStr.ToUpper);
  20.   WriteLn(UpperCase(MyStr));
  21.   WriteLn(AnsiUpperCase(MyStr));
  22.   WriteLn(UTF8String('Укључено').ToUpper);
  23.   WriteLn(UpperCase(UnicodeString(Utf8ToString('Укључено'))));
  24.   WriteLn(UpperCase(Utf8ToString('Укључено')));
  25.  
  26.   {$IFDEF WINDOWS}
  27.     SetConsoleOutputCP(PreviousValue);
  28.   {$ENDIF}
  29. end.
  30.  

produces this output:
Quote
m:\Lazarus\testing>testcase
PASCALCASE Жељко
Укључено
Укључено
Укључено
Укључено
Укључено
Укључено

As you can see, whatever I tried output is upper cased only for ASCII characters in a string. If 'Укључено' can be shown properly in output, then I think that upper cased  'УКЉУЧЕНО' should be shown on at least one output line above. Is this a bug or my misunderstanding?

Tested output on Win10 x64 console with trunk 32-bit FPC.
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

Kays

  • Hero Member
  • *****
  • Posts: 576
  • Whasup!?
    • KaiBurghardt.de
[…] As you can see, whatever I tried output is upper cased only for ASCII characters in a string. […]
And this is “correct” because only this functionality is implemented. The great thing about open source is you can investigate it it on your own: toUpper calls upperCase
Code: Pascal  [Select][+][-]
  1. function TStringHelper.ToUpper: string;
  2. begin
  3.   Result:=UpperCase(Self);
  4. end;
which callls sysUtils.upperCase
Code: Pascal  [Select][+][-]
  1. class function TStringHelper.UpperCase(const S: string): string;
  2. begin
  3.   Result:=sysutils.Uppercase(S);
  4. end;
which only looks at the range 'a'..'z'. The description of upperCase even says so. It is understandable though that just looking at the documentation of toUpper you might expect all Unicode code points to be treated correctly. The behavior of ANSIUpperCase depends on the utilized wide string manager, so this may be different.
« Last Edit: July 15, 2021, 12:48:31 pm by Kays »
Yours Sincerely
Kai Burghardt

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Untested, but try with UTF8UpperCase() or UpperCaseString() from LazUTF8 (LazUtils package); both should work (the last is the inined version of the former).
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

avra

  • Hero Member
  • *****
  • Posts: 2514
    • Additional info
@Kays: I must admit that I didn't expect to see that FPC does not have any Upper derivate that goes beyond ASCII. Maybe there is a way to let OS handle it?

Untested, but try with UTF8UpperCase() or UpperCaseString() from LazUTF8 (LazUtils package); both should work (the last is the inined version of the former).
Thanks, but these are not available from just RTL SysUtils and console where I need them at the moment.

The main reason why I consider it as a potential bug is this behaviour:

AnsiUpperCase() works in Lazarus (look at bithelpers demo from my signature):
Code: Pascal  [Select][+][-]
  1. Memo1.Append(AnsiUpperCase('Укључено')); // output is 'УКЉУЧЕНО' as it should be

but does not work in FPC:
Code: Pascal  [Select][+][-]
  1. WriteLn(AnsiUpperCase('Укључено')); // output is 'Укључено' which is wrong since it's not upper cased

Strange, right?
« Last Edit: July 15, 2021, 02:04:43 pm by avra »
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

ASerge

  • Hero Member
  • *****
  • Posts: 2249
AnsiUpperCase() works in Lazarus (look at bithelpers demo from my signature):
Code: Pascal  [Select][+][-]
  1. Memo1.Append(AnsiUpperCase('Укључено')); // output is 'УКЉУЧЕНО' as it should be

but does not work in FPC:
Code: Pascal  [Select][+][-]
  1. WriteLn(AnsiUpperCase('Укључено')); // output is 'Укључено' which is wrong since it's not upper cased
On Windows, SysUtils.AnsiUpperCase calls the native CharUpperBuffA, which only works with ANSI, but the string contains UTF8.
When you add LazUtils, it replaces the AnsiUpperCase call with UTF8UpperCase, which understands UTF8.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11458
  • FPC developer.
Note in the icon/manifest project section there is a tick to use the newer Windows 10 utf8 default option

Kays

  • Hero Member
  • *****
  • Posts: 576
  • Whasup!?
    • KaiBurghardt.de
@Kays: I must admit that I didn't expect to see that FPC does not have any Upper derivate that goes beyond ASCII. Maybe there is a way to let OS handle it?
OS? No, that ain’t no task of an OS.
[…] Strange, right?
No, because, as mentioned above, the behavior of ANSIUpperCase depends on the utilized wide string manager. The following works on a Linux system:
Code: Pascal  [Select][+][-]
  1. program cyrillicUpcase(input, output, stdErr);
  2. uses
  3.         {$ifDef UNIX}
  4.                 CWString,
  5.         {$endIf} sysUtils;
  6. begin
  7.         writeLn(ANSIUpperCase('Укључено'));
  8. end.
« Last Edit: July 15, 2021, 05:44:19 pm by Kays »
Yours Sincerely
Kai Burghardt

avra

  • Hero Member
  • *****
  • Posts: 2514
    • Additional info
When you add LazUtils, it replaces the AnsiUpperCase call with UTF8UpperCase, which understands UTF8.
I did not use LazUtils anywhere in bithelpers demo, but maybe some of the used units uses it. Thanks for the info!

Note in the icon/manifest project section there is a tick to use the newer Windows 10 utf8 default option
I was not aware of that. Thanks! That can not be applied to console applications, right?

@Kays: I must admit that I didn't expect to see that FPC does not have any Upper derivate that goes beyond ASCII. Maybe there is a way to let OS handle it?
OS? No, that ain’t no task of an OS.
So what are functions like CharUpperBuffW() or LCMapStringA() doing in Windows API?

behavior of ANSIUpperCase depends on the utilized wide string manager. The following works on a Linux system
Thanks! It's more clear now.

So, it's not a bug after all...
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

ttomas

  • Full Member
  • ***
  • Posts: 245
@avra, look at LazUtils as FPC library, not as LCL dependency. Maybe is good to move LazUtils to FPC or I'm wrong?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11458
  • FPC developer.
Note in the icon/manifest project section there is a tick to use the newer Windows 10 utf8 default option
I was not aware of that. Thanks! That can not be applied to console applications, right?
[/quote]

Afaik it can.

 

TinyPortal © 2005-2018