Forum > FPC development

UpperCase, ToUpper, LowerCase and ToLower bug or my misunderstanding?

(1/2) > >>

avra:
This program:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program testcase;{$H+} uses  {$IFDEF WINDOWS}windows,{$ENDIF} sysutils; var  PreviousValue: uint;  MyStr: string;begin  {$IFDEF WINDOWS}    PreviousValue := GetConsoleOutputCP;    DefaultSystemCodePage := CP_UTF8;    SetConsoleOutputCP(CP_UTF8);  {$ENDIF}   MyStr := 'Укључено';  WriteLn('PascalCase Жељко'.ToUpper);  WriteLn(MyStr.ToUpper);  WriteLn(UpperCase(MyStr));  WriteLn(AnsiUpperCase(MyStr));  WriteLn(UTF8String('Укључено').ToUpper);  WriteLn(UpperCase(UnicodeString(Utf8ToString('Укључено'))));  WriteLn(UpperCase(Utf8ToString('Укључено')));   {$IFDEF WINDOWS}    SetConsoleOutputCP(PreviousValue);  {$ENDIF}end. 
produces this output:

--- Quote ---m:\Lazarus\testing>testcase
PASCALCASE Жељко
Укључено
Укључено
Укључено
Укључено
Укључено
Укључено

--- End quote ---

As you can see, whatever I tried output is upper cased only for ASCII characters in a string. If 'Укључено' can be shown properly in output, then I think that upper cased  'УКЉУЧЕНО' should be shown on at least one output line above. Is this a bug or my misunderstanding?

Tested output on Win10 x64 console with trunk 32-bit FPC.

Kays:

--- Quote from: avra on July 15, 2021, 12:21:33 pm ---[…] As you can see, whatever I tried output is upper cased only for ASCII characters in a string. […]
--- End quote ---
And this is “correct” because only this functionality is implemented. The great thing about open source is you can investigate it it on your own: toUpper calls upperCase
--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function TStringHelper.ToUpper: string;begin  Result:=UpperCase(Self);end;which callls sysUtils.upperCase
--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---class function TStringHelper.UpperCase(const S: string): string;begin  Result:=sysutils.Uppercase(S);end;which only looks at the range 'a'..'z'. The description of upperCase even says so. It is understandable though that just looking at the documentation of toUpper you might expect all Unicode code points to be treated correctly. The behavior of ANSIUpperCase depends on the utilized wide string manager, so this may be different.

lucamar:
Untested, but try with UTF8UpperCase() or UpperCaseString() from LazUTF8 (LazUtils package); both should work (the last is the inined version of the former).

avra:
@Kays: I must admit that I didn't expect to see that FPC does not have any Upper derivate that goes beyond ASCII. Maybe there is a way to let OS handle it?


--- Quote from: lucamar on July 15, 2021, 01:00:22 pm ---Untested, but try with UTF8UpperCase() or UpperCaseString() from LazUTF8 (LazUtils package); both should work (the last is the inined version of the former).
--- End quote ---
Thanks, but these are not available from just RTL SysUtils and console where I need them at the moment.

The main reason why I consider it as a potential bug is this behaviour:

AnsiUpperCase() works in Lazarus (look at bithelpers demo from my signature):

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---Memo1.Append(AnsiUpperCase('Укључено')); // output is 'УКЉУЧЕНО' as it should be
but does not work in FPC:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---WriteLn(AnsiUpperCase('Укључено')); // output is 'Укључено' which is wrong since it's not upper cased
Strange, right?

ASerge:

--- Quote from: avra on July 15, 2021, 01:23:29 pm ---AnsiUpperCase() works in Lazarus (look at bithelpers demo from my signature):

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---Memo1.Append(AnsiUpperCase('Укључено')); // output is 'УКЉУЧЕНО' as it should be
but does not work in FPC:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---WriteLn(AnsiUpperCase('Укључено')); // output is 'Укључено' which is wrong since it's not upper cased
--- End quote ---
On Windows, SysUtils.AnsiUpperCase calls the native CharUpperBuffA, which only works with ANSI, but the string contains UTF8.
When you add LazUtils, it replaces the AnsiUpperCase call with UTF8UpperCase, which understands UTF8.

Navigation

[0] Message Index

[#] Next page

Go to full version