Recent

Author Topic: Questions of new Strings in FPC 3.0  (Read 9580 times)

Michl

  • Full Member
  • ***
  • Posts: 194
Questions of new Strings in FPC 3.0
« on: October 29, 2015, 10:21:52 am »
Hi,

I have read a lot about the new strings and understood most of it and the ideas behind it. I've build a lot of test cases for my self and test a lot of conversions of strings, but I still have some questions.

Shortstring: The code page of a shortstring is implicitly CP_ACP and hence will always be equal to the current value of DefaultSystemCodePage. So I make a test:
Code: Pascal  [Select]
  1. program project1;
  2.  
  3. //{$codepage cp1252}
  4. //{$mode ObjFPC}{$H+}
  5. //{$modeswitch systemcodepage}
  6.  
  7. const
  8.   StrCP1252 = #$80#$C4#$D6#$8C#$A5;
  9. // CP1252     €   Ä   Ö   Œ   ¥
  10. // CP437      Ç   ─   ╓   î   Ñ
  11. // 1. output  ?   Ä   Ö   O   ¥
  12. // 2. output  Ç   ─   Í   î   Ñ
  13.  
  14. var
  15.   s: String;
  16.  
  17. begin
  18.   writeln(DefaultSystemCodePage);
  19.   s := StrCP1252;
  20.   writeln(s);               // expected: €ÄÖŒ¥   get: ?ÄÖO¥
  21.   writeln(StrCP1252);       // expected: €ÄÖŒ¥   get: Ç─ÍîÑ
  22.   writeln(Char(#$80), Char(#$C4), Char(#$D6), Char(#$8C), Char(#$A5));
  23. end.  

The output is:
Quote
1252
?ÄÖO¥
Ç─ÍîÑ
Ç─ÍîÑ

My Codepage is 1252. The output of ShortStrings is more similar to the codepage 437. With the assigning of the ShortString to a String, I got nearly a CP1252 string.

It doesn´t matter if I define/not define:
Code: Pascal  [Select]
  1. //{$codepage cp1252}
  2. //{$mode ObjFPC}{$H+}
  3. //{$modeswitch systemcodepage}

I expected, if I define copepoints (Bytes > 127) of a CP1252 string, I got the characters shown here: https://en.wikipedia.org/wiki/Windows-1252

Why are in the output other characters then explained in the wiki (I got ? instead € and O instead Œ)?
Why is WriteLn(SomeShortString) not the same as WriteLn(SomeString)?

My system: Windows 7, 64bit, FPC 3.1.1 32bit r32092
Compileroptions: -MObjFPC -Scghi -O1 -g -gl -l -vewnhibq -Filib\i386-win32 -Fu. -FUlib\i386-win32
« Last Edit: October 29, 2015, 10:24:36 am by Michl »
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

Cyrax

  • Hero Member
  • *****
  • Posts: 758
Re: Questions of new Strings in FPC 3.0
« Reply #1 on: October 29, 2015, 01:46:02 pm »
...
Compileroptions: -MObjFPC -Scghi -O1 -g -gl -l -vewnhibq -Filib\i386-win32 -Fu. -FUlib\i386-win32

You are compiling your test project via Lazarus. That is why there is no difference if you undefine those compiler conditional settings. You should try compiling your test program via command line.

Michl

  • Full Member
  • ***
  • Posts: 194
Re: Questions of new Strings in FPC 3.0
« Reply #2 on: October 29, 2015, 02:46:43 pm »
You are right, I can't deactivate such FPC conditional settings. If I compile from command line the result of enabled/disabled {$mode ObjFPC}{$H+} is a other, cause in one case "s" is a ShortString on the other "s" is a String. Thank you for that hint.

But my questions are still here. Can anyone give me a hint for:
Why are in the output other characters then explained in the wiki (I got ? instead € and O instead Œ)?
Why is WriteLn(SomeShortString) not the same as WriteLn(SomeString) with {$mode ObjFPC}{$H+}?
« Last Edit: October 29, 2015, 02:53:20 pm by Michl »
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

Cyrax

  • Hero Member
  • *****
  • Posts: 758
Re: Questions of new Strings in FPC 3.0
« Reply #3 on: October 29, 2015, 03:05:45 pm »
WriteLn is compiler intrinsic subroutine which will be replaced during compile time to calls real subroutine. In case of WriteLn(SomeShortString) there will be type specific call and call to subroutine which will print EOL to standard output. You will see this yourself by examining disassembler output via  View->Debug Windows->Assembler in Lazarus.

Michl

  • Full Member
  • ***
  • Posts: 194
Re: Questions of new Strings in FPC 3.0
« Reply #4 on: October 29, 2015, 03:28:56 pm »
OK I understand.

I make a new test:
Code: Pascal  [Select]
  1. program project1;
  2.  
  3. //{$codepage cp1252}
  4. {$mode ObjFPC}{$H+}
  5.  
  6. var
  7.   s: String;
  8.   s2: ShortString;
  9.   f: file of byte;
  10.   ftext: TextFile;
  11.  
  12. begin
  13.   AssignFile(f, 'test.txt');
  14.   Rewrite(f);
  15.   write(f, $80);
  16.   write(f, $C4);
  17.   write(f, $D6);
  18.   write(f, $8C);
  19.   write(f, $A5);
  20.   CloseFile(f);
  21.  
  22.   AssignFile(ftext, 'test.txt');
  23.   Reset(ftext);
  24.   Read(ftext, s);
  25.   Reset(ftext);
  26.   Read(ftext, s2);
  27.   CloseFile(ftext);
  28.   writeln(s);   // expected: €ÄÖŒ¥   Console Output: ?ÄÖO¥
  29.   writeln(s2);  // expected: €ÄÖŒ¥   Console Output: Ç─ÍîÑ
  30.  
  31.   AssignFile(ftext, 'teststring.txt');
  32.   Rewrite(ftext);
  33.   write(ftext, s);
  34.   CloseFile(ftext);
  35.  
  36.   AssignFile(ftext, 'testshortstring.txt');
  37.   Rewrite(ftext);
  38.   write(ftext, s2);
  39.   CloseFile(ftext);
  40. end.

If I now inspect the three files with Windows Notepad. They all have the same and correct content (€ÄÖŒ¥). Now it is seems so, that there is no difference between Write(SomeShortString) and Write(SomeString) also for WriteLn.

If I compile the project "project1.exe > testconsole.txt" there are the wrong chars in that file.

So only the Console output is wrong (the output for the ShortString and/or the String). Should I report that to the bugtracker or is it a known problem?

Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

Cyrax

  • Hero Member
  • *****
  • Posts: 758
Re: Questions of new Strings in FPC 3.0
« Reply #5 on: October 29, 2015, 03:44:19 pm »
You might find attached source file educating in this matter. It will demonstrate how to change codepage of standard output in console application (also its font) so characters will show correctly.

Unfortunately it is Windows only.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3652
  • I like bugs.
Re: Questions of new Strings in FPC 3.0
« Reply #6 on: October 29, 2015, 04:22:19 pm »
You might find attached source file educating in this matter. It will demonstrate how to change codepage of standard output in console application (also its font) so characters will show correctly.

I guess Michl wanted to find a solution using system codepage, without using UTF-8.

Michl

  • Full Member
  • ***
  • Posts: 194
Re: Questions of new Strings in FPC 3.0
« Reply #7 on: October 29, 2015, 04:44:24 pm »
You might find attached source file educating in this matter. It will demonstrate how to change codepage of standard output in console application (also its font) so characters will show correctly.

Unfortunately it is Windows only.
Thank you very much - it is realy interesting!!! With this unit, the output for Strings works now, but it is a UTF8 string, not a CP1252. Maybe the answer near, I'll play with it a little bit ;)

I guess Michl wanted to find a solution using system codepage, without using UTF-8.
You are right. A solution is, to write all data into a file, the bytes there I can trust, the shown output in the console not.

And hey, that is just the first question, and I have created a lot of test cases ::)
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

Cyrax

  • Hero Member
  • *****
  • Posts: 758
Re: Questions of new Strings in FPC 3.0
« Reply #8 on: October 29, 2015, 04:45:21 pm »
System code page is different than console code page. That is why Michl demo project fails.

If I execute command chcp in command prompt, it says that Active code page: 850.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7592
Re: Questions of new Strings in FPC 3.0
« Reply #9 on: October 29, 2015, 04:51:23 pm »
Console is in OEMSTRING:

Code: Pascal  [Select]
  1. type
  2.   OemString = type AnsiString(CP_OEMCP);

if not already predefined.

Cyrax

  • Hero Member
  • *****
  • Posts: 758
Re: Questions of new Strings in FPC 3.0
« Reply #10 on: October 29, 2015, 04:59:00 pm »
Check out attached demo project. It should shine some light for this matter.

Cyrax

  • Hero Member
  • *****
  • Posts: 758
Re: Questions of new Strings in FPC 3.0
« Reply #11 on: October 29, 2015, 06:07:10 pm »
And I think I managed to solve Michl's problem. Downside is that you need to change console font every time when you execute your console program and maybe restore to default font.

See attached project for more info.

Michl

  • Full Member
  • ***
  • Posts: 194
Re: Questions of new Strings in FPC 3.0
« Reply #12 on: October 29, 2015, 06:08:13 pm »
Check out attached demo project. It should shine some light for this matter.
You make the goal! The shown codepage in the console is cp850 for the files cp1252. 

If I change the project to something like
Code: Pascal  [Select]
  1. program project1;
  2.  
  3. {$codepage cp1252}
  4. {$mode ObjFPC}{$H+}
  5.  
  6. uses windows;
  7.  
  8. var
  9.   s: String;
  10.   s2: ShortString;
  11.  
  12. begin
  13.   System.SetTextCodePage(Output, 1252);
  14.   s := #$80#$C4#$D6#$8C#$A5;
  15.   s2 := #$80#$C4#$D6#$8C#$A5;
  16.   writeln(s);   // get: Ç─ÍîÑ
  17.   writeln(s2);  // get: Ç─ÍîÑ
  18. end.

I now got the same result for both, ShortString and String. It is a string represented by chars of codepage 850.

Quite confusing. I must first digest it.



Console is in OEMSTRING:

Code: Pascal  [Select]
  1. type
  2.   OemString = type AnsiString(CP_OEMCP);

if not already predefined.
A direct assigning of a constant to that new defined string doesn't work, only from a other string. But this is a new topic.


For me it was important to understand, if the statement "Shortstring: The code page of a shortstring is implicitly CP_ACP and hence will always be equal to the current value of DefaultSystemCodePage." is correct.
For files it is.
For the console too, but you can't get the displayed codepage with DefaultSystemCodePage, but with System.GetTextCodePage(Output).


Thank you all! For me this topic is solved!

I think, I'll create a new thread for a new topic.
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

Michl

  • Full Member
  • ***
  • Posts: 194
Re: Questions of new Strings in FPC 3.0
« Reply #13 on: October 29, 2015, 06:23:34 pm »
And I think I managed to solve Michl's problem. Downside is that you need to change console font every time when you execute your console program and maybe restore to default font.

See attached project for more info.
Very nice!!! The name of the unit isn't correct any more ;)

Maybe it is good for a feature request, that the codepage for the console is the same as for files, the DefaultSystemCodePage?

Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

Michl

  • Full Member
  • ***
  • Posts: 194
Re: Questions of new Strings in FPC 3.0
« Reply #14 on: October 30, 2015, 09:30:27 am »
A new day, new tests.

Your project compiles fine on command line. If I compile your project with Lazarus I got a Runerror 103 on line "  WriteLn(StrCP1252);    ". I understand also why, cause your project file is saved with CP1252 - the DefaultSystemCodePage.

If I create an new program with Lazarus and copy the content of your project file in the new project file and compile it, it works but now the String "  WriteLn('€ÄÖŒ¥'); " is a UTF8-String. If I compile the new created project on command line there is the same, cause Lazarus saves the project file as a UTF8-File.

So here we have the issue, that with Lazarus it is never possible to build clean CP_APC projects, cause Lazarus saves the project file as UTF8, so the compiler interprets all the time the saved strings as UTF8. It is also not possible to compile such a UTF8 project from command line, the result is the same. I'm right???
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;