Lazarus

Free Pascal => General => Topic started by: Roland57 on July 19, 2016, 08:58:40 pm

Title: Accented characters in concatenated strings
Post by: Roland57 on July 19, 2016, 08:58:40 pm
Hello!

I try to display accented characters in a Windows console program. When the string is concatenated to another, the accented characters isn't displayed correctly. Why ?

Here is my code. I compiled it with FPC 3.0.0.

Code: Pascal  [Select][+][-]
  1. begin
  2.   WriteLn('r'#130'pertoire');
  3.   WriteLn('r'#130'pertoir' + 'e');
  4. end.
Title: Re: Accented characters in concatenated strings
Post by: engkin on July 19, 2016, 09:10:54 pm
Code: Pascal  [Select][+][-]
  1. {$codepage UTF8}
  2.  
  3. begin
  4.   WriteLn('répertoire');
  5.   WriteLn('répertoir' + 'e');
  6. end.
  7.  
Title: Re: Accented characters in concatenated strings
Post by: marcov on July 19, 2016, 09:11:23 pm
Prints two times the same here. (fpc 3.0 on the win32 console)

possible reasons:
- settings relating to sourcecode encoding
- some lazarus feature that changes the default encoding.
Title: Re: Accented characters in concatenated strings
Post by: engkin on July 19, 2016, 10:01:03 pm
The first one is written using fpc_write_text_shortstr, no codepage conversion is involved. You'll see the correct letter if your console output codepage has é for #130.

The second one is written using fpc_Write_Text_AnsiStr, a codepage conversion happens from DefaultSystemCodePage to TextRec(Output).CodePage:
Code: Pascal  [Select][+][-]
  1.   WriteLn('DefaultSystemCodePage: ', DefaultSystemCodePage);
  2.   WriteLn('TextRec(Output).CodePage: ', TextRec(Output).CodePage);

This conversion could corrupt the letter based on the two codepages.
Title: Re: Accented characters in concatenated strings
Post by: Roland57 on July 19, 2016, 10:12:13 pm
@engkin, marcov

Thank you for your answers.

Indeed, with {$codepage UTF8} the result is correct. (The source code encoding is UTF-8 without BOM.)

But I get a warning: Implicit string type conversion from "AnsiString" to "UnicodeString". It isn't really a problem (since the result is correct), but I wonder what I should write to avoid the warning.

Here is my full code:

Code: Pascal  [Select][+][-]
  1. program touchdirectory;
  2. {$codepage UTF8}
  3.  
  4. uses
  5.   SysUtils, DateUtils, Process;
  6.  
  7. const
  8.   TOUCH = 'C:\BCC101\bin\touch.exe';
  9. // https://www.embarcadero.com/fr/free-tools/ccompiler
  10.  
  11. var
  12.   year, month, day, hour, minute, second, millisecond: word;
  13.   stamp, path: string;
  14.  
  15. begin
  16.   if (ParamCount = 1) and DirectoryExists(ParamStr(1)) then
  17.   begin
  18.     path := ParamStr(1) + '\';
  19.     WriteLn('Traitement du répertoire "' + path + '".');
  20.   end else
  21.   begin
  22.     path := '';
  23.     WriteLn('Traitement du répertoire courant.');
  24.   end;
  25.  
  26.   DecodeDateTime(Now(), year, month, day, hour, minute, second, millisecond);
  27.   stamp := Format('%0.2d%0.2d%0.2d%0.2d%0.2d', [month, day, hour, 0, year mod 100]);
  28.  
  29.   with TProcess.Create(nil) do
  30.   begin
  31.     Executable := TOUCH;
  32.     Parameters.Add('-d' + stamp);
  33.     Parameters.Add('-D');
  34.     Parameters.Add('-s');
  35.     Parameters.Add('-v');
  36.     Parameters.Add(path + '*.*');
  37.     Options := Options + [poWaitOnExit];
  38.     Execute;
  39.     Free;
  40.   end;
  41.  
  42.   Write('Appuyez sur Entrée pour continuer... ');
  43.   ReadLn;
  44. end.
  45.  
Title: Re: Accented characters in concatenated strings
Post by: Roland57 on July 19, 2016, 10:16:17 pm
The first one is written using fpc_write_text_shortstr, no codepage conversion is involved. You'll see the correct letter if your console output codepage has é for #130.

The second one is written using fpc_Write_Text_AnsiStr, a codepage conversion happens from DefaultSystemCodePage to TextRec(Output).CodePage:
Code: Pascal  [Select][+][-]
  1.   WriteLn('DefaultSystemCodePage: ', DefaultSystemCodePage);
  2.   WriteLn('TextRec(Output).CodePage: ', TextRec(Output).CodePage);

This conversion could corrupt the letter based on the two codepages.

Thank you for the explanations and for the code.
Title: Re: Accented characters in concatenated strings
Post by: engkin on July 19, 2016, 10:32:43 pm
But I get a warning: Implicit string type conversion from "AnsiString" to "UnicodeString". It isn't really a problem (since the result is correct), but I wonder what I should write to avoid the warning.

Assuming you are using Lazarus, I know of two possibilities:

1-Hide the warning in this specific instance, using Lazarus IDE directive {%H-} at the location the warning is generated. In this case exactly to the left side of the variable path without leaving a space:
Code: Pascal  [Select][+][-]
  1.     WriteLn('Traitement du répertoire "' + {%H-}path + '".');

2-Hide all instances by passing -vm4104 to the compiler in the project options.

You'll see both if you right-click on that warning.
Title: Re: Accented characters in concatenated strings
Post by: Roland57 on July 19, 2016, 10:45:34 pm
@engkin

I learn many things today! Thank you.  :)
 
Title: Re: Accented characters in concatenated strings
Post by: Tomas Hajny on August 04, 2016, 10:04:02 am
Note that the warning about implicit conversions may be avoided also by either using an explicit conversion (like: Ansistring ('répertoire')), or by not mixing UnicodeString (your constant) and ansistring (your variables). Both are probably better solutions than just hiding the warning.
TinyPortal © 2005-2018