Recent

Author Topic: Accented characters in concatenated strings  (Read 3770 times)

RolandC

  • Guest
Accented characters in concatenated strings
« on: July 19, 2016, 08:58:40 pm »
Hello!

I try to display accented characters in a Windows console program. When the string is concatenated to another, the accented characters isn't displayed correctly. Why ?

Here is my code. I compiled it with FPC 3.0.0.

Code: Pascal  [Select]
  1. begin
  2.   WriteLn('r'#130'pertoire');
  3.   WriteLn('r'#130'pertoir' + 'e');
  4. end.

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: Accented characters in concatenated strings
« Reply #1 on: July 19, 2016, 09:10:54 pm »
Code: Pascal  [Select]
  1. {$codepage UTF8}
  2.  
  3. begin
  4.   WriteLn('répertoire');
  5.   WriteLn('répertoir' + 'e');
  6. end.
  7.  

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7236
Re: Accented characters in concatenated strings
« Reply #2 on: July 19, 2016, 09:11:23 pm »
Prints two times the same here. (fpc 3.0 on the win32 console)

possible reasons:
- settings relating to sourcecode encoding
- some lazarus feature that changes the default encoding.

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: Accented characters in concatenated strings
« Reply #3 on: July 19, 2016, 10:01:03 pm »
The first one is written using fpc_write_text_shortstr, no codepage conversion is involved. You'll see the correct letter if your console output codepage has é for #130.

The second one is written using fpc_Write_Text_AnsiStr, a codepage conversion happens from DefaultSystemCodePage to TextRec(Output).CodePage:
Code: Pascal  [Select]
  1.   WriteLn('DefaultSystemCodePage: ', DefaultSystemCodePage);
  2.   WriteLn('TextRec(Output).CodePage: ', TextRec(Output).CodePage);

This conversion could corrupt the letter based on the two codepages.

RolandC

  • Guest
Re: Accented characters in concatenated strings
« Reply #4 on: July 19, 2016, 10:12:13 pm »
@engkin, marcov

Thank you for your answers.

Indeed, with {$codepage UTF8} the result is correct. (The source code encoding is UTF-8 without BOM.)

But I get a warning: Implicit string type conversion from "AnsiString" to "UnicodeString". It isn't really a problem (since the result is correct), but I wonder what I should write to avoid the warning.

Here is my full code:

Code: Pascal  [Select]
  1. program touchdirectory;
  2. {$codepage UTF8}
  3.  
  4. uses
  5.   SysUtils, DateUtils, Process;
  6.  
  7. const
  8.   TOUCH = 'C:\BCC101\bin\touch.exe';
  9. // https://www.embarcadero.com/fr/free-tools/ccompiler
  10.  
  11. var
  12.   year, month, day, hour, minute, second, millisecond: word;
  13.   stamp, path: string;
  14.  
  15. begin
  16.   if (ParamCount = 1) and DirectoryExists(ParamStr(1)) then
  17.   begin
  18.     path := ParamStr(1) + '\';
  19.     WriteLn('Traitement du répertoire "' + path + '".');
  20.   end else
  21.   begin
  22.     path := '';
  23.     WriteLn('Traitement du répertoire courant.');
  24.   end;
  25.  
  26.   DecodeDateTime(Now(), year, month, day, hour, minute, second, millisecond);
  27.   stamp := Format('%0.2d%0.2d%0.2d%0.2d%0.2d', [month, day, hour, 0, year mod 100]);
  28.  
  29.   with TProcess.Create(nil) do
  30.   begin
  31.     Executable := TOUCH;
  32.     Parameters.Add('-d' + stamp);
  33.     Parameters.Add('-D');
  34.     Parameters.Add('-s');
  35.     Parameters.Add('-v');
  36.     Parameters.Add(path + '*.*');
  37.     Options := Options + [poWaitOnExit];
  38.     Execute;
  39.     Free;
  40.   end;
  41.  
  42.   Write('Appuyez sur Entrée pour continuer... ');
  43.   ReadLn;
  44. end.
  45.  

RolandC

  • Guest
Re: Accented characters in concatenated strings
« Reply #5 on: July 19, 2016, 10:16:17 pm »
The first one is written using fpc_write_text_shortstr, no codepage conversion is involved. You'll see the correct letter if your console output codepage has é for #130.

The second one is written using fpc_Write_Text_AnsiStr, a codepage conversion happens from DefaultSystemCodePage to TextRec(Output).CodePage:
Code: Pascal  [Select]
  1.   WriteLn('DefaultSystemCodePage: ', DefaultSystemCodePage);
  2.   WriteLn('TextRec(Output).CodePage: ', TextRec(Output).CodePage);

This conversion could corrupt the letter based on the two codepages.

Thank you for the explanations and for the code.

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: Accented characters in concatenated strings
« Reply #6 on: July 19, 2016, 10:32:43 pm »
But I get a warning: Implicit string type conversion from "AnsiString" to "UnicodeString". It isn't really a problem (since the result is correct), but I wonder what I should write to avoid the warning.

Assuming you are using Lazarus, I know of two possibilities:

1-Hide the warning in this specific instance, using Lazarus IDE directive {%H-} at the location the warning is generated. In this case exactly to the left side of the variable path without leaving a space:
Code: Pascal  [Select]
  1.     WriteLn('Traitement du répertoire "' + {%H-}path + '".');

2-Hide all instances by passing -vm4104 to the compiler in the project options.

You'll see both if you right-click on that warning.

RolandC

  • Guest
Re: Accented characters in concatenated strings
« Reply #7 on: July 19, 2016, 10:45:34 pm »
@engkin

I learn many things today! Thank you.  :)
 

Tomas Hajny

  • Moderator
  • New member
  • *****
  • Posts: 35
Re: Accented characters in concatenated strings
« Reply #8 on: August 04, 2016, 10:04:02 am »
Note that the warning about implicit conversions may be avoided also by either using an explicit conversion (like: Ansistring ('répertoire')), or by not mixing UnicodeString (your constant) and ansistring (your variables). Both are probably better solutions than just hiding the warning.