Recent

Author Topic: [Windows] UTF8 encoding with ReadLn  (Read 1493 times)

bstewart

  • New Member
  • *
  • Posts: 14
[Windows] UTF8 encoding with ReadLn
« on: January 20, 2026, 04:41:22 pm »
Consider this simple source:

Code: Pascal  [Select][+][-]
  1. program test;
  2.  
  3. {$CODEPAGE UTF8}
  4. {$MODE OBJFPC}
  5. {$MODESWITCH UNICODESTRINGS}
  6.  
  7. uses
  8.   Windows;
  9.  
  10. var
  11.   S: string;
  12.  
  13. begin
  14.   Write('Enter δείγμα: ');
  15.   ReadLn(S);
  16.   WriteLn('S := ' + S);
  17. end.
  18.  

I run this in a cmd.exe session and enter the command chcp 65001 to set the console input and output encoding to UTF8, run it, and enter a UTF8 string. Output:

Code: [Select]
C:\Users\username\FPC\source\test>chcp 65001
Active code page: 65001

C:\Users\username\FPC\source\test>fpc test.pp
Free Pascal Compiler version 3.2.2 [2021/05/15] for i386
Copyright (c) 1993-2021 by Florian Klaempfl and others
Target OS: Win32 for i386
Compiling test.pp
Linking test.exe
17 lines compiled, 0.1 sec, 30480 bytes code, 1348 bytes data

C:\Users\username\FPC\source\test>.\test
Enter δείγμα: δείγμα
S := de??µa

For the ReadLn call, I copy and paste the string δείγμα and hit Enter.

I as expecting the WriteLn call to output the UnicodeString as I entered it.

What would be needed to get ReadLn to read a UTF8 string properly?
« Last Edit: January 20, 2026, 04:51:24 pm by bstewart »

andersonscinfo

  • Full Member
  • ***
  • Posts: 156
Re: [Windows] UTF8 encoding with ReadLn
« Reply #1 on: January 20, 2026, 05:05:14 pm »
The issue you're experiencing is related to how Free Pascal handles console I/O on Windows when dealing with UTF-8 encoded input. Even though you've set the codepage to UTF-8 (65001) and enabled Unicode strings in FPC, the standard ReadLn function doesn't properly handle UTF-8 input from the Windows console.

Here's the solution:

Code: Pascal  [Select][+][-]
  1. program test;
  2. {$CODEPAGE UTF8}
  3. {$MODE OBJFPC}
  4. {$MODESWITCH UNICODESTRINGS}
  5.  
  6. uses
  7.   Windows;
  8.  
  9. function ConsoleReadLn: String;
  10. var
  11.   hConsole: THandle;
  12.   Buffer: packed array[0..1023] of WideChar;
  13.   NumRead: DWORD;
  14.   UTF8Str: UTF8String;
  15. begin
  16.   hConsole := GetStdHandle(STD_INPUT_HANDLE);
  17.   NumRead := 0;
  18.  
  19.   // Read wide characters from console
  20.   ReadConsoleW(hConsole, @Buffer[0], SizeOf(Buffer) div SizeOf(WideChar) - 1, NumRead, nil);
  21.  
  22.   // Null terminate
  23.   Buffer[NumRead] := #0;
  24.  
  25.   // Convert WideString to UTF8String
  26.   SetLength(UTF8Str, WideCharLenToString(nil, @Buffer[0], NumRead));
  27.   WideCharToMultiByte(CP_UTF8, 0, @Buffer[0], NumRead, PAnsiChar(UTF8Str), Length(UTF8Str), nil, nil);
  28.  
  29.   Result := String(UTF8Str);
  30.  
  31.   // Remove trailing newline if present
  32.   if (Length(Result) > 0) and (Result[Length(Result)] = #10) then
  33.     Delete(Result, Length(Result), 1);
  34.   if (Length(Result) > 0) and (Result[Length(Result)] = #13) then
  35.     Delete(Result, Length(Result), 1);
  36. end;
  37.  
  38. var
  39.   S: string;
  40. begin
  41.   Write('Enter δείγμα: ');
  42.   S := ConsoleReadLn;
  43.   WriteLn('S := ' + S);
  44. end.
  45.  

Alternatively, you can use a simpler approach by using the Windows API directly:

Code: Pascal  [Select][+][-]
  1. program test;
  2. {$CODEPAGE UTF8}
  3. {$MODE OBJFPC}
  4. {$MODESWITCH UNICODESTRINGS}
  5.  
  6. uses
  7.   Windows, SysUtils;
  8.  
  9. function ReadUTF8Line: String;
  10. var
  11.   hInput: THandle;
  12.   Buffer: array[0..1023] of WideChar;
  13.   NumRead: DWORD;
  14.   UTF8Str: UTF8String;
  15. begin
  16.   hInput := GetStdHandle(STD_INPUT_HANDLE);
  17.   NumRead := 0;
  18.  
  19.   ReadConsoleW(hInput, @Buffer[0], SizeOf(Buffer) div SizeOf(WideChar) - 1, NumRead, nil);
  20.  
  21.   if NumRead > 0 then
  22.   begin
  23.     Buffer[NumRead] := #0;
  24.     SetLength(UTF8Str, WideCharLenToString(nil, @Buffer[0], NumRead));
  25.     WideCharToMultiByte(CP_UTF8, 0, @Buffer[0], NumRead, PAnsiChar(UTF8Str), Length(UTF8Str), nil, nil);
  26.     Result := String(UTF8Str);
  27.    
  28.     // Remove carriage return and line feed
  29.     if (Length(Result) > 0) and (Result[Length(Result)] = #13) then
  30.       SetLength(Result, Length(Result) - 1);
  31.     if (Length(Result) > 0) and (Result[Length(Result)] = #10) then
  32.       SetLength(Result, Length(Result) - 1);
  33.   end
  34.   else
  35.     Result := '';
  36. end;
  37.  
  38. var
  39.   S: string;
  40. begin
  41.   Write('Enter δείγμα: ');
  42.   S := ReadUTF8Line;
  43.   WriteLn('S := ' + S);
  44. end.
  45.  

The problem occurs because the standard ReadLn function in FPC doesn't properly translate UTF-8 input from the Windows console. The Windows console API functions (ReadConsoleW) handle Unicode properly, so using them directly resolves the issue.

The key points are:
1. Using ReadConsoleW to read wide characters from the console
2. Converting the wide character buffer to UTF-8 using WideCharToMultiByte
3. Properly handling the conversion to ensure the UTF-8 string is correctly formed

This approach will properly capture and display your Greek text "δείγμα" as expected.

jcmontherock

  • Sr. Member
  • ****
  • Posts: 336
Re: [Windows] UTF8 encoding with ReadLn
« Reply #2 on: January 20, 2026, 05:41:45 pm »
With the "Keysboard" component I never met encoding problems. It's more better than ReadLn...
Windows 11 UTF8-64 - Lazarus 4.4-64 - FPC 3.2.2

bstewart

  • New Member
  • *
  • Posts: 14
Re: [Windows] UTF8 encoding with ReadLn
« Reply #3 on: January 20, 2026, 05:58:06 pm »
I was really looking for more of a confirmation that Read/ReadLn is broken. Seems so. Using the Windows APIs is non-starter for other platforms (understood that I posted in the Windows OS section; I'm merely pointing it out).

Perhaps the ongoing overhaul of the RTL will address this limitation. ReadLn shouldn't fail in this manner...

andersonscinfo

  • Full Member
  • ***
  • Posts: 156
Re: [Windows] UTF8 encoding with ReadLn
« Reply #4 on: January 20, 2026, 06:08:02 pm »
try this

Code: Pascal  [Select][+][-]
  1. program ConsoleUTF8Test;
  2.  
  3. {$MODE OBJFPC}      // Modo padrão moderno Object Pascal
  4. {$H+}               // Strings longas (AnsiString) ativadas
  5. {$CODEPAGE UTF8}    // Define a codepage do fonte para UTF-8
  6.  
  7. uses
  8.   {$IFDEF UNIX}
  9.   BaseUnix,         // Em Linux, o ReadLn padrão geralmente funciona bem com UTF-8
  10.   {$ENDIF}
  11.   {$IFDEF WINDOWS}
  12.   Windows,          // Necessário para as APIs de console
  13.   {$ENDIF}
  14.   SysUtils;
  15.  
  16. // Função encapsulada para ler UTF-8 do Console de forma Cross-Platform
  17. function ConsoleReadLn: string;
  18. {$IFDEF WINDOWS}
  19. var
  20.   hStdIn: THandle;
  21.   NumRead: DWORD;
  22.   Buffer: array[0..4095] of WideChar; // Buffer generoso para WideString
  23.   ResWide: UnicodeString;
  24. begin
  25.   hStdIn := GetStdHandle(STD_INPUT_HANDLE);
  26.  
  27.   // Tenta ler do console usando a API Unicode do Windows
  28.   if ReadConsoleW(hStdIn, @Buffer, Length(Buffer), NumRead, nil) then
  29.   begin
  30.     // Define o tamanho da string Unicode baseada no que foi lido
  31.     SetLength(ResWide, NumRead);
  32.     Move(Buffer[0], ResWide[1], NumRead * SizeOf(WideChar));
  33.    
  34.     // Converte nativamente de UTF-16 para a String padrão (UTF-8 definido no CP)
  35.     Result := UTF8Encode(ResWide);
  36.   end
  37.   else
  38.     Result := ''; // Falha na leitura
  39.    
  40.   // Tratamento clássico de fim de linha (CR/LF)
  41.   while (Length(Result) > 0) and (Result[Length(Result)] in [#13, #10]) do
  42.     SetLength(Result, Length(Result) - 1);
  43. end;
  44. {$ELSE}
  45. begin
  46.   // Em Unix/Linux, o sistema de arquivos e console já costumam ser UTF-8 nativo
  47.   ReadLn(Result);
  48. end;
  49. {$ENDIF}
  50.  
  51. var
  52.   Entrada: string;
  53. begin
  54.   // Configuração essencial para SAÍDA correta no Windows
  55.   {$IFDEF WINDOWS}
  56.   SetConsoleOutputCP(CP_UTF8);
  57.   SetConsoleCP(CP_UTF8);
  58.   // Ajuste de fonte pode ser necessário no terminal do Windows (ex: Lucida Console)
  59.   {$ENDIF}
  60.  
  61.   Write('Digite algo (ex: ελληνικά / Português): ');
  62.  
  63.   // Usamos nossa função robusta
  64.   Entrada := ConsoleReadLn;
  65.  
  66.   WriteLn('Você digitou: ', Entrada);
  67.   WriteLn('Tamanho em bytes: ', Length(Entrada));
  68.   WriteLn('Tamanho em caracteres (UTF8Length): ', UTF8Length(Entrada)); // Requer LazUTF8 ou implementação manual em FPC puro
  69.  
  70.   WriteLn('Pressione ENTER para sair...');
  71.   ReadLn;
  72. end.

cdbc

  • Hero Member
  • *****
  • Posts: 2612
    • http://www.cdbc.dk
Re: [Windows] UTF8 encoding with ReadLn
« Reply #5 on: January 20, 2026, 06:23:46 pm »
Hi
@bstewart: Try this:
Code: Pascal  [Select][+][-]
  1. program test;
  2.  
  3. {$MODE OBJFPC}{$H+}
  4.  
  5. uses sysutils;
  6.  
  7. var
  8.   S: string;
  9.  
  10. begin
  11.   Write('Enter δείγμα: ');
  12.   ReadLn(S);
  13.   WriteLn('S := ' + S);
  14. end.
it produces this:
Code: Bash  [Select][+][-]
  1. bc@red bstewart$ fpc test.pp
  2. Free Pascal Compiler version 3.2.2 [2024/07/26] for x86_64
  3. Copyright (c) 1993-2021 by Florian Klaempfl and others
  4. Target OS: Linux for x86-64
  5. Compiling test.pp
  6. Linking test
  7. 15 lines compiled, 0.3 sec
  8. bc@red bstewart$ ./test
  9. Enter δείγμα: δείγμα
  10. S := δείγμα
  11. bc@red bstewart$
  12.  
Ofc. this was on Linux, but I think you're overcomplicating things...
Regards Benny
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE6/QT6 -> FPC Release -> Lazarus Release &  FPC Main -> Lazarus Main

LV

  • Sr. Member
  • ****
  • Posts: 412
Re: [Windows] UTF8 encoding with ReadLn
« Reply #6 on: January 20, 2026, 07:14:09 pm »
Hi! Try this.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12645
  • FPC developer.
Re: [Windows] UTF8 encoding with ReadLn
« Reply #7 on: January 20, 2026, 07:14:17 pm »
I run this in a cmd.exe session and enter the command chcp 65001 to set the console input and output encoding to UTF8, run it, and enter a UTF8 string. Output:

All too complicated. Note that $codepage only influences the codepage of the strings in the source, it doesn't do much for thje rest.

For Windows, go the the application tab in the "project->options", turn the manifest on, and turn on "ANSI codepage is UTF-8, (Windows 10 1903+)" and it will simply work unless you are on unsupported Windows versions.

bstewart

  • New Member
  • *
  • Posts: 14
Re: [Windows] UTF8 encoding with ReadLn
« Reply #8 on: January 20, 2026, 07:25:47 pm »
Quote
All too complicated. Note that $codepage only influences the codepage of the strings in the source, it doesn't do much for thje rest.

Yes, aware of that; I added it to the sample to tell FPC the source is UTF8.

Quote
For Windows, go the the application tab in the "project->options", turn the manifest on, and turn on "ANSI codepage is UTF-8, (Windows 10 1903+)" and it will simply work unless you are on unsupported Windows versions.

I know this is a Lazarus forum, but I'm actually not using Lazarus. How do I do this if compiling from the command line?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12645
  • FPC developer.
Re: [Windows] UTF8 encoding with ReadLn
« Reply #9 on: January 20, 2026, 07:28:35 pm »
I know this is a Lazarus forum, but I'm actually not using Lazarus. How do I do this if compiling from the command line?

Basically that tab in Lazarus is only wizard to click together a manifest that is then translated to a .res. You can look it up in MSDN and create the manifest and then compile it with windres or fpcres.

Then link it to the EXE with $R.

LV

  • Sr. Member
  • ****
  • Posts: 412
Re: [Windows] UTF8 encoding with ReadLn
« Reply #10 on: January 20, 2026, 07:45:48 pm »
I know this is a Lazarus forum, but I'm actually not using Lazarus. How do I do this if compiling from the command line?

bstewart

  • New Member
  • *
  • Posts: 14
Re: [Windows] UTF8 encoding with ReadLn
« Reply #11 on: January 20, 2026, 07:46:59 pm »
Quote
Then link it to the EXE with $R.

Confirmed working, thanks!

How do we fix ReadLn without a manifest so it runs on older platforms? Or is it not possible?

bstewart

  • New Member
  • *
  • Posts: 14
Re: [Windows] UTF8 encoding with ReadLn
« Reply #12 on: January 20, 2026, 08:09:57 pm »
Regarding the manifest: Here's the MCVE.

File test.rc:
Code: [Select]
1 24 "test.manifest"

File test.manifest:
Code: [Select]
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
  <assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
  <application>
    <windowsSettings>
      <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
    </windowsSettings>
  </application>
</assembly>

Compile test.rc to test.res:
Code: [Select]
windres -i test.rc -o test.res

Finally, compile using the resource by adding the following to the top of the source file:

Code: [Select]
{$R *.res}

LV

  • Sr. Member
  • ****
  • Posts: 412
Re: [Windows] UTF8 encoding with ReadLn
« Reply #13 on: January 20, 2026, 08:20:00 pm »
How do we fix ReadLn without a manifest so it runs on older platforms? Or is it not possible?
😉

bstewart

  • New Member
  • *
  • Posts: 14
Re: [Windows] UTF8 encoding with ReadLn
« Reply #14 on: January 20, 2026, 08:25:49 pm »
@andersonscinfo - beat you to it ;-)

@LV - run your sample without calling the extra Windows APIs and without a resource (*.res) file.
« Last Edit: January 20, 2026, 08:29:08 pm by bstewart »

 

TinyPortal © 2005-2018