Recent

Author Topic: Malformed UTF-8 string with special characters  (Read 952 times)

itblumi

  • New Member
  • *
  • Posts: 29
Malformed UTF-8 string with special characters
« on: November 27, 2022, 11:20:03 pm »
Why do I get an error from Lazarus (2.2.4) with the following example. The character in the following example is in the UTF-8 codepage and I don't understand why I get this message from the compiler. It's the same with other special characters.

Code: Pascal  [Select][+][-]
  1. var
  2.   AStr: string;
  3. begin
  4.   // this char is in the UTF-8 codepage [228    U+00E4  C3 A4   ä      Latin Small Letter A With Diaeresis]
  5.   AStr := 'ä'; //  Error: Malformed UTF-8 string
  6. end;
  7.  

UTF-8 Codepage https://www.charset.org/utf-8

Edit: I use Lazarus on Windows 7.
« Last Edit: November 27, 2022, 11:53:11 pm by itblumi »
Jan

Delphi XE6, Lazarus 2.2.4, Visual Studio, Eclipse
Platforms: Ubuntu 22.10, Windows 7, 10
Progarmming languages: Pascal, C, C++, C#, Java

dsiders

  • Hero Member
  • *****
  • Posts: 1052
Re: Malformed UTF-8 string with special characters
« Reply #1 on: November 28, 2022, 12:38:22 am »
Why do I get an error from Lazarus (2.2.4) with the following example. The character in the following example is in the UTF-8 codepage and I don't understand why I get this message from the compiler. It's the same with other special characters.

Code: Pascal  [Select][+][-]
  1. var
  2.   AStr: string;
  3. begin
  4.   // this char is in the UTF-8 codepage [228    U+00E4  C3 A4   ä      Latin Small Letter A With Diaeresis]
  5.   AStr := 'ä'; //  Error: Malformed UTF-8 string
  6. end;
  7.  

UTF-8 Codepage https://www.charset.org/utf-8

Edit: I use Lazarus on Windows 7.

I don't have a working Windows 7 any more. I tried this using 2.2.4 on Windows 8.1 and Windows 11. Both of the following worked for me.

Code: Pascal  [Select][+][-]
  1. var AStr: String;
  2.  
  3.   AStr:= 'ä';
  4.   AStr := #$c3#$a4;

How did you generate the character in the Editor? Using CharMap or some other mechanism?

Preview Lazarus 3.99 documentation at: https://dsiders.gitlab.io/lazdocsnext

dbannon

  • Hero Member
  • *****
  • Posts: 2786
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Malformed UTF-8 string with special characters
« Reply #2 on: November 28, 2022, 12:41:28 am »
itblumi, you code works fine on Linux. That says to me that your issue somehow relates to how Windows handles utf-8 in Lazarus. I am not a Windows user so cannot say what that is, but I don't take extra precautions to make such code work on Windows so I suggest you have a deliberately selected a different code page perhaps ?

Must be a Windows user who can help ...

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

itblumi

  • New Member
  • *
  • Posts: 29
Re: Malformed UTF-8 string with special characters
« Reply #3 on: November 28, 2022, 12:48:23 am »
Quote
How did you generate the character in the Editor? Using CharMap or some other mechanism?

I used my keyboard for this my default keyboard is set to german, but I have the same issue for characters like this "á". It could be that lazarus or windows create this character
in Ansi, but I'm not sure about this. I will google this for Windows 7.
« Last Edit: November 28, 2022, 12:51:15 am by itblumi »
Jan

Delphi XE6, Lazarus 2.2.4, Visual Studio, Eclipse
Platforms: Ubuntu 22.10, Windows 7, 10
Progarmming languages: Pascal, C, C++, C#, Java

itblumi

  • New Member
  • *
  • Posts: 29
Re: Malformed UTF-8 string with special characters
« Reply #4 on: November 28, 2022, 01:10:41 am »
I think I know the issue. Delphi create the pascal file with ANSI encoding and Lazarus has then an issue with them. I will try to convert the *.pas files to UTF-8 and try them in Delphi and Lazarus.
Jan

Delphi XE6, Lazarus 2.2.4, Visual Studio, Eclipse
Platforms: Ubuntu 22.10, Windows 7, 10
Progarmming languages: Pascal, C, C++, C#, Java

itblumi

  • New Member
  • *
  • Posts: 29
Re: Malformed UTF-8 string with special characters
« Reply #5 on: November 28, 2022, 01:22:14 am »
I could fix the issue to convert all the *.pas files to UTF-8 with BOM.
Thanks for the help!
Jan

Delphi XE6, Lazarus 2.2.4, Visual Studio, Eclipse
Platforms: Ubuntu 22.10, Windows 7, 10
Progarmming languages: Pascal, C, C++, C#, Java

dbannon

  • Hero Member
  • *****
  • Posts: 2786
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Malformed UTF-8 string with special characters
« Reply #6 on: November 28, 2022, 01:34:34 am »
Ah, yes, BOM. What a great idea !  Lets hide something at the start of a file that completely changes the nature of the file and we'll see how long it takes people to find it !

 :D

Glad you solved your problem itblumi !

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

itblumi

  • New Member
  • *
  • Posts: 29
Re: Malformed UTF-8 string with special characters
« Reply #7 on: November 28, 2022, 01:37:31 am »
I changed to this format, because Delphi will save the file everytime with BOM  ;)
Jan

Delphi XE6, Lazarus 2.2.4, Visual Studio, Eclipse
Platforms: Ubuntu 22.10, Windows 7, 10
Progarmming languages: Pascal, C, C++, C#, Java

 

TinyPortal © 2005-2018