Recent

Author Topic: {$codepage UTF8} and Unicode  (Read 6433 times)

JD

  • Hero Member
  • *****
  • Posts: 1910
{$codepage UTF8} and Unicode
« on: March 04, 2017, 02:10:18 pm »
Hi there everyone,

I have a problem that cropped up when I upgraded to Lazarus 1.6.4/FPC 3.0.2. I recompiled some old projects and I noticed that the text is now garbled even with {$codepage UTF8} enabled. See screenshot

I work mostly in Win-1252 (French) and all the projects were previously compiled in Lazarus 1.6.2/FPC 3.0.

That said, what is the use of {$codepage UTF8}? Do I need to keep using it because I thought I read somewhere that Lazarus is now unicode. If I need to use it, must it be placed in every *.pas file in a project or will putting it in the *.lpr project file be enough?

Is it possibIe to have a situation where strings are unicode by default and I no longer have to deal with functions like UTF8Encode, UTF8String and all the other conversion functions?

Thanks a lot for your thoughts & contributions.

JD
Linux Mint - Lazarus 4.0/FPC 3.2.2,
Windows - Lazarus 4.0/FPC 3.2.2

mORMot 2, PostgreSQL & MariaDB.

Bart

  • Hero Member
  • *****
  • Posts: 5706
    • Bart en Mariska's Webstek
Re: {$codepage UTF8} and Unicode
« Reply #1 on: March 04, 2017, 10:55:21 pm »
You only need this in a source file that contains string literals that contain non-ascii characters (the sourcefile must be saved in utf8 encoding of course, as it is by default in Lazarus IDE).

Bart

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4698
  • I like bugs.
Re: {$codepage UTF8} and Unicode
« Reply #2 on: March 05, 2017, 10:16:29 am »
You only need this in a source file that contains string literals that contain non-ascii characters (the sourcefile must be saved in utf8 encoding of course, as it is by default in Lazarus IDE).
Not really. Easier and recommended way is to not use {$codepage UTF8} and then remember 2 things:
 1. Assign a constant always to a type String variable.
 2. Use type UnicodeString explicitly for API calls that need it.
The issue is explained here:
 http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#String_Literals
It looks more complicated than it is. Just remember the 2 above rules.
Note, assignment between string variables goes always right thanks to FPC's dynamic string encoding.

Dealing with Windows system codepages is not compatible with earlier Lazarus versions. Then encodings must be converted explicitly which is quite easy:
 http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#Reading_.2F_writing_text_file_with_Windows_codepage
I however recommend switching to Unicode in all your text data. It is such a big improvement over Windows codepages.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

JD

  • Hero Member
  • *****
  • Posts: 1910
Re: {$codepage UTF8} and Unicode
« Reply #3 on: March 06, 2017, 10:16:02 pm »
Thanks for all your responses. I've decided to keep both Lazarus 1.6.2/FPC 3.0 and Lazarus 1.6.4/FPC 3.0.2

The former will be for production use and I will slowly migrate to the latter as I sort out the codepage issues in my old projects.
Linux Mint - Lazarus 4.0/FPC 3.2.2,
Windows - Lazarus 4.0/FPC 3.2.2

mORMot 2, PostgreSQL & MariaDB.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4698
  • I like bugs.
Re: {$codepage UTF8} and Unicode
« Reply #4 on: March 06, 2017, 11:24:34 pm »
Thanks for all your responses. I've decided to keep both Lazarus 1.6.2/FPC 3.0 and Lazarus 1.6.4/FPC 3.0.2
The former will be for production use and I will slowly migrate to the latter as I sort out the codepage issues in my old projects.
Sorry but you have misunderstood something now. The "Better Unicode Support" came with Lazarus 1.6.0. It broke backwards compatibility against Lazarus 1.4.x for certain Windows codepage specific code.
If your code now works with Lazarus 1.6.2 then you can safely switch to 1.6.4.
« Last Edit: March 06, 2017, 11:26:52 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

 

TinyPortal © 2005-2018