Recent

Author Topic: UTF8 Problems  (Read 4497 times)

JD

  • Hero Member
  • *****
  • Posts: 1848
UTF8 Problems
« on: April 29, 2017, 12:37:09 pm »
Hi there everyone,

I have a problem with a case statement in an old project. {$codepage UTF8} was set near the beginning of the file. When I compile it, I get the error shown below

Code: Pascal  [Select][+][-]
  1. unit1.pas(38,34) Error: Constant and CASE types do not match
  2. unit1.pas(38,34) Error: String expression expected
  3.  

The line with the error is shown in the attached screenshot. Does anyone know how I can fix this problem?

Thanks,

JD
Windows - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe),
Linux Mint - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe)

mORMot; Zeos 8; SQLite, PostgreSQL & MariaDB; VirtualTreeView

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: UTF8 Problems
« Reply #1 on: April 29, 2017, 01:25:22 pm »
Remove the directive {$codepage utf8}. It helped when I simulated your issue although I don't understand (Lazarus source files are utf8 anyway, aren't they?)

Thaddy

  • Hero Member
  • *****
  • Posts: 14371
  • Sensorship about opinions does not belong here.
Re: UTF8 Problems
« Reply #2 on: April 29, 2017, 01:25:33 pm »
case <string> is a bug feature of FPC and is strictly ANSI afaik. It isn't supposed to work with UTF8, because there IS no UTF8 support in the compiler itself.
« Last Edit: April 29, 2017, 01:27:32 pm by Thaddy »
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: UTF8 Problems
« Reply #3 on: April 29, 2017, 03:09:31 pm »
But why is the case statement working if the {$codepage utf8} is removed? If I open the file in NotePad++ it tells me that the file is UTF8. So, the compiler is able to understand UTF8.

All this is very confusing, and I have always tried to avoid additional directives and "exotic" string types such as rawbytestring or the codepage aware strings declared at compile time etc. Using plain-old "string" without any directives works in 95%, probably even more, of all programs that I write.

Thaddy

  • Hero Member
  • *****
  • Posts: 14371
  • Sensorship about opinions does not belong here.
Re: UTF8 Problems
« Reply #4 on: April 29, 2017, 03:31:01 pm »
No, the compiler listens to the codepage of the underlying OS. Which is probably CP863. Therefor it works. Up to a point. Like for French. But not for all languages.
The same code will probably fail on other platforms than Windows. Unless the strings hash the same.
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11452
  • FPC developer.
Re: UTF8 Problems
« Reply #5 on: April 29, 2017, 05:43:04 pm »
So in addition to Thaddy, to state the obvious:

the compiler is compiled with ansistring(0), so on windows that means the default windows codepage (ansi codepages like windows-125x, not OEM codepages like cp852 like Thaddy says, though they are mostly 1:1 matched)

So anything inside the compiler that doesn't have special treatment (like literals) will use that codepage.

Literals have a special pass through to allow literals in varying codepages to be transfered to the final binary.

The choice for one-byte unicode/utf8 strings on Windows was doomed from the start, and should have been avoided, for exactly these reasons.

Thaddy

  • Hero Member
  • *****
  • Posts: 14371
  • Sensorship about opinions does not belong here.
Re: UTF8 Problems
« Reply #6 on: April 29, 2017, 06:53:02 pm »
All these string types, indeed.
cp is probably 1252-1.

Maybe a documentation issue in FPC.

Then again, people should use case of <string> with care. I wish someone put that genie back in its bottle.
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

JD

  • Hero Member
  • *****
  • Posts: 1848
Re: UTF8 Problems
« Reply #7 on: April 29, 2017, 11:23:45 pm »
All these string types, indeed.
cp is probably 1252-1.

Maybe a documentation issue in FPC.

Then again, people should use case of <string> with care. I wish someone put that genie back in its bottle.

Thanks a lot Thaddy. I replaced the case statement with an if statement & I was able to proceed. Now I will do the same with all the other case statements in the project.

The funny thing is that in another project, I had used case statements with case of <variant> and that worked. I might add that the variant was the mORMot TDocVariant type.

The other strange thing is this whole problem came to light with Lazarus 1.6.4/FPC 3.0.2. The last version Lazarus 1.6.2/FPC 3.0 did not complain at all.

Cheers,

JD
« Last Edit: April 29, 2017, 11:50:59 pm by JD »
Windows - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe),
Linux Mint - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe)

mORMot; Zeos 8; SQLite, PostgreSQL & MariaDB; VirtualTreeView

JD

  • Hero Member
  • *****
  • Posts: 1848
Re: UTF8 Problems
« Reply #8 on: April 30, 2017, 12:08:03 am »
Using IndexStr also works with {$codepage UTF8}

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. begin
  3.   //
  4.   case IndexStr(Edit1.Text, ['Sondages', 'Internet', 'Bouche à oreille', 'Partenaire']) of
  5.     0 : ShowMessage('First option');
  6.     1 : ShowMessage('Second option');
  7.     2 : ShowMessage('Third option');
  8.     3 : ShowMessage('Fourth option');
  9.   end;
  10. end;
  11.  

NB: IndexStr  is in StrUtils.pas

JD
« Last Edit: April 30, 2017, 12:14:25 am by JD »
Windows - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe),
Linux Mint - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe)

mORMot; Zeos 8; SQLite, PostgreSQL & MariaDB; VirtualTreeView

 

TinyPortal © 2005-2018