Recent

Author Topic: Unicode Constants  (Read 2660 times)

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: Unicode Constants
« Reply #15 on: September 18, 2020, 03:18:42 pm »
The difference is that in the case of #$2267#$0338 the compiler does a compile time conversion
Quote
For the single character FPC will do a runtime conversion

Why the different behaviour? Both are constant values? Both could be done at compile time?

Is this only because "What Delphi does"? How about mode ObjFpc?

Moreover (concluding from compiler warnings), if a char-constant is part of a constant expression that results in a string, then it appears to be converted at runtime?
Code: Pascal  [Select][+][-]
  1.   u := #$2267 + '';
  2.   u := #$2267 + ansistring('');
  3.   u := #$2267 + rawbytestring('');
  4.   u := #$2267 + char(' ');
  5.  
or is the added string/char first converted to widechar/string, then added, and the result converted back?

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Unicode Constants
« Reply #16 on: September 18, 2020, 04:56:15 pm »
The difference is that in the case of #$2267#$0338 the compiler does a compile time conversion
Quote
For the single character FPC will do a runtime conversion

Why the different behaviour? Both are constant values? Both could be done at compile time?

Is this only because "What Delphi does"? How about mode ObjFpc?

It's "what Delphi does", because the whole "code page aware string" concept is lifted from and modeled after Delphi.

Though I just noticed the following comment inside the "UnicodeChar -> AnsiString" conversion code:

Code: [Select]
                              // compiler has different codepage than a system running an application
                              // to prevent wrong codepage and data loss we are converting unicode char
                              // using a helper routine. This is not delphi compatible behavior.
                              // Delphi converts UniocodeChar to ansistring at the compile time

This was added to fix issue 21195.

Moreover (concluding from compiler warnings), if a char-constant is part of a constant expression that results in a string, then it appears to be converted at runtime?
Code: Pascal  [Select][+][-]
  1.   u := #$2267 + '';
  2.   u := #$2267 + ansistring('');
  3.   u := #$2267 + rawbytestring('');
  4.   u := #$2267 + char(' ');
  5.  
or is the added string/char first converted to widechar/string, then added, and the result converted back?

These are all constant strings like in the case of #$2267#$0338 and thus are handled at compile time.

Remy Lebeau

  • Hero Member
  • *****
  • Posts: 1312
    • Lebeau Software
Re: Unicode Constants
« Reply #17 on: September 18, 2020, 08:27:18 pm »
If I remember correctly this is indeed how this behaves in newer versions of Delphi if the left side is indeed a AnsiString.

The behavior has nothing to do with the type used on the left side of the assignment.  The behavior is controlled by the {$HIGHCHARUNICODE} directive instead:

Quote
When HIGHCHARUNICODE is OFF:

    All decimal #xxx n-digit literals are parsed as AnsiChar.
    All hexadecimal #$xx 2-digit literals are parsed as AnsiChar.
    All hexadecimal #$xxxx 4-digit literals are parsed as WideChar.

When HIGHCHARUNICODE is ON:

    All literals are parsed as WideChar.

As for Delphi String = UnicodeString you won't notice this, cause there won't be any conversion necessary.

In FPC, String=AnsiString in {$mode Delphi}, and String=UnicodeString in {$mode DelphiUnicode} and {$modeswitch UnicodeStrings}.
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 365
Re: Unicode Constants
« Reply #18 on: September 19, 2020, 12:23:01 am »
> Why you expect 100% Delphi-compatibility?

Well, I don't. Clearly there's areas where there isn't. But when it comes to something as fundamental as unicode, and as uibiquituous as string handling, then I do expect a clearly documented way to write code that delivers source code compatibility, yes.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Unicode Constants
« Reply #19 on: September 19, 2020, 04:13:46 pm »
If I remember correctly this is indeed how this behaves in newer versions of Delphi if the left side is indeed a AnsiString.

The behavior has nothing to do with the type used on the left side of the assignment.  The behavior is controlled by the {$HIGHCHARUNICODE} directive instead:

Quote
When HIGHCHARUNICODE is OFF:

    All decimal #xxx n-digit literals are parsed as AnsiChar.
    All hexadecimal #$xx 2-digit literals are parsed as AnsiChar.
    All hexadecimal #$xxxx 4-digit literals are parsed as WideChar.

When HIGHCHARUNICODE is ON:

    All literals are parsed as WideChar.

FPC does not support that switch. And please also see what I mentioned further down in my post after I looked at the compiler's code.

As for Delphi String = UnicodeString you won't notice this, cause there won't be any conversion necessary.

In FPC, String=AnsiString in {$mode Delphi}, and String=UnicodeString in {$mode DelphiUnicode} and {$modeswitch UnicodeStrings}.

I know that FPC supports that modeswitch, but right now it's essentially a joke and useless. Because yes, String might be set to UnicodeString then, but the whole RTL still uses AnsiString. So overriding any virtual methods of RTL classes requires you to explicitely use AnsiString instead of String thus providing even less compatibility to Delphi code than the current solution does.

> Why you expect 100% Delphi-compatibility?

Well, I don't. Clearly there's areas where there isn't. But when it comes to something as fundamental as unicode, and as uibiquituous as string handling, then I do expect a clearly documented way to write code that delivers source code compatibility, yes.


You're expecting wrong. What we do is document code so that it generates working code, not code that delivers source code compatibility to Delphi. And the Unicode related behavior is extensively documented here.

 

TinyPortal © 2005-2018