Recent

Author Topic: Unicode Constants  (Read 2715 times)

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 365
Unicode Constants
« on: September 17, 2020, 02:28:57 pm »
This line of code in delphi:

    dict.add('≧̸', #$2267#$0338);

Adds to dict, which is a TDictionary<String, String>, the string pair &ngE; and '≧̸'.

However compiling the same code in $mode delphi using FPC results in adding the string pair &ngE; and '??'.  But this works for other unicode characters like:

    dict.add('&ne;', #$2260);

which is '≠' in both delphi and FPC.

A bonus question: I'm somewhat confused by this. My code is behaving like the mode is delphiunicode, but it's only set to delphi. The project options syntax mode default is ObjFPC, so that's not it, and Use AnsiStrings is on. I suppose that's wrong, but why is simple unicode working? The documentation is confusing on this. Also, if my strings are unicode, what's a char? is that unicode too?

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Unicode Constants
« Reply #1 on: September 17, 2020, 02:35:53 pm »
I'm not sure (and might be completely wrong) but: Can it be that the compiler is converting your "unicode" chars to UTF-8 and while it works OK for single chars it fails for composed ones (though it shouldn't ...)?
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 365
Re: Unicode Constants
« Reply #2 on: September 17, 2020, 02:40:31 pm »
Well, presumably. I was kind of hoping someone who understands this could tell me how to resolve this one

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9867
  • Debugger - SynEdit - and more
    • wiki
Re: Unicode Constants
« Reply #3 on: September 17, 2020, 03:21:15 pm »
Conversion depends on the target: AnsiString or Utf8String.

AnsiString can not hold all the Unicode chars, so some chars will fail others will work.
Unicodestring should work.

I do not know how the Param to "add" is declared. I also do not know, if it will directly convert to that parms type, or go via default code page (which you can set somehow).

If it is declared as Utf8String then mayby Utf8String(#$2267#$0338)

Or you can either insert calls to Utf16ToUtf8 or specify the utf8 directly

https://www.fileformat.info/info/unicode/char/2267/index.htm
https://www.fileformat.info/info/unicode/char/0338/index.htm

#$E2#$89#$A7 + #$CC#$B8

Thaddy

  • Hero Member
  • *****
  • Posts: 14373
  • Sensorship about opinions does not belong here.
Re: Unicode Constants
« Reply #4 on: September 17, 2020, 04:45:06 pm »
However compiling the same code in $mode delphi using FPC
The correct mode is {$mode delphiunicode} which is the same as Delphi's 16 bit unicode.
Note that that is not very well supported by Lazarus (yet) since Lazarus is UTF8.
But in the correct mode the strings should be assignment compatible to a large extend.
« Last Edit: September 17, 2020, 04:48:42 pm by Thaddy »
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 365
Re: Unicode Constants
« Reply #5 on: September 17, 2020, 10:08:10 pm »
umm, I'm having trouble understanding this. I think this means that in 2020 there's still no way to actually write source that is unicode capable and consistent between FPC and delphi?

Because {$mode delphiunicode}means that my strings are not compatible with any system libraries.

Or have I misunderstood?

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Unicode Constants
« Reply #6 on: September 17, 2020, 10:20:00 pm »
TDictionary<String, String> in Delphi equals to TDictionary<UnicodeString, UnicodeString> in fpc?
(TDictionary<String, String> in fpc means TDictionary<AnsiString, AnsiString>)
Then define the constants explicitely as UnicodeString.

Just an untested suggestion.

Bart

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 365
Re: Unicode Constants
« Reply #7 on: September 18, 2020, 01:20:11 am »
Yes, well, declaring an intermediate parameter of UnicodeString did solve the problem. Then you can assign to String anyway. So I'm not convinced that it's not a bug, but the String situation is so messy I don't really know.

There's a bug in the base FPC Json classes:

 {
    "a": "\u2267\u0338\n"
  }

Will be read as something other than ≧̸ but the debugger support for unicode is sufficiently poor and FPCunit GUI crashes when I try to copy, so I can't figure out what it actually reads it as

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9867
  • Debugger - SynEdit - and more
    • wiki
Re: Unicode Constants
« Reply #8 on: September 18, 2020, 02:25:14 am »
You can try to set the watch to "memory dump" (does NOT always work).
Add a string (not shortstring) as watch with typecast: ^byte(somestring)^
then go to the watch properties and select "memory dump"

For shortstring it is
  ^byte(@somestring[1])^


For WideString use ^word(somewidestring)^


In "FpDebug" you also need to set "repeat count"
« Last Edit: September 18, 2020, 02:34:18 am by Martin_fr »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9867
  • Debugger - SynEdit - and more
    • wiki
Re: Unicode Constants
« Reply #9 on: September 18, 2020, 02:28:40 am »
FPCunit GUI crashes when I try to copy,
Use the right mouse, and copy from the context menu.

jamie

  • Hero Member
  • *****
  • Posts: 6130
Re: Unicode Constants
« Reply #10 on: September 18, 2020, 03:31:51 am »
Code: Pascal  [Select][+][-]
  1.  
  2. procedure TForm1.Button1Click(Sender: TObject);
  3. begin
  4.   Canvas.Font.Size := 30;
  5.   canvas.TextOut(0,0,JsonStringToString('\u2267 '+#9+'\u0338'));
  6. end;                                                                
  7.  

This works. there is some strange happenings when the two of those are side by side and its not json string doing it..

 Actually using an alternate first Unicode char with the u0338 still produces the error.

 So by inserting the space and then a back space displays the correct output and you can do this with wide string functions directly using the Windows.TextoutW(Canvas.Handle,0,0,wideString(….')..) and it still displays the error.

 So can't say where the real issue is here. It would appear like the chars need to be printed in singles.
The only true wisdom is knowing you know nothing

jamie

  • Hero Member
  • *****
  • Posts: 6130
Re: Unicode Constants
« Reply #11 on: September 18, 2020, 03:38:34 am »
Here is example of printing it in singles.

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. begin
  3.   Canvas.Font.Size := 30;
  4.   canvas.TextOut(0,0,JsonStringToString('\u2267'));
  5.   Canvas.Textout(Canvas.PenPos.x,canvas.PenPos.Y,JsonStringtoSTring('\u0338'));
  6. end;
  7.                                                                                    
  8.  

That also produces a nice output..
The only true wisdom is knowing you know nothing

kupferstecher

  • Hero Member
  • *****
  • Posts: 583
Re: Unicode Constants
« Reply #12 on: September 18, 2020, 12:22:58 pm »
The project options syntax mode default is ObjFPC, so that's not it, and Use AnsiStrings is on. I suppose that's wrong, but why is simple unicode working?
I'm not sure if this is clear or not: Ansi-String doesn't mean the string is limited to the ANSI-characters, but also could contain Unicode in form of UTF-8. You shouldn't need any WideString/UnicodeString or anything else to use Unicode characters, only if it's for library reasons that use UTF-16.

umm, I'm having trouble understanding this. I think this means that in 2020 there's still no way to actually write source that is unicode capable and consistent between FPC and delphi?
Why you expect 100% Delphi-compatibility?

This works for me:
Code: Pascal  [Select][+][-]
  1.   Label1.caption:= UTF8Encode(#$2267) + UTF8Encode(#$0338);
But I have to change the label's font to a unicode one (e.g. "Arial Unicode MS"), because of the composed character.

there is some strange happenings when the two of those are side by side and its not json string doing it..

 Actually using an alternate first Unicode char with the u0338 still produces the error.
As I understand it, its a composed character. The #$0338 is combined with #$2267 to one character on the display.

nanobit

  • Full Member
  • ***
  • Posts: 160
Re: Unicode Constants
« Reply #13 on: September 18, 2020, 01:04:57 pm »
This line of code in delphi:
    dict.add('&ngE;', #$2267#$0338);

If you have unicode constants in your source, you should declare {$codepage utf8} in your unit which helps with resolution at compile time. Your constant of widechars should ideally work without this, thus a bug report is appropriate.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5481
  • Compiler Developer
Re: Unicode Constants
« Reply #14 on: September 18, 2020, 03:04:13 pm »
This line of code in delphi:

    dict.add('&ngE;', #$2267#$0338);

Adds to dict, which is a TDictionary<String, String>, the string pair &ngE; and '≧̸'.

However compiling the same code in $mode delphi using FPC results in adding the string pair &ngE; and '??'.  But this works for other unicode characters like:

    dict.add('&ne;', #$2260);

which is '≠' in both delphi and FPC.

The difference is that in the case of #$2267#$0338 the compiler does a compile time conversion to an AnsiString where the used encoding will be the encoding of the file (by default CP 1252, you can change this with the $CodePage directive). For the single character FPC will do a runtime conversion whereby the selected multibyte conversion codepage influences the result (in case of Lazarus that will be UTF-8).

If I remember correctly this is indeed how this behaves in newer versions of Delphi if the left side is indeed a AnsiString. As for Delphi String = UnicodeString you won't notice this, cause there won't be any conversion necessary.

 

TinyPortal © 2005-2018