Recent

Author Topic: widechar literals  (Read 1946 times)

440bx

  • Hero Member
  • *****
  • Posts: 5821
widechar literals
« on: April 30, 2022, 12:00:05 pm »
Hello,

I found an interesting comment in one of FPC's API definition files, specifically:
Code: Text  [Select][+][-]
  1. // L'xx' translates to 'xx'#$0000 because that forces a wide literal in FPC.
That got my attention because I didn't know that.  I searched for that "feature" to be documented somewhere and, unless I missed it, it doesn't seem to be documented anywhere.

My question is: is what that comment states true ? (since it does not seem to be documented, I would like to have confirmation)

If someone knows where that detail is documented, I'd appreciate a pointer to it.

Thank you for your help.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

AlexTP

  • Hero Member
  • *****
  • Posts: 2625
    • UVviewsoft
Re: widechar literals
« Reply #1 on: April 30, 2022, 01:29:10 pm »
Strange, the source is L'xx' == 2 chars, but the output is 'xx' plus the null-char == 3 chars.

440bx

  • Hero Member
  • *****
  • Posts: 5821
Re: widechar literals
« Reply #2 on: April 30, 2022, 02:47:28 pm »
Strange, the source is L'xx' == 2 chars, but the output is 'xx' plus the null-char == 3 chars.
Yes, and, if the comment is true then the output/interpretation should be 3 widechars.


FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12537
  • FPC developer.
Re: widechar literals
« Reply #3 on: April 30, 2022, 02:54:56 pm »
Any widechar concatination is enough, #0000 is nothing special.

e.g.

Code: Pascal  [Select][+][-]
  1. const yy = widechar('x')+'rest';

has length 5, prints xrest using an unicode routine:

Code: [Select]
call fpc_write_text_unicodestr

440bx

  • Hero Member
  • *****
  • Posts: 5821
Re: widechar literals
« Reply #4 on: April 30, 2022, 03:06:01 pm »
if the compiler is considering the constant to be made of widechar(s) then I don't see how the following sample program can compile:
Code: Pascal  [Select][+][-]
  1. program _WidecharLiteral;
  2.  
  3. function Test(p : pchar) : char;
  4. begin
  5.   result := p^;
  6. end;
  7.  
  8. const
  9.   AWIDECHAR = widechar('x') + 'morecharacters';       { supposedly a widechar literal }
  10.  
  11. begin
  12.   Test(AWIDECHAR);  { the compiler should refuse this but, it doesn't }
  13. end.                    
if the constant is truly wide then the call to "Test" shouldn't compile. 

Am I missing something ?

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12537
  • FPC developer.
Re: widechar literals
« Reply #5 on: April 30, 2022, 03:10:12 pm »
if the compiler is considering the constant to be made of widechar(s) then I don't see how the following sample program can compile:

It internally generates two constants, a widechar and a converted ansichar one.

440bx

  • Hero Member
  • *****
  • Posts: 5821
Re: widechar literals
« Reply #6 on: April 30, 2022, 03:32:35 pm »
if the compiler is considering the constant to be made of widechar(s) then I don't see how the following sample program can compile:

It internally generates two constants, a widechar and a converted ansichar one.
something doesn't look right.  First, what's being passed to the "Test" function is - supposedly - an explicitly widechar version of the constant, which the compiler should refuse.

the other thing that doesn't seem right is that, inspection of the executable in hex and disassembly shows only one constant and it is char, not widechar as the following shows:
Code: Text  [Select][+][-]
  1. 0000 ac00 00 00 01 00 00 00 00 00  ff ff ff ff ff ff ff ff  0f 00 00 00 00 00 00 00  78 6d 6f 72 65 63 68 61  ..☺.....        ☼.......xmorecha
  2. 0000 ac20 72 61 63 74 65 72 73 00  00 00 00 00 00 00 00 00  39 54 68 69 73 20 62 69  6e 61 72 79 20 68 61 73  racters.........9This binary has
  3. 0000 ac40 20 6e 6f 20 73 74 72 69  6e 67 20 63 6f 6e 76 65  72 73 69 6f 6e 20 73 75  70 70 6f 72 74 20 63 6f   no string conversion support co
  4. 0000 ac60 6d 70 69 6c 65 64 20 69  6e 2e 00 00 00 00 00 00  67 52 65 63 6f 6d 70 69  6c 65 20 74 68 65 20 61  mpiled in.......gRecompile the a
  5. 0000 ac80 70 70 6c 69 63 61 74 69  6f 6e 20 77 69 74 68 20  61 20 75 6e 69 74 20 74  68 61 74 20 69 6e 73 74  pplication with a unit that inst
  6. 0000 aca0 61 6c 6c 73 20 61 20 75  6e 69 63 6f 64 65 73 74  72 69 6e 67 20 6d 61 6e  61 67 65 72 20 69 6e 20  alls a unicodestring manager in
  7. 0000 acc0 74 68 65 20 70 72 6f 67  72 61 6d 20 75 73 65 73  20 63 6c 61 75 73 65 2e  00 00 00 00 00 00 00 00  the program uses clause.........
  8. 0000 ace0 0e 52 75 6e 74 69 6d 65  20 65 72 72 6f 72 20 00  05 20 61 74 20 24 00 00  00 00 00 00 00 00 00 00  ♫Runtime error .♣ at $..........

Code: Text  [Select][+][-]
  1. .rdata:000000010000D018 aXmorecharacter db 'xmorecharacters',0  ; DATA XREF: sub_100001480+Eo
  2. .rdata:000000010000D028 qword_10000D028 dq 0                    ; DATA XREF: sub_100003980+15Co
  3. .rdata:000000010000D028                                         ; sub_100007220+13o ...
  4. .rdata:000000010000D030 a9thisBinaryHas db '9This binary has no string conversion support compiled in.',0
  5. .rdata:000000010000D030                                         ; DATA XREF: sub_100003380:loc_1000033B6o
  6.  
note the "db" not "dw" as it would be if the constant was truly made of widechars.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12537
  • FPC developer.
Re: widechar literals
« Reply #7 on: April 30, 2022, 05:49:58 pm »
For me (3.3.1 two days old on win32) it was as I said. If I look in .s I see:

Code: [Select]
.section .rodata.n_.Ld1,"a"
.balign 4
.Ld1$strlab:
.short 0,1
.long -1,15
.Ld1:
.ascii "xmorecharacters\000"

the codepage 0 constant followed by the unicode (1200) constant:

Code: [Select]
.section .rodata.n_.Ld2,"a"
.balign 4
.Ld2$strlab:
.short 1200,2
.long -1,15
.Ld2:
.short 120,109,111,114,101,99,104,97,114,97,99,116,101,114,115,0


.. which uses "short" to encode the string.  .ld1 is passed to the test routine and .ld2 is passed to writeln.

Maybe you didn't writeln the constant so the wide version gets omitted/smartlinked out?
« Last Edit: April 30, 2022, 05:52:44 pm by marcov »

440bx

  • Hero Member
  • *****
  • Posts: 5821
Re: widechar literals
« Reply #8 on: April 30, 2022, 07:01:31 pm »
Maybe you didn't writeln the constant so the wide version gets omitted/smartlinked out?
It's possible that the widechar version got smartlinked out since, it is true that I did not use it.

What I actually wanted was to define a constant that would be only a widechar literal so that if it was used as a pchar, the compiler would issue an error. The fact that the compiler types the constant depending on how it is used defeats typing the constant.

IOW, there is no difference between

const A = widechar('a') + 'bc";

and

const A = 'abc';

since the compiler will create ansi and unicode versions of the constant depending on how they are used.  It would be nice to be able to tell the compiler that some constant is uniquely ansichar or uniquely widechar to prevent its use as the opposite of what it is. (i.e, char -> widechar or widechar -> char)

Marco, thank you for taking the time to explain.

 

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

PascalDragon

  • Hero Member
  • *****
  • Posts: 6195
  • Compiler Developer
Re: widechar literals
« Reply #9 on: May 02, 2022, 01:42:20 pm »
since the compiler will create ansi and unicode versions of the constant depending on how they are used.  It would be nice to be able to tell the compiler that some constant is uniquely ansichar or uniquely widechar to prevent its use as the opposite of what it is. (i.e, char -> widechar or widechar -> char)

If you want to fix the type of a constant you need to use a typed constant. The compiler can freely change the type of an untyped constant if it's needed (and valid) no matter if there are specific typecasts on the right side.

440bx

  • Hero Member
  • *****
  • Posts: 5821
Re: widechar literals
« Reply #10 on: May 02, 2022, 02:52:11 pm »
If you want to fix the type of a constant you need to use a typed constant.

Thank you for confirming that.

The compiler can freely change the type of an untyped constant if it's needed (and valid) no matter if there are specific typecasts on the right side.
that part surprises me a bit.  In that case, the compiler is being explicitly told that the constant is of a specific type yet it chooses to ignore it.  IMO, it would rather nice if the compiler could be told the type of a constant without having to turn the "constant" into a variable (by forcing the use of a typed "constant" to get it to mind the type.)

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

 

TinyPortal © 2005-2018