Recent

Author Topic: FPC Unleashed (inline vars, statement expr, tuples, match, indexed/lazy labels)  (Read 33432 times)

440bx

  • Hero Member
  • *****
  • Posts: 6462
Once a string literal is parsed it is interpreted as a string, therefore string type casting rules apply as others have already mentioned.
Wrong!... the compiler is being explicitly told to interpret that sequence of characters as a DWORD.   again: DWORD('abcd').

It's not hard to understand: D-W-O-R-D = DWORD = 4 bytes, they could be $1, $F, $3, $7 or 'a', 'b', 'c', 'd' or 'abcd'.  Why is this so difficult for Pascal programmers to understand ???.  It's always a matter of interpretation and as long as the compiler is being told how to interpret it, the compiler has no business interpreting it a different way.  The compiler is supposed to do what the programmer tells it to do (unless what's being asked of it cannot possibly be done but, that's not the case when it comes to interpreting 4 bytes as a DWORD.... there is no problem there... well... I should say there shouldn't because if this thread proves one thing is that the minds that justify and support writable constants are definitely still around.)

Unbelievable!!

ETA:

I forgot that in the previous post I agreed that you people are absolutely right.  Mea culpa... apologies.

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

hukka

  • Jr. Member
  • **
  • Posts: 62
    • Github
This feature would make parsing file headers much nicer. Human-readable 4-byte IDs are common in many file formats and having to use magic numbers hinders code readability. In 68000 assembly this sort of casting is very common:

cmp.l #'ILBM',(a0)  ; 32-bit value
beq   validheader

Fibonacci

  • Hero Member
  • *****
  • Posts: 929
  • Behold, I bring salvation - FPC Unleashed
Constant string-to-ordinal casts (stringordcast)

Following up on @440bx's request - DWORD('abcd') now works in const, var initializers and inline var under {$modeswitch stringordcast} (on by default in unleashed mode). The compiler folds it to a real compile-time constant when the literal's byte count matches the target ordinal's size.

Two clarifications before the demo.

1. MacPas already has a partial version. Under {$mode macpas}, FPC accepts DWORD('abcd') in const and folds it to an immediate, but only for 4-byte literals (byte('X'), word('HI'), qword('12345678') are rejected) and with big-endian packing regardless of the target.

stringordcast generalizes this: 1/2/4/8-byte literals, and packing uses the target's native endianness. The practical consequence is that the in-memory byte layout of the folded constant matches the source byte order on both LE and BE targets, so signature checks like PDWORD(@buffer)^ = DWORD('RIFF') work uniformly. On x86/x86_64 that means dword('abcd') folds to $64636261 (bytes 61 62 63 64 = 'a' 'b' 'c' 'd' in memory); on a BE target the numerical value would be $61626364 but the memory layout is still 61 62 63 64.

Result type is the type you cast to, including signed variants (LongInt, Int64).

2. I owe a correction. Earlier in this thread I said DWORD('abcd') "already works with inline variables". It parses and prints the right value, but the generated assembly tells a different story - the compiler puts the string literal in rodata and emits mov reg, [label] at runtime. I should have checked the generated code before claiming "already works".

With stringordcast it's a genuine compile-time fold now (x86_64):

Code: ASM  [Select][+][-]
  1. movl  $1684234849, -4(%rbp)   ; 1684234849 == $64636261, native LE load of 'abcd'

No string in rodata, no runtime load, no pointer cast - the bytes are packed into an ordconstn at typecheck time.

Demo

Code: Pascal  [Select][+][-]
  1. program stringordcast_demo;
  2.  
  3. {$mode unleashed}
  4.  
  5. const
  6.   // untyped const
  7.   SIG_MZ    = word('MZ');               // $5A4D
  8.   SIG_RIFF  = dword('RIFF');            // $46464952
  9.   MAGIC_8   = qword('DEADBEEF');        // $4645454246414544
  10.  
  11.   // char-literal variants
  12.   HEX_DWORD = dword(#$DE#$AD#$BE#$EF);  // $EFBEADDE
  13.   MIXED     = dword('AB'#$00#$01);      // $01004241
  14.  
  15. var
  16.   // global var with initializer
  17.   gSig: dword = dword('abcd');          // $64636261
  18.   gTag: word  = word('OK');             // $4B4F
  19.  
  20. procedure inline_context;
  21. begin
  22.   // inline var, inferred type
  23.   var a := dword('abcd');
  24.   // inline var, typed
  25.   var b: word := word('HI');
  26.   // signed variant
  27.   var c: int64 := int64('abcdefgh');
  28.  
  29.   writeln('  inline inferred a  = $', hexstr(a, 8));
  30.   writeln('  inline typed    b  = $', hexstr(b, 4));
  31.   writeln('  inline signed   c  = $', hexstr(c, 16));
  32. end;
  33.  
  34. procedure signature_check;
  35. var
  36.   buf: array[0..3] of char = ('R', 'I', 'F', 'F');
  37. begin
  38.   if pdword(@buf[0])^ = SIG_RIFF then
  39.     writeln('  RIFF file detected (bytes in memory match SIG_RIFF)');
  40. end;
  41.  
  42. begin
  43.   writeln('untyped const:');
  44.   writeln('  word(''MZ'')          = $', hexstr(SIG_MZ, 4));
  45.   writeln('  dword(''RIFF'')       = $', hexstr(SIG_RIFF, 8));
  46.   writeln('  qword(''DEADBEEF'')   = $', hexstr(MAGIC_8, 16));
  47.   writeln('  dword(#$DE#$AD...)  = $', hexstr(HEX_DWORD, 8));
  48.   writeln('  dword(''AB''#$00#$01) = $', hexstr(MIXED, 8));
  49.   writeln;
  50.  
  51.   writeln('typed var initializer:');
  52.   writeln('  gSig = $', hexstr(gSig, 8));
  53.   writeln('  gTag = $', hexstr(gTag, 4));
  54.   writeln;
  55.  
  56.   writeln('inline var:');
  57.   inline_context;
  58.   writeln;
  59.  
  60.   writeln('signature use-case:');
  61.   signature_check;
  62.   writeln;
  63.  
  64.   readln;
  65. end.

Output:

Code: Text  [Select][+][-]
  1. untyped const:
  2.   word('MZ')          = $5A4D
  3.   dword('RIFF')       = $46464952
  4.   qword('DEADBEEF')   = $4645454244414544
  5.   dword(#$DE#$AD...)  = $EFBEADDE
  6.   dword('AB'#$00#$01) = $01004241
  7.  
  8. typed var initializer:
  9.   gSig = $64636261
  10.   gTag = $4B4F
  11.  
  12. inline var:
  13.   inline inferred a  = $64636261
  14.   inline typed    b  = $4948
  15.   inline signed   c  = $6867666564636261
  16.  
  17. signature use-case:
  18.   RIFF file detected (bytes in memory match SIG_RIFF)

Details

Size must match exactly. If it doesn't, you get a specific diagnostic instead of the generic "Illegal expression":

Code: Text  [Select][+][-]
  1. Error: Cannot cast string of length 3 to ordinal type "LongWord" (size 4 bytes)

Works with #N-escaped char literals and mixed forms: dword(#$DE#$AD#$BE#$EF) gives bytes DE AD BE EF in memory, dword('AB'#$00#$01) gives 41 42 00 01.

New modeswitch

This is a new modeswitch stringordcast - off by default in all existing modes, on by default in unleashed mode. In stock Pascal a string literal is a string, not an ordinal, so the cast is formally illegal; the modeswitch makes it explicit opt-in rather than forcing it on. The emitted code for the const-section case is identical to writing the hex value by hand - no storage, no pointer cast, just an immediate.

Implementation

The parser already treats TypeName(expr) as a typecast node, so the fold lives in the constant evaluator: when the cast target is an integer ordinal and the inner node is a cst_conststring whose byte length equals the target size, the bytes are packed in target-native endianness into an ordconstn and the string node is discarded before codegen. No new AST node, no storage path, no change to how non-size-matching casts are diagnosed.
FPC Unleashed - inline vars, tuples, statement expressions, array equality, compound assignments, indexed/lazy labels, no-RTTI & more. ⭐ Star it on GitHub!

ccrause

  • Hero Member
  • *****
  • Posts: 1116
This feature would make parsing file headers much nicer. Human-readable 4-byte IDs are common in many file formats and having to use magic numbers hinders code readability. In 68000 assembly this sort of casting is very common:

cmp.l #'ILBM',(a0)  ; 32-bit value
beq   validheader


You are describing a feature, but do not suggest a syntax compatible with the Pascal language philosophy (or otherwise).  Looking at the current (Free) Pascal definition of a constant number indicates four different notations: decimal (no prefix, should be defined with digits in the range 0..9, hexadecimal prefixed by $), octal (prefixed by &) and binary (prefixed by %).

Constant string-to-ordinal casts (stringordcast)
Extending the existing number syntax pattern to a string literal without creating yet another syntax or type cast overload can simply be achieved by defining a suitable (as in uniquely identifiable and descriptive) prefix that is recognized by the parser/tokenizer. Once it is tokenized the fun is over, it is just a number.

Potential example using the arbitrary prefix ! (only because it is not an existing special character):
Code: [Select]
const x = !'abcd';

440bx

  • Hero Member
  • *****
  • Posts: 6462
2. I owe a correction. Earlier in this thread I said DWORD('abcd') "already works with inline variables". It parses and prints the right value, but the generated assembly tells a different story - the compiler puts the string literal in rodata and emits mov reg, [label] at runtime. I should have checked the generated code before claiming "already works".
I was tempted to make the correction because that's what FPC v3.2.2 does but, since I didn't know if trunk did the same thing, I figured I'd leave it alone.  I'm very pleased you caught that "trick" the compiler pulls to get what should be a compiler constant.  It explains why it worked in the code section and not in the var nor the const sections.

Curious, how did you implement it ?... I ask because it is normally implemented as part of a compile time expression evaluator, is that the implementation you're using or something different ?

For those who still think 'abcd' is a Pascal string, note in the screenshot that it is not, it is simply a sequence of 4 characters NOT preceded by a count (unlike some of the strings that follow which are used in the writeln statements.)

ETA:

Both snapshots are the same except that the first one has annoying extra whitespace.  I tried getting rid of that attachment but, it remained there.  If a moderator can get rid of that one, that would be great.  Thank you.

Look at the second attachment, not the first one.


« Last Edit: April 18, 2026, 11:43:24 am by 440bx »
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Fibonacci

  • Hero Member
  • *****
  • Posts: 929
  • Behold, I bring salvation - FPC Unleashed
2. I owe a correction. Earlier in this thread I said DWORD('abcd') "already works with inline variables". It parses and prints the right value, but the generated assembly tells a different story - the compiler puts the string literal in rodata and emits mov reg, [label] at runtime. I should have checked the generated code before claiming "already works".
I was tempted to make the correction because that's what FPC v3.2.2 does but, since I didn't know if trunk did the same thing, I figured I'd leave it alone.  I'm very pleased you caught that "trick" the compiler pulls to get what should be a compiler constant.  It explains why it worked in the code section and not in the var nor the const sections.

Curious, how did you implement it ?... I ask because it is normally implemented as part of a compile time expression evaluator, is that the implementation you're using or something different ?

Exactly that. FPC picks the conversion kind in defcmp.pas (the enum tconverttype - things like tc_int_2_int, tc_char_2_string, and so on), and at typecheck time dispatches through an array of method pointers in ncnv.pas. One of those enum values is tc_cstring_2_int, handled by typecheck_cstring_to_int. The only place in stock FPC that sets this conversion is a small m_mac-gated block - put there specifically for MacPas, hardcoded to 4-byte literals, big-endian, LongWord result. Whole machinery, one user.

So the fold was 90% done. What stringordcast does is widen the defcmp gate to also fire under the new modeswitch, and add a second branch in typecheck_cstring_to_int that accepts 1/2/4/8-byte literals in target-native endianness and uses the cast's target type as the result. The handler returns an ordconstn (plain integer-constant AST node) and the original string subtree is discarded - codegen never sees it. That's why it works identically in const, var initializer, typed constant, inline-var: it's just an integer constant at that point.

A few one-liners elsewhere to register the new modeswitch and error message, plus two test files (pass + fail). Most of the time went into reading defcmp and ncnv to figure out where to plug in. The whole fold is one extra branch in the existing handler.
FPC Unleashed - inline vars, tuples, statement expressions, array equality, compound assignments, indexed/lazy labels, no-RTTI & more. ⭐ Star it on GitHub!

440bx

  • Hero Member
  • *****
  • Posts: 6462
Thank you @Fibonacci, I appreciate the details.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

fabiopesaju

  • Jr. Member
  • **
  • Posts: 99
Excuse my ignorance, I'm just a basic pascal user. But I'd like to know, what are the chances of all this making it into the official version? That would be amazing! I believe that a few people will be quite backward regarding these magnificent implementations.

440bx

  • Hero Member
  • *****
  • Posts: 6462
Excuse my ignorance, I'm just a basic pascal user. But I'd like to know, what are the chances of all this making it into the official version? That would be amazing! I believe that a few people will be quite backward regarding these magnificent implementations.
Since I am not an FPC developer, I cannot offer a definite answer but, I believe the odds are similar to getting a continuous supply of free Häagen-Dazs in hell.

For the time being, I suggest hoping for the release of FPC v3.2.4, that might happen before hell starts importing Häagen-Dazs.


FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

ccrause

  • Hero Member
  • *****
  • Posts: 1116
Extending the existing number syntax pattern to a string literal without creating yet another syntax or type cast overload can simply be achieved by defining a suitable (as in uniquely identifiable and descriptive) prefix that is recognized by the parser/tokenizer. Once it is tokenized the fun is over, it is just a number.

So (unsurprisingly) the actual situation in the compiler is not so simple. Internally the compiler uses Val() to convert a text pattern into a value.  This route would then require formally defining a new number format for ansistring literals as base 256 encoded numbers.

xiyi0616

  • Jr. Member
  • **
  • Posts: 64

Quote
Install via fpcupdeluxe. In fpcup.ini, add to the top of [ALIASfpcURL]:

Code: INI  [Select]

unleashed.git=https://github.com/fpc-unleashed/freepascal.git
 

For Lazarus Unleashed (with autocomplete support), add to the top of [ALIASlazURL]:

Code: INI  [Select]

unleashed.git=https://github.com/fpc-unleashed/lazarus.git
 

Optionally enable docked IDE in Setup+, then hit Install/update FPC+Lazarus.

Does it matter which versions of FPC and Lazarus I choose in fpcupdeluxe?


Fibonacci

  • Hero Member
  • *****
  • Posts: 929
  • Behold, I bring salvation - FPC Unleashed
Does it matter which versions of FPC and Lazarus I choose in fpcupdeluxe?

Yes, it's crucial. Assuming you edited fpcup.ini, untick GitLab and you'll see unleashed.git at the top of both the FPC and Lazarus version lists. The screenshot in the first post shows this.
FPC Unleashed - inline vars, tuples, statement expressions, array equality, compound assignments, indexed/lazy labels, no-RTTI & more. ⭐ Star it on GitHub!

ccrause

  • Hero Member
  • *****
  • Posts: 1116
Extending the existing number syntax pattern to a string literal without creating yet another syntax or type cast overload can simply be achieved by defining a suitable (as in uniquely identifiable and descriptive) prefix that is recognized by the parser/tokenizer. Once it is tokenized the fun is over, it is just a number.

Potential example using the arbitrary prefix ! (only because it is not an existing special character):
Code: [Select]
const x = !'abcd';

In the spirit of this discussion, here is a branch that implements the above syntax for specifying a string literal as a number. This branch accepts a new special character as delimiter for a new numerical prefix (!). To fully implement the feature required updating Val() to decode the new format, since the compiler itself uses Val to decode patterns from the parser.

This allows specifying a number as a delimited string anywhere in code (where a literal number would make sense).
Code: Pascal  [Select][+][-]
  1. const
  2.   x = !'MZ';
  3.  
  4. var
  5.   y: word = !'A';
  6.  
  7. begin
  8.   if x > !'QQ' then
  9.  

Not saying ! is an intuitive delimiter. Also note that endian issues are at play when storing the number (this is different to the requested functionality of "string as byte array").

Fibonacci

  • Hero Member
  • *****
  • Posts: 929
  • Behold, I bring salvation - FPC Unleashed
Extending the existing number syntax pattern to a string literal without creating yet another syntax or type cast overload can simply be achieved by defining a suitable (as in uniquely identifiable and descriptive) prefix that is recognized by the parser/tokenizer. Once it is tokenized the fun is over, it is just a number.

Potential example using the arbitrary prefix ! (only because it is not an existing special character):
Code: [Select]
const x = !'abcd';

In the spirit of this discussion, here is a branch that implements the above syntax for specifying a string literal as a number. This branch accepts a new special character as delimiter for a new numerical prefix (!). To fully implement the feature required updating Val() to decode the new format, since the compiler itself uses Val to decode patterns from the parser.

This allows specifying a number as a delimited string anywhere in code (where a literal number would make sense).
Code: Pascal  [Select][+][-]
  1. const
  2.   x = !'MZ';
  3.  
  4. var
  5.   y: word = !'A';
  6.  
  7. begin
  8.   if x > !'QQ' then
  9.  

Not saying ! is an intuitive delimiter. Also note that endian issues are at play when storing the number (this is different to the requested functionality of "string as byte array").

The feature is already implemented in Unleashed and works everywhere - const, var, typed const, inline vars, and even without any variable at all:
- writeln(DWORD('abcd'))
- writeln(DWORD(#0#0'a'#0))
- if WORD('xy')>0 then
- if WORD('x'#0)>0 then
etc.

The disassembly confirms it's a plain immediate value, not a pointer or a function call. No RTL/Val() changes were needed either. What's wrong with the current implementation? Or do you just really want that ! prefix?

I consider this one done.
FPC Unleashed - inline vars, tuples, statement expressions, array equality, compound assignments, indexed/lazy labels, no-RTTI & more. ⭐ Star it on GitHub!

440bx

  • Hero Member
  • *****
  • Posts: 6462
The disassembly confirms it's a plain immediate value, not a pointer or a function call. No RTL/Val() changes were needed either. What's wrong with the current implementation? Or do you just really want that ! prefix?

I consider this one done.
You should consider it done.  You implemented a feature the compiler was already supposed to have, it allows casting therefore it should know how to cast 'abcd', which before your changes it didn't.  Essentially, what you've done is correct a bug caused by the originally deficient implementation of the parser's expression evaluator,  Proof of that is that it could do it in mode macpas but not in other mode, which was absurd.

Thank you for fixing this FPC problem.

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

 

TinyPortal © 2005-2018