Recent

Author Topic: FPC Unleashed (inline vars, statement expr, tuples, match, indexed/lazy labels)  (Read 40818 times)

440bx

  • Hero Member
  • *****
  • Posts: 6532
Are we in a "C"-centric world? Very well, now is the time to get rid of C, not to appropriate its semantics.
I'm all for getting rid of C but, C did get a few things right, for instance, its feature set is quite good though its implementation sure leaves a great deal to desire.

Rather than demanding syntax and compiler capabilities that devalue Pascal, it would be better to do more by enhancing and fixing what's available now.
Believe it or not,
Code: Pascal  [Select][+][-]
  1. while BOOL(var i : integer := <somevalue>) do...
  2.  
That is an enhancement that corrects two current problems in the language that are intimately related, that is the non-locality of the variable that controls the loop.

First, there is no way to guarantee the while loop variable shares its lifetime with the while loop, which means it is declared as a function/procedure global, which in turn means that any nested function or procedure could easily modify its value and the programmer not realize it.   The fact is, currently, within the function/procedure, the while index variable behaves like a global variable and that is very undesirable.

Second, and related to the above first, if the variable that controls the while loop is a genuine global variable, which is bad programming but may be forced by the language in some cases, then the value of that variable could be altered by another execution thread and, again, it's unlikely this would be noticed by the programmer because the language does not provide any mechanisms to prevent that.

That "while" statement I showed, solves both of those problems and tells the programmer, right there in that line, that the variable's life is tied to the while loop, just as local and just as long lived, as it should be, not as the mess the language currently allows.

There is one thing I have to give you credit for:
Quote
if you don't know how or don't want to... you simply work with others who do.
That is great advice, now all you have to do is make sure you are better at following the advice you give to others.


« Last Edit: May 11, 2026, 10:43:06 am by 440bx »
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Fibonacci

  • Hero Member
  • *****
  • Posts: 1000
  • Behold, I bring salvation - FPC Unleashed
@VisualLab: @LeP:

Honestly? All I wanted was inline variables. That's it. One feature.

I went to the FPC core devs. The answer was no. I came back to it over the years - opened a few threads asking about the status, are they coming? - same arguments, same wall. I pointed out that Delphi has inline vars and FPC even ships {$mode delphi} - didn't matter. The "original Pascal" purist crowd cheered every refusal. At one point I even asked half-jokingly whether any core dev would land inline vars in stock FPC for $5000 (@LeP - you replied in that very thread, quoting the $5k figure, you should remember). The response - in every single thread, every single time inline vars came up - was always the same: absolutely not, never. Not in this lifetime, no chance, not up for discussion, end of story. The purist crowd defends this position to the death - inline vars = demon, evil incarnate.

So, fine. Unleashed has inline vars now. And once that door was open, the rest followed.



Where things actually stand today: the syntax side is essentially done. The only remaining language-level item on the list is Composable Records (plus per-record / per-field alignment). That's a serious chunk of work and is dragging because of it. After it lands, there's no real new-feature backlog left - everything else is polishing what's already there. The language already lets you "do anything" - I think that's a fair description.

Oh... and String Interpolation. Then it's done ;)

So your point about enhancing rather than adding - that's exactly the phase I'm moving into.



Right now I'm working on IDE tooling.

1. Unleashed Installer. A small installer that pulls down Unleashed only - think fpcupdeluxe but Unleashed-only and dead simple. Already in good shape, sources will be public soon. Host platforms Win64 and Linux64; cross-compile targets win64/win32, linux64/linux32 and WASM out of the box. More on demand. fpcupdeluxe still works perfectly fine for installing Unleashed (just edit fpcup.ini as in the first post) - this is a convenience layer so a complete novice can click through it. The installer also installs IDE packages - currently the minimap (optional, you can untick).

2. CPUView integration. I reached out to Alex (author of CPUView and its dependency FWHexView) about a license change - we worked it out, both projects are MIT as of yesterday. I'm planning to ship CPUView with the Unleashed IDE. Exact UX is TBD - definitely a toolbar button. If I manage to dig deep enough into the IDE internals, I'd like to drop CPUView straight in place of the current Assembler tab. The debugger UX in CPUView is roughly 100x what stock Lazarus offers; pulling that into the IDE moves the bar significantly.

3. Formatter. About 70% done. It parses Pascal source into a CST (Concrete Syntax Tree) and then formats by rewriting from scratch off that tree - whitespace, line breaks, indentation, all of it - strictly according to the user's settings. No regex hacks, no character-level patching of the original source.

This is actually why I added WASM to the installer's cross-compile target list - the formatter compiles to WASM, and I'll be putting it up online as a standalone web app shortly. I'll start a separate topic for it. The point of the web version is testing: it will have bugs, and I'd rather have people pound on it in a browser and report the corner cases before I wire it into the IDE. So if you're up for it - critical but constructive feedback would be very welcome there.

Once the formatter is solid, it lands inside Unleashed IDE. There'll be a dedicated tab in Tools -> Options for it, with settings for when to format (e.g. on save, on demand, etc.) and the usual style knobs. Exact options still being worked out.



So - I don't actually disagree with the "improve what's there" direction. It's where Unleashed is already heading. The syntax phase is winding down; the next chapters are tooling, IDE, debugger.
FPC Unleashed - inline vars, tuples, statement expressions, array equality, compound assignments, indexed/lazy labels, no-RTTI & more. ⭐ Star it on GitHub!

LeP

  • Sr. Member
  • ****
  • Posts: 337
@Fibonacci
I've always preferred inline variables (for limited purposes) and in Delphi I use them. But in other areas, I dont' agree with expanded rules (such as extending C syntax or semantics).

@440bx
I wrote that I was rude, and I don't mean to offend anyone. But the idea that porting code from C requires adapting Pascal to C is a view I don't share and one I abhor.

P.S.: It's not a good idea to publish my beliefs here. I'll refrain from interfering in this thread in the future.
« Last Edit: May 11, 2026, 04:44:58 pm by LeP »
Un Sistema per domarli, un IDE per trovarli, un codice per ghermirli e nel framework incatenarli.
An operating system to tame them, an IDE to find them, a code to catch them and in the framework chain them.

flowCRANE

  • Hero Member
  • *****
  • Posts: 986
I'm very interested in this dialect because I'm tired of the language's archaic syntax. I'm very pleased with the addition of support for inline variables and lazy label declarations — this is a best practice that promotes minimal variable scope, making it easier to design code and catch errors. Admittedly, I want to stick to the very basics of the language syntax and data types — I’m not interested in touples, features like autofree, defer, or pattern matching — but that doesn’t mean I’m against them. Let those who need them use them, and the rest don’t have to.


However, in my opinion, this dialect is still missing a few features that would make writing code more convenient and improve its readability. So — @Fibonacci — can I suggest changes or additions to the language that I'm interested in?

Draft (so I don't forget):
  • named and unnamed unions (my old proposal),
  • C-style array declaration with the option to specify the number of items (no keywords needed),
  • bit fields with the option to specify the number of bits occupied by the field.
  • the ability to modify the for-loop iterator.
« Last Edit: May 11, 2026, 04:23:06 pm by flowCRANE »
Lazarus 4.6 with FPC 3.2.2, Windows 11 — all 64-bit

Working solo on a top-down retro-style action/adventure game (pixel art), programming the engine from scratch, using Free Pascal and SDL3.

Fibonacci

  • Hero Member
  • *****
  • Posts: 1000
  • Behold, I bring salvation - FPC Unleashed
  • named and unnamed unions (my old proposal),
  • bit fields with the option to specify the number of bits occupied by the field.

Both of these fall under Composable Records - just wait ;) That's the one big language-level item still in flight. Unions I already have a clear picture of. Bit-fields I'm less sure about - haven't settled on what the syntax should look like in Pascal. If you have a concrete idea of how it should read, drop the pseudocode here, very welcome.

  • C-style array declaration with the option to specify the number of items (no keywords needed),

Need more info - can you sketch the syntax you have in mind? "C-style without keywords" can mean a few different things and I'd rather not guess.

  • the ability to modify the for-loop iterator.

Need more info too - what exactly do you mean by "modify"? Assigning to the counter from inside the body and have the loop continue from there? Or something else?

Also - is what you're after maybe already covered by custom enumerators? FPC has supported user-defined iteration via for x in collection do for years, and you can plug in any class implementing the enumerator pattern. If that's the direction you want, it's already there.
FPC Unleashed - inline vars, tuples, statement expressions, array equality, compound assignments, indexed/lazy labels, no-RTTI & more. ⭐ Star it on GitHub!

Curt Carpenter

  • Hero Member
  • *****
  • Posts: 759
I just spent a few days writing some C to finish a project on an RPi Pico, and it reminded me of how elegant basic Pascal is in comparison.   I wonder if adding features to the language isn't well into the law of diminishing returns at this point?  Is "feature bloat" a thing?


440bx

  • Hero Member
  • *****
  • Posts: 6532
I am definitely very interested in the possible enhancements to the variants in records.  I already explicitly asked for that feature in this thread.

I've read the "composable records" thing but, I think it falls short of what a genuinely useful multiple variant record should provide.

Unnamed unions in C are another implementation disgrace.  That's because, in C, when a struct has multiple unnamed unions, the compiler allows fields of one union to be set while also allowing fields of a different union that affect the fields just set of the previous union to be modified.  That's ridiculous and, it is an obvious error the compiler should catch and, at the very least, warn the programmer of the problem but, not in C ... macho programmers don't need to be told they just shot themselves in the foot with a bazooka.  They don't need that and, why should the compiler warn about semantic mistakes ?... who's idea is it that the compiler should actually work and ensure things make sense ?  ... what's next ?... strong typing ?

Anyway...

what follows is a possible implementation....   

First, let's define the syntactic structure of the variants (some of the following is heavily influenced by Ada variants)... there are two types of variants, tagged and untagged.  That's very important because the syntax of one should be a clear superset of the syntax of the other.  With that in mind...

1. Records may contain variant "case" parts anywhere in the body — not just at the end. Multiple variants per record allowed. Each variant has its own "end".

Anonymous variant — no tag field, no storage for selector:

Code: Pascal  [Select][+][-]
  1. case (ArmA, ArmB) of
  2.    ArmA: ( ... );
  3.    ArmB: ( ... );
  4. end
```

Named variant — tag field stored in record:

Code: Pascal  [Select][+][-]
  1. case TagField : dword(ArmA, ArmB) of
  2.    ArmA: ( ... );
  3.    ArmB: ( ... );
  4. end

2. Tagged variant fields may only be written via write-mode "with". Reading uses read-mode "with" — programmer asserts correct tag value  (this is a bit of a departure from Ada and, not necessarily a good one.) Untagged variant fields may be accessed directly via full path.

Code: Pascal  [Select][+][-]
  1. { Write mode — sets tag, all fields must be assigned }
  2. with r.FirstOption.NamedField := OtherOptionA do
  3. begin
  4.    AsInt32 := 1; AnotherField := true;
  5. end
  6.  
The important thing about the above "with" is that the tag field is assigned a value AND it requires every field that is in the union/variant to be assigned a value and, since the scope is clearly identified, fields that don't belong to the scope cannot be referenced (one of the absurd problems in the C unnamed unions implementation.)

Code: Pascal  [Select][+][-]
  1. { Read mode — programmer asserts tag value }
  2. with r.FirstOption.NamedField.OtherOptionA do
  3.    n := AsInt32;
When simply reading/referencing a field, for practical purposes there are no requirements, as long as the field is in scope, it can be "read".

The important thing is that the "with" statement becomes a safety net (instead of opening the door to problems which can happen when combined with some poor programming practices), here are the rules that should govern the "with" statement:

(a.) the with list may contain record variables, enumeration type names, untagged variant paths, and tagged variant write-mode entries

(b.) A record variable brings its fields into unqualified scope (in other words, no longer needs a full reference since the "with record_var" sets the scope)

(c.) An enumeration type name brings its elements into unqualified scope (presuming scoped enums are in effect.)

(d.) An enumeration variable in "with" list is a semantic error — use the type name

(e.) with r.ArmName do — untagged variant arm path — brings arm fields into unqualified scope

(f.) with r.path.TagField.ArmName do — tagged variant read mode — brings arm fields into scope

(g.) with r.path.TagField := ArmName do — tagged variant write mode — sets tag and activates arm

(h.) In write mode all fields of the activated arm must be assigned in the "with" (this ensures there cannot be partially initialized variant/union.)

(i.) In write mode leaving any field unassigned is a semantic error

(j.) The tag field value may not be changed outside of write-mode `with`

(k.) In read mode the compiler does not verify the tag field value (the compiler relies on the programmer to ensure the fields referenced are consistent with the value of the already set tag value.  this isn't necessarily a good idea but, it does simplify the code generation and error handling... anyway, it's questionable...)

(l.) A variable may not appear more than once in same with list (I believe this has already been implemented in the "unleashed" version.)

(m.) A type name may not appear more than once in same "with" list.  Same as above but explicitly for types.

(n.) If any two entries share a field or element name — semantic error  (this prevents the most common complaint about the "with" statement, lack of "obviousness"/"uniqueness" of what field is getting set.)

(o.) Use separate nested "with" statements to resolve name ambiguity

(p.) Field and element names resolved at point of use within innermost active "with" scope

(q.) A name in scope from two or more active "with" scopes simultaneously — semantic error (the multiple "with" use must not itself create a situation where the field being set is not obvious/unique.)

(r.) Explicit qualified reference always valid inside "with" (helps clear possible ambiguities)

(s.) Explicit qualification takes precedence over unqualified resolution

(t.) "with" list may mix record variables and enumeration type names freely

(u.) Nested "with" statements resolve ambiguity for shared names

(v.) Each top-level record variable in a "with" list defines a scope chain — its nested field entries must immediately follow it as a contiguous group before any other top-level entry appears

(w.) Interleaving entries from different scope chains within the same "with" list is a semantic error

(x.) A scope chain entry is a field of the record or nested record most recently introduced by the preceding entry in the same chain

(y.) Each entry in a scope chain must be a valid field of the record type brought into scope by the immediately preceding entry — a field that does not belong to that scope is a semantic error

Example valid: "with r, inner1, inner2, inner3 do" — each entry descends from the previous
Example invalid: "with r, r2, inner1, r2inner1, inner2 do" — entries from r's chain and r2's chain are interleaved

(z.) Enumeration type names in a "with" list are not part of any record scope chain and may appear at any position without violating the contiguous grouping rule

(aa.) "with all variable do" requires every direct fixed field of the record variable to be assigned before the body exits (Note: this form requires a new keyword "all".)

(ab.) "with all" checks direct fields only — nested record fields are treated as a single unit; assigning the nested record as a whole satisfies the check

(ac.) To apply completeness checking to a nested record too use a separate "with all nestedfield do"

(ad.) "with all" on a record containing a variant is a semantic error — use write-mode "with" for variants instead

(ae.) FAM fields are excluded from the "with all" completeness check — the FAM field must be explicitly satisfied using "rec.fam := unassigned;"  (Note: this requires a new keyword "unassigned", "unassigned" is an escape valve to be used with arrays.)

(af.) Array fields satisfy "with all" either by whole-array assignment or by "MyArray := unassigned;"

(ag.) If any field is assigned only inside a conditional statement the "with all" check is considered satisfied but the compiler issues a warning that one or more fields may be left uninitialised

(ah.) "with all" applies only to the immediately following variable — it does not propagate to scope chain entries

(ai.) "with all" on an enumeration type name is a semantic error — enumeration elements cannot be assigned

(aj.) A tagged variant write-mode entry ("variable := identifier") must be the sole entry in the "with" list — no other entries of any kind may accompany it

(ak.) A "with" list containing a tagged variant write-mode entry alongside any other entry is a semantic error

(al.) Tagged variant read-mode entries and untagged variant path entries are not subject to this isolation rule and may appear alongside other entries subject to the contiguous scope chain rules

Those are the semantic rules the parser should enforce to ensure the "with" is crystal clear, introduces no possible ambiguities as to which field is being referenced and, for variants, it isolates the variant's fields ensuring all of its fields are set and no other "foreign" uninvited fields "crash the party."

Using those rules, the compiler can ensure consistency, uniqueness and completeness.  That's what a compiler should do.

HTH.

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

flowCRANE

  • Hero Member
  • *****
  • Posts: 986
@fibonacci — ok, so below are my humble suggestions for an improved syntax. 8)


1. Named and unnamed unions

As we know, Free Pascal allows you to declare records with variant parts, but this comes with a number of problems:
  • To declare a variant part, you have to use the case of syntax, which is absurd—it’s a statement that controls the flow of execution, which has nothing to do with a data type declaration.
  • Inside the case of statement, you should use "some" countable data type (such as integer or boolean) that has nothing to do with the variant part, which is unnecessarily confusing.
  • Each subbranch of the variable part must be enclosed in parentheses but should be in begin end at most.
  • The variable part is not terminated with a standard end, which is completely out of place in typical code blocks and causes even more confusion.
  • The variable part extends all the way to the end of the entire record declaration, and there is no way to declare it so that it applies to fields located in the middle of the structure. To bypass this, you must use additional begin end or record end statements to group such fields, which unnecessarily lengthens the declaration of the entire data structure and introduces more bloat in the syntax.

What I propose is adding support for union declarations in the simplest, shortest, and clearest way possible. If we want a union where all fields are at the same offset, we would declare this data type using the keyword union instead of record:

Code: Pascal  [Select][+][-]
  1. type
  2.   TFoo = union
  3.     Field1: Integer; // offset 0
  4.     Field2: Single;  // offset 0
  5.   end;
  6.  

If we want to declare a variable part within a record or union, we can do so using a union block whose body is terminated by a standard end. The content between the union and end keywords pertains to the variable part, and everything after end consists of the structure's normal fields. The variable part can be declared as either named or unnamed to avoid creating unnecessary namespaces:

Code: Pascal  [Select][+][-]
  1. type
  2.   TFoo = record
  3.     // normal fields
  4.     Field1: Integer; // offset 0
  5.     Field2: Single;  // offset 4
  6.    
  7.     // unnamed variant part
  8.     union
  9.       Field3: Int64;   // offset 8
  10.       Field4: Pointer; // offset 8
  11.       Field5: Char;    // offset 8
  12.     end;
  13.    
  14.     // normal fields
  15.     Field6: Integer; // offset 16
  16.     {..}
  17.   end;
  18.  

If you need more examples, I’d be happy to provide them. The general idea is not only to make it as easy as possible to declare unions (using the union keyword), but also to allow them to be declared as named and unnamed unions, and to ensure that the variable part (the union) applies to any fields, not just the last ones. The syntax proposed above is the shortest and simplest one that meets the requirements; moreover, it is very clear and unambiguous.


2. Bit fields

Free Pascal supports bit-packed records, but the syntax for declaring bit fields is... odd and counterintuitive, as it requires specifying the range of values for a given field instead of its size in bits. When working with such structures, we specify how many bits each field should occupy, but currently this information is not included in the field declaration syntax.

I'm not sure about the exact syntax, but what's important to me is being able to see how many bits each field occupies. In theory, this type of information can be placed at the end of the field declaration, for example, after the additional keyword size (or a colon):

Code: Pascal  [Select][+][-]
  1. type
  2.   TFoo = bitpacked record
  3.     BitField1: Integer size 4; // four bits
  4.     BitField2: Boolean size 2; // two bits
  5.     BitField3: -2 .. 2 size 6; // six bits with s specific range of values
  6.   end;
  7.  

This way, the compiler will not only know what data type we are referring to, but also exactly how many bits the field should occupy. The size specified in bits is to be the target size of the field, regardless of the data type used. If we manually specify the allowed range of values, the size in bits specified at the end of the declaration must be used, even if the range of values requires fewer bits (if the range requires more bits than specified, a compilation error occurs). If you specify a data type instead of a range (e.g., Integer), values ranging from zero up to the maximum supported by that data type or up to the specified field size in bits are allowed. The compiler should have no trouble deducing the range of values based on the specified field size in bits, and then use it to verify the validity of values during range checks.


3. Abbreviated array declaration

The current syntax for declaring arrays is extremely verbose and unnecessarily requires specifying every detail. I propose changing the way arrays are declared so that only key information, such as the type and number of elements, is required:

Code: Pascal  [Select][+][-]
  1. var
  2.   Array1: Integer[5];    // array of five integers, indexing from 0 to 4
  3.   Array2: Integer[5, 5]; // 2D array of integers, indexing from 0 to 4 for both dimensions
  4.   Array3: Integer[];     // dynamic array of integers, indexing from 0
  5.   Array4: Integer[,];    // dynamic 2D array of integers, indexing from 0 for both dimesions
  6.  

Of course, the standard ranges should still be supported:

Code: Pascal  [Select][+][-]
  1. var
  2.   Array1: Integer[-2 .. 2];
  3.  

Instead of specifying a range of indices, it should also be possible to specify a constant that defines the number of elements, as well as the data type (as is currently supported):

Code: Pascal  [Select][+][-]
  1. const
  2.   ITEMS_NUM = 10;
  3.  
  4. var
  5.   Array1: Integer[ITEMS_NUM]; // array of 10 integers, indexing from 0 to 9
  6.   Array2: Integer[Boolean];   // array of 2 integers, indexing from False to True
  7.   Array3: Integer[Char];      // array of 256 integers, indexing from #0 to #255
  8.  

The general idea is to be able to declare arrays by specifying the number of elements (rather than a range of indices), as well as to simplify the syntax to a form familiar from other programming languages, since the current one is bloated. In any case, this shorthand syntax is not unfamiliar to Free Pascal, since it is used to declare short strings, such as Foo: String[10]; (short string with 10 characters).


Unleash the for-loop iterator!

Currently, the for-loop iterator is protected, which means that its value cannot be modified inside the loop body—neither by directly assigning it to the iterator variable nor via a pointer (taking the address of the iterator is illegal). This makes absolutely no sense, because if we need to modify the iterator’s value inside the loop, the current nonsensical restrictions force us to use a while or repeat loop. I therefore propose that the iterator be freed from any restrictions and that the programmer be allowed to decide how the loop should work. Example:

Code: Pascal  [Select][+][-]
  1. for var I := 0 to 10 do
  2. begin
  3.   // some instructions
  4.  
  5.   if {condition} then
  6.   begin
  7.     I += 2;   // skip testing next two iterator values
  8.     continue; // go to the next loop iteration
  9.   end;
  10.  
  11.   // some instructions
  12. end;
  13.  

Second, the condition for exiting the for loop should also be modifiable (as another variable):

Code: Pascal  [Select][+][-]
  1. for var Current := 0 to var Last := 10 do
  2. begin
  3.   // some instructions
  4.  
  5.   if {condition} then
  6.     Last += 1; // change the iterator value for the last iteration
  7.    
  8.   // some instructions
  9. end;
  10.  

An additional variable can also apply to the stepper:

Code: Pascal  [Select][+][-]
  1. for var Current := 0 to var Last := 10 step var Stride := 2 do

All variables used in the loop header declaration (one through three) can be freely modified within the loop body and declared inline, as shown in the example. This way, the programmer would have complete control over the loop's behavior, without any artificial limitations or being forced to use other loops and write more code.

This isn't just something I made up—I've often implemented algorithms that require this kind of non-standard skips, but because the for-loop iterator currently cannot be modified, I was forced to use while loops and write more lines of code.

For example, I recently wrote a function designed to compare elements in two arrays and, if necessary, add a new element to the second array. When a new element is added, it must be inserted after the current one—in this case, the loop iterator must be additionally incremented (to skip this new element in the next iteration), as must the index of the last iteration (because a new element has been added to the array). Of course, this can’t be done with for loops, so I had to use while loop. Too bad.
« Last Edit: May 12, 2026, 12:01:47 am by flowCRANE »
Lazarus 4.6 with FPC 3.2.2, Windows 11 — all 64-bit

Working solo on a top-down retro-style action/adventure game (pixel art), programming the engine from scratch, using Free Pascal and SDL3.

creaothceann

  • Sr. Member
  • ****
  • Posts: 375
What I propose is adding support for union declarations in the simplest, shortest, and clearest way possible. If we want a union where all fields are at the same offset, we would declare this data type using the keyword union instead of record:

Code: Pascal  [Select][+][-]
  1. type
  2.   TFoo = union
  3.     Field1: Integer; // offset 0
  4.     Field2: Single;  // offset 0
  5.   end;
  6.  

If we want to declare a variable part within a record or union, we can do so using a union block whose body is terminated by a standard end. The content between the union and end keywords pertains to the variable part, and everything after end consists of the structure's normal fields. The variable part can be declared as either named or unnamed to avoid creating unnecessary namespaces:

Code: Pascal  [Select][+][-]
  1. type
  2.   TFoo = record
  3.     // normal fields
  4.     Field1: Integer; // offset 0
  5.     Field2: Single;  // offset 4
  6.    
  7.     // unnamed variant part
  8.     union
  9.       Field3: Int64;   // offset 8
  10.       Field4: Pointer; // offset 8
  11.     end;
  12.    
  13.     // normal fields
  14.     Field5: Integer; // offset 12
  15.     {..}
  16.   end;
  17.  

If you need more examples, I’d be happy to provide them. The general idea is not only to make it as easy as possible to declare unions (using the union keyword), but also to allow them to be declared as named and unnamed unions, and to ensure that the variable part (the union) applies to any fields, not just the last ones. The syntax proposed above is the shortest and simplest one that meets the requirements; moreover, it is very clear and unambiguous.

Speaking of memory layout...

As someone who often implements emulated CPUs, the regular Pascal syntax has its drawbacks, and this union syntax isn't 100% there yet either. Imagine x86 CPUs - you can refer to the register RAX (8 bytes) or EAX (lower 4 bytes of RAX) or AX (lower 2 bytes of EAX) or AL (low byte of AX) or AH (high byte of AX). Regular Pascal forces splitting up all registers into several "cases", with manually inserted padding bytes wherever necessary.

Code: Pascal  [Select][+][-]
  1. type
  2.         x86 = packed record  // old syntax
  3.                 case integer of
  4.                         1: (
  5.                                 // byte identifiers
  6.                                 AL, AH : u8;   _reserved1 : array[2..7] of u8;  // needs padding
  7.                                 BL, BH : u8;   {etc.}
  8.                                 );
  9.                         2: (
  10.                                 // word identifiers
  11.                                 AX     : u16;  _reserved2 : array[1..3] of u16;  // needs padding
  12.                                 BX     : u16;  {etc.}
  13.                                 );
  14.                         4: (
  15.                                 // dword identifiers
  16.                                 EAX    : u32;  _reserved4 : u32;  // needs padding
  17.                                 EBX    : u32;  {etc.}
  18.                                 );
  19.                         8: (
  20.                                 // qword identifiers
  21.                                 RAX    : u64;
  22.                                 RBX    : u64;
  23.                                 {etc.}
  24.                                 );
  25.                 end;

The only alternative would be declaring an array of u8 + an array of u16 etc., hidden getter/setter methods for every register, and public properties referring to these methods (with no way to use Dec/Inc on them).

An FPC Unleashed alternative could be unions that can contain "regular" lists of variables:

Code: Pascal  [Select][+][-]
  1. type
  2.         x86 = packed record  // theoretical draft
  3.                 union  (AL, AH : u8;)  AX : u16;  EAX : u32;  RAX : u64;  end;
  4.                 union  (BL, BH : u8;)  BX : u16;  EBX : u32;  RBX : u64;  end;
  5.                 {etc.}
  6.                 end;

Just as an example.


2. Bit fields

Free Pascal supports bit-packed records, but the syntax for declaring bit fields is... odd and counterintuitive, as it requires specifying the range of values for a given field instead of its size in bits. When working with such structures, we specify how many bits each field should occupy, but currently this information is not included in the field declaration syntax.

That's exactly why I declare and use u1..u64 and i1..i64 in my projects. Actual support from the compiler would make that a bit easier.
« Last Edit: May 12, 2026, 12:41:26 am by creaothceann »

flowCRANE

  • Hero Member
  • *****
  • Posts: 986
An FPC Unleashed alternative could be unions that can contain "regular" lists of variables:

Code: Pascal  [Select][+][-]
  1. type
  2.         x86 = packed record  // theoretical draft
  3.                 union  (AL, AH : u8;)  AX : u16;  EAX : u32;  RAX : u64;  end;
  4.                 union  (BL, BH : u8;)  BX : u16;  EBX : u32;  RBX : u64;  end;
  5.                 {etc.}
  6.                 end;

Just as an example.

This is very close to the syntax proposed by me but, instead od parenthesis, you would use begin end (my preferred choice):

Code: Pascal  [Select][+][-]
  1. type
  2.         x86 = packed record  // theoretical draft
  3.                 union  begin AL, AH : u8; end;  AX : u16;  EAX : u32;  RAX : u64;  end;
  4.                 union  begin BL, BH : u8; end;  BX : u16;  EBX : u32;  RBX : u64;  end;
  5.                 {etc.}
  6.                 end;

or record end (still unnamed group of fields):

Code: Pascal  [Select][+][-]
  1. type
  2.         x86 = packed record  // theoretical draft
  3.                 union  record AL, AH : u8; end;  AX : u16;  EAX : u32;  RAX : u64;  end;
  4.                 union  record BL, BH : u8; end;  BX : u16;  EBX : u32;  RBX : u64;  end;
  5.                 {etc.}
  6.                 end;

So yes, my proposition applies to your needs—much less to write and much easier to understand. Current official Free Pascal does not support unnamed records—the unleashed Free Pascal can support both unnamed records and unions (as inner parts of a whole record/union).
« Last Edit: May 12, 2026, 02:32:27 am by flowCRANE »
Lazarus 4.6 with FPC 3.2.2, Windows 11 — all 64-bit

Working solo on a top-down retro-style action/adventure game (pixel art), programming the engine from scratch, using Free Pascal and SDL3.

440bx

  • Hero Member
  • *****
  • Posts: 6532
Code: Pascal  [Select][+][-]
  1. type
  2.         x86 = packed record  // theoretical draft
  3.                 union  (AL, AH : u8;)  AX : u16;  EAX : u32;  RAX : u64;  end;
  4.                 union  (BL, BH : u8;)  BX : u16;  EBX : u32;  RBX : u64;  end;
  5.                 {etc.}
  6.                 end;
There is much to like about what you proposed but, it has all the problems and even a few more C unions have, which is, they are semantically bankrupt.  They barely provide enough information to the compiler. There should be more information for the compiler to ensure the structure is used in a coherent, rational manner.  Help the compiler help the programmer write correct, sensible, code.

I'd suggest something like what's below, which does everything your proposal does and gives the compiler a lot more information it can use to ensure the programmer uses the structure correctly/"as intended":

Code: Pascal  [Select][+][-]
  1. type
  2.   x86 = packed record
  3.     rax = packed container : qword is
  4.       AL                       : byte  at 0;
  5.       AH                       : byte;
  6.       AX                       : word  at 0;
  7.       EAX                      : dword at 0;
  8.     end
  9.  
  10.     rbx = packed container : qword is
  11.       BL                       : byte  at 0;
  12.       BH                       : byte;
  13.       BX                       : word  at 0;
  14.       EBX                      : dword at 0;
  15.     end;
  16.  
  17.     { and so on for all registers          }
  18.   end;
  19.  
A full generalization of the above would also require a coherent definition for bit fields, something along the lines of "n bits at <offset>", something to think about.

The main advantage of the above is that it explicitly reveals there is a container and it is a hierarchical container.  Things that could potentially be added are a way to describe what happens to other fields in the container when one field is modified, e.g, does writing to EAX mean the high dword is zeroed out or are whatever bits where there untouched ?... that kind of detail is important when emulating hardware.  That would be a natural and possible enhancement.

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Fibonacci

  • Hero Member
  • *****
  • Posts: 1000
  • Behold, I bring salvation - FPC Unleashed
Thanks everyone for the input - this is exactly the kind of concrete pseudocode + use cases I was hoping for. Quick reactions and where things are landing:



Bit fields and alignment

I really like the post-suffix form (Field: Type bitsize N). Reads naturally, Pascal-friendly. Originally I was leaning toward something more C-style, but the more I look at it the less it fits - a reader who doesn't know the feature exists would just be lost, and Pascal is supposed to "read like a book" ;) Building on @flowCRANE's size suggestion, going with bitsize (more explicit than plain size, which could be misread as bytes).

@440bx earlier proposed align for per-field / per-record alignment - keeping that, with the standard convention: align N = N bytes (matches C / C++ / Rust / ABI specs everywhere). For the rare case where you need sub-byte alignment, bitalign N = N bits as an escape valve. So you get both:
  • align 4 -> aligned on a 4-byte boundary (the usual meaning)
  • bitalign 3 -> aligned on a 3-bit boundary (specialised, only inside bit-packed contexts)
Two separate keywords, no overlap in meaning, no surprise for readers coming from other languages.



Unions

I'm going to borrow the keyword and wire it in roughly the way @flowCRANE described - possibly exactly that. Part of the implementation is already written. I can't fully recall the current state from memory, honestly - too much in-flight stuff in my head right now without proper docs or design notes for the in-progress pieces :P

@creaothceann - as someone who actually writes CPU emulators: any objection to the begin ... end form instead of parenthesized sub-groups? Both express the same intent, and I'm currently leaning toward begin/end because it's more standard Pascal and parses cleanly inside a record body. But if there's a real-world reason the parens read better for register-overlay use cases, I want to hear it before it sets.



Record composition (replacing case)

The new model doesn't actually need case at all. Plan:

- old records with case ... of keep working as today (backward compat),
- new code uses union ... end for variant overlays - anywhere in the body (not just at the end), multiple unions per record allowed,
- anonymous nested record ... end; (no field name) flattens its fields into the outer scope,
- you can also embed an existing record type by writing its bare name on its own line - same flatten semantics,
- regular named subfields (name: T;) keep their standard Pascal meaning - no flatten, access via outer.name.field.

Code: Pascal  [Select][+][-]
  1. type
  2.   TInner = record
  3.     x, y: integer;
  4.   end;
  5.  
  6.   TOuter = record
  7.     TInner;                  // anonymous embed - outer.x, outer.y reachable directly
  8.     record a, b: byte; end;  // inline anonymous record - outer.a, outer.b
  9.     named: TInner;           // standard named field - outer.named.x (no flatten)
  10.   end;
  11.  

Zero new keywords on the composition side. The bare type name and bare record ... end; forms are unambiguous (a regular field always has : between names and type), and the named form is just standard Pascal. So you pick by writing what you mean: anonymous = flat fields, named = access through the field name (outer.named.field).



PEB example - putting union and bitsize together

Concrete example showing how the syntax plays out in real code - a slice of the Windows PEB struct (trimmed, just enough to demonstrate the shape):

Code: Pascal  [Select][+][-]
  1. type
  2.   TPEB = record
  3.     InheritedAddressSpace:    bytebool;
  4.     ReadImageFileExecOptions: bytebool;
  5.     BeingDebugged:            bytebool;
  6.     union
  7.       BitField: byte;
  8.       record
  9.         ImageUsesLargePages:          boolean bitsize 1;
  10.         IsProtectedProcess:           boolean bitsize 1;
  11.         IsImageDynamicallyRelocated:  boolean bitsize 1;
  12.         SkipPatchingUser32Forwarders: boolean bitsize 1;
  13.         IsPackagedProcess:            boolean bitsize 1;
  14.         IsAppContainer:               boolean bitsize 1;
  15.         IsProtectedProcessLight:      boolean bitsize 1;
  16.         IsLongPathAwareProcess:       boolean bitsize 1;
  17.       end;
  18.     end;
  19.     // ...
  20.   end;
  21.  

References used to cross-check the PEB layout and the BitField byte:

1) http://terminus.rewolf.pl/terminus/structures/ntdll/_PEB32_x64.html
2) https://processhacker.sourceforge.io/doc/struct___p_e_b.html
3) https://www.geoffchappell.com/studies/windows/km/ntoskrnl/inc/api/pebteb/peb/bitfield.htm

The union gives both views of the same byte - raw byte (via BitField) for fast bitfield I/O, named Boolean flags for clarity in code. The example doesn't demonstrate align/bitalign (which will also be available).

Feedback welcome on any of the above before things lock down.
FPC Unleashed - inline vars, tuples, statement expressions, array equality, compound assignments, indexed/lazy labels, no-RTTI & more. ⭐ Star it on GitHub!

440bx

  • Hero Member
  • *****
  • Posts: 6532
In this example:
Code: Pascal  [Select][+][-]
  1. type
  2.   TInner = record
  3.     x, y: integer;
  4.   end;
  5.  
  6.   TOuter = record
  7.     TInner;                  // anonymous embed - outer.x, outer.y reachable directly
  8.     record a, b: byte; end;  // inline anonymous record - outer.a, outer.b
  9.     named: TInner;           // standard named field - outer.named.x (no flatten)
  10.   end;
  11.  
What comes to mind are the following:

1. I don't recall ever having needed to simply ensure a record has a copy of another record's elements inside it.  Whenever I've needed that, it was actually critical to have the group of fields named as in "named : TInner;" in your example so that the group of fields could be manipulated as a record instead of individually.  Succinctly, I haven't needed that feature and don't see how it might even be useful.  I would really appreciate an example of where that feature is genuinely useful.

2.  how is "record a, b: byte; end;" useful ?  there is no way to refer to the group a, b as a single unit, therefore the elements must be referenced by their individual names which is what would happen if the "record ... end" was omitted.  Succinctly, I don't see what having a & b in an anonymous record accomplishes.  Am I missing something ?
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Thausand

  • Hero Member
  • *****
  • Posts: 560
  • align 4 -> aligned on a 4-byte boundary (the usual meaning)
  • bitalign 3 -> aligned on a 3-bit boundary (specialised, only inside bit-packed contexts)
Two separate keywords, no overlap in meaning, no surprise for readers coming from other languages.
How that (counter) work with exist align and packrecord ? https://www.freepascal.org/daily/doc/prog/progsu1.html

Am I missing something ?
Example outer.named.x is more easy include and for not have duplicate declare. example 1 + 2 is I may be think have many variant record ?
« Last Edit: May 12, 2026, 08:13:01 am by Thausand »
A docile goblin always follow HERMES.md

Fibonacci

  • Hero Member
  • *****
  • Posts: 1000
  • Behold, I bring salvation - FPC Unleashed
1. I don't recall ever having needed to simply ensure a record has a copy of another record's elements inside it.  Whenever I've needed that, it was actually critical to have the group of fields named as in "named : TInner;" in your example so that the group of fields could be manipulated as a record instead of individually.  Succinctly, I haven't needed that feature and don't see how it might even be useful.  I would really appreciate an example of where that feature is genuinely useful.

Two use cases for the anonymous form (TInner):

a) Breaking a long record down into smaller pieces. If you want to keep some order or grouping in a 50+ field struct, you can carve it into named sub-records (TPart1, TPart2, etc.) and embed them anonymously into the main record. Existing call-site code accessing outer.Field doesn't change - the flat path is preserved across the refactor. Just an option for whoever likes organising fields that way.

b) Reusable record fragments. Write THeader / TFooter / whatever once, then embed it into every record that needs that block - instead of copy-pasting the same field list across N record definitions. Same flat-access semantics, no duplication, easy to update in one place.

So named: TInner; is the right call when you want to manipulate the group as a single record value. The anonymous form is for organising or reusing chunks at the layout level while keeping flat access.

2.  how is "record a, b: byte; end;" useful ?  there is no way to refer to the group a, b as a single unit, therefore the elements must be referenced by their individual names which is what would happen if the "record ... end" was omitted.  Succinctly, I don't see what having a & b in an anonymous record accomplishes.  Am I missing something ?

You already have the example - the PEB struct earlier in this thread, inside the union. The inner record ... end; is what binds the 8 single-bit flags together as the alternative view of BitField: byte. Without that wrapper there's no way to say "these 8 fields are one variant of the union".

And as for why the inner record isn't named - what would that buy? Access becomes peb.WhateverName.ImageUsesLargePages instead of peb.ImageUsesLargePages? Just longer paths for the same memory, no semantic gain.



How that (counter) work with exist align and packrecord ? https://www.freepascal.org/daily/doc/prog/progsu1.html

Override - per-field align N / bitalign N take precedence over the surrounding {$align} / {$packrecords} for that specific field. The directives still set the default for the rest of the record; the per-field form is just a local override.

One thing worth pointing out: the directives you linked cap at 8 bytes ({$packrecords 1|2|4|8|default|c|normal}, same range for {$align}). The per-field form accepts arbitrary power-of-two boundaries (16, 32, 64, 128, ...) - so it also covers cases the global directives can't express today: cache line alignment (typically 64), AVX-512 (64), or whatever the target ABI requires for a specific field. That's part of why per-field is worth having on top of the existing directives, not just instead of them.
« Last Edit: May 12, 2026, 08:55:09 am by Fibonacci »
FPC Unleashed - inline vars, tuples, statement expressions, array equality, compound assignments, indexed/lazy labels, no-RTTI & more. ⭐ Star it on GitHub!

 

TinyPortal © 2005-2018