Recent

Author Topic: FPC Unleashed (inline vars, statement expr, tuples, match, indexed/lazy labels)  (Read 41851 times)

creaothceann

  • Sr. Member
  • ****
  • Posts: 377
Just for the record (pun not intended) and in addition to my previous post, this is the syntax that I'd probably end up using in regular FPC:

Code: Pascal  [Select][+][-]
  1. {$ModeSwitch AdvancedRecords}
  2.  
  3. type
  4.         u8  =  byte;
  5.         u16 =  word;
  6.         u32 = dword;  uint = u32;
  7.         u64 = qword;
  8.  
  9.  
  10.         T_x86 = packed record
  11.                 public
  12.                 type
  13.                         TRegister64 = packed record
  14.                                 case uint of
  15.                                         1: (L, H : u8 );
  16.                                         2: (X    : u16);
  17.                                         4: (E    : u32);
  18.                                         8: (R    : u64);
  19.                                 end;
  20.  
  21.                 const
  22.                         RegisterCount64 = 2;
  23.  
  24.                 private
  25.                 var
  26.                         _Registers64 : array[0..RegisterCount64 - 1] of TRegister64;
  27.  
  28.                 public
  29.                 property  AL : u8  read _Registers64[0].L write _Registers64[0].L;
  30.                 property  AH : u8  read _Registers64[0].H write _Registers64[0].H;
  31.                 property  AX : u16 read _Registers64[0].X write _Registers64[0].X;
  32.                 property EAX : u32 read _Registers64[0].E write _Registers64[0].E;
  33.                 property RAX : u64 read _Registers64[0].R write _Registers64[0].R;
  34.  
  35.                 property  BL : u8  read _Registers64[1].L write _Registers64[1].L;
  36.                 property  BH : u8  read _Registers64[1].H write _Registers64[1].H;
  37.                 property  BX : u16 read _Registers64[1].X write _Registers64[1].X;
  38.                 property EBX : u32 read _Registers64[1].E write _Registers64[1].E;
  39.                 property RBX : u64 read _Registers64[1].R write _Registers64[1].R;
  40.                 end;  {$if SizeOf(T_x86) <> SizeOf(T_x86.TRegister64) * T_x86.RegisterCount64}  {$fatal}  {$endif}
  41.  
  42. var
  43.         CPU : T_x86;
  44.  

... with the downside that Inc/Dec wouldn't work with the properties.


- - -


begin end (my preferred choice) [...] or record end (still unnamed group of fields)

That might work too. Though I prefer the parentheses because regular variant parts already use them to group fields, and the "begin..end" / "record" keywords are already used for code / regular records.


There is much to like about what you proposed but, it has all the problems and even a few more C unions have, which is, they are semantically bankrupt.  They barely provide enough information to the compiler. There should be more information for the compiler to ensure the structure is used in a coherent, rational manner.  Help the compiler help the programmer write correct, sensible, code.

I'd suggest something like what's below, which does everything your proposal does and gives the compiler a lot more information it can use to ensure the programmer uses the structure correctly/"as intended":

Code: Pascal  [Select][+][-]
  1. type
  2.   x86 = packed record
  3.     rax = packed container : qword is
  4.       AL                       : byte  at 0;
  5.       AH                       : byte;
  6.       AX                       : word  at 0;
  7.       EAX                      : dword at 0;
  8.     end
  9.  
  10.     rbx = packed container : qword is
  11.       BL                       : byte  at 0;
  12.       BH                       : byte;
  13.       BX                       : word  at 0;
  14.       EBX                      : dword at 0;
  15.     end;
  16.  
  17.     { and so on for all registers          }
  18.   end;
  19.  

The main advantage of the above is that it explicitly reveals there is a container and it is a hierarchical container.

But it wouldn't be a hierarchy (if I understand you correctly). AL, AH, AX, EAX, RAX, BL etc. all have the same scope.


A full generalization of the above would also require a coherent definition for bit fields, something along the lines of "n bits at <offset>"

"name : int(<bit count>) at <offset>" or "name : uint(<number of bits>) at <offset>" would be my ideal in terms of clarity. ("unsigned integer" if we want to preserve the Pascal spirit of natural-language keywords.) This is also compatible with flowCRANE's shorter array form.

EDIT: bitsize N would of course work too.


"Things that could potentially be added are a way to describe what happens to other fields in the container when one field is modified, e.g, does writing to EAX mean the high dword is zeroed out or are whatever bits where there untouched ?... that kind of detail is important when emulating hardware.  That would be a natural and possible enhancement.

Personally I wouldn't try to encode that in the type declaration language, because it wouldn't be able to cover all the weird ways that hardware can behave... The actual behavior would be in the code that emulates the instructions, partially also because some CPUs behave differently depending on certain bits in the status register.


@creaothceann - as someone who actually writes CPU emulators: any objection to the begin ... end form instead of parenthesized sub-groups? Both express the same intent, and I'm currently leaning toward begin/end because it's more standard Pascal and parses cleanly inside a record body. But if there's a real-world reason the parens read better for register-overlay use cases, I want to hear it before it sets.

See above. Though I'm relatively flexible with syntax (after all my interest in FPC Unleashed is mainly that it enables things that are straight up impossible in standard Pascal).
« Last Edit: May 12, 2026, 09:06:46 am by creaothceann »

Okoba

  • Hero Member
  • *****
  • Posts: 660
@Fibonacci Thank you for the update for `with`.
Your work on Unleashed is a very interesting subject to me and I am trying every feature.

Thausand

  • Hero Member
  • *****
  • Posts: 560
How that (counter) work with exist align and packrecord ? https://www.freepascal.org/daily/doc/prog/progsu1.html
Override - per-field align N / bitalign N take precedence over the surrounding {$align} / {$packrecords} for that specific field. The directives still set the default for the rest of the record; the per-field form is just a local override.
Ok, I have understand. Thank you for explain.

Quote
One thing worth pointing out: the directives you linked cap at 8 bytes ({$packrecords 1|2|4|8|default|c|normal}, same range for {$align}).
That not complete correct because have codealign (and depend compiler platform ABI default structure), also manual https://www.freepascal.org/docs-html/prog/progsu9.html#x16-150001.2.9 and https://www.freepascal.org/docs-html/ref/refsu15.html

Quote
The per-field form accepts arbitrary power-of-two boundaries (16, 32, 64, 128, ...) - so it also covers cases the global directives can't express today: cache line alignment (typically 64), AVX-512 (64), or whatever the target ABI requires for a specific field. That's part of why per-field is worth having on top of the existing directives, not just instead of them.
Ok, even answer before not complete correct this is interresting implementation. Is may be add more confuse how is work record/field alignment ?
« Last Edit: May 12, 2026, 09:13:12 am by Thausand »
A docile goblin always follow HERMES.md

440bx

  • Hero Member
  • *****
  • Posts: 6542
@Fibonacci,

First, thank you for the answers, I have a clearer view of the intent now.

about this:
Code: Pascal  [Select][+][-]
  1. type
  2.   TPEB = record
  3.     InheritedAddressSpace:    bytebool;
  4.     ReadImageFileExecOptions: bytebool;
  5.     BeingDebugged:            bytebool;
  6.     union
  7.       BitField: byte;
  8.       record
  9.         ImageUsesLargePages:          boolean bitsize 1;
  10.         IsProtectedProcess:           boolean bitsize 1;
  11.         IsImageDynamicallyRelocated:  boolean bitsize 1;
  12.         SkipPatchingUser32Forwarders: boolean bitsize 1;
  13.         IsPackagedProcess:            boolean bitsize 1;
  14.         IsAppContainer:               boolean bitsize 1;
  15.         IsProtectedProcessLight:      boolean bitsize 1;
  16.         IsLongPathAwareProcess:       boolean bitsize 1;
  17.       end;
  18.     end;
  19.     // ...
  20.   end;
  21.  
I see what you're aiming at and, I have some concerns, they are all related.  In C, each field is preceded with the container size that holds the field, in the example you presented, it is BOOLEAN fieldname : 1 { number of bits }.  In C, one of the undesirable effects of that syntax is the repetitious and mostly superfluous use of the container, BOOLEAN in this case.    In Addition to that, the container type declares the implicit alignment.

That's why, I offered the "container" syntax, it just once, declares the container size and with it the default container alignment. For the union in question, it would end up being:
Code: Pascal  [Select][+][-]
  1. bitfield = packed container : byte is
  2.   ImageUsesLargePages : 1 bit at 0;
  3.   { etc, etc, all fields are sequential and bitpacked, gaps and overlays can be had by specifying a different "at" bit }
  4. end;
  5.  
That allows the compiler to verify that there aren't more bits defined in the container than the container size allows.  Alternatively, to make the individual flags anonymous, it could be defined as:
Code: Pascal  [Select][+][-]
  1. packed container : byte is
  2.   BitField : byte at 0;
  3.  
  4.   ImageUsesLargePages : 1 bit at 0;   { since "at" specifies 0, it overlays BitField }
  5.   { etc, etc, all fields are sequential and bitpacked, gaps and overlays can be had by specifying a different "at" bit }
  6. end;  
  7.  
in this case, since the container is anonymous, all identifiers references are unqualified (there is no way to qualify them.)  The compiler can still warn the programmer of two possible cases, 1. the container is incomplete and 2. the size of the container has been exceeded.  Those are two significant advantages the C syntax and its proposed translation don't offer.

Note: the above is a "watered down" variation of Ada's facilities to declare variants and bit fields.  Ada offers a much richer semantic syntax that I recommend checking out for potential ideas and ways for the compiler to ensure correct usage.



FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

440bx

  • Hero Member
  • *****
  • Posts: 6542
But it wouldn't be a hierarchy (if I understand you correctly). AL, AH, AX, EAX, RAX, BL etc. all have the same scope.
It's a container hierarchy not a scope hierarchy.  RAX contains EAX which contains AX which contains AH and AL.

Personally I wouldn't try to encode that in the type declaration language, because it wouldn't be able to cover all the weird ways that hardware can behave... The actual behavior would be in the code that emulates the instructions, partially also because some CPUs behave differently depending on certain bits in the status register.
I'd much rather have it declared in the definition, that way, I can have the compiler do the work instead of having to implement it in code which takes more effort for the programmer maintaining the code to see what the intent is.  That said, I readily acknowledge that it would be difficult to account for all possible cases and, I wouldn't even try.  I'd initially account for the few already known cases, which would allow the compiler to either generate code to ensure things occur as desired or, if the programmer ends up having to write the code, the compiler can verify the code does not violate the intent.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Fibonacci

  • Hero Member
  • *****
  • Posts: 1002
  • Behold, I bring salvation - FPC Unleashed
@Fibonacci Thank you for the update for `with`.
Your work on Unleashed is a very interesting subject to me and I am trying every feature.

Thanks, glad you like it. The more Unleashed users the better, so we can actually write code in Unleashed syntax and share it. More users = more visibility = people will know what to compile it with, what {$mode unleashed} means at the top of a unit, no questions asked. That's the goal - getting the mode recognised on sight, the same way {$mode delphi} or {$mode objfpc} is recognised today. Did you give the project a star on GitHub yet? :D



That not complete correct because have codealign (and depend compiler platform ABI default structure), also manual https://www.freepascal.org/docs-html/prog/progsu9.html#x16-150001.2.9 and https://www.freepascal.org/docs-html/ref/refsu15.html

You may have a point - to my surprise, this code:

Code: Pascal  [Select][+][-]
  1. {$mode objfpc}
  2. {$codealign recordmin=64}
  3. type
  4.   trec = record
  5.     a: integer;
  6.     b: integer;
  7.   end;
  8. begin
  9.   writeln(sizeof(trec));
  10.   readln;
  11. end.
  12.  

prints 68. I wasn't aware FPC's {$codealign recordmin=N} could push record field alignment beyond 8. Fair correction.

That said, per-record / per-field align still beats it for granularity. {$codealign recordmin=64} affects every record in scope at the unit level. Per-field align N lets you align one specific field differently - the rest of the record is unaffected, other records aren't touched at all.

Is may be add more confuse how is work record/field alignment ?

Fair concern, but I don't think so. The way it works: per-field align N / bitalign N is opt-in - if you don't write it on a field, nothing changes from current FPC behaviour for that field. Existing code keeps doing exactly what it does today; the new form only kicks in where a user explicitly writes it.

So the global mental model stays intact - you only have to think about per-field alignment when you're staring at an align N on a specific line. If anything, it reduces confusion for the cases where today you'd have to fight {$packrecords}/{$codealign} juggling at the unit level just to align one field differently. With per-field align you say it on the field itself - the rest of the record is unaffected.



That allows the compiler to verify that there aren't more bits defined in the container than the container size allows.  Alternatively, to make the individual flags anonymous, it could be defined as:
Code: Pascal  [Select][+][-]
  1. packed container : byte is
  2.   BitField : byte at 0;
  3.  
  4.   ImageUsesLargePages : 1 bit at 0;   { since "at" specifies 0, it overlays BitField }
  5.   { etc, etc, all fields are sequential and bitpacked, gaps and overlays can be had by specifying a different "at" bit }
  6. end;  
  7.  

Honestly - your pseudocode doesn't read well at first glance. I had to slow down and parse it: what does N bit at Z actually mean, what does the container declaration buy, what does at anchor to, what's the difference between the named and the anonymous form. Multiple new pieces of syntax to learn before any single line makes sense.

Compared to TPEB: regular Pascal field declarations - a name, a colon, a type. The only addition is a bitsize 1 suffix that says "this field takes 1 bit". Anyone who can read a Pascal record can read TPEB at a glance, including someone seeing Unleashed for the very first time.

N bit at Z reads more like an Ada representation clause than a Pascal field. Expressive, yes - but every reader now has to learn a new mini-language (what is bit as a noun-suffix? what does at anchor to? when is it required, when optional?) before they can read a single declaration. If we keep going down that road, we'll basically need a manual for newcomers just to read Unleashed records - and that's a sign the syntax is doing too much work.

The compile-time checks you want (total bits fit the container, warn on incomplete coverage) - those are useful, and they will land. The compiler will emit a warning when bitsize members exceed the union container size, or when they leave it incomplete. Same safety as your container syntax. The difference: 3 new keywords (container, at, is - well, that one's repurposed) in your version vs 0 new keywords in mine - the parser already has everything it needs from union ... end + bitsize N: the container size (the byte / word / dword the union overlays) and the bit count of each member.



Everyone has their own vision, everyone would do it differently. We need to find a middle ground - it's impossible to please everyone. I'll keep the design as close to what reads naturally as Pascal, while taking in suggestions that genuinely improve it without bloating the syntax. Some calls won't make everyone happy - that's just the nature of designing a language.

Right, you kids keep at it - I've got work to do :)
FPC Unleashed - inline vars, tuples, statement expressions, array equality, compound assignments, indexed/lazy labels, no-RTTI & more. ⭐ Star it on GitHub!

creaothceann

  • Sr. Member
  • ****
  • Posts: 377
I'd much rather have it declared in the definition, that way, I can have the compiler do the work instead of having to implement it in code which takes more effort for the programmer maintaining the code to see what the intent is. [...] I'd initially account for the few already known cases, which would allow the compiler to either generate code to ensure things occur as desired or, if the programmer ends up having to write the code, the compiler can verify the code does not violate the intent.

It's a balance act... Putting it into the definition requires more specialized syntax that has to be implemented / learned. Meanwhile, code is much more flexible (and quite useful if FPC had compile-time code). Macros, especially FPC's neutered form, are barely enough for size checks.

flowCRANE

  • Hero Member
  • *****
  • Posts: 986
PEB example - putting union and bitsize together

Concrete example showing how the syntax plays out in real code - a slice of the Windows PEB struct (trimmed, just enough to demonstrate the shape):

Code: Pascal  [Select][+][-]
  1. type
  2.   TPEB = record
  3.     InheritedAddressSpace:    bytebool;
  4.     ReadImageFileExecOptions: bytebool;
  5.     BeingDebugged:            bytebool;
  6.     union
  7.       BitField: byte;
  8.       record
  9.         ImageUsesLargePages:          boolean bitsize 1;
  10.         IsProtectedProcess:           boolean bitsize 1;
  11.         IsImageDynamicallyRelocated:  boolean bitsize 1;
  12.         SkipPatchingUser32Forwarders: boolean bitsize 1;
  13.         IsPackagedProcess:            boolean bitsize 1;
  14.         IsAppContainer:               boolean bitsize 1;
  15.         IsProtectedProcessLight:      boolean bitsize 1;
  16.         IsLongPathAwareProcess:       boolean bitsize 1;
  17.       end;
  18.     end;
  19.     // ...
  20.   end;
  21.  

It looks great and is very easy to read. Just remember that when using a bitpacked record, the fields of the internal union must also be bit-packed, as is currently the case. So the record from your example (in this form) takes up 4 bytes and has bit-packed union fields (packed to one byte). This structure should also take up 4 bytes if we declare it as follows:

Code: Pascal  [Select][+][-]
  1. type
  2.   TPEB = record
  3.     InheritedAddressSpace:    bytebool;
  4.     ReadImageFileExecOptions: bytebool;
  5.     BeingDebugged:            bytebool;
  6.     union
  7.       BitField: byte;
  8.       bitpacked record // here is forcing all record fields to be bit-packed
  9.         ImageUsesLargePages:          boolean;
  10.         IsProtectedProcess:           boolean;
  11.         IsImageDynamicallyRelocated:  boolean;
  12.         SkipPatchingUser32Forwarders: boolean;
  13.         IsPackagedProcess:            boolean;
  14.         IsAppContainer:               boolean;
  15.         IsProtectedProcessLight:      boolean;
  16.         IsLongPathAwareProcess:       boolean;
  17.       end;
  18.     end;
  19.     // ...
  20.   end;
  21.  

Since the unnamed record containing these eight bit fields is marked with the bitpacked modifier, its fields should be bit-packed (this is currently supported by the official dialect). For internal unnamed records and unions, it must be possible to mark them as packed and bitpacked, just as is currently done for the entire record:

Code: Pascal  [Select][+][-]
  1. type
  2.   TPEB = bitpacked record // here is forcing all fields of the entire record to be bit-packed
  3.     InheritedAddressSpace:    bytebool;
  4.     ReadImageFileExecOptions: bytebool;
  5.     BeingDebugged:            bytebool;
  6.     union
  7.       BitField: byte;
  8.       record
  9.         ImageUsesLargePages:          boolean;
  10.         IsProtectedProcess:           boolean;
  11.         IsImageDynamicallyRelocated:  boolean;
  12.         SkipPatchingUser32Forwarders: boolean;
  13.         IsPackagedProcess:            boolean;
  14.         IsAppContainer:               boolean;
  15.         IsProtectedProcessLight:      boolean;
  16.         IsLongPathAwareProcess:       boolean;
  17.       end;
  18.     end;
  19.     // ...
  20.   end;
  21.  

As for additional keywords related to packing and alignment—I really like the choice of keywords like size, bitsize, align, and bitalign. Unlike C, Pascal’s syntax relies on keywords rather than special characters, which makes it much more pleasant and readable. Therefore, if the syntax is to be changed, everyone should be keen to stick with simple keywords. And that is precisely why we need to change the syntax of variant records to readable unions, in order to eliminate special characters (and syntactic bloat) in favor of simple, readable keywords.


As for the suggestion to allow specifying not only the size of a field in bits but also its location—I’m not entirely convinced, though I do see the potential. On the one hand, specifying the position of a bit field after the `at` keyword would eliminate the need to add unnecessary padded fields. But on the other hand, this would be prone to incorrect position specification (e.g., out of scope), and that would require the compiler to check the target locations of the fields and ensure that the user is notified of such errors. So, on the one hand, we eliminate the need for extra fields to enforce padding, but on the other hand, the compiler has more work calculating the structure’s size, the offset of each field, and validating the correctness of the offsets.

And another problem is that currently, field offsets correspond to the order in which the fields are declared, but if we have the option to specify field offsets manually, their order may not match the order in which they are declared. What then? Should this be allowed (just like, for example, non-ascending enum order) and result in a warning at most, or should it cause a compilation error? I’m leaning toward the compiler issuing a warning that the order of field declarations does not match the specified offsets.

By the way, specifying positions manually would require distinguishing between offsets in bytes (for normal fields) and in bits (for bit-packed fields). So we would need two keywords—at for specifying an offset in bytes and atbit for an offset in bits. However, to make it convenient to specify offsets for long structures, the offset itself should be able to be a numeric literal, but also an expression (the offset of field X plus a certain number of bytes/bits). I'm not sure if this could be meaningfully combined with modifiers for specifying field memory alignment (such as suggested align and bitalign).
« Last Edit: May 12, 2026, 04:03:51 pm by flowCRANE »
Lazarus 4.6 with FPC 3.2.2, Windows 11 — all 64-bit

Working solo on a top-down retro-style action/adventure game (pixel art), programming the engine from scratch, using Free Pascal and SDL3.

flowCRANE

  • Hero Member
  • *****
  • Posts: 986
1. I don't recall ever having needed to simply ensure a record has a copy of another record's elements inside it.  Whenever I've needed that, it was actually critical to have the group of fields named as in "named : TInner;" in your example so that the group of fields could be manipulated as a record instead of individually.  Succinctly, I haven't needed that feature and don't see how it might even be useful.  I would really appreciate an example of where that feature is genuinely useful.

This type of composition is nothing more than the inheritance familiar from objects and classes—one structure absorbs another without imposing an additional namespace/scope. Combined with readable unions, this makes it excellent for composing ”fat structures”, such as those commonly used in game development.

I'm just not sure about specifying only the data type (without the field name) in this case. Pascal uses the order of name followed by type for any declaration, so omitting the field name is likely to negatively affect the code's readability/formatting. Perhaps in this case it would be better to use a special keyword, such as unnamed, in place of the name of such an unnamed field? This way, the anonymity of a given field would be apparent, and the keyword would replace the name without disrupting the formatting or readability.

Example:

Code: Pascal  [Select][+][-]
  1. type
  2.   TPosition = record
  3.     X: Integer;
  4.     Y: Integer;
  5.   end;
  6.  
  7. type
  8.   TSize = record
  9.     W: Integer;
  10.     H: Integer;
  11.   end;
  12.  
  13.  
  14. type  
  15.   TEntity = record
  16.     Kind:   Integer;    // Entity.Kind         (normal field)
  17.     unnamed TPosition;  // Entity.X, Entity.Y  (structure absorbed as unnamed)
  18.     unnamed TSize;      // Entity.W, Entity.H  (structure absorbed as unnamed)
  19.     Name:   String;     // Entity.Name         (normal field)
  20.   end;
  21.  

Something like this.
« Last Edit: May 12, 2026, 04:21:42 pm by flowCRANE »
Lazarus 4.6 with FPC 3.2.2, Windows 11 — all 64-bit

Working solo on a top-down retro-style action/adventure game (pixel art), programming the engine from scratch, using Free Pascal and SDL3.

creaothceann

  • Sr. Member
  • ****
  • Posts: 377
As for the suggestion to allow specifying not only the size of a field in bits but also its location—I’m not entirely convinced, though I do see the potential. On the one hand, specifying the position of a bit field after the `at` keyword would eliminate the need to add unnecessary padded fields. But on the other hand, this would be prone to incorrect position specification (e.g., out of scope)

It's just as prone to incorrect layouts as the regular methods, imo. Byte- or bit-perfect memory layouts simply need careful consideration and checking.


And another problem is that currently, field offsets correspond to the order in which the fields are declared, but if we have the option to specify field offsets manually, their order may not match the order in which they are declared. What then? Should this be allowed (just like, for example, non-ascending enum order) and result in a warning at most, or should it cause a compilation error? I’m leaning toward the compiler issuing a warning that the order of field declarations does not match the specified offsets.

I think it would be fine (as long as the compiler can handle it). That way, field declarations could be grouped by size or usage.

Unless you think this would indicate a logic error, or difficult to read?


By the way, specifying positions manually would require distinguishing between offsets in bytes (for normal fields) and in bits (for bit-packed fields). So we would need two keywords—at for specifying an offset in bytes and atbit for an offset in bits. However, to make it convenient to specify offsets for long structures, the offset itself should be able to be a numeric literal, but also an expression (the offset of field X plus a certain number of bytes/bits). I'm not sure if this could be meaningfully combined with modifiers for specifying field memory alignment (such as suggested align and bitalign).

byte.bit (with bit being 0..7) or field.bit (with bit being 0..BitSizeOf(field) - 1)) would work fine, imo.

I've seen it elsewhere, e.g. "The reload also occurs on a 1->0 transition of $2100.7."

flowCRANE

  • Hero Member
  • *****
  • Posts: 986
It's just as prone to incorrect layouts as the regular methods, imo. Byte- or bit-perfect memory layouts simply need careful consideration and checking.

I’m not opposed to being able to manually define the memory layout of structures—quite the contrary. All I’m suggesting is keeping the syntax as simple as possible (using short, readable keywords) to make declaring fields as easy as possible while retaining full control over the layout.

Quote
Unless you think this would indicate a logic error, or difficult to read?

I think that if the order in which fields are declared does not match the specified offsets, this should be treated as a logical error, and compilation should be terminated upon detection. This way, the compiler will help catch errors in the declared memory layout of the structure, so that they don't slip past the user. In other words, it should not be possible for a given field to exist before any other field declared above it:

Code: Pascal  [Select][+][-]
  1. type
  2.   TStruct = record
  3.     Field1: Integer at 4;
  4.     Field2: Integer at 0; // Error: invalid offset, offsets must be non-decreasing
  5.   end;
  6.  

The valid declaration is as follows:

Code: Pascal  [Select][+][-]
  1. type
  2.   TStruct = record
  3.     Field1: Integer at 0;
  4.     Field2: Integer at 0; // at the same offset as Field1
  5.     Field3: Integer at 4; // after Field2
  6.   end;
  7.  

Simillar with bit fields:

Code: Pascal  [Select][+][-]
  1. type
  2.   TStruct = record
  3.     Field1: Integer at 0; // bits  0-31
  4.     Field2: Integer at 0; // bits  0-31
  5.     Field3: Integer at 4; // bits 32-63
  6.     union
  7.       Field4: Boolean atbit 0; // at bit 64?
  8.       Field5: Boolean atbit 1; // at bit 65?
  9.       Field6: Boolean atbit 0; // invalid offset
  10.       {..}
  11.     end
  12.   end;
  13.  

But there's a problem with unnamed union fields here, because they use a local offset. The solution could be to specify also target offset in bytes:

Code: Pascal  [Select][+][-]
  1. type
  2.   TStruct = record
  3.     Field1: Integer at 0;
  4.     Field2: Integer at 0;
  5.     Field3: Integer at 4;
  6.     union
  7.       Field4: Boolean at 8 atbit 0;
  8.       Field5: Boolean at 8 atbit 1;
  9.       {..}
  10.     end
  11.   end;
  12.  

Not bad. It can be also combined with bitsize:

Code: Pascal  [Select][+][-]
  1. type
  2.   TStruct = record
  3.     Field1: Integer at 0;
  4.     Field2: Integer at 0;
  5.     Field3: Integer at 4;
  6.     union
  7.       Field4: Boolean at 8 atbit 0 bitsize 1;
  8.       Field5: Boolean at 8 atbit 1 bitsize 3;
  9.       {..}
  10.     end
  11.   end;
  12.  

The compiler would have a lot of work validating this data and calculating the memory layout, but I have to admit that it seems to be very powerful and, on top of that, very easy to read. Interestingly, by manually specifying the field offsets and their sizes in bytes/bits, the packed and bitpacked modifiers are not needed at all.

Not only that, but with at, atbit, and bitsize, even the proposed unions become irrelevant, because each field can be declared at a specific location of our choosing—meaning it can also be at the same offset as other fields (hence, grouping fields into a union is unnecessary) and eliminating the need to declare useless fields that serve as padding:

Code: Pascal  [Select][+][-]
  1. type
  2.   TStruct = record
  3.     Field1: UInt32  at 0;
  4.     Field2: UInt32  at 0;
  5.     Field3: UInt16  at 6; // 2 bytes padding between Field2 and Field3
  6.     Field4: Boolean at 8 atbit 0 bitsize 1;
  7.     Field5: Boolean at 8 atbit 1 bitsize 3;
  8.   end;
  9.  

Man, that would be really powerful. 8)
« Last Edit: May 12, 2026, 06:28:08 pm by flowCRANE »
Lazarus 4.6 with FPC 3.2.2, Windows 11 — all 64-bit

Working solo on a top-down retro-style action/adventure game (pixel art), programming the engine from scratch, using Free Pascal and SDL3.

440bx

  • Hero Member
  • *****
  • Posts: 6542
I just want to point out that in modern language design, the philosophy is to give the compiler as much information as possible because this enables the compiler to ensure the language construct is used, in the code, as the programmer intended.

That's the reason languages such as Ada, Rust, Eiffel, Zig and others are semantically rich and, the reason why C and C++ are semantically bankrupt.

It is definitely true that, using a semantically rich language requires more work from the programmer because the programmer has to include more information for the construct to be acceptable to the compiler.  The benefit ?... the compiler can tell you when the usage is "questionable" (emit a warning) or simply unacceptable (emit an error) when the usage conflicts with the construct.

It also means the compiler has to work harder and, as far as I know, programming languages exist to spare the human from having to do work the computer can do.  They don't exist to spare the CPU from working hard (I want the CPU to sweat bits... I want to get my money's worth.)

Pascal was likely one of the first languages that was semantically rich (relatively speaking, when it was created) which is what allows it to be strongly typed but, these days, it is far, very far, behind modern languages yet, it is one of the languages that explicitly opened the door to semantic affluence.  IOW, adding semantic richness to it is the most Pascal thing that can be added to Pascal (just ask Ada which in turn has influenced Rust, Go, Eiffel, Zig and others.)

The point of this post ? ... every addition, change or modification to the language should be  loaded with as much semantic information as possible to enable the compiler to help the programmer, as much as possible, avoid unintended mistakes.  Add more Pascal to Pascal. :)  The balancing act in the pursuit of that goal is to ensure the compiler has explicit defaults that, in the common cases are intuitive, predictable and clearly stated in the construct's definition, that spares the programmer from having to be explicit all the time.  As mathematicians are fond of saying, the defaults don't cause any loss of generality.

As is often stated in open source code, the above is provided with the intent of offering something useful.
« Last Edit: May 12, 2026, 06:08:11 pm by 440bx »
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Fibonacci

  • Hero Member
  • *****
  • Posts: 1002
  • Behold, I bring salvation - FPC Unleashed
Status update - today's work + a few clarifications



So the record from your example (in this form) takes up 4 bytes and has bit-packed union fields (packed to one byte).

Confirmed - TPEB as written takes 4 bytes: 3 bytebools (3 bytes) + the union (1 byte, holding the 8 single-bit flags).



What happens with a 9th boolean bitsize 1 field?

The union grows. 9 single-bit booleans = 9 bits, which needs 2 bytes to fit. The union becomes 2 bytes, and the whole TPEB record grows from 4 to 5 bytes.

Some of you might prefer a compile error here. @440bx's container proposal solves exactly this - declare an explicit budget at the top, and the compiler errors if members overflow. Clean rule, real safety. The reason I'm not going that way: cost. The container construct adds container / is / at keywords and Ada-style verbosity, and the bit-field syntax becomes N bit at Z - which reads like a foreign mini-language rather than a Pascal field declaration.

Better path: keep union as-is, and let it take an optional size cap. Three forms:

Code: Pascal  [Select][+][-]
  1. union size 1                  // this union must fit in 1 byte
  2.   BitField: byte;
  3.   bitpacked record
  4.     // 8 single-bit flags - fits in 1 byte, OK
  5.   end;
  6. end;
  7.  
  8. union type byte               // shorthand for "size sizeof(byte)" = 1 byte
  9.   BitField: byte;
  10.   bitpacked record
  11.     // ...
  12.   end;
  13. end;
  14.  
  15. union size sizeof(TSomeRec)   // arbitrary compile-time-constant expression
  16.   BitField: byte;
  17.   bitpacked record
  18.     // ...
  19.   end;
  20. end;
  21.  

union size N is the general form - N can be a literal, a sizeof() expression, a named constant, or any compile-time-constant expression. union type T is shorthand for the common case size sizeof(T). Equivalent pairs: size 1type byte, size 2type word, size 4type dword, size 8type qword. Use type T when the cap is just a type's size; use size N for everything else (odd widths like size 3, sizeof(...) of a record / array / nested type, named consts, etc.).

If any variant of the union exceeds the cap, the compiler errors. Add a 9th bitsize 1 flag to a union size 1 block -> compile error: "union body 9 bits / 2 bytes exceeds declared size 1 byte". Same safety as @440bx's container, without the Ada-style scaffolding.

Without a cap (union alone, as before), behaviour is unchanged - the union sizes itself to fit whatever is inside. Backward compatible.



Since the unnamed record containing these eight bit fields is marked with the bitpacked modifier, its fields should be bit-packed (this is currently supported by the official dialect).

Yes, confirmed - if the inner record is declared bitpacked, the per-field bitsize 1 isn't needed. bitpacked forces every field to its minimum width automatically:

Code: Pascal  [Select][+][-]
  1. union
  2.   BitField: byte;
  3.   bitpacked record
  4.     ImageUsesLargePages:          boolean;
  5.     IsProtectedProcess:           boolean;
  6.     // ... 6 more, no bitsize 1 needed
  7.   end;
  8. end;
  9.  

bitsize N is the explicit form for non-1 widths - booleans don't really need it since they collapse to 1 bit when packed, but a 3-bit priority field or a 5-bit counter do.



Anonymous embed syntax - actual implementation

Quick correction on what the parser actually accepts right now. To embed an existing record type:

Code: Pascal  [Select][+][-]
  1. TOuter = record
  2.   embed TInner;     // anonymous embed - requires the [b]embed[/b] keyword
  3.   named: TInner;    // named subfield - standard Pascal, no keyword
  4. end;
  5.  

The bare TInner; form (without embed) - which I sketched in my earlier post - isn't what the parser accepts in the current cut. The disambiguation rule needs the embed keyword for the unnamed form, otherwise it collides with "incomplete field declaration".

I argued against new keywords on the composition side earlier - this is one concession. embed earns it because the parser genuinely needs the marker; one keyword vs three for the same disambiguation isn't a bad trade.



align example - confirming behaviour, @440bx feedback wanted

For:

Code: Pascal  [Select][+][-]
  1. TRec = record
  2.   a: byte align 8;
  3.   b: byte align 8;
  4.   c: byte;
  5. end;
  6.  

The layout:
  • a at offset 0 (already 8-aligned)
  • b at offset 8 (next 8-aligned boundary after a)
  • c at offset 9 (immediately after b, no special alignment)
  • sizeof(TRec) = 16 - padded up to a multiple of the largest alignment in the record - 8. With packed record, no trailing pad -> sizeof = 10.
The trailing padding to 16 is the standard C-struct convention - it ensures arrays of TRec keep each element 8-aligned.

A couple more cases to confirm the rule:

Code: Pascal  [Select][+][-]
  1. TTest1 = record
  2.   a: byte align 16;
  3.   b: byte align 8;
  4.   c: byte;
  5. end;
  6.  

Same memory layout as the first one - a at 0, b at 8, c at 9. Why? a's requested align 16 is already satisfied at offset 0, and b only needs align 8, so b still lands at offset 8. The record's overall alignment becomes 16 (the max field alignment), which means sizeof must be a multiple of 16. The last used byte is c at offset 9, so the data extends to byte 10. The smallest multiple of 16 that is >= 10 is 16, so sizeof = 16.

(Why the rounding? So arrays of TTest1 keep every element 16-aligned - if sizeof were 10, the second element would land at offset 10, which isn't 16-aligned.)

Code: Pascal  [Select][+][-]
  1. TTest2 = record
  2.   a: byte align 16;
  3.   b: byte align 32;
  4.   c: byte;
  5. end;
  6.  

Layout:
  • a at offset 0
  • b at offset 32 (next 32-aligned boundary after a)
  • c at offset 33
  • sizeof = 64 - record's overall alignment is 32 (the max field alignment). The last used byte is c at offset 33, so data extends to byte 34. The smallest multiple of 32 that is >= 34 is 64.
@440bx - does this match what you'd expect? Before it's locked in, I want to confirm the convention.
FPC Unleashed - inline vars, tuples, statement expressions, array equality, compound assignments, indexed/lazy labels, no-RTTI & more. ⭐ Star it on GitHub!

Thausand

  • Hero Member
  • *****
  • Posts: 560
That said, per-record / per-field align still beats it for granularity. {$codealign recordmin=64} affects every record in scope at the unit level. Per-field align N lets you align one specific field differently - the rest of the record is unaffected, other records aren't touched at all.
Ok, that can understand thank you for elaborate.

Then is question separate: when record align set and make one field have other alignment then what is take precedence (e.g. is field align is all time must fit record align or can be more big for one field and ignore record align because this is small contradict with documentation that is exist)

I think may be this important for have good document for explain how is apply align rule and what is restrict (or is no restrict) when use
both align option.

Quote
So the global mental model stays intact - you only have to think about per-field alignment when you're staring at an align N on a specific line. If anything, it reduces confusion for the cases where today you'd have to fight {$packrecords}/{$codealign} juggling at the unit level just to align one field differently. With per-field align you say it on the field itself - the rest of the record is unaffected.
Yes, I have agree that when is implement that way then is more clear. But I have concern that theory is say this is conflict with documentation that is exist for (code)align recordfield.

But I have also understand this version unleashed so not have 100% for match documentation (you align implementation is feature new). Is only concern for avoid confuse documentation.


Other concern for confuse is for other feature for bitfield ... now fpc is confuse when use bitfield and have change endian (bitfield is swap) and I have question how this is work for you feature implementation. May be you can have small explain for that for have better understand ?
A docile goblin always follow HERMES.md

flowCRANE

  • Hero Member
  • *****
  • Posts: 986
After working on my game's code today, I have another suggestion, this time regarding loops. Free Pascal supports four types of loops, and each requires a condition to keep the iteration going. Unfortunately, this language does not have a loop specifically designed for infinite iteration, which forces you to use one of these basic loops and specify a redundant condition (such as while True do or repeat until False).

So I propose adding a new type of loop, designed specifically for infinite loops:

Code: Pascal  [Select][+][-]
  1. loop
  2.   // iterating forever
  3. end;

This is the syntax I use myself, and it’s modeled after loops in other languages—Rust has loop {}, and Ruby has loop do end (which is syntactically closer to Pascal). Of course, to get this syntax in current Free Pascal, I had to use a macro and add a syntax highlighting rule (markup and matches) to the IDE settings so that the word loop is highlighted just like other keywords. The macro looks like this:

Code: Pascal  [Select][+][-]
  1. {$DEFINE loop := while True do begin}

Unfortunately, the problem with macros is that if this loop macro is on the first line of a function's body, the IDE's jump to function body feature jumps not to the body of the function, but to the macro's declaration (which is super annoying).

So, the complete set of loops would look like this:
  • for to <step> do — known number of iterations
  • for in do — known number of iterations
  • while do — variadic number of iterations
  • repeat until — variadic number of iterations
  • loop end — infinite spinning
« Last Edit: May 12, 2026, 08:45:28 pm by flowCRANE »
Lazarus 4.6 with FPC 3.2.2, Windows 11 — all 64-bit

Working solo on a top-down retro-style action/adventure game (pixel art), programming the engine from scratch, using Free Pascal and SDL3.

 

TinyPortal © 2005-2018