Status update - today's work + a few clarifications
So the record from your example (in this form) takes up 4 bytes and has bit-packed union fields (packed to one byte).
Confirmed - TPEB as written takes
4 bytes: 3
bytebools (3 bytes) + the union (1 byte, holding the 8 single-bit flags).
What happens with a 9th boolean bitsize 1 field?The union grows. 9 single-bit booleans = 9 bits, which needs 2 bytes to fit. The union becomes 2 bytes, and the whole TPEB record grows from 4 to
5 bytes.
Some of you might prefer a compile error here.
@440bx's
container proposal solves exactly this - declare an explicit budget at the top, and the compiler errors if members overflow. Clean rule, real safety. The reason I'm not going that way: cost. The container construct adds
container /
is /
at keywords and Ada-style verbosity, and the bit-field syntax becomes
N bit at Z - which reads like a foreign mini-language rather than a Pascal field declaration.
Better path: keep
union as-is, and let it take an optional size cap. Three forms:
union size 1 // this union must fit in 1 byte
BitField: byte;
bitpacked record
// 8 single-bit flags - fits in 1 byte, OK
end;
end;
union type byte // shorthand for "size sizeof(byte)" = 1 byte
BitField: byte;
bitpacked record
// ...
end;
end;
union size sizeof(TSomeRec) // arbitrary compile-time-constant expression
BitField: byte;
bitpacked record
// ...
end;
end;
union size N is the general form -
N can be a literal, a
sizeof() expression, a named constant, or any compile-time-constant expression.
union type T is shorthand for the common case
size sizeof(T). Equivalent pairs:
size 1 ≡
type byte,
size 2 ≡
type word,
size 4 ≡
type dword,
size 8 ≡
type qword. Use
type T when the cap is just a type's size; use
size N for everything else (odd widths like
size 3,
sizeof(...) of a record / array / nested type, named consts, etc.).
If any variant of the union exceeds the cap, the compiler errors. Add a 9th
bitsize 1 flag to a
union size 1 block -> compile error: "union body 9 bits / 2 bytes exceeds declared size 1 byte". Same safety as @440bx's container, without the Ada-style scaffolding.
Without a cap (
union alone, as before), behaviour is unchanged - the union sizes itself to fit whatever is inside. Backward compatible.
Since the unnamed record containing these eight bit fields is marked with the bitpacked modifier, its fields should be bit-packed (this is currently supported by the official dialect).
Yes, confirmed - if the inner record is declared
bitpacked, the per-field
bitsize 1 isn't needed.
bitpacked forces every field to its minimum width automatically:
union
BitField: byte;
bitpacked record
ImageUsesLargePages: boolean;
IsProtectedProcess: boolean;
// ... 6 more, no bitsize 1 needed
end;
end;
bitsize N is the explicit form for non-1 widths - booleans don't really need it since they collapse to 1 bit when packed, but a 3-bit priority field or a 5-bit counter do.
Anonymous embed syntax - actual implementationQuick correction on what the parser actually accepts right now. To embed an existing record type:
TOuter = record
embed TInner; // anonymous embed - requires the [b]embed[/b] keyword
named: TInner; // named subfield - standard Pascal, no keyword
end;
The bare
TInner; form (without
embed) - which I sketched in my earlier post - isn't what the parser accepts in the current cut. The disambiguation rule needs the
embed keyword for the unnamed form, otherwise it collides with "incomplete field declaration".
I argued against new keywords on the composition side earlier - this is one concession.
embed earns it because the parser genuinely needs the marker; one keyword vs three for the same disambiguation isn't a bad trade.
align example - confirming behaviour, @440bx feedback wantedFor:
TRec = record
a: byte align 8;
b: byte align 8;
c: byte;
end;
The layout:
- a at offset 0 (already 8-aligned)
- b at offset 8 (next 8-aligned boundary after a)
- c at offset 9 (immediately after b, no special alignment)
- sizeof(TRec) = 16 - padded up to a multiple of the largest alignment in the record - 8. With packed record, no trailing pad -> sizeof = 10.
The trailing padding to 16 is the standard C-struct convention - it ensures arrays of
TRec keep each element 8-aligned.
A couple more cases to confirm the rule:
TTest1 = record
a: byte align 16;
b: byte align 8;
c: byte;
end;
Same memory layout as the first one -
a at 0,
b at 8,
c at 9. Why?
a's requested
align 16 is already satisfied at offset 0, and
b only needs
align 8, so
b still lands at offset 8. The record's overall alignment becomes 16 (the max field alignment), which means
sizeof must be a multiple of 16. The last used byte is
c at offset 9, so the data extends to byte 10. The smallest multiple of 16 that is >= 10 is 16, so
sizeof = 16.
(Why the rounding? So arrays of
TTest1 keep every element 16-aligned - if
sizeof were 10, the second element would land at offset 10, which isn't 16-aligned.)
TTest2 = record
a: byte align 16;
b: byte align 32;
c: byte;
end;
Layout:
- a at offset 0
- b at offset 32 (next 32-aligned boundary after a)
- c at offset 33
- sizeof = 64 - record's overall alignment is 32 (the max field alignment). The last used byte is c at offset 33, so data extends to byte 34. The smallest multiple of 32 that is >= 34 is 64.
@440bx - does this match what you'd expect? Before it's locked in, I want to confirm the convention.