Recent

Author Topic: Strict Aliasing Rule  (Read 1789 times)

nixbody

  • New Member
  • *
  • Posts: 12
Strict Aliasing Rule
« on: June 16, 2025, 08:52:25 pm »
I would like to ask if it is legal to access a block of memory via pointers of different types. It seems to be pretty common practice in FreePascal code I've seen so far, including FCL, to have a storage declared as one type and then intentionally use a pointer of a different type to access it. E. g.
Code: Pascal  [Select][+][-]
  1. TFPMemoryImage.FData: PFPIntegerArray
which is treated as
Code: Pascal  [Select][+][-]
  1. PFPColorArray
for non-paletted images. That would be an undefined behaviour in C (at least since C99). I would be happy should such casts were legal in FreePascal but, since it's illegal in C, I would like to hear from the developers of FPC whether or not it's legal. Thank you.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12314
  • FPC developer.
Re: Strict Aliasing Rule
« Reply #1 on: June 16, 2025, 09:19:34 pm »
It's simply accepted that such usage is implementation defined/architecture dependent. Which is probably the idea behind the C defined/undefined status also.
« Last Edit: June 16, 2025, 09:25:30 pm by marcov »

nixbody

  • New Member
  • *
  • Posts: 12
Re: Strict Aliasing Rule
« Reply #2 on: June 16, 2025, 09:41:38 pm »
The idea behind the strict aliasing rule in C is that it allows for aggressive optimisations when an optimisaing compiler can safely assume that two pointers of different types can never alias each other. Because GCC and other big compilers really use this rule to eliminate some loads/stores, it is very dangerous to break this rule with high level optimisation settings. So what's the situation with FPC?
« Last Edit: June 17, 2025, 03:57:49 pm by nixbody »

Khrys

  • Sr. Member
  • ****
  • Posts: 256
Re: Strict Aliasing Rule
« Reply #3 on: June 17, 2025, 10:30:00 am »
I don't think that the ISO Pascal Standard mentions aliasing at all (unlike the ISO C Standard), so I'd agree with @marcov that it's implementation-defined.

Regarding FPC's implementation, it seems that currently no such aggressive optimizations are performed:

Code: Pascal  [Select][+][-]
  1. function Foo(F: PSingle; I: PInteger): Integer;
  2. begin
  3.   I^ := 1;
  4.   F^ := 0.0;
  5.   Result := I^;
  6. end;

With  -O4  and  constprop, deadstore, deadvalues, dfa, regvar  explicitly turned on via  {$optimization},  the above code compiles to this:

Code: ASM  [Select][+][-]
  1. mov   dword ptr [rsi], 1
  2. xorps xmm0, xmm0
  3. movss dword ptr [rdi], xmm0
  4. mov   eax, dword ptr [rsi]
  5. ret

For comparison, the equivalent C function is optimized by GCC to eliminate the reload of  I  via  [rsi]  under the assumption that  F  and  I  may never refer to the same object:

Code: ASM  [Select][+][-]
  1. mov dword ptr [rsi], 1
  2. mov eax, 1
  3. mov dword ptr [rdi], 0
  4. ret

With the amount of code in the FCL (and elsewhere) that relies on the absence of strict aliasing rules, I don't expect this to change soon (or at all).
Apart from that, personally I don't think that the existence of the  absolute  variable modifier is compatible with the notion of strict aliasing being present in FPC's memory model.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8453
Re: Strict Aliasing Rule
« Reply #4 on: June 17, 2025, 11:19:47 am »
In any event untagged variant records are a legal part of the language, they may be passed by reference and pointers to them may be declared.

It's difficult to imagine a language which could be used to write system-level modules (i.e. including its own RTL etc.) which did not have this capability. Whether it is something which should be made available at the application level is a different matter.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

gues1

  • Jr. Member
  • **
  • Posts: 80
Re: Strict Aliasing Rule
« Reply #5 on: June 17, 2025, 11:54:03 am »
In any event untagged variant records are a legal part of the language, they may be passed by reference and pointers to them may be declared.

It's difficult to imagine a language which could be used to write system-level modules (i.e. including its own RTL etc.) which did not have this capability. Whether it is something which should be made available at the application level is a different matter.

MarkMLl
My thinking, but my thought is worth less than a cent, it is that performing an explicit or implicit cast from one type to another at the user application level is possible if it is performed under the control of the compiler (according to its security rules), but if it is a type reassignment via pointer ... well this is an operation that even if legitimate I would raise some doubts with BIG question marks.
I am talking at the application level excluding RTL and RTTI.

Pascal as a language is defined (and I like to think so) as strictly typed. Using such techniques, even if I repeat they are legitimate, poses a serious and concrete problem of security, interoperability, maintenance and porting of the code.

And regarding the purists who rightly do not consider implementing "inline" variables, I do not see how they can accept such situations.

I have also noticed that this way of operating is very widespread among FPC users and developers, sometimes it seems to see code directly ported from C.

Mine is just a thought read aloud, I absolutely don't want to argue.

nixbody

  • New Member
  • *
  • Posts: 12
Re: Strict Aliasing Rule
« Reply #6 on: June 17, 2025, 04:19:41 pm »
Thank you all for your answers, those were very helpful and insightful. I don't think I'm a fan of the strict aliasing rule and so I'm glad that such a thing doesn't seem to exist in FPC. For those who might wonder how C can be a systems programming language with such a restriction, there are a few exceptions to this rule.
  • void * can be used to pass around arbitrary pointers, but can only be cast back to the "effective type" of the pointee.
  • (un/signed) char * can be used to access individual bytes of any object, but the opposite is not true and char * (signedness doesn't matter) cannot be cast to anything else (this may change in the next C standard).
  • memcpy and memmove can copy data from an object to another object of a different type.
  • Type punning via unions is allowed (in C, not in C++).

tetrastes

  • Hero Member
  • *****
  • Posts: 666
Re: Strict Aliasing Rule
« Reply #7 on: June 19, 2025, 12:06:08 am »
For those who might wonder how C can be a systems programming language with such a restriction, there are a few exceptions to this rule.

This option is not the default. https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html:
Quote
The -fstrict-aliasing option is enabled at levels -O2, -O3, -Os.

PS. And there you can add -fno-strict-aliasing.
« Last Edit: June 19, 2025, 12:16:11 am by tetrastes »

nixbody

  • New Member
  • *
  • Posts: 12
Re: Strict Aliasing Rule
« Reply #8 on: June 19, 2025, 01:30:18 am »
Quote
This option is not the default. https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
It's part of the C standard and therefore any conforming C programme has to adhere to this rule, otherwise the behaviour is undefined. If we talk about GCC then, as you pointed out, it's enabled with -O2/-Os or higher optimisation setting so I think it's safe to say that it's the default for any production build unless explicitly turned off with -fno-strict-aliasing (I think Linux does that).

Thaddy

  • Hero Member
  • *****
  • Posts: 17414
  • Ceterum censeo Trumpum esse delendum (Tnx Charlie)
Re: Strict Aliasing Rule
« Reply #9 on: June 19, 2025, 07:02:16 am »
Quote
This option is not the default. https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
It's part of the C standard
Which one? Definitely not C99 which is de facto still the most commonly used.
There are two newer standards, but these are not yet "standard" because of existing codebases.
(And to many, also highly controversial)

If a "standard" supports something like this, it is considered "optional" and not enforced in any way. It just means you may use it if you decide to do so.
The Linux kernel adheres to C99, nothing newer, although officially C11 is the base line, you won't find much use of new features from C11 if any at all as far as the kernel is concerned.

In the Linux kernel, strict aliasing is explicitly disabled by compiling with the -fno-strict-aliasing flag.
This is a deliberate choice because the kernel often uses type punning and low-level memory manipulation
that would violate the C standard’s aliasing rules. Enabling strict aliasing could allow the compiler to make optimizations that assume, for example, that a float* and an int* never point to the same memory—which isn’t always true in kernel code.
The above is pretty much what already was referred to above.

In FreePascal similar exists regarding e.g. floats and integers where you can not assume a bit pattern being the same for both. This can only be achieved by hard-casts.

Fun fact: try to translate the famous "bit twiddling hacks" from Stanford to pascal: though that can be done it is easy to make mistakes and when done correctly, the Pascal code becomes extremely verbose while the compiler still generates code similar - in fact mostly equivalent - to GNU C. A great way to spend a weekend.
(And I used part of that for some of the bit manipulation helpers in sysutils!)
« Last Edit: June 19, 2025, 08:27:41 am by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

Warfley

  • Hero Member
  • *****
  • Posts: 1930
Re: Strict Aliasing Rule
« Reply #10 on: June 19, 2025, 11:43:56 am »
The C standard has two kinds of not specified behavior, the first one is implementation specific behavior, where the C standard basically does not prescribe something directly but says the implementations have to decide on it, document it and behave consistently on it, and undefined behavior, which means relying on it is not a valid C program.

One example for an implementation specific feature is the bitsize of int types. int is defined as covering at least the range of -2^16-2^16, so on one platform you can have a 16 bit 2nd complement integer, on another a 32 bit sign and magnitude and on a third platform a 47 bit integer type that uses 2 sign bits and a 16 bit crc in the end. But the same compiler on the same platform should always have the same type with the same rules, independent of optimization level or context where it is used in.

Pointer casting on the other hand is undefined except in cases where you cast to and from an integer of sufficient size, to char * (but not back), or to and from void*. Note that only casting is allowed but accessing not, so casting to an integer modifying the integer and casting back is again UB.
This means pointer casting like Qakes famous Fast Inverse Square Root is not valid standard C.

That said, not everything that is UB is an error, it only means it's not valid standard C. Compilers like MSVC, GCC or Clang sometimes provide the ability to specify UB behavior, like with -fno-strict-aliasing for this rule. But when you utilize UB, you must be in the clear that you are writing non standard C code and should usually have a good reason for doing so.

PS: There are also a few ways to access memory under a different layout that are not UB. C allows the use of unions to access a shared prefix, but only a shared prefix. So the following would be allowed:
Code: C  [Select][+][-]
  1. union {
  2.   struct {int i; float f} b1;
  3.   struct {int i; char c} b2;
  4. } u;
  5.  
  6. u.b1.i=42;
  7. printf("%d\n", u.b2.i);
But the following not:
Code: C  [Select][+][-]
  1. union {
  2.   struct {short s; float f} b1;
  3.   struct {int i; char c} b2;
  4. } u;
  5.  
  6. u.b1.s=42;
  7. printf("%d\n", u.b2.i);
Even though under the assumption both short and int are 2nd complement integers with sizeof(short)<sizeof(int) (which is not necessarily true) on a little endian system it should work, it's UB.

C++ on the other hand has dynamic_cast for class hierachies, and since c++20 bit_cast, which specifically fills that gap.
« Last Edit: June 19, 2025, 04:28:33 pm by Warfley »

nixbody

  • New Member
  • *
  • Posts: 12
Re: Strict Aliasing Rule
« Reply #11 on: June 19, 2025, 04:00:25 pm »
Quote
Which one? Definitely not C99 which is de facto still the most commonly used.
There are two newer standards, but these are not yet "standard" because of existing codebases.
(And to many, also highly controversial)
Hey, I said that I'm not exactly a fan of C's aliasing rules, but the fact is that it is a part of the C standard since C99 and it is also one of the things that was somewhat controversial when C99 was introduced.

I hope you consider the GCC C manual an authoritative-enough source.
https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Aliasing-Type-Rules.html

Quote
However, when optimizing, the compiler is allowed to assume (mistakenly, here) that q does not point to the same storage as p, because their data types are not allowed to alias.
From this assumption, the compiler can deduce (falsely, here) that the assignment into q->size has no effect on the value of p->size, which must therefore still be 0. Thus, x will be set to 0.
GNU C, following the C standard, defines this optimization as legitimate. Code that misbehaves when optimized following these rules is, by definition, incorrect C code.
« Last Edit: June 19, 2025, 04:04:18 pm by nixbody »

Thaddy

  • Hero Member
  • *****
  • Posts: 17414
  • Ceterum censeo Trumpum esse delendum (Tnx Charlie)
Re: Strict Aliasing Rule
« Reply #12 on: June 19, 2025, 04:28:29 pm »
Yes, but we already agreed.... ::) Note the manual is not authorative enough, but the standard document is:
The official standard document for C99 is ISO/IEC 9899:1999

More in general, though: strictness of whatever lies beneath a pointer is silly in all respect.
But I can do silly things in ADA too.... :o... just because I can....on purpose....
That's frankly what the core of the question was: you can only be strict about the underlying structures.
Usually a union or in pascal a variant record.
« Last Edit: June 19, 2025, 04:43:10 pm by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

nixbody

  • New Member
  • *
  • Posts: 12
Re: Strict Aliasing Rule
« Reply #13 on: June 19, 2025, 05:22:13 pm »
I wanted to provide potential dear readres with something more readable than the C standard document written in such an arcane language.  :)
But we're going off on a tangent here because my original question was about FreePascal and C was only given as an example of why I'm concerned about pointer casting. I believe @Khrys pretty much answered that question with their nicely elaborate answer. Some found it hard-to-imagine that a system-level language can be that restrictive and for them I wrote my short summary of what is allowed within the C's aliasing rules.
« Last Edit: June 19, 2025, 05:44:55 pm by nixbody »

tetrastes

  • Hero Member
  • *****
  • Posts: 666
Re: Strict Aliasing Rule
« Reply #14 on: June 20, 2025, 01:45:13 pm »
https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Aliasing-Type-Rules.html

Quote
However, when optimizing, the compiler is allowed to assume (mistakenly, here) that q does not point to the same storage as p, because their data types are not allowed to alias.
From this assumption, the compiler can deduce (falsely, here) that the assignment into q->size has no effect on the value of p->size, which must therefore still be 0. Thus, x will be set to 0.
GNU C, following the C standard, defines this optimization as legitimate. Code that misbehaves when optimized following these rules is, by definition, incorrect C code.

In fact, something else is highlighted there:
Quote
GNU C, following the C standard, defines this optimization as legitimate.

So this is a GCC decision, not a standard one (and vc does not have this optimization). Then there are self-contradictory attempts to justify this decision:
Quote
What do these rules say about the example in this subsection?

For foo.size (equivalently, a->size), t is int. The type float is not allowed as an aliasing type by those rules, so b->size is not supposed to alias with elements of a. Based on that assumption, GNU C makes a permitted optimization that was not, in this case, consistent with what the programmer intended the program to do.

But if we make b->size to be int (i.e. structs a and b are identical):
Code: C  [Select][+][-]
  1. #include <stdio.h>
  2. struct a { int size; char *data; };
  3. //struct b { float size; char *data; };
  4. struct b { int size; char *data; };    // !!!
  5.  
  6. void sub (struct a *p, struct b *q)
  7. {
  8.   int x;
  9.   p->size = 0;
  10.   q->size = 1;
  11.   x = p->size;
  12.   printf("x       =%d\n", x);
  13.   printf("p->size =%d\n", (int)p->size);
  14.   printf("q->size =%d\n", (int)q->size);
  15. }
  16.  
  17. int main(void)
  18. {
  19.   struct a foo;
  20.   struct a *p = &foo;
  21.   struct b *q = (struct b *) &foo;
  22.  
  23.   sub (p, q);
  24. }

optimization will still happen (x=0).

In my opinion, this is stupid (espesially as silently default in O2/O3/Os) optimization, because if a programmer does not need aliasing, he will not introduce unnecessary objects, and if he has already introduced them, then what's hell to optimize them?

P.S. I first came across this optimization in the 1990s, before the C99 standard, in the famous at that times Watcom compiler, which honestly wrote that it was the responsibility of a programmer.

 

TinyPortal © 2005-2018