* * *

Author Topic: FPC + SDL2 SIGSEGV because code alignment using SSE  (Read 4324 times)

jorgicor

  • New member
  • *
  • Posts: 11
FPC + SDL2 SIGSEGV because code alignment using SSE
« on: July 17, 2015, 09:13:38 pm »
Hi, I have a very strange bug. I don't know exactly where the problem can be, if in the fpc compiler, in the sdl libraries, in the pascal sdl headers or what, but I write here first so maybe someone can give me a clue.

I have a program using SDL2 in freepascal, using Tim Blume's SDL2 pascal headers: https://github.com/ev1313/Pascal-SDL-2-Headers .

What happens is that when I call SDL_FillRect() I get a SIGSEGV.

After debugging and investigating a lot, it seems that SDL_FillRect is using a SSE version. The code in question that causes the problem is this (inside SDL_FillRect, in C):

Code: [Select]
static void
SDL_FillRect4SSE(Uint8 *pixels, int pitch, Uint32 color, int w, int h)
{
    int i, n;
    Uint8 *p = NULL;

    __m128 c128;
    DECLARE_ALIGNED(Uint32, cccc[4], 16);
    cccc[0] = color;
    cccc[1] = color;
    cccc[2] = color;
    cccc[3] = color;
    c128 = *(__m128 *)cccc;

The segmentation fault happens in the last line because, I suppose,  c128 is actually not aligned in 16 bytes and neither cccc: (its address when I debugged was 0xbffff69c) while the address of cccc was (0xbffff60c).

DECLARE_ALIGNED actually  is

#define DECLARE_ALIGNED(t,v,a)  t __attribute__((aligned(a))) v

that is that gcc is supposed to be aligning that c128 variable by default and cccc because we is is specified. But when I execute the program it is not.

Note that this always works if I make the program in C (at least in the several tests I have done, the variables are aligned, for example addresses end in in 0x...90).

In the case of my free pascal program, this error sometimes happens and sometimes not, depending on the amount of code I write, or maybe is depending on when the shared library is loaded in memory, so I am a bit lost.

The question is: when the SDL dynamic library is loaded, because my pascal program needs it, who is deciding where the library is aligned? Because I suppose that the above DECLARE_ALIGNED directive aligns the stack variables int the function but 'relative' to the function start when it is compiled, but the whole library or the whole function should loaded in memory in some address and maybe someone decides where the whole code is loaded and aligned. Who is deciding that? Is this a problem of fpc in combination of a shared library that requires specific aligment, the SDL2 pascal headers are lacking something, or can it be a problem of the SDL libraries?

I know that the problem is strange, but I am sure the Free Pascal compiler writers have a lot of knowledge about these things, so maybe they have a clue.

Thank you very much.

engkin

  • Hero Member
  • *****
  • Posts: 1811
Re: FPC + SDL2 SIGSEGV because code alignment using SSE
« Reply #1 on: July 18, 2015, 05:53:22 am »
I guess the problem has nothing to do with the local variables, it is the parameter pixels that needs to be aligned.

jorgicor

  • New member
  • *
  • Posts: 11
Re: FPC + SDL2 SIGSEGV because code alignment using SSE
« Reply #2 on: July 18, 2015, 11:55:17 am »
Perhaps all the program when linked with the shared library stub must be aligned. I explain, after some more investigation I have found this from the GCC manual:

Quote
-mstackrealign
    Realign the stack at entry. On the Intel x86, the -mstackrealign option will generate an alternate prologue/epilogue that realigns the runtime stack. This supports mixing legacy codes that keep a 4-byte aligned stack with modern codes that keep a 16-byte stack for SSE compatibility. The alternate prologue and epilogue are slower and bigger than the regular ones, and they require one dedicated register for the entire function. This also lowers the number of registers available if used in conjunction with the regparm attribute. Nested functions encountered while -mstackrealign is on will generate warnings, and they will not realign the stack when called.

-mpreferred-stack-boundary=num
    Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits), except when optimizing for code size (-Os or -Oz (APPLE ONLY)), in which case the default is the minimum correct alignment (4 bytes for x86, and 8 bytes for x86-64).

    On Pentium and PentiumPro, double and long double values should be aligned to an 8 byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128 suffers similar penalties if it is not 16 byte aligned.

    To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.

    This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2.

Reading this, I come to the conclusion that in previous versions of GCC everything was aligned to 4 bytes, but nowadays everything is aligned to 16 bytes to be compatible with SSE (althought you can change that with these options to remain compatible with old code). That's why my C program linked with SDL2 never crashes, because everything is aligned to 16 bytes, and when the C program is linked with the SDL2 library, the SDL2 library is properly aligned (all of it) to 16 bytes.

If this is correct, then the problem is perhaps that FPC always aligns to 4 bytes and when linked with SDL the library is aligned to 4 bytes and not 16, and that's why depending on the amount of code I write sometimes the the library ends aligned and it works, and sometimes not and it crashes.

I've seen that FPC has a $CODEALING option, I have tried it with some values, for example PROC=16 and LOCALMIN=16 but without success. (By the way, $CODEALING 16 without any param like PROC etc, as specified in the documentation, gives a compiler error with 2.6.4).

Maybe FPC should allow an option to allow all the code to be aligned to 16 bytes to make it compatible with SSE code? Maybe is in linking time ({$LINKLIB} maybe) when we have to specify that?

I am only guessing...

engkin

  • Hero Member
  • *****
  • Posts: 1811
Re: FPC + SDL2 SIGSEGV because code alignment using SSE
« Reply #3 on: July 18, 2015, 03:39:03 pm »
Disregard this post.

**************************
The problem is with what value pixels points at, it has to be aligned.

See, with Intel CPUs using SSE instructions, you can store a value in a memory location using fast instruction that expects an aligned address, or a slower non-aligned instruction. Check the attached image.

It is easy to test the validity of this guess by calling FillRect with an aligned address.
« Last Edit: July 18, 2015, 03:57:20 pm by engkin »

engkin

  • Hero Member
  • *****
  • Posts: 1811
Re: FPC + SDL2 SIGSEGV because code alignment using SSE
« Reply #4 on: July 18, 2015, 04:41:00 pm »
In your code, do you touch any write member of TSDL_Surface record?

If you do, there might be a disagreement between FPC and C record alignment which corrupts the record when you write to it. You can try using PackRecord to see if you can bring them into agreement.

jorgicor

  • New member
  • *
  • Posts: 11
Re: FPC + SDL2 SIGSEGV because code alignment using SSE
« Reply #5 on: July 18, 2015, 04:53:11 pm »
Thank you engkin, but no, I don't touch the record. In the first post, you can see that the function called, SDL_FillRect, does not touch pixels for anything, it gives segmentation fault before even touching it, the problem is with the alignment of c128 and cccc, they are not aligned to a 16 byte boundry as you see by the memory addresses I have found.

I have investigated more about this, and I have found the same problem in the FreeBasic compiler... So the question is, is Free Pascal generating code aligned to 16 byte boundary to be compatible with external code using SSE on Linux, as it seems that nowadays this is the Linux ABI?

I have seen that FPC has an option to chose the ABI, in may machine I have: DEFAULT, SYSV, AIX, EABI,  ARMED. I suppose that SYSV aligns to 4 byte boundary the same as DEFAULT.

Please see this post (taken from: http://webcache.googleusercontent.com/search?q=cache:5mIYiCV7m1gJ:sourceforge.net/p/fbc/bugs/659/+&cd=1&hl=es&ct=clnk&gl=es) :

Quote
The GCC devs decided to unilaterally change the Linux x86 ABI [1] [2] [3] [4]. I'm not sure when this happened, but I think it first became a common problem with GCC 4.1. Previously the Linux x86 ABI was the "SysV i386 ABI", which stated that the stack is aligned to a 4-byte boundary on function entry. GCC now assumes by default that the stack is aligned to a 16-byte boundary. This is an very controversial issue with the GCC devs saying they have changed the ABI, and many other people considering the SysV ABI to be the real ABI and GCC to be buggy. GCC Bugzilla is full of flamewars. GCC devs have said "GCC chose to change the unwritten standard for the ABI in use for IA32 GNU/Linux" and "The ABI is undocumented; that is reality" [3]

Anyway, this is a problem because of the existence of SSE instructions that segfault if their operands are not 16-byte aligned. When compiling code with -O3 GCC will use SSE instructions if they are enabled on the selected CPU architecture (eg. -march=pentium4 or -msse or -m32 on a 64 bit machine). If you try to link these object files into a FB program and call them you can get a segfault (testcase below).

Regardless of whether GCC is right or wrong, FBC should be updated to ensure a 16 byte stack alignment when calling external code so that it's compatible with Linux libraries. Some ways this could be done:

1) What I did this for the OSX port was to realign the stack before pushing arguments for any CDECL call, then restore $esp afterwards
2) ensure that the size of a function's stack frame is a multiple of 16 bytes on every function call, plus align the stack on every entry point (program start and thread starts) or in any function that actually makes uses of SSE instructions. But it's not truely necessary to realign the stack when a FB function is called from an external library if you assume that all external linked code ensures 16 byte alignment if any of it requires 16 byte alignment.
3) Realign the stack to 16 bytes at the beginning of every function, and ensure that the size of a function's stack frame is a multiple of 16 bytes on every function call

1) is a hack, tricky, and relatively expensive. 2) is a far cleaner and faster solution and is AFAIK what GCC does. 3) is the simplest, though slightly slower than 2), and is what GCC does with the -mstackrealign argument. I suggest using 3); the speed difference will be tiny compared to all the other optimisations that FBC doesn't do. And I suggest doing so regardless of OS, for simplicity.

16 byte stack alignment is and always has been a part of the OSX ABI.

It appears that GCC's alignment expectation has changed to 16 bytes on all x86 platforms, but it's only a semi-official change to the ABI on Linux. The *BSDs maintainers have apparently been patching GCC or ensuring build args to fix its observation of their ABI (eg. [5]).

engkin

  • Hero Member
  • *****
  • Posts: 1811
Re: FPC + SDL2 SIGSEGV because code alignment using SSE
« Reply #6 on: July 18, 2015, 06:00:25 pm »
The location of the dynamic library code is not related to the stack.

Before you call FillRect you can make sure the stack is aligned with something like
Code: [Select]
Var
  someDWvar: dword;
...
{$asmmode Intel}
Asm
  Mov someDWvar, esp
 And esp, $fffffff0
End;
FillRect...
Asm
  Mov esp, someDWvar
End;
...

jorgicor

  • New member
  • *
  • Posts: 11
Re: FPC + SDL2 SIGSEGV because code alignment using SSE
« Reply #7 on: July 18, 2015, 06:47:06 pm »
I will try your solution, thank you engkin. Anyway everyone can program a Fillrect in 3 lines of code ;) So if that is the solution I prefer to program it in pascal directly, even if slower.

The thing is why is there a problem when interfacing with a C library which uses SSE instructions. The previous posts tell all the story.

Anyway, it seems that the FPC developers were already aware of the problem and for now they are not going to change anything. As they see it, it is the C library that should be compiled with -mstackrealign, see: http://bugs.freepascal.org/view.php?id=15582

I respect the decision.

Thanks.

engkin

  • Hero Member
  • *****
  • Posts: 1811
Re: FPC + SDL2 SIGSEGV because code alignment using SSE
« Reply #8 on: July 18, 2015, 07:18:49 pm »
I didn't mean to present a solution, it was to confirm that the problem is indeed related to stack alignment in the procedure before calling FillRect.

After reading the replies in that bug report, this became the solution.

It might be worth mentioning that only:
Code: [Select]
And esp, $fffffff0Is needed if USEEBP is not used.

jorgicor

  • New member
  • *
  • Posts: 11
Re: FPC + SDL2 SIGSEGV because code alignment using SSE
« Reply #9 on: July 18, 2015, 07:39:33 pm »
Thank you engkin for your replies. The solution didn't work though, it still crashes.

Recompiling the SDL2 library using -mstackrealing fixes the problem, but my linux distro (Slackware) is not compiling the library with that flag by default. I don't know if others are doing that.

Thaddy

  • Hero Member
  • *****
  • Posts: 5187
Re: FPC + SDL2 SIGSEGV because code alignment using SSE
« Reply #10 on: November 21, 2017, 08:57:41 am »
Old topic but given a bug report 32710 in Mantis:
Code: Pascal  [Select]
  1. {$CODEALIGN LOCALMIN=16}
should align the stack correctly since local vars and parameters are on top of stack.
There really is no need for assembler trickery.
« Last Edit: November 21, 2017, 09:08:21 am by Thaddy »
"Logically, no number of positive outcomes at the level of experimental testing can confirm a scientific theory, but a single counterexample is logically decisive."

engkin

  • Hero Member
  • *****
  • Posts: 1811
Re: FPC + SDL2 SIGSEGV because code alignment using SSE
« Reply #11 on: November 25, 2017, 06:58:10 pm »
Old topic but given a bug report 32710 in Mantis:
Code: Pascal  [Select]
  1. {$CODEALIGN LOCALMIN=16}
should align the stack correctly since local vars and parameters are on top of stack.
There really is no need for assembler trickery.
Unfortunately at this time {$CODEALIGN LOCALMIN=16} is not working properly according to Daniel Glöckner.

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus