Forum > General

FPC + SDL2 SIGSEGV because code alignment using SSE

(1/3) > >>

jorgicor:
Hi, I have a very strange bug. I don't know exactly where the problem can be, if in the fpc compiler, in the sdl libraries, in the pascal sdl headers or what, but I write here first so maybe someone can give me a clue.

I have a program using SDL2 in freepascal, using Tim Blume's SDL2 pascal headers: https://github.com/ev1313/Pascal-SDL-2-Headers .

What happens is that when I call SDL_FillRect() I get a SIGSEGV.

After debugging and investigating a lot, it seems that SDL_FillRect is using a SSE version. The code in question that causes the problem is this (inside SDL_FillRect, in C):


--- Code: ---static void
SDL_FillRect4SSE(Uint8 *pixels, int pitch, Uint32 color, int w, int h)
{
    int i, n;
    Uint8 *p = NULL;

    __m128 c128;
    DECLARE_ALIGNED(Uint32, cccc[4], 16);
    cccc[0] = color;
    cccc[1] = color;
    cccc[2] = color;
    cccc[3] = color;
    c128 = *(__m128 *)cccc;
--- End code ---

The segmentation fault happens in the last line because, I suppose,  c128 is actually not aligned in 16 bytes and neither cccc: (its address when I debugged was 0xbffff69c) while the address of cccc was (0xbffff60c).

DECLARE_ALIGNED actually  is

#define DECLARE_ALIGNED(t,v,a)  t __attribute__((aligned(a))) v

that is that gcc is supposed to be aligning that c128 variable by default and cccc because we is is specified. But when I execute the program it is not.

Note that this always works if I make the program in C (at least in the several tests I have done, the variables are aligned, for example addresses end in in 0x...90).

In the case of my free pascal program, this error sometimes happens and sometimes not, depending on the amount of code I write, or maybe is depending on when the shared library is loaded in memory, so I am a bit lost.

The question is: when the SDL dynamic library is loaded, because my pascal program needs it, who is deciding where the library is aligned? Because I suppose that the above DECLARE_ALIGNED directive aligns the stack variables int the function but 'relative' to the function start when it is compiled, but the whole library or the whole function should loaded in memory in some address and maybe someone decides where the whole code is loaded and aligned. Who is deciding that? Is this a problem of fpc in combination of a shared library that requires specific aligment, the SDL2 pascal headers are lacking something, or can it be a problem of the SDL libraries?

I know that the problem is strange, but I am sure the Free Pascal compiler writers have a lot of knowledge about these things, so maybe they have a clue.

Thank you very much.

engkin:
I guess the problem has nothing to do with the local variables, it is the parameter pixels that needs to be aligned.

jorgicor:
Perhaps all the program when linked with the shared library stub must be aligned. I explain, after some more investigation I have found this from the GCC manual:


--- Quote ----mstackrealign
    Realign the stack at entry. On the Intel x86, the -mstackrealign option will generate an alternate prologue/epilogue that realigns the runtime stack. This supports mixing legacy codes that keep a 4-byte aligned stack with modern codes that keep a 16-byte stack for SSE compatibility. The alternate prologue and epilogue are slower and bigger than the regular ones, and they require one dedicated register for the entire function. This also lowers the number of registers available if used in conjunction with the regparm attribute. Nested functions encountered while -mstackrealign is on will generate warnings, and they will not realign the stack when called.

-mpreferred-stack-boundary=num
    Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits), except when optimizing for code size (-Os or -Oz (APPLE ONLY)), in which case the default is the minimum correct alignment (4 bytes for x86, and 8 bytes for x86-64).

    On Pentium and PentiumPro, double and long double values should be aligned to an 8 byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128 suffers similar penalties if it is not 16 byte aligned.

    To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.

    This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2.
--- End quote ---

Reading this, I come to the conclusion that in previous versions of GCC everything was aligned to 4 bytes, but nowadays everything is aligned to 16 bytes to be compatible with SSE (althought you can change that with these options to remain compatible with old code). That's why my C program linked with SDL2 never crashes, because everything is aligned to 16 bytes, and when the C program is linked with the SDL2 library, the SDL2 library is properly aligned (all of it) to 16 bytes.

If this is correct, then the problem is perhaps that FPC always aligns to 4 bytes and when linked with SDL the library is aligned to 4 bytes and not 16, and that's why depending on the amount of code I write sometimes the the library ends aligned and it works, and sometimes not and it crashes.

I've seen that FPC has a $CODEALING option, I have tried it with some values, for example PROC=16 and LOCALMIN=16 but without success. (By the way, $CODEALING 16 without any param like PROC etc, as specified in the documentation, gives a compiler error with 2.6.4).

Maybe FPC should allow an option to allow all the code to be aligned to 16 bytes to make it compatible with SSE code? Maybe is in linking time ({$LINKLIB} maybe) when we have to specify that?

I am only guessing...

engkin:
Disregard this post.

**************************
The problem is with what value pixels points at, it has to be aligned.

See, with Intel CPUs using SSE instructions, you can store a value in a memory location using fast instruction that expects an aligned address, or a slower non-aligned instruction. Check the attached image.

It is easy to test the validity of this guess by calling FillRect with an aligned address.

engkin:
In your code, do you touch any write member of TSDL_Surface record?

If you do, there might be a disagreement between FPC and C record alignment which corrupts the record when you write to it. You can try using PackRecord to see if you can bring them into agreement.

Navigation

[0] Message Index

[#] Next page

Go to full version