Lazarus

Free Pascal => General => Topic started by: ChrisR on May 09, 2021, 08:02:58 pm

Title: SIMD Intrinsics in FPC/Lazarus
Post by: ChrisR on May 09, 2021, 08:02:58 pm
 Single instruction, multiple data (SIMD) can dramatically accelerate some computations. C programmers can use intrinsics to write SSE/AVX (for x86-64) and Neon (for ARM) SIMD, which is more familiar to many relative to assembly language. However, FPC does not yet provide intrinsics. I created a repository that demonstrates writing intrinsics in C and inserting the resulting objects into FPC code. I demonstrate this on both ARM and x86-64 CPUs. I also show how the C SSE2Neon library allows you to write your SIMD code once in SSE and then compile the resulting code for either x86-64 or ARM

https://github.com/neurolabusc/FPCintrinsics (https://github.com/neurolabusc/FPCintrinsics)

While I would prefer native FPC intrinsics, in truth only a tiny fraction of time critical code really benefits from SIMD. This kludge can provide a convenient way to optimize that critical code.
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: marcov on May 09, 2021, 10:21:33 pm
However, FPC does not yet provide intrinsics. 

Check rtl/i386/cpummprocs.inc (and rtl/x86_64) in trunk
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: PascalDragon on May 10, 2021, 01:27:16 pm
While I would prefer native FPC intrinsics, in truth only a tiny fraction of time critical code really benefits from SIMD. This kludge can provide a convenient way to optimize that critical code.

SIMD intrinsics are a work-in-progress.

However, FPC does not yet provide intrinsics. 

Check rtl/i386/cpummprocs.inc (and rtl/x86_64) in trunk

They are not yet enabled.
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: Laksen on May 10, 2021, 01:57:38 pm
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: ChrisR on May 10, 2021, 05:12:14 pm
Laksen, this sounds terrific! I look forward to native FPC intrinsics.
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: PascalDragon on May 11, 2021, 08:38:31 am
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon

Though we still need to solve the topic about how to name them as that was an open point still ;)
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: Laksen on May 11, 2021, 12:06:45 pm
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon

Though we still need to solve the topic about how to name them as that was an open point still ;)

For sure. I wasn't planning to enable the extension just some of the tertiary stuff that needs fixing either way  :)
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: MathMan on May 11, 2021, 01:21:58 pm
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon

Though we still need to solve the topic about how to name them as that was an open point still ;)

Follow the naming convention used in C-Compilers, maybe? This however does not generalise across hardware platforms - x86 (MMX, AVX, AVX512) vs ARM (NEON, SVE), etc. But I am in doubt that a generalised naming is possible anyway, as the individual CPU instructions can hardly be mapped to have similar functionality.
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: MarkMLl on May 14, 2021, 10:38:22 pm
I'm not familiar enough with modern SSE etc. to transcribe examples, but it will be interesting to compare the performance of any SSE (etc.) extensions with of Vector Pascal which claims to have nailed this sort of thing already.

MarkMLl
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: marcov on May 14, 2021, 11:36:24 pm
I'm not familiar enough with modern SSE etc. to transcribe examples, but it will be interesting to compare the performance of any SSE (etc.) extensions with of Vector Pascal which claims to have nailed this sort of thing already.

Aren't these two different things?  Vector Pascal tries to lift vector expressions to a higher level, while intrinsics are more about generating inlinable and optimizable code.
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: MarkMLl on May 15, 2021, 09:34:18 am
Aren't these two different things?  Vector Pascal tries to lift vector expressions to a higher level, while intrinsics are more about generating inlinable and optimizable code.

Yes, but it would be nice if FPC made some measure of parallel processing comparatively accessible, and so far it looks like SSE etc. is as far as it's realistic to go. I'm obviously not talking about any sort of source-level compatibility.

MarkMLl
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: Seenkao on May 15, 2021, 09:59:38 am
Это интересная тема! Но те методы, которые используются не совсем корректны в FPC. Для корректного использования, мы должны уметь корректно работать с массивами и циклами. А для работы с матрицами, надо делать очень сложную работу, чтобы в самом деле эффективно использовать предоставляемые команды.
Я хотел на это выделить время. Но, на это надо очень много времени. А предоставлять "абы как" я бы не хотел.

Удобство состоит в том, что для множества процессоров, команды (по большей части) подобны.

google translate: This is an interesting topic! But the methods that are used are not entirely correct in FPC. For correct use, we must be able to work correctly with arrays and loops. And to work with matrices, you have to do a very difficult job in order to really effectively use the provided commands.
I wanted to set aside time for this. But it takes a lot of time. And I would not want to provide "anyhow".

The convenience is that for many processors, the instructions are (for the most part) similar.
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: marcov on May 15, 2021, 04:22:30 pm
Aren't these two different things?  Vector Pascal tries to lift vector expressions to a higher level, while intrinsics are more about generating inlinable and optimizable code.

Yes, but it would be nice if FPC made some measure of parallel processing comparatively accessible, and so far it looks like SSE etc. is as far as it's realistic to go. I'm obviously not talking about any sort of source-level compatibility.

Maybe, but operations on arbitrary length vectors are not the only application of SSE. Or rather, only a small part. Image processing is a major one.

And intrinsics are like doing sse/avx assembler, but can be rearranged easier (no need to redo all register allocation manually) and are inlinable.
Title: Re: SIMD Intrinsics in FPC/Lazarus
Post by: PascalDragon on May 16, 2021, 05:13:47 pm
google translate: This is an interesting topic! But the methods that are used are not entirely correct in FPC. For correct use, we must be able to work correctly with arrays and loops. And to work with matrices, you have to do a very difficult job in order to really effectively use the provided commands.
I wanted to set aside time for this. But it takes a lot of time. And I would not want to provide "anyhow".

The convenience is that for many processors, the instructions are (for the most part) similar.

The intrinsics are important, because they are the low level stuff that is necessary to do any higher level optimizations regarding SIMD operations.
TinyPortal © 2005-2018