Lazarus
Free Pascal => General => Topic started by: ChrisR on May 09, 2021, 08:02:58 pm
-
Single instruction, multiple data (SIMD) can dramatically accelerate some computations. C programmers can use intrinsics to write SSE/AVX (for x86-64) and Neon (for ARM) SIMD, which is more familiar to many relative to assembly language. However, FPC does not yet provide intrinsics. I created a repository that demonstrates writing intrinsics in C and inserting the resulting objects into FPC code. I demonstrate this on both ARM and x86-64 CPUs. I also show how the C SSE2Neon library allows you to write your SIMD code once in SSE and then compile the resulting code for either x86-64 or ARM
https://github.com/neurolabusc/FPCintrinsics (https://github.com/neurolabusc/FPCintrinsics)
While I would prefer native FPC intrinsics, in truth only a tiny fraction of time critical code really benefits from SIMD. This kludge can provide a convenient way to optimize that critical code.
-
However, FPC does not yet provide intrinsics.
Check rtl/i386/cpummprocs.inc (and rtl/x86_64) in trunk
-
While I would prefer native FPC intrinsics, in truth only a tiny fraction of time critical code really benefits from SIMD. This kludge can provide a convenient way to optimize that critical code.
SIMD intrinsics are a work-in-progress.
However, FPC does not yet provide intrinsics.
Check rtl/i386/cpummprocs.inc (and rtl/x86_64) in trunk
They are not yet enabled.
-
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon
-
Laksen, this sounds terrific! I look forward to native FPC intrinsics.
-
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon
Though we still need to solve the topic about how to name them as that was an open point still ;)
-
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon
Though we still need to solve the topic about how to name them as that was an open point still ;)
For sure. I wasn't planning to enable the extension just some of the tertiary stuff that needs fixing either way :)
-
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon
Though we still need to solve the topic about how to name them as that was an open point still ;)
Follow the naming convention used in C-Compilers, maybe? This however does not generalise across hardware platforms - x86 (MMX, AVX, AVX512) vs ARM (NEON, SVE), etc. But I am in doubt that a generalised naming is possible anyway, as the individual CPU instructions can hardly be mapped to have similar functionality.
-
I'm not familiar enough with modern SSE etc. to transcribe examples, but it will be interesting to compare the performance of any SSE (etc.) extensions with of Vector Pascal which claims to have nailed this sort of thing already.
MarkMLl
-
I'm not familiar enough with modern SSE etc. to transcribe examples, but it will be interesting to compare the performance of any SSE (etc.) extensions with of Vector Pascal which claims to have nailed this sort of thing already.
Aren't these two different things? Vector Pascal tries to lift vector expressions to a higher level, while intrinsics are more about generating inlinable and optimizable code.
-
Aren't these two different things? Vector Pascal tries to lift vector expressions to a higher level, while intrinsics are more about generating inlinable and optimizable code.
Yes, but it would be nice if FPC made some measure of parallel processing comparatively accessible, and so far it looks like SSE etc. is as far as it's realistic to go. I'm obviously not talking about any sort of source-level compatibility.
MarkMLl
-
Это интересная тема! Но те методы, которые используются не совсем корректны в FPC. Для корректного использования, мы должны уметь корректно работать с массивами и циклами. А для работы с матрицами, надо делать очень сложную работу, чтобы в самом деле эффективно использовать предоставляемые команды.
Я хотел на это выделить время. Но, на это надо очень много времени. А предоставлять "абы как" я бы не хотел.
Удобство состоит в том, что для множества процессоров, команды (по большей части) подобны.
google translate: This is an interesting topic! But the methods that are used are not entirely correct in FPC. For correct use, we must be able to work correctly with arrays and loops. And to work with matrices, you have to do a very difficult job in order to really effectively use the provided commands.
I wanted to set aside time for this. But it takes a lot of time. And I would not want to provide "anyhow".
The convenience is that for many processors, the instructions are (for the most part) similar.
-
Aren't these two different things? Vector Pascal tries to lift vector expressions to a higher level, while intrinsics are more about generating inlinable and optimizable code.
Yes, but it would be nice if FPC made some measure of parallel processing comparatively accessible, and so far it looks like SSE etc. is as far as it's realistic to go. I'm obviously not talking about any sort of source-level compatibility.
Maybe, but operations on arbitrary length vectors are not the only application of SSE. Or rather, only a small part. Image processing is a major one.
And intrinsics are like doing sse/avx assembler, but can be rearranged easier (no need to redo all register allocation manually) and are inlinable.
-
google translate: This is an interesting topic! But the methods that are used are not entirely correct in FPC. For correct use, we must be able to work correctly with arrays and loops. And to work with matrices, you have to do a very difficult job in order to really effectively use the provided commands.
I wanted to set aside time for this. But it takes a lot of time. And I would not want to provide "anyhow".
The convenience is that for many processors, the instructions are (for the most part) similar.
The intrinsics are important, because they are the low level stuff that is necessary to do any higher level optimizations regarding SIMD operations.