Recent

Author Topic: SIMD Intrinsics in FPC/Lazarus  (Read 4432 times)

ChrisR

  • Full Member
  • ***
  • Posts: 206
SIMD Intrinsics in FPC/Lazarus
« on: May 09, 2021, 08:02:58 pm »
 Single instruction, multiple data (SIMD) can dramatically accelerate some computations. C programmers can use intrinsics to write SSE/AVX (for x86-64) and Neon (for ARM) SIMD, which is more familiar to many relative to assembly language. However, FPC does not yet provide intrinsics. I created a repository that demonstrates writing intrinsics in C and inserting the resulting objects into FPC code. I demonstrate this on both ARM and x86-64 CPUs. I also show how the C SSE2Neon library allows you to write your SIMD code once in SSE and then compile the resulting code for either x86-64 or ARM

https://github.com/neurolabusc/FPCintrinsics

While I would prefer native FPC intrinsics, in truth only a tiny fraction of time critical code really benefits from SIMD. This kludge can provide a convenient way to optimize that critical code.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 9340
  • FPC developer.
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #1 on: May 09, 2021, 10:21:33 pm »
However, FPC does not yet provide intrinsics. 

Check rtl/i386/cpummprocs.inc (and rtl/x86_64) in trunk

PascalDragon

  • Hero Member
  • *****
  • Posts: 3068
  • Compiler Developer
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #2 on: May 10, 2021, 01:27:16 pm »
While I would prefer native FPC intrinsics, in truth only a tiny fraction of time critical code really benefits from SIMD. This kludge can provide a convenient way to optimize that critical code.

SIMD intrinsics are a work-in-progress.

However, FPC does not yet provide intrinsics. 

Check rtl/i386/cpummprocs.inc (and rtl/x86_64) in trunk

They are not yet enabled.

Laksen

  • Hero Member
  • *****
  • Posts: 684
    • J-Software
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #3 on: May 10, 2021, 01:57:38 pm »
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon

ChrisR

  • Full Member
  • ***
  • Posts: 206
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #4 on: May 10, 2021, 05:12:14 pm »
Laksen, this sounds terrific! I look forward to native FPC intrinsics.

PascalDragon

  • Hero Member
  • *****
  • Posts: 3068
  • Compiler Developer
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #5 on: May 11, 2021, 08:38:31 am »
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon

Though we still need to solve the topic about how to name them as that was an open point still ;)

Laksen

  • Hero Member
  • *****
  • Posts: 684
    • J-Software
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #6 on: May 11, 2021, 12:06:45 pm »
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon

Though we still need to solve the topic about how to name them as that was an open point still ;)

For sure. I wasn't planning to enable the extension just some of the tertiary stuff that needs fixing either way  :)

MathMan

  • Full Member
  • ***
  • Posts: 223
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #7 on: May 11, 2021, 01:21:58 pm »
I have a branch somewhere with a lot of fixes to the Intrinsics, enough to where they seem to work consistently. I'll see if I can clean it up soon

Though we still need to solve the topic about how to name them as that was an open point still ;)

Follow the naming convention used in C-Compilers, maybe? This however does not generalise across hardware platforms - x86 (MMX, AVX, AVX512) vs ARM (NEON, SVE), etc. But I am in doubt that a generalised naming is possible anyway, as the individual CPU instructions can hardly be mapped to have similar functionality.

MarkMLl

  • Hero Member
  • *****
  • Posts: 2721
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #8 on: May 14, 2021, 10:38:22 pm »
I'm not familiar enough with modern SSE etc. to transcribe examples, but it will be interesting to compare the performance of any SSE (etc.) extensions with of Vector Pascal which claims to have nailed this sort of thing already.

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 9340
  • FPC developer.
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #9 on: May 14, 2021, 11:36:24 pm »
I'm not familiar enough with modern SSE etc. to transcribe examples, but it will be interesting to compare the performance of any SSE (etc.) extensions with of Vector Pascal which claims to have nailed this sort of thing already.

Aren't these two different things?  Vector Pascal tries to lift vector expressions to a higher level, while intrinsics are more about generating inlinable and optimizable code.

MarkMLl

  • Hero Member
  • *****
  • Posts: 2721
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #10 on: May 15, 2021, 09:34:18 am »
Aren't these two different things?  Vector Pascal tries to lift vector expressions to a higher level, while intrinsics are more about generating inlinable and optimizable code.

Yes, but it would be nice if FPC made some measure of parallel processing comparatively accessible, and so far it looks like SSE etc. is as far as it's realistic to go. I'm obviously not talking about any sort of source-level compatibility.

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Seenkao

  • Full Member
  • ***
  • Posts: 157
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #11 on: May 15, 2021, 09:59:38 am »
Это интересная тема! Но те методы, которые используются не совсем корректны в FPC. Для корректного использования, мы должны уметь корректно работать с массивами и циклами. А для работы с матрицами, надо делать очень сложную работу, чтобы в самом деле эффективно использовать предоставляемые команды.
Я хотел на это выделить время. Но, на это надо очень много времени. А предоставлять "абы как" я бы не хотел.

Удобство состоит в том, что для множества процессоров, команды (по большей части) подобны.

google translate: This is an interesting topic! But the methods that are used are not entirely correct in FPC. For correct use, we must be able to work correctly with arrays and loops. And to work with matrices, you have to do a very difficult job in order to really effectively use the provided commands.
I wanted to set aside time for this. But it takes a lot of time. And I would not want to provide "anyhow".

The convenience is that for many processors, the instructions are (for the most part) similar.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 9340
  • FPC developer.
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #12 on: May 15, 2021, 04:22:30 pm »
Aren't these two different things?  Vector Pascal tries to lift vector expressions to a higher level, while intrinsics are more about generating inlinable and optimizable code.

Yes, but it would be nice if FPC made some measure of parallel processing comparatively accessible, and so far it looks like SSE etc. is as far as it's realistic to go. I'm obviously not talking about any sort of source-level compatibility.

Maybe, but operations on arbitrary length vectors are not the only application of SSE. Or rather, only a small part. Image processing is a major one.

And intrinsics are like doing sse/avx assembler, but can be rearranged easier (no need to redo all register allocation manually) and are inlinable.
« Last Edit: May 16, 2021, 05:15:47 pm by marcov »

PascalDragon

  • Hero Member
  • *****
  • Posts: 3068
  • Compiler Developer
Re: SIMD Intrinsics in FPC/Lazarus
« Reply #13 on: May 16, 2021, 05:13:47 pm »
google translate: This is an interesting topic! But the methods that are used are not entirely correct in FPC. For correct use, we must be able to work correctly with arrays and loops. And to work with matrices, you have to do a very difficult job in order to really effectively use the provided commands.
I wanted to set aside time for this. But it takes a lot of time. And I would not want to provide "anyhow".

The convenience is that for many processors, the instructions are (for the most part) similar.

The intrinsics are important, because they are the low level stuff that is necessary to do any higher level optimizations regarding SIMD operations.

 

TinyPortal © 2005-2018