Lazarus

Free Pascal => General => Topic started by: MathMan on July 17, 2022, 11:08:34 am

Title: Question on compiler provided defines and optimisation flags
Post by: MathMan on July 17, 2022, 11:08:34 am

Hello all,

I just discovered, that there are two interesting compiler provided defines FPC_HAS_FAST_FMA_SINGLE & FPC_HAS_FAST_FMA_SINGLE which seem to interact with the FASTMATH optimisation flag.

These might come in handy for something I am currently working on and I would like to fully understand their effects - but unfortunately there is litte to no information in the manuals for FPC 3.2.2.

My current understanding, after playing around and analysing the assembler output, is as follows

* the defines are set if a suitable target architecture is selected - e.g. for COREIAVX2 they are set, for COREI they are unset
* in the source I can enable / disable the use via {$optimisation FASTMATH} and {$optimisation NOFASTMATH}

Is the above correct, or is there something else I have to consider, if I want to compile a function with & without FMA operations?

Finally there is the command line parameter -Sv which is only explained as "use vector operations if available".

* is this really for generation of vector operations, or does it only relate to the vectorised parameter passing under Win?
* if the former, what magic must be applied in the source to make use of this? <= I did try some variants, but without any effect

Kind regards,
MathMan

Title: Re: Question on compiler provided defines and optimisation flags
Post by: Jonas Maebe on July 17, 2022, 11:42:09 am

Quote from: MathMan on July 17, 2022, 11:08:34 am

Is the above correct, or is there something else I have to consider, if I want to compile a function with & without FMA operations?

Put it in an include file, and include it once in a unit compiled for one architecture and once for the other. Make the function name either configurable via a macro, or keep the header in the unit so that you can give them different names.

Quote

Finally there is the command line parameter -Sv which is only explained as "use vector operations if available".

It enables adding (and several other math and logic operations) for array operands, and will use vector operations to calculate the results. It does not perform auto-vectorization.

Title: Re: Question on compiler provided defines and optimisation flags
Post by: PascalDragon on July 17, 2022, 11:51:41 am

Quote from: MathMan on July 17, 2022, 11:08:34 am

* the defines are set if a suitable target architecture is selected - e.g. for COREIAVX2 they are set, for COREI they are unset

The defines solely depend on the specified FPU type. Both AVX2 and AVX512 have them set.

Quote from: MathMan on July 17, 2022, 11:08:34 am

* in the source I can enable / disable the use via {$optimisation FASTMATH} and {$optimisation NOFASTMATH}

FastMath might result in the loss of precision on certain platforms with certain operations (especially on platforms that support 80-bit floating point). Otherwise it will try to use FMA in case of Single or Double types (see compiler/nadd.pas, taddnode.try_fma (https://gitlab.com/freepascal.org/fpc/source/-/blob/main/compiler/nadd.pas#L3907)).

Quote from: MathMan on July 17, 2022, 11:08:34 am

Finally there is the command line parameter -Sv which is only explained as "use vector operations if available".

* is this really for generation of vector operations, or does it only relate to the vectorised parameter passing under Win?
* if the former, what magic must be applied in the source to make use of this? <= I did try some variants, but without any effect

It's related to the former. You need to arrays with 4 fields (or in general?) of an ordinal or floating point type (with a size <= 8 Byte) and you need to have SSE or higher enabled. Then the compiler will allow the use of vector operations on these types and use SIMD to achieve them.

Title: Re: Question on compiler provided defines and optimisation flags
Post by: MathMan on July 17, 2022, 12:10:11 pm

Quote from: PascalDragon on July 17, 2022, 11:51:41 am

Quote from: MathMan on July 17, 2022, 11:08:34 am
* the defines are set if a suitable target architecture is selected - e.g. for COREIAVX2 they are set, for COREI they are unset

The defines solely depend on the specified FPU type. Both AVX2 and AVX512 have them set.

Quote from: MathMan on July 17, 2022, 11:08:34 am
* in the source I can enable / disable the use via {$optimisation FASTMATH} and {$optimisation NOFASTMATH}

FastMath might result in the loss of precision on certain platforms with certain operations (especially on platforms that support 80-bit floating point). Otherwise it will try to use FMA in case of Single or Double types (see compiler/nadd.pas, taddnode.try_fma (https://gitlab.com/freepascal.org/fpc/source/-/blob/main/compiler/nadd.pas#L3907)).

Quote from: MathMan on July 17, 2022, 11:08:34 am
Finally there is the command line parameter -Sv which is only explained as "use vector operations if available".

* is this really for generation of vector operations, or does it only relate to the vectorised parameter passing under Win?
* if the former, what magic must be applied in the source to make use of this? <= I did try some variants, but without any effect

It's related to the former. You need to arrays with 4 fields (or in general?) of an ordinal or floating point type (with a size <= 8 Byte) and you need to have SSE or higher enabled. Then the compiler will allow the use of vector operations on these types and use SIMD to achieve them.

Many thanks PascalDragon - that clarifies it.

* regarding first point above I assume sometime in the future this may also become available for ARM cores with SVE (or somesuch)?
* regarding the second point - I only intend to use single / double types, so that fits <= I stay away from extendend as far as I can usually
* regarding the last, I thought I tried that, but will try again - will also take a look at the intrinsics file mentioned in the other thread. Should that be something like the below then

Code: Pascal [Select][+]

var
  Src1, Src2: array [ 0..3 ] of single;
  Mult: array [ 0..3 ] of single;
 
...
  Mult := Src1 * Src2; // Mult is the dot-product of the 4 point vectors Src1, Src2?
 

Kind regards,
MathMan

Title: Re: Question on compiler provided defines and optimisation flags
Post by: MathMan on July 18, 2022, 08:15:29 am

Quote from: Jonas Maebe on July 17, 2022, 11:42:09 am

Quote from: MathMan on July 17, 2022, 11:08:34 am
Is the above correct, or is there something else I have to consider, if I want to compile a function with & without FMA operations?
Put it in an include file, and include it once in a unit compiled for one architecture and once for the other. Make the function name either configurable via a macro, or keep the header in the unit so that you can give them different names.

Quote
Finally there is the command line parameter -Sv which is only explained as "use vector operations if available".
It enables adding (and several other math and logic operations) for array operands, and will use vector operations to calculate the results. It does not perform auto-vectorization.

Thanks Jonas - I don't know how, but I must have overlooked your response yesterday :-[

Regarding the first point - yes, this approach is also the one I had in mind (I just looked at the MMX unit the other day ...)

Regarding second - I have it working now. However it looks like -Sv and FASTMATH do not go hand in hand as of yet, and I still haven't found a way to discover if -Sv has been set and manipulating it in the source like I can with FASTMATH. So maybe I have to put that on hold and takeup again at some later date.

Kind regards,
MathMan