Forum > General

Question on compiler provided defines and optimisation flags

(1/1)

MathMan:
Hello all,

I just discovered, that there are two interesting compiler provided defines FPC_HAS_FAST_FMA_SINGLE & FPC_HAS_FAST_FMA_SINGLE which seem to interact with the FASTMATH optimisation flag.

These might come in handy for something I am currently working on and I would like to fully understand their effects - but unfortunately there is litte to no information in the manuals for FPC 3.2.2.

My current understanding, after playing around and analysing the assembler output, is as follows

* the defines are set if a suitable target architecture is selected - e.g. for COREIAVX2 they are set, for COREI they are unset
* in the source I can enable / disable the use via {$optimisation FASTMATH} and {$optimisation NOFASTMATH}

Is the above correct, or is there something else I have to consider, if I want to compile a function with & without FMA operations?

Finally there is the command line parameter -Sv which is only explained as "use vector operations if available".

* is this really for generation of vector operations, or does it only relate to the vectorised parameter passing under Win?
* if the former, what magic must be applied in the source to make use of this? <= I did try some variants, but without any effect

Kind regards,
MathMan

Jonas Maebe:

--- Quote from: MathMan on July 17, 2022, 11:08:34 am ---Is the above correct, or is there something else I have to consider, if I want to compile a function with & without FMA operations?

--- End quote ---
Put it in an include file, and include it once in a unit compiled for one architecture and once for the other. Make the function name either configurable via a macro, or keep the header in the unit so that you can give them different names.


--- Quote ---Finally there is the command line parameter -Sv which is only explained as "use vector operations if available".

--- End quote ---
It enables adding (and several other math and logic operations) for array operands, and will use vector operations to calculate the results. It does not perform auto-vectorization.

PascalDragon:

--- Quote from: MathMan on July 17, 2022, 11:08:34 am ---* the defines are set if a suitable target architecture is selected - e.g. for COREIAVX2 they are set, for COREI they are unset
--- End quote ---

The defines solely depend on the specified FPU type. Both AVX2 and AVX512 have them set.


--- Quote from: MathMan on July 17, 2022, 11:08:34 am ---* in the source I can enable / disable the use via {$optimisation FASTMATH} and {$optimisation NOFASTMATH}
--- End quote ---

FastMath might result in the loss of precision on certain platforms with certain operations (especially on platforms that support 80-bit floating point). Otherwise it will try to use FMA in case of Single or Double types (see compiler/nadd.pas, taddnode.try_fma).


--- Quote from: MathMan on July 17, 2022, 11:08:34 am ---Finally there is the command line parameter -Sv which is only explained as "use vector operations if available".

* is this really for generation of vector operations, or does it only relate to the vectorised parameter passing under Win?
* if the former, what magic must be applied in the source to make use of this? <= I did try some variants, but without any effect
--- End quote ---

It's related to the former. You need to arrays with 4 fields (or in general?) of an ordinal or floating point type (with a size <= 8 Byte) and you need to have SSE or higher enabled. Then the compiler will allow the use of vector operations on these types and use SIMD to achieve them.

MathMan:

--- Quote from: PascalDragon on July 17, 2022, 11:51:41 am ---
--- Quote from: MathMan on July 17, 2022, 11:08:34 am ---* the defines are set if a suitable target architecture is selected - e.g. for COREIAVX2 they are set, for COREI they are unset
--- End quote ---

The defines solely depend on the specified FPU type. Both AVX2 and AVX512 have them set.


--- Quote from: MathMan on July 17, 2022, 11:08:34 am ---* in the source I can enable / disable the use via {$optimisation FASTMATH} and {$optimisation NOFASTMATH}
--- End quote ---

FastMath might result in the loss of precision on certain platforms with certain operations (especially on platforms that support 80-bit floating point). Otherwise it will try to use FMA in case of Single or Double types (see compiler/nadd.pas, taddnode.try_fma).


--- Quote from: MathMan on July 17, 2022, 11:08:34 am ---Finally there is the command line parameter -Sv which is only explained as "use vector operations if available".

* is this really for generation of vector operations, or does it only relate to the vectorised parameter passing under Win?
* if the former, what magic must be applied in the source to make use of this? <= I did try some variants, but without any effect
--- End quote ---

It's related to the former. You need to arrays with 4 fields (or in general?) of an ordinal or floating point type (with a size <= 8 Byte) and you need to have SSE or higher enabled. Then the compiler will allow the use of vector operations on these types and use SIMD to achieve them.

--- End quote ---

Many thanks PascalDragon - that clarifies it.

* regarding first point above I assume sometime in the future this may also become available for ARM cores with SVE (or somesuch)?
* regarding the second point - I only intend to use single / double types, so that fits <= I stay away from extendend as far as I can usually
* regarding the last, I thought I tried that, but will try again - will also take a look at the intrinsics file mentioned in the other thread. Should that be something like the below then


--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---var  Src1, Src2: array [ 0..3 ] of single;  Mult: array [ 0..3 ] of single; ...  Mult := Src1 * Src2; // Mult is the dot-product of the 4 point vectors Src1, Src2? 
Kind regards,
MathMan

MathMan:

--- Quote from: Jonas Maebe on July 17, 2022, 11:42:09 am ---
--- Quote from: MathMan on July 17, 2022, 11:08:34 am ---Is the above correct, or is there something else I have to consider, if I want to compile a function with & without FMA operations?

--- End quote ---
Put it in an include file, and include it once in a unit compiled for one architecture and once for the other. Make the function name either configurable via a macro, or keep the header in the unit so that you can give them different names.


--- Quote ---Finally there is the command line parameter -Sv which is only explained as "use vector operations if available".

--- End quote ---
It enables adding (and several other math and logic operations) for array operands, and will use vector operations to calculate the results. It does not perform auto-vectorization.

--- End quote ---

Thanks Jonas - I don't know how, but I must have overlooked your response yesterday  :-[

Regarding the first point - yes, this approach is also the one I had in mind (I just looked at the MMX unit the other day ...)

Regarding second - I have it working now. However it looks like -Sv and FASTMATH do not go hand in hand as of yet, and I still haven't found a way to discover if -Sv has been set and manipulating it in the source like I can with FASTMATH. So maybe I have to put that on hold and takeup again at some later date.

Kind regards,
MathMan

Navigation

[0] Message Index

Go to full version