Recent

Author Topic: CORE-MATH 32bits in Pascal! (beta version)  (Read 1050 times)

schuler

  • Sr. Member
  • ****
  • Posts: 339
CORE-MATH 32bits in Pascal! (beta version)
« on: April 21, 2026, 02:21:53 pm »
:) Hello :)
CORE-MATH is a research project from INRIA providing correctly-rounded implementations of standard math functions. pas-core-math is faithful port of the CORE-MATH binary32 (single-precision floating-point) C library to Free Pascal:

https://github.com/joaopauloschuler/pas-core-math
https://forum.lazarus.freepascal.org/index.php/topic,73881.0.html

In the benchmark below, pascal was faster than its C implementation.
Code: Pascal  [Select][+][-]
  1. src/tests$ uname -a
  2. Linux Ubuntu24 6.17.0-22-generic #22~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 26 15:25:54 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
  3. src/tests$ lscpu | grep Model
  4. Model name:                              13th Gen Intel(R) Core(TM) i7-1355U
  5. src/tests$ ./build.sh -dAVX2 -CfAVX2 -CpCOREAVX -OpCOREAVX
  6. src/tests$ taskset -c 1 env LD_LIBRARY_PATH=/home/bpsa/app/pas-core-math/src/tests/.. /home/bpsa/app/pas-core-math/src/tests/../../bin/Benchmark32
  7. Compiled with AVX2.
  8. === Benchmark: 200000000 calls per function ===
  9.  
  10. acosf             C:  367.0 Mops/s  Pascal:  343.1 Mops/s  sink=MATCH
  11. acoshf            C:  128.3 Mops/s  Pascal:  192.3 Mops/s  sink=MISMATCH  FASTER! YAY!
  12. acospif           C:  237.8 Mops/s  Pascal:  246.6 Mops/s  sink=MISMATCH  TIE
  13. asinf             C:  383.9 Mops/s  Pascal:  400.8 Mops/s  sink=MATCH  TIE
  14. asinhf            C:  251.3 Mops/s  Pascal:  249.4 Mops/s  sink=MATCH  TIE
  15. asinpif           C:  258.4 Mops/s  Pascal:  408.2 Mops/s  sink=MISMATCH  FASTER! YAY!
  16. atanf             C:  337.3 Mops/s  Pascal:  331.1 Mops/s  sink=MATCH  TIE
  17. atanhf            C:  198.0 Mops/s  Pascal:  163.1 Mops/s  sink=MATCH
  18. atanpif           C:  120.0 Mops/s  Pascal:  173.6 Mops/s  sink=MATCH  FASTER! YAY!
  19. cbrtf             C:   79.6 Mops/s  Pascal:   91.1 Mops/s  sink=MATCH  FASTER! YAY!
  20. cosf              C:  192.5 Mops/s  Pascal:   85.2 Mops/s  sink=MATCH
  21. coshf             C:  144.0 Mops/s  Pascal:  400.8 Mops/s  sink=MATCH  FASTER! YAY!
  22. cospif            C:  148.4 Mops/s  Pascal:  143.5 Mops/s  sink=MATCH  TIE
  23. erff              C:  185.0 Mops/s  Pascal:  171.8 Mops/s  sink=MATCH
  24. erfcf             C:  168.2 Mops/s  Pascal:  286.1 Mops/s  sink=MATCH  FASTER! YAY!
  25. expf              C:  131.8 Mops/s  Pascal:  228.6 Mops/s  sink=MATCH  FASTER! YAY!
  26. exp10f            C:  125.7 Mops/s  Pascal:  252.5 Mops/s  sink=MATCH  FASTER! YAY!
  27. exp10m1f          C:  191.4 Mops/s  Pascal:  139.2 Mops/s  sink=MATCH
  28. exp2f             C:   92.3 Mops/s  Pascal:  188.7 Mops/s  sink=MATCH  FASTER! YAY!
  29. exp2m1f           C:  140.7 Mops/s  Pascal:  197.4 Mops/s  sink=MATCH  FASTER! YAY!
  30. expm1f            C:  138.2 Mops/s  Pascal:  257.7 Mops/s  sink=MATCH  FASTER! YAY!
  31. lgammaf           C:   67.5 Mops/s  Pascal:   70.3 Mops/s  sink=MATCH  TIE
  32. logf              C:  209.4 Mops/s  Pascal:  210.5 Mops/s  sink=MATCH  TIE
  33. log10f            C:  149.1 Mops/s  Pascal:  129.8 Mops/s  sink=MATCH
  34. log10p1f          C:   92.5 Mops/s  Pascal:   99.6 Mops/s  sink=MATCH  FASTER! YAY!
  35. log1pf            C:  171.1 Mops/s  Pascal:  143.0 Mops/s  sink=MATCH
  36. log2f             C:  172.9 Mops/s  Pascal:  211.4 Mops/s  sink=MATCH  FASTER! YAY!
  37. log2p1f           C:  158.5 Mops/s  Pascal:  105.1 Mops/s  sink=MATCH
  38. rsqrtf            C:   96.3 Mops/s  Pascal:  275.1 Mops/s  sink=MISMATCH  FASTER! YAY!
  39. sinf              C:  180.5 Mops/s  Pascal:  135.3 Mops/s  sink=MATCH
  40. sinhf             C:  187.6 Mops/s  Pascal:  317.0 Mops/s  sink=MATCH  FASTER! YAY!
  41. sinpif            C:  295.4 Mops/s  Pascal:  328.4 Mops/s  sink=MATCH  FASTER! YAY!
  42. tanf              C:  105.5 Mops/s  Pascal:   89.5 Mops/s  sink=MATCH
  43. tanhf             C:  119.3 Mops/s  Pascal:  191.4 Mops/s  sink=MATCH  FASTER! YAY!
  44. tanpif            C:   97.3 Mops/s  Pascal:   99.4 Mops/s  sink=MATCH  TIE
  45. tgammaf           C:  110.1 Mops/s  Pascal:  169.8 Mops/s  sink=MATCH  FASTER! YAY!
  46. atan2f            C:  106.4 Mops/s  Pascal:  132.0 Mops/s  sink=MATCH  FASTER! YAY!
  47. atan2pif          C:    3.4 Mops/s  Pascal:    1.8 Mops/s  sink=MATCH
  48. hypotf            C:   67.0 Mops/s  Pascal:   84.4 Mops/s  sink=MATCH  FASTER! YAY!
  49. powf              C:   94.0 Mops/s  Pascal:   79.9 Mops/s  sink=MATCH
  50. compoundf         C:   56.2 Mops/s  Pascal:   67.8 Mops/s  sink=MATCH  FASTER! YAY!
  51. sincosf           C:  108.6 Mops/s  Pascal:   72.6 Mops/s  sink=MATCH
  52.  
  53. Pascal won: 21  |  C won: 13  |  Ties (<5%): 8
  54. On average, Pascal is 1.24x faster than C (arithmetic mean over 42 functions)

The current pascal source code has contributions from mathman and nanobit.
« Last Edit: April 21, 2026, 03:10:48 pm by schuler »

Thaddy

  • Hero Member
  • *****
  • Posts: 19165
  • Glad to be alive.
Re: CORE-MATH 32bits in Pascal! (beta version)
« Reply #1 on: April 21, 2026, 03:24:38 pm »
Great achievement.
objects are fine constructs. You can even initialize them with constructors.

Okoba

  • Hero Member
  • *****
  • Posts: 660
Re: CORE-MATH 32bits in Pascal! (beta version)
« Reply #2 on: April 21, 2026, 03:30:26 pm »
Seems interesting, but i don't know about usage of these compare to what comes with the compiler. Can you explain a little please?

Thaddy

  • Hero Member
  • *****
  • Posts: 19165
  • Glad to be alive.
Re: CORE-MATH 32bits in Pascal! (beta version)
« Reply #3 on: April 21, 2026, 03:50:12 pm »
@Okoba
The compiler has a "generic" approach about math routines.
This project is about providing math routines that do:
- better rounding, in my opinion more important.
- make better use of new instruction sets that are faster.

The compiler can not always do that in its default configuration it is build with a different mindset. (e.g. in system.pas, which you normally never see, the basic math routines cater for extended precision as default and go down from that)
But this project (the meta view) adds ways to be more correct in rounding and speed and an eventually goal is to merge this. (as done in some other major compilers already)

This technically would mean that the compiler would redirect for the type of routines what is asked for, e.g. single, double and extended, where it now is all extended (double in most 64bit). The compiler already has the infrastructure for that, but its implementation is quite complex.
The changes to system.pas would be quite considerable, so for the moment it is presented as a separate library. This is opposed to C and family where all that stuff are libraries anyway.

You can also see this as a sub-project of the meta core-math project, which is about many compilers, not just Pascal compilers.

The speed win is just a side effect ;) (Sorry Joao  O:-), great win!!)

You can already speed up the compiler by building it with newer instruction sets for the platform, like intel/amd and arm compilers, but it still won't be as efficient, nor as accurate as this library for 32 bit intel/amd math.

Since accuracy accumulates, let me rephrase that, rather inaccuracy accumulates, this is not something to be ignored.
Within the confounds of 32 bit this library is both faster and has higher precision.
« Last Edit: April 21, 2026, 04:21:08 pm by Thaddy »
objects are fine constructs. You can even initialize them with constructors.

schuler

  • Sr. Member
  • ****
  • Posts: 339
Re: CORE-MATH 32bits in Pascal! (beta version)
« Reply #4 on: April 21, 2026, 09:32:39 pm »
@Okoba,
Quote
i don't know about usage of these compare to what comes with the compiler. Can you explain a little please?

As per https://forum.lazarus.freepascal.org/index.php/topic,73881.0.html, pas-core-math is ~2.4× faster than FPC builtins in Windows/X86-64. In Linux/X86-64, pas-core-math is 6x to 20x times faster than FPC builtins. Benchmarking in ARM is still missing.

JCLRQ

  • Newbie
  • Posts: 3
Re: CORE-MATH 32bits in Pascal! (beta version)
« Reply #5 on: April 24, 2026, 01:01:09 pm »
Thanks Schuler for this development. This sounds promising. I may try it in my code to see as my algo solves in type "single".

Boleeman

  • Hero Member
  • *****
  • Posts: 1158
Re: CORE-MATH 32bits in Pascal! (beta version)
« Reply #6 on: May 01, 2026, 01:15:39 pm »
The speed increase (pas-core-math is ~2.4× faster than FPC builtins in Windows/X86-64. In Linux/X86-64, pas-core-math is 6x to 20x times faster than FPC builtins) might come in useful for making graphics fractal programs (now is all extended double in most 64bit).

Would it work equally well on both Intel and AMD CPU computers?

Thaddy

  • Hero Member
  • *****
  • Posts: 19165
  • Glad to be alive.
Re: CORE-MATH 32bits in Pascal! (beta version)
« Reply #7 on: May 01, 2026, 02:05:33 pm »
Yes. One of my test laptops is AMD/Ryzen 7020 another one is a Ryzen 7700
Both support avx2 and also fma3. Which is not even necessary.
extended is only double on x64_86-win64, btw.
But math libraries often are double. And that is the case here.
« Last Edit: May 01, 2026, 07:48:55 pm by Thaddy »
objects are fine constructs. You can even initialize them with constructors.

schuler

  • Sr. Member
  • ****
  • Posts: 339
Re: CORE-MATH 32bits in Pascal! (beta version)
« Reply #8 on: May 02, 2026, 08:33:18 pm »
@boleeman
Quote
The speed increase (pas-core-math is ~2.4× faster than FPC builtins in Windows/X86-64. In Linux/X86-64, pas-core-math is 6x to 20x times faster than FPC builtins) might come in useful for making graphics fractal programs (now is all extended double in most 64bit).

This is regarding the Single 32 bits float type. The 64 bits version is still in testing and is certainly slower.

microxa

  • New Member
  • *
  • Posts: 13
Re: CORE-MATH 32bits in Pascal! (beta version)
« Reply #9 on: May 08, 2026, 03:52:54 pm »
:) Hello :)
CORE-MATH


A monstrous exp()!
And I was looking for something simpler, for an MNIST-like ANN hello.

Code: Pascal  [Select][+][-]
  1. procedure Act_Sigmoid_4x(var f: single);
  2. asm
  3.   movaps xmm0, [eax]
  4.   movaps xmm1, xmm7         // 1.0
  5.   mulps xmm0, xmm6          // -12102203.0;
  6.   cvttps2dq xmm0, xmm0
  7.   paddd xmm0, xmm1
  8.   addps xmm0, xmm1
  9.  
  10.   //divps xmm1, xmm0
  11.  
  12.   rcpps xmm0,xmm0 //  a / b ≈ a * rcpps(b)
  13.   mulps xmm1,xmm0  
  14.  
  15.   movaps [eax], xmm1
  16. end;
  17.  
« Last Edit: May 09, 2026, 04:03:15 pm by microxa »

 

TinyPortal © 2005-2018