Recent

Author Topic: AI assisted translation of CORE-MATH to Free Pascal  (Read 5027 times)

MathMan

  • Hero Member
  • *****
  • Posts: 501
AI assisted translation of CORE-MATH to Free Pascal
« on: April 12, 2026, 10:02:27 am »
I have spawned this topic from thread 'Can /my/ AI help me with pascal coding?' as I foresee substantial exchange on this topic, deserving a separate thread.

Can one of the admins please take the related posts from the original thread over to here?

Regards,
MathMan

MathMan

  • Hero Member
  • *****
  • Posts: 501
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #1 on: April 12, 2026, 10:22:06 am »
@schuler,

Regarding your request to validate 'BenchmarkFPC'. I get substantially different results (compared to the ones posted under https://github.com/joaopauloschuler/pas-core-math/tree/main#benchmark-results) on my machine.

Environment:
- FPC 3.2.2
- Zen 3 (AMD Ryzen 7 7730U) @ 4.5 GHz
- 16 GByte
- all checks and assertions disabled, debug info kept

Result O1:

Code: Pascal  [Select][+][-]
  1. === FPC vs Pascal CORE-MATH (PCM) Benchmark: 50000000 calls per function ===
  2.  
  3. sinf              FPC:   15,3 Mops/s  PCM:   39,7 Mops/s  FASTER! YAY!
  4. cosf              FPC:   15,1 Mops/s  PCM:   41,8 Mops/s  FASTER! YAY!
  5. tanf              FPC:   29,4 Mops/s  PCM:   39,3 Mops/s  FASTER! YAY!
  6. asinf             FPC:  133,7 Mops/s  PCM:  206,6 Mops/s  FASTER! YAY!
  7. acosf             FPC:  110,1 Mops/s  PCM:  164,5 Mops/s  FASTER! YAY!
  8. atanf             FPC:  200,8 Mops/s  PCM:  127,2 Mops/s
  9. sinhf             FPC:   86,5 Mops/s  PCM:  182,5 Mops/s  FASTER! YAY!
  10. coshf             FPC:  158,2 Mops/s  PCM:  194,6 Mops/s  FASTER! YAY!
  11. tanhf             FPC:  193,1 Mops/s  PCM:  156,7 Mops/s
  12. asinhf            FPC:   41,1 Mops/s  PCM:   83,5 Mops/s  FASTER! YAY!
  13. acoshf            FPC:   87,0 Mops/s  PCM:  161,3 Mops/s  FASTER! YAY!
  14. atanhf            FPC:   44,6 Mops/s  PCM:  197,6 Mops/s  FASTER! YAY!
  15. expf              FPC:  196,1 Mops/s  PCM:  163,9 Mops/s
  16. logf              FPC:   69,5 Mops/s  PCM:  142,0 Mops/s  FASTER! YAY!
  17. log2f             FPC:   66,6 Mops/s  PCM:  144,9 Mops/s  FASTER! YAY!
  18. log10f            FPC:   66,7 Mops/s  PCM:  143,7 Mops/s  FASTER! YAY!
  19. atan2f            FPC:  167,8 Mops/s  PCM:   66,6 Mops/s
  20. hypotf            FPC:  153,8 Mops/s  PCM:   75,5 Mops/s
  21. powf              FPC:   39,8 Mops/s  PCM:   39,2 Mops/s  TIE
  22. sincosf           FPC:   24,1 Mops/s  PCM:   37,5 Mops/s  FASTER! YAY!
  23.  
  24. PCM won: 14  |  FPC won: 5  |  Ties (<5%): 1
  25. On average, PCM is 1,7x faster than FPC (arithmetic mean over 19 functions)
  26. GlobalSink = 591609502 (prevents dead-code elimination)
  27.  

Result O3:

Code: Pascal  [Select][+][-]
  1. === FPC vs Pascal CORE-MATH (PCM) Benchmark: 50000000 calls per function ===
  2.  
  3. sinf              FPC:   15,5 Mops/s  PCM:   90,9 Mops/s  FASTER! YAY!
  4. cosf              FPC:   15,3 Mops/s  PCM:   82,0 Mops/s  FASTER! YAY!
  5. tanf              FPC:   32,9 Mops/s  PCM:   84,2 Mops/s  FASTER! YAY!
  6. asinf             FPC:  151,1 Mops/s  PCM:  328,9 Mops/s  FASTER! YAY!
  7. acosf             FPC:  126,9 Mops/s  PCM:  299,4 Mops/s  FASTER! YAY!
  8. atanf             FPC:  227,3 Mops/s  PCM:  252,5 Mops/s  FASTER! YAY!
  9. sinhf             FPC:   96,2 Mops/s  PCM:  214,6 Mops/s  FASTER! YAY!
  10. coshf             FPC:  170,1 Mops/s  PCM:  247,5 Mops/s  FASTER! YAY!
  11. tanhf             FPC:  211,9 Mops/s  PCM:  266,0 Mops/s  FASTER! YAY!
  12. asinhf            FPC:   42,7 Mops/s  PCM:  173,6 Mops/s  FASTER! YAY!
  13. acoshf            FPC:  108,5 Mops/s  PCM:  290,7 Mops/s  FASTER! YAY!
  14. atanhf            FPC:   45,5 Mops/s  PCM:  285,7 Mops/s  FASTER! YAY!
  15. expf              FPC:  227,3 Mops/s  PCM:  337,8 Mops/s  FASTER! YAY!
  16. logf              FPC:   77,2 Mops/s  PCM:  285,7 Mops/s  FASTER! YAY!
  17. log2f             FPC:   72,0 Mops/s  PCM:  312,5 Mops/s  FASTER! YAY!
  18. log10f            FPC:   72,8 Mops/s  PCM:  320,5 Mops/s  FASTER! YAY!
  19. atan2f            FPC:  176,1 Mops/s  PCM:  166,1 Mops/s
  20. hypotf            FPC:  290,7 Mops/s  PCM:  167,2 Mops/s
  21. powf              FPC:   42,7 Mops/s  PCM:  100,4 Mops/s  FASTER! YAY!
  22. sincosf           FPC:   24,2 Mops/s  PCM:   73,1 Mops/s  FASTER! YAY!
  23.  
  24. PCM won: 18  |  FPC won: 2  |  Ties (<5%): 0
  25. On average, PCM is 2,9x faster than FPC (arithmetic mean over 19 functions)
  26. GlobalSink = 591609502 (prevents dead-code elimination)
  27.  

Result O4 + -CfAVX2 -CpCOREI -OpCOREI -dAVX2

Code: Pascal  [Select][+][-]
  1. === FPC vs Pascal CORE-MATH (PCM) Benchmark: 50000000 calls per function ===
  2.  
  3. sinf              FPC:   15,4 Mops/s  PCM:   91,4 Mops/s  FASTER! YAY!
  4. cosf              FPC:   15,3 Mops/s  PCM:   85,8 Mops/s  FASTER! YAY!
  5. tanf              FPC:   33,1 Mops/s  PCM:   85,0 Mops/s  FASTER! YAY!
  6. asinf             FPC:  160,8 Mops/s  PCM:  331,1 Mops/s  FASTER! YAY!
  7. acosf             FPC:  128,2 Mops/s  PCM:  303,0 Mops/s  FASTER! YAY!
  8. atanf             FPC:  240,4 Mops/s  PCM:  260,4 Mops/s  FASTER! YAY!
  9. sinhf             FPC:   96,3 Mops/s  PCM:  212,8 Mops/s  FASTER! YAY!
  10. coshf             FPC:  175,4 Mops/s  PCM:  243,9 Mops/s  FASTER! YAY!
  11. tanhf             FPC:  211,0 Mops/s  PCM:  270,3 Mops/s  FASTER! YAY!
  12. asinhf            FPC:   42,7 Mops/s  PCM:  174,8 Mops/s  FASTER! YAY!
  13. acoshf            FPC:  109,6 Mops/s  PCM:  295,9 Mops/s  FASTER! YAY!
  14. atanhf            FPC:   45,4 Mops/s  PCM:  292,4 Mops/s  FASTER! YAY!
  15. expf              FPC:  245,1 Mops/s  PCM:  318,5 Mops/s  FASTER! YAY!
  16. logf              FPC:   77,2 Mops/s  PCM:  289,0 Mops/s  FASTER! YAY!
  17. log2f             FPC:   72,7 Mops/s  PCM:  316,5 Mops/s  FASTER! YAY!
  18. log10f            FPC:   72,3 Mops/s  PCM:  318,5 Mops/s  FASTER! YAY!
  19. atan2f            FPC:  197,6 Mops/s  PCM:  167,2 Mops/s
  20. hypotf            FPC:  287,4 Mops/s  PCM:  162,9 Mops/s
  21. powf              FPC:   42,8 Mops/s  PCM:  100,2 Mops/s  FASTER! YAY!
  22. sincosf           FPC:   24,4 Mops/s  PCM:   69,1 Mops/s  FASTER! YAY!
  23.  
  24. PCM won: 18  |  FPC won: 2  |  Ties (<5%): 0
  25. On average, PCM is 2,9x faster than FPC (arithmetic mean over 19 functions)
  26. GlobalSink = 591609502 (prevents dead-code elimination)
  27.  

Paolo

  • Hero Member
  • *****
  • Posts: 710
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #2 on: April 12, 2026, 11:39:19 am »
@mathman, out of curiosity:

- the test is done for different function argument ?
- comparison of numerical results is done (I mean the results are the same for both the cases?)

Ciao.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12842
  • FPC developer.
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #3 on: April 12, 2026, 01:11:04 pm »
Maybe benchmark with -Cfcoreavx2 or so too ?   (or 64-bit).

Basically FPC has two sets of FPU implementation for x86, x87 and sse2+.   Win64 defaults to the sse2 option.

MathMan

  • Hero Member
  • *****
  • Posts: 501
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #4 on: April 12, 2026, 01:44:56 pm »
@mathman, out of curiosity:

- the test is done for different function argument ?
- comparison of numerical results is done (I mean the results are the same for both the cases?)

Ciao.

@Paolo

- the function arguments are spread over the full range of Single
- the results are definitly not the same for various functions and arguments, as current FPC RTL is not fully correctly rounded across functions and arguments

However the majority of the FPC CORE-MATH functions (2 or 3 exceptions) has been verified against the C implementation of CORE-MATH.

MathMan

  • Hero Member
  • *****
  • Posts: 501
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #5 on: April 12, 2026, 01:53:44 pm »
Maybe benchmark with -Cfcoreavx2 or so too ?   (or 64-bit).

Basically FPC has two sets of FPU implementation for x86, x87 and sse2+.   Win64 defaults to the sse2 option.

Here we go

FPC 3.2.2 -O4 -CfAVX2 -CpCoreAVX2 -OpCoreAVX2 -dAVX2

Code: Pascal  [Select][+][-]
  1. === FPC vs Pascal CORE-MATH (PCM) Benchmark: 50000000 calls per function ===
  2.  
  3. sinf              FPC:   15,7 Mops/s  PCM:   73,2 Mops/s  FASTER! YAY!
  4. cosf              FPC:   15,4 Mops/s  PCM:   68,0 Mops/s  FASTER! YAY!
  5. tanf              FPC:   33,2 Mops/s  PCM:   67,3 Mops/s  FASTER! YAY!
  6. asinf             FPC:  155,8 Mops/s  PCM:  340,1 Mops/s  FASTER! YAY!
  7. acosf             FPC:  126,3 Mops/s  PCM:  335,6 Mops/s  FASTER! YAY!
  8. atanf             FPC:  243,9 Mops/s  PCM:  279,3 Mops/s  FASTER! YAY!
  9. sinhf             FPC:   96,7 Mops/s  PCM:  214,6 Mops/s  FASTER! YAY!
  10. coshf             FPC:  183,8 Mops/s  PCM:  252,5 Mops/s  FASTER! YAY!
  11. tanhf             FPC:  224,2 Mops/s  PCM:  276,2 Mops/s  FASTER! YAY!
  12. asinhf            FPC:   42,8 Mops/s  PCM:  186,6 Mops/s  FASTER! YAY!
  13. acoshf            FPC:  108,9 Mops/s  PCM:  188,0 Mops/s  FASTER! YAY!
  14. atanhf            FPC:   45,9 Mops/s  PCM:  297,6 Mops/s  FASTER! YAY!
  15. expf              FPC:  239,2 Mops/s  PCM:  340,1 Mops/s  FASTER! YAY!
  16. logf              FPC:   78,7 Mops/s  PCM:  280,9 Mops/s  FASTER! YAY!
  17. log2f             FPC:   73,4 Mops/s  PCM:  333,3 Mops/s  FASTER! YAY!
  18. log10f            FPC:   74,2 Mops/s  PCM:  303,0 Mops/s  FASTER! YAY!
  19. atan2f            FPC:  183,2 Mops/s  PCM:  163,4 Mops/s
  20. hypotf            FPC:  287,4 Mops/s  PCM:  175,4 Mops/s
  21. powf              FPC:   43,8 Mops/s  PCM:  105,9 Mops/s  FASTER! YAY!
  22. sincosf           FPC:   24,0 Mops/s  PCM:   58,6 Mops/s  FASTER! YAY!
  23.  
  24. PCM won: 18  |  FPC won: 2  |  Ties (<5%): 0
  25. On average, PCM is 2,7x faster than FPC (arithmetic mean over 19 functions)
  26. GlobalSink = 591609502 (prevents dead-code elimination)
  27.  

and

FPC 3.3.1 -O4 -CfFMA -CpZEN3 -OpZEN3 -dAVX2

Code: Pascal  [Select][+][-]
  1. === FPC vs Pascal CORE-MATH (PCM) Benchmark: 50000000 calls per function ===
  2.  
  3. sinf              FPC:   19,1 Mops/s  PCM:   90,1 Mops/s  FASTER! YAY!
  4. cosf              FPC:   19,3 Mops/s  PCM:   85,5 Mops/s  FASTER! YAY!
  5. tanf              FPC:   33,2 Mops/s  PCM:   82,4 Mops/s  FASTER! YAY!
  6. asinf             FPC:  191,6 Mops/s  PCM:  335,6 Mops/s  FASTER! YAY!
  7. acosf             FPC:  157,7 Mops/s  PCM:  312,5 Mops/s  FASTER! YAY!
  8. atanf             FPC:  252,5 Mops/s  PCM:  267,4 Mops/s  FASTER! YAY!
  9. sinhf             FPC:  185,2 Mops/s  PCM:  216,5 Mops/s  FASTER! YAY!
  10. coshf             FPC:  177,9 Mops/s  PCM:  251,3 Mops/s  FASTER! YAY!
  11. tanhf             FPC:  145,8 Mops/s  PCM:  282,5 Mops/s  FASTER! YAY!
  12. asinhf            FPC:   42,8 Mops/s  PCM:  177,3 Mops/s  FASTER! YAY!
  13. acoshf            FPC:  113,1 Mops/s  PCM:  264,6 Mops/s  FASTER! YAY!
  14. atanhf            FPC:   46,2 Mops/s  PCM:  289,0 Mops/s  FASTER! YAY!
  15. expf              FPC:  227,3 Mops/s  PCM:  352,1 Mops/s  FASTER! YAY!
  16. logf              FPC:   77,6 Mops/s  PCM:  285,7 Mops/s  FASTER! YAY!
  17. log2f             FPC:   73,0 Mops/s  PCM:  316,5 Mops/s  FASTER! YAY!
  18. log10f            FPC:   72,6 Mops/s  PCM:  316,5 Mops/s  FASTER! YAY!
  19. atan2f            FPC:  215,5 Mops/s  PCM:  121,4 Mops/s
  20. hypotf            FPC:  318,5 Mops/s  PCM:  176,7 Mops/s
  21. powf              FPC:   45,1 Mops/s  PCM:   79,9 Mops/s  FASTER! YAY!
  22. sincosf           FPC:   24,2 Mops/s  PCM:   81,7 Mops/s  FASTER! YAY!
  23.  
  24. PCM won: 18  |  FPC won: 2  |  Ties (<5%): 0
  25. On average, PCM is 2,7x faster than FPC (arithmetic mean over 19 functions)
  26. GlobalSink = 2161696855 (prevents dead-code elimination)
  27.  

Notice, that the GlobalSink is different from the previous results. As this is simply a full xor'ing of all results some computations must yield different results.

schuler

  • Sr. Member
  • ****
  • Posts: 336
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #6 on: April 12, 2026, 02:27:12 pm »
:) very good morning :)

@mathman, love your testing!

MathMan

  • Hero Member
  • *****
  • Posts: 501
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #7 on: April 12, 2026, 02:59:27 pm »
Good morning to you too, @schuler  :)

Unfortunately I can not test as much as I'd like. I'm a hobbyist running Win64 and FPC only - no Linux & C compiler here.

Nevertheless - I currently go through the sources in depth. Already detected several things that I would handle different - but a full report will take some more time.

What's really bothering me is the double-rounding issue (e.g. in 'compundf'). I spotted it when I coarsly scanned the sources first time and thought "that is asking for trouble" - only to find, when I later read the 'tasklist.md', that there actually is trouble ...

schuler

  • Sr. Member
  • ****
  • Posts: 336
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #8 on: April 12, 2026, 03:23:01 pm »
@mathman and everyone, to be able to compile and test in linux for free, in github, you can press the green “code” button at https://github.com/joaopauloschuler/pas-core-math/ and then create a free codespace. The repo is ready to install all dependencies. You will wait a bit to build the codespace and then you will be ready to compile with build.sh and then run the tests.

MathMan

  • Hero Member
  • *****
  • Posts: 501
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #9 on: April 12, 2026, 11:48:54 pm »
@schuler,

Here is my first feedback. There is probably more to come, but I wanted to provide a first response today. All of the following is in light of use under FPC 3.2.2.

1. FPC 3.2.2 can not inline across units
1.1. the design of 'pascoremathtypes' and 'pascoremathhelperfuncs' should be modified accordingly
1.2. 'pascoremathtypes' should only contain type definitions - the operator overload of + and the MulWide functions should be moved
1.3. pack all functions from 'pascoremathtypes' and 'pascoremathhelperfuncs' that are marked as inline into a separate include file that is then included in the implementation section of 'pascoremath'
1.4. 'pascoremathhelperfunc' should then contain all functions that are not marked as inline - including 'pcr_poly12'
1.5. due to this there is no use in marking the 'pcr...' functions of 'pascoremath' as inline

2. FPC can not inline functions with asm - try to directly resolve this into the 'pcr...' functions.

3. FPC 3.2.2 already provides intrinsics for forward/reverse bitscan and arithmetic shift right - 'BsfDWord', 'BsrDWord', 'SarLongInt' & 'SarInt64'. Use these and replace the locally defined equivalents 'pcr_bsf32' etc. Advantage is that the intrinsics will be inlined automatically, whereas the locally defined variants use asm in some cases, which can not be inlined and generate calls to subroutine.

4. 'MulWide' should be renamed to 'Mulu64u64' I think. Beside that the pure Pascal variant can be inlined and is therefore faster than the asm variant, which can not be inlined. Finally the pure Pascal implementation is not good - it uses way to many variables. A better implementation would be

Code: Pascal  [Select][+][-]
  1. function Mulu64u64(a, b: UInt64): TUInt128; inline;
  2. var
  3.   MulHi: UInt64;   // intermediates for double limb product calculation
  4.   MulLo: UInt64;
  5.   Temp1: UInt64;
  6.   Temp2: UInt64;
  7. begin
  8.   MulLo := Lo( a ) * Lo( b );
  9.   Temp1 := ( a shr 32 ) * Lo( b );
  10.   Temp2 := Lo( a ) * ( b shr 32 ) + Lo( Temp1 ) + ( MulLo shr 32 );
  11.   MulLo := Lo( MulLo ) + ( Temp2 shl 32 );
  12.   MulHi := ( a shr 32 ) * ( b shr 32 ) + ( Temp1 shr 32 ) + ( Temp2 shr 32 );
  13.  
  14.   // these must be the last actions, as Low=Val1 or Low=Val2 is possible
  15.   Result.lo := MulLo;
  16.   Result.hi := MulHi;
  17. end;

5. I think the type system is lacking - strictly use the FPC types UInt32, UInt64, Int32 and Int64. Especially avoid casts to Integer.

6. I think the benchmark should be modified. The current approach goes through the full range of input parameter(s) on a fixed stride. However, in real live scenarios, the majority of calls (like 90%) have the parameter(s) in a sweet spot range - i.e. -10.0..10.0 for the trigonometrics, -1.0..1.0 for the hyperbolics etc. And within that sweet spot the values are distributed uniform mainly. Can the benchmark be modified in a way that the values are taken from an array, which is prefilled with random values according to the above description?

7. still open
7.1. the double rounding issue and sub-sequent failures of 'compundf' etc.
7.2. investigate why 'pcr_atan2f' and 'pcr_hypotf' are slower than the current FPC RTL equivalents. I suspect that the RTL implementation only follows a fast path and has lots of errors in the outer regions.
7.3. can you have BPSA implement a similar check of FPC RTL functions against the CORE MATH C library, like there is for the Pascal CORE MATH?

8. Constants
8.1. All tables have been converted from C hexadecimal floating point notation to decimal floating point notation for Pascal. But decimal fp notation is a can of worms - especially when getting close to +/-Inf or into the sub-normal range. As there are already compound types available 'tb32u32' and 'tb64u64' I suggest to convert the hex floats to UInt32 / UInt64 and modify the tables accordingly.
8.2. same holds for constants used in the function implementations - I would also try to use UInt32 / UInt64 throughout. Funnily some of the 'pcr..' functions already have this - i.e. 'pcr_rsqrtf' or 'pcr_tanhf' - while others don't.

Cheers,
MathMan
« Last Edit: April 13, 2026, 12:45:04 am by MathMan »

nanobit

  • Full Member
  • ***
  • Posts: 189
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #10 on: April 13, 2026, 10:35:35 am »

A better implementation would be
Code: Pascal  [Select][+][-]
  1. function Mulu64u64(a, b: UInt64): TUInt128; inline;
  2. var
  3.   MulHi: UInt64;   // intermediates for double limb product calculation
  4.   MulLo: UInt64;
  5.   Temp1: UInt64;
  6.   Temp2: UInt64;
  7. begin
  8.   MulLo := Lo( a ) * Lo( b );
  9.   Temp1 := ( a shr 32 ) * Lo( b );
  10.   Temp2 := Lo( a ) * ( b shr 32 ) + Lo( Temp1 ) + ( MulLo shr 32 );
  11.   MulLo := Lo( MulLo ) + ( Temp2 shl 32 );
  12.   MulHi := ( a shr 32 ) * ( b shr 32 ) + ( Temp1 shr 32 ) + ( Temp2 shr 32 );
  13.  
  14.   // these must be the last actions, as Low=Val1 or Low=Val2 is possible
  15.   Result.lo := MulLo;
  16.   Result.hi := MulHi;
  17. end;

Lo( a ) * Lo( b ) will give uint32 under cpu32.
I modified your routine somewhat:

Code: Pascal  [Select][+][-]
  1. var
  2.   MulLo, Temp1, Temp2: UInt64;
  3. begin
  4.   MulLo := uint64(uint32(a)) * uint64(uint32(b));
  5.   Temp1 := (a shr 32) * uint64(uint32(b)) + (MulLo shr 32);
  6.   Temp2 := uint64(uint32(a)) * (b shr 32) + uint64(uint32(Temp1));
  7.   Result.lo := (Temp2 shl 32) or (MulLo and $FFFFFFFF);
  8.   Result.hi := (a shr 32) * (b shr 32) + (Temp1 shr 32) + (Temp2 shr 32);
  9. end;
« Last Edit: April 15, 2026, 06:04:39 am by nanobit »

schuler

  • Sr. Member
  • ****
  • Posts: 336
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #11 on: April 13, 2026, 02:21:29 pm »
STATUS UPDATE
  • Out of all original 32 bits float functions, 6 functions do not produce a binary exact result according to the original C implementation. In all 6 functions that fail the binary test, the numerical error is smaller than 10^(-18).
  • In 2 linux machines, the new functions compared to the existing FPC functions are 20x faster according to the benchmarking code BenchmarkFPC32. I would like to ask for more members of this forum to replicate this experiment as it varies from hardware to hardware. In windows, the speed gain is in the range from 2x to 3x. Given that my own Neural API is mostly run on Linux, I’m already making plans to use pas-core-math with CAI Neural API.

Code: Pascal  [Select][+][-]
  1. bpsa@Ubuntu24:~/app/pas-core-math/src/tests$ lscpu | grep "Model name"
  2. Model name:                              13th Gen Intel(R) Core(TM) i7-1355U
  3.  
  4. bpsa@Ubuntu24:~/app/pas-core-math/src/tests$ uname -a
  5. Linux Ubuntu24 6.17.0-20-generic #20~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 19 01:28:37 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
  6.  
  7. bpsa@Ubuntu24:~/app/pas-core-math/src/tests$ ./bin/BenchmarkFPC32
  8. === FPC vs Pascal CORE-MATH (PCM) Benchmark: 50000000 calls per function ===
  9.  
  10. sinf              FPC:   45.3 Mops/s  PCM:  135.1 Mops/s  FASTER! YAY!
  11. cosf              FPC:   44.0 Mops/s  PCM:  130.2 Mops/s  FASTER! YAY!
  12. tanf              FPC:   17.1 Mops/s  PCM:  132.6 Mops/s  FASTER! YAY!
  13. asinf             FPC:    8.4 Mops/s  PCM:  431.0 Mops/s  FASTER! YAY!
  14. acosf             FPC:    8.1 Mops/s  PCM:  201.6 Mops/s  FASTER! YAY!
  15. atanf             FPC:   27.8 Mops/s  PCM:  347.2 Mops/s  FASTER! YAY!
  16. sinhf             FPC:    5.3 Mops/s  PCM:  347.2 Mops/s  FASTER! YAY!
  17. coshf             FPC:    5.4 Mops/s  PCM:  409.8 Mops/s  FASTER! YAY!
  18. tanhf             FPC:   31.6 Mops/s  PCM:  294.1 Mops/s  FASTER! YAY!
  19. asinhf            FPC:   27.9 Mops/s  PCM:  238.1 Mops/s  FASTER! YAY!
  20. acoshf            FPC:    6.3 Mops/s  PCM:  187.3 Mops/s  FASTER! YAY!
  21. atanhf            FPC:    7.0 Mops/s  PCM:  431.0 Mops/s  FASTER! YAY!
  22. expf              FPC:   16.7 Mops/s  PCM:  179.2 Mops/s  FASTER! YAY!
  23. logf              FPC:   46.1 Mops/s  PCM:  233.6 Mops/s  FASTER! YAY!
  24. log2f             FPC:   43.3 Mops/s  PCM:  271.7 Mops/s  FASTER! YAY!
  25. log10f            FPC:   14.4 Mops/s  PCM:  400.0 Mops/s  FASTER! YAY!
  26. atan2f            FPC:   17.6 Mops/s  PCM:   92.1 Mops/s  FASTER! YAY!
  27. hypotf            FPC:   67.0 Mops/s  PCM:  203.3 Mops/s  FASTER! YAY!
  28. powf              FPC:    1.8 Mops/s  PCM:  106.2 Mops/s  FASTER! YAY!
  29. sincosf           FPC:   18.2 Mops/s  PCM:  108.7 Mops/s  FASTER! YAY!
  30.  
  31. PCM won: 20  |  FPC won: 0  |  Ties (<5%): 0
  32. On average, PCM is 24.6x faster than FPC (arithmetic mean over 19 functions)
  33.  
  34.  
  35. bpsa@Ubuntu24:~/app/pas-core-math/src/tests$ LD_LIBRARY_PATH=/home/bpsa/app/pas-core-math/src/tests/.. /home/bpsa/app/pas-core-math/src/tests/../../bin/Benchmark32
  36. === Benchmark: 50000000 calls per function ===
  37. acosf             C:  349.7 Mops/s  Pascal:  174.8 Mops/s  sink=MATCH
  38. acoshf            C:  127.9 Mops/s  Pascal:  176.1 Mops/s  sink=MATCH  FASTER! YAY!
  39. acospif           C:  227.3 Mops/s  Pascal:  220.3 Mops/s  sink=MATCH  TIE
  40. asinf             C:  365.0 Mops/s  Pascal:  416.7 Mops/s  sink=MATCH  FASTER! YAY!
  41. asinhf            C:  243.9 Mops/s  Pascal:  222.2 Mops/s  sink=MATCH
  42. asinpif           C:  216.5 Mops/s  Pascal:  197.6 Mops/s  sink=MATCH
  43. atanf             C:  301.2 Mops/s  Pascal:  337.8 Mops/s  sink=MATCH  FASTER! YAY!
  44. atanhf            C:  378.8 Mops/s  Pascal:  393.7 Mops/s  sink=MATCH  TIE
  45. atanpif           C:  373.1 Mops/s  Pascal:  413.2 Mops/s  sink=MATCH  FASTER! YAY!
  46. cbrtf             C:  229.4 Mops/s  Pascal:  181.8 Mops/s  sink=MATCH
  47. cosf              C:  218.3 Mops/s  Pascal:  119.3 Mops/s  sink=MISMATCH
  48. coshf             C:  312.5 Mops/s  Pascal:  400.0 Mops/s  sink=MATCH  FASTER! YAY!
  49. cospif            C:  373.1 Mops/s  Pascal:  378.8 Mops/s  sink=MISMATCH  TIE
  50. erff              C:  326.8 Mops/s  Pascal:  183.2 Mops/s  sink=MATCH
  51. erfcf             C:  340.1 Mops/s  Pascal:  354.6 Mops/s  sink=MATCH  TIE
  52. expf              C:  320.5 Mops/s  Pascal:  167.8 Mops/s  sink=MATCH
  53. exp10f            C:  304.9 Mops/s  Pascal:  308.6 Mops/s  sink=MATCH  TIE
  54. exp10m1f          C:  274.7 Mops/s  Pascal:  248.8 Mops/s  sink=MATCH
  55. exp2f             C:  333.3 Mops/s  Pascal:  335.6 Mops/s  sink=MATCH  TIE
  56. exp2m1f           C:  297.6 Mops/s  Pascal:  340.1 Mops/s  sink=MATCH  FASTER! YAY!
  57. expm1f            C:  299.4 Mops/s  Pascal:  340.1 Mops/s  sink=MATCH  FASTER! YAY!
  58. lgammaf           C:  138.9 Mops/s  Pascal:   35.9 Mops/s  sink=MATCH
  59. logf              C:  367.6 Mops/s  Pascal:  314.5 Mops/s  sink=MATCH
  60. log10f            C:  263.2 Mops/s  Pascal:  362.3 Mops/s  sink=MATCH  FASTER! YAY!
  61. log10p1f          C:  219.3 Mops/s  Pascal:  158.2 Mops/s  sink=MATCH
  62. log1pf            C:  333.3 Mops/s  Pascal:  316.5 Mops/s  sink=MATCH
  63. log2f             C:  335.6 Mops/s  Pascal:  337.8 Mops/s  sink=MATCH  TIE
  64. log2p1f           C:  352.1 Mops/s  Pascal:  183.2 Mops/s  sink=MATCH
  65. rsqrtf            C:  171.2 Mops/s  Pascal:  463.0 Mops/s  sink=MATCH  FASTER! YAY!
  66. sinf              C:  214.6 Mops/s  Pascal:  126.6 Mops/s  sink=MISMATCH
  67. sinhf             C:  312.5 Mops/s  Pascal:  333.3 Mops/s  sink=MATCH  FASTER! YAY!
  68. sinpif            C:  378.8 Mops/s  Pascal:  375.9 Mops/s  sink=MISMATCH  TIE
  69. tanf              C:  192.3 Mops/s  Pascal:  121.7 Mops/s  sink=MISMATCH
  70. tanhf             C:  312.5 Mops/s  Pascal:  284.1 Mops/s  sink=MATCH
  71. tanpif            C:  204.1 Mops/s  Pascal:  219.3 Mops/s  sink=MISMATCH  FASTER! YAY!
  72. tgammaf           C:  314.5 Mops/s  Pascal:  123.5 Mops/s  sink=MATCH
  73. atan2f            C:  260.4 Mops/s  Pascal:   85.5 Mops/s  sink=MATCH
  74. atan2pif          C:    6.6 Mops/s  Pascal:    3.2 Mops/s  sink=MATCH
  75. hypotf            C:  156.3 Mops/s  Pascal:  202.4 Mops/s  sink=MATCH  FASTER! YAY!
  76. powf              C:  151.1 Mops/s  Pascal:   95.4 Mops/s  sink=MISMATCH
  77. compoundf         C:   99.4 Mops/s  Pascal:  110.9 Mops/s  sink=MISMATCH  FASTER! YAY!
  78. sincosf           C:  199.2 Mops/s  Pascal:   95.8 Mops/s  sink=MATCH
  79.  
  80. Pascal won: 13  |  C won: 21  |  Ties (<5%): 8
  81.  
  82.  
  83. bpsa@Ubuntu24:~/app/pas-core-math/src/tests$ LD_LIBRARY_PATH=/home/bpsa/app/pas-core-math/src/tests/.. /home/bpsa/app/pas-core-math/src/tests/../../bin/TestHarness32
  84. === TestHarness: comparing Pascal (pcr_*) vs C (cr_*) ===
  85. acosf             tested=4294967296  mismatches=2  FAIL max_error=0.000000000000000000
  86. acoshf            tested=4294967296  mismatches=0  PASS
  87. acospif           tested=4294967296  mismatches=0  PASS
  88. asinf             tested=4294967296  mismatches=0  PASS
  89. asinhf            tested=4294967296  mismatches=0  PASS
  90. asinpif           tested=4294967296  mismatches=0  PASS
  91. atanf             tested=4294967296  mismatches=0  PASS
  92. atanhf            tested=4294967296  mismatches=0  PASS
  93. atanpif           tested=4294967296  mismatches=0  PASS
  94. cbrtf             tested=4294967296  mismatches=0  PASS
  95. cosf              tested=4294967296  mismatches=0  PASS
  96. coshf             tested=4294967296  mismatches=0  PASS
  97. cospif            tested=4294967296  mismatches=0  PASS
  98. erff              tested=4294967296  mismatches=0  PASS
  99. erfcf             tested=4294967296  mismatches=0  PASS
  100. expf              tested=4294967296  mismatches=1  FAIL max_error=0.000000000000000000
  101. exp10f            tested=4294967296  mismatches=0  PASS
  102. exp10m1f          tested=4294967296  mismatches=0  PASS
  103. exp2f             tested=4294967296  mismatches=0  PASS
  104. exp2m1f           tested=4294967296  mismatches=0  PASS
  105. expm1f            tested=4294967296  mismatches=0  PASS
  106. lgammaf           tested=4294967296  mismatches=0  PASS
  107. logf              tested=4294967296  mismatches=0  PASS
  108. log10f            tested=4294967296  mismatches=0  PASS
  109. log10p1f          tested=4294967296  mismatches=0  PASS
  110. log1pf            tested=4294967296  mismatches=0  PASS
  111. log2f             tested=4294967296  mismatches=0  PASS
  112. log2p1f           tested=4294967296  mismatches=1  FAIL max_error=0.000000000000000000
  113. rsqrtf            tested=4294967296  mismatches=0  PASS
  114. sinf              tested=4294967296  mismatches=0  PASS
  115. sinhf             tested=4294967296  mismatches=0  PASS
  116. sinpif            tested=4294967296  mismatches=0  PASS
  117. tanf              tested=4294967296  mismatches=0  PASS
  118. tanhf             tested=4294967296  mismatches=0  PASS
  119. tanpif            tested=4294967296  mismatches=0  PASS
  120. tgammaf           tested=4294967296  mismatches=1  FAIL max_error=0.000000000000000000
  121. atan2f            tested=  10000000  mismatches=0  PASS
  122. atan2pif          tested=  10000000  mismatches=0  PASS
  123. hypotf            tested=  10000000  mismatches=0  PASS
  124. powf              tested=  10000000  mismatches=458657  FAIL max_error=0.000000000000000000
  125. compoundf         tested=  10000000  mismatches=1903717  FAIL max_error=0.000000000000000000
  126. sincosf           tested=4294967296  mismatches=0  PASS
  127.  
  128. === TOTAL: 36 PASS, 6 FAIL ===
« Last Edit: April 13, 2026, 04:24:53 pm by schuler »

schuler

  • Sr. Member
  • ****
  • Posts: 336
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #12 on: April 13, 2026, 02:25:26 pm »
@mathman, many thanks for spending time with a detailed code review. I’ll FUP.

note:
Quote
In old versions of Free Pascal, inline code was not exported from a unit. This meant that when calling an inline procedure from another unit, a normal procedure call will be performed. Only inside units, Inline procedures are really inlined. As of version 2.0.2, inline works across units.
https://www.freepascal.org/docs-html/ref/refsu77.html
« Last Edit: April 13, 2026, 09:27:16 pm by schuler »

MathMan

  • Hero Member
  • *****
  • Posts: 501
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #13 on: April 13, 2026, 10:02:59 pm »
@mathman, many thanks for spending time with a detailed code review. I’ll FUP.

note:
Quote
In old versions of Free Pascal, inline code was not exported from a unit. This meant that when calling an inline procedure from another unit, a normal procedure call will be performed. Only inside units, Inline procedures are really inlined. As of version 2.0.2, inline works across units.
https://www.freepascal.org/docs-html/ref/refsu77.html

@schuler, I stand corrected - somehow this crucial info passed me by.

schuler

  • Sr. Member
  • ****
  • Posts: 336
Re: AI assisted translation of CORE-MATH to Free Pascal
« Reply #14 on: April 13, 2026, 10:41:29 pm »
@mathman, from your list:
3) Done.
4) Renaming: done.
5) Done for Int32.

 

TinyPortal © 2005-2018