Recent

Author Topic: Optimization: Performance fpc -ic for x86_64-win64?  (Read 462 times)

HenrikErlandsson

  • New Member
  • *
  • Posts: 19
  • ^ Happy coder :)
Optimization: Performance fpc -ic for x86_64-win64?
« on: September 07, 2019, 01:40:47 am »
This is a niche question for Win7x64 and Win10x64 builds only. You could say it's only for my own personal pleasure :) but I do want to support i7 4790k up to Ryzen 3rd gen, hopefully including Ryzen 9 3950X without fail.

What I get from fpc -ic on the base CPU i7 4790k is:

Code: Pascal  [Select]
  1. ATHLON64
  2. COREI
  3. COREAVX
  4. COREAVX2

Am I correct in thinking -CpATHLON64 will do what I want?

I will also listen to any fpc -io advice towards my goal. ;D
Turbo Pascal was the tool of my trade as a young professional.

Thaddy

  • Hero Member
  • *****
  • Posts: 8681
Re: Optimization: Performance fpc -ic for x86_64-win64?
« Reply #1 on: September 07, 2019, 07:51:59 am »
ATHLON64 is a baseline for 64 bit. I believe both processor families you mention also support coreavx of coreavx2. Check that e.g. with cpiuid. In that case use coreavx2.
You can combine the latter with the vectorcall calling convention on windows to achieve  better performance for properly aligned single and double arrays. (3.2 and higher)
« Last Edit: September 07, 2019, 09:47:52 am by Thaddy »
Most people that want to use threading should learn to patch their jeans first: use a needle.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7362
Re: Optimization: Performance fpc -ic for x86_64-win64?
« Reply #2 on: September 07, 2019, 01:38:55 pm »
Haswell and later mostly have AVX2 (at least i3,i5,i7, according to wikipedia some sku's below i3 don't). Ryzen also has AVX2.

Not all CPUs are equal though, e.g. only Ryzen 1000 and 2000 series implement avx2 instructions using two avx1 pipes. Ryzen 3000 series has native avx2 (and thus can do more avx2 instructions per tick)

HenrikErlandsson

  • New Member
  • *
  • Posts: 19
  • ^ Happy coder :)
Re: Optimization: Performance fpc -ic for x86_64-win64?
« Reply #3 on: September 07, 2019, 03:44:43 pm »
Thx for clear answers, AVX2 it is. Cheers. 8)
Turbo Pascal was the tool of my trade as a young professional.

Thaddy

  • Hero Member
  • *****
  • Posts: 8681
Re: Optimization: Performance fpc -ic for x86_64-win64?
« Reply #4 on: September 07, 2019, 04:38:27 pm »
Don't forget my tip regarding vectorcall, it may be interesting to you it shows quite some speed improvements on my tests, especially when using AVX2.
From the preliminary fpc 3.2.0 documentation: https://wiki.freepascal.org/FPC_New_Features_3.2#Support_for_Microsoft.27s_vectorcall_calling_convention
Most people that want to use threading should learn to patch their jeans first: use a needle.

HenrikErlandsson

  • New Member
  • *
  • Posts: 19
  • ^ Happy coder :)
Re: Optimization: Performance fpc -ic for x86_64-win64?
« Reply #5 on: September 08, 2019, 02:29:43 am »
All right cheers Thaddy. How do I specify the vectorcall modifier? I Can't find like a -Gv or similar, surely I don't type it as a keyword or something in each call?? (Asking because I just ran the standalone physics example in the Castle Game Engine and didn't see anything like that in the code at first glance.)

BTW, this performance thing is for OpenGL, and it will be GPU heavy not CPU heavy I think. Just looking to avoid unnecessary CPU performance hits if I can. This is for playing around with GLUT (not LCL) + GLSL (at least I *think* that's what I want, do suggest, I'm sort of just finding out stuff at this point  :D)

Thing is, I've coded OpenGL before but that was either through layers or WebGL so I haven't really given this performance thing a proper try yet.)

« Last Edit: September 08, 2019, 02:36:45 am by HenrikErlandsson »
Turbo Pascal was the tool of my trade as a young professional.

Thaddy

  • Hero Member
  • *****
  • Posts: 8681
Re: Optimization: Performance fpc -ic for x86_64-win64?
« Reply #6 on: September 08, 2019, 07:15:48 am »
Similar to:
Code: Pascal  [Select]
  1.  function testme(const a:TSingleArray):Boolean;stdcall; {or nothing at all}
  2.  // you would declare it like this:
  3.  function fastertestme(const a:TSingleArray):Boolean;vectorcall;
  4.  

You can also use {$calling vectorcall} on top of a unit but that is overkill. Use it only where it matters.

Even on GPU intensive code it will have a speed effect, e.g. on moves or pre-processing buffers..
« Last Edit: September 08, 2019, 07:24:10 am by Thaddy »
Most people that want to use threading should learn to patch their jeans first: use a needle.