Recent

Author Topic: The fastest integer type?  (Read 4377 times)

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11351
  • FPC developer.
Re: The fastest integer type?
« Reply #15 on: August 10, 2019, 01:36:45 pm »
The code already is avx2 ?
It's enough SSE2 to beat AVX2 that working with Int32.

Why? Show and explain, let us learn. Do you miss certain instructions in AVX2? Do you have a processor (like Ryzen<3000 series) that implements avx2 with two pipes? Is the 128-bit lane shuffle limit somehow a problem? (hitting shuffle limits?)

I mostly do bytewise SSE2 (and in rare cases avx2), but inbetween results are often 16-bit.   

Thaddy

  • Hero Member
  • *****
  • Posts: 14159
  • Probably until I exterminate Putin.
Re: The fastest integer type?
« Reply #16 on: August 10, 2019, 01:52:07 pm »
The code already is avx2 ?
It's enough SSE2 to beat AVX2 that working with Int32.

Please answer my question! You are completely incomprehensable. Usually caused by language problems (fair) or having no clue at all (worrying). For now I assume the latter.
« Last Edit: August 10, 2019, 01:54:38 pm by Thaddy »
Specialize a type, not a var.

LemonParty

  • Jr. Member
  • **
  • Posts: 58
Re: The fastest integer type?
« Reply #17 on: August 10, 2019, 02:07:23 pm »
Why?
There is an explanation in this answer.
In short processor can do calculations fast until the cache not limit him.
For example you can get 64 bytes from cache or 16 dwords, that's mean you can process 16 or 64 integers per portion of time.
This doesn't depend on instructions because the speed of RAM is constant.
« Last Edit: August 10, 2019, 02:10:39 pm by LemonParty »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11351
  • FPC developer.
Re: The fastest integer type?
« Reply #18 on: August 10, 2019, 02:33:07 pm »
Why?
There is an explanation in this answer.
In short processor can do calculations fast until the cache not limit him.
For example you can get 64 bytes from cache or 16 dwords, that's mean you can process 16 or 64 integers per portion of time.
This doesn't depend on instructions because the speed of RAM is constant.

Yes. That is possible, if your dataset is much larger than your cache, so that it can be assumed cold, and with relatively simple instructions (unpack - add - pack cycle a few times unrolled, or not even that I assume). Memory sizes, cache size etc, ARE considerably increasing though. Todays cold load might still be in cache tomorrow, look at the sizes on these puppies

I have some SIMD code for work, mostly dealing with image format transformation and kernel operations. It is simplified (64-bit only, aligned only, only widths that are multiples of 32px etc)

Most of it is still SSE2 for similar reasons. Only color distance and YUV/HSV conversions are AVX2. For three reasons:

  • simple code doesn't benefit as much
  • I use Delphi which @$*@$HYQ#E@# still doesn't support AVX2. The avx2 code is in FPC generated DLLs, but I only do it when it matters
  • The shuffle units of AVX2 still have some limitations for 1 and 2-byte quantities to shift them over 128-bit lanes. Probably it is possible, but a whole lot more complicated


LemonParty

  • Jr. Member
  • **
  • Posts: 58
Re: The fastest integer type?
« Reply #19 on: August 10, 2019, 02:47:13 pm »
CPUs with 8+ cores can open a potential of AVX2 instructions in multithread code, because the bigger cache you have the more data you can prefetch.

circular

  • Hero Member
  • *****
  • Posts: 4181
    • Personal webpage
Re: The fastest integer type?
« Reply #20 on: August 19, 2019, 02:43:57 pm »
I would agree that cache is the main limit. I have been trying to use 64 bit integers instead of 32 bit integers but as a matter of fact, it did not increase the speed significantly. But hitting the cache limit make a very big difference.
Conscience is the debugger of the mind

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11351
  • FPC developer.
Re: The fastest integer type?
« Reply #21 on: August 19, 2019, 04:58:43 pm »
CPUs with 8+ cores can open a potential of AVX2 instructions in multithread code, because the bigger cache you have the more data you can prefetch.

On AMD afaik cores can only use cache on the same core complex. Larger numbers are typically fragmented over multiple core complexes.

munair

  • Hero Member
  • *****
  • Posts: 798
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: The fastest integer type?
« Reply #22 on: September 20, 2019, 08:40:43 pm »
I wonder if a CPU cache would make much difference here. Generally the fastest integer is the native bit width of a system. So on a 16 bits system two byte integers are fastest, on a 32 bits system that would be 4 bytes etc.
keep it simple

BeniBela

  • Hero Member
  • *****
  • Posts: 905
    • homepage
Re: The fastest integer type?
« Reply #23 on: September 20, 2019, 11:43:39 pm »
Not everyone has AVX btw

Till last week I have been using an i5-520M without any AVX

VTwin

  • Hero Member
  • *****
  • Posts: 1215
  • Former Turbo Pascal 3 user
Re: The fastest integer type?
« Reply #24 on: September 21, 2019, 01:49:34 am »
I love this forum. Looking forward to the shootout.

It won't change my code though, I agree with wp.
« Last Edit: September 21, 2019, 01:51:32 am by VTwin »
“Talk is cheap. Show me the code.” -Linus Torvalds

Free Pascal Compiler 3.2.2
macOS 12.1: Lazarus 2.2.6 (64 bit Cocoa M1)
Ubuntu 18.04.3: Lazarus 2.2.6 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.2.6 (64 bit on VBox)

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: The fastest integer type?
« Reply #25 on: September 21, 2019, 03:10:54 pm »
I love this forum. Looking forward to the shootout.
It won't change my code though, I agree with wp.
+1

 

TinyPortal © 2005-2018