Recent

Author Topic: Count elements in set  (Read 8588 times)

Thaddy

  • Hero Member
  • *****
  • Posts: 19245
  • Glad to be alive.
Re: Count elements in set
« Reply #30 on: August 09, 2025, 04:55:13 pm »
I settled for:
Code: Pascal  [Select][+][-]
  1. function CountSet<T>(const aSet: T): Integer; inline;
  2. var
  3.   p: PByte;
  4.   i: Integer;
  5. begin
  6.   if GetTypeKind(T) <> tkSet then
  7.      Exit(-1);// -1 means it failed. Skipped exception handling.
  8.   Result := 0;
  9.   p := @aSet;
  10.   for i := 0 to SizeOf(T) - 1 do
  11.   begin
  12.     Result := Result + popcnt(p^);
  13.     Inc(p);
  14.   end;
  15. end;
Because that tests 100% coverage without any other issues and is fast if you specify something more modern than AMD64.
Then the asm instruction popcnt is used. ARM/AARCH also seem to be doing OK.
It covers all {$packset} settings on testing.
And yes, I agree there should be an intrinsic. This is a prime example where an intrinsic is warranted.

« Last Edit: August 09, 2025, 04:59:01 pm by Thaddy »
objects are fine constructs. You can even initialize them with constructors.

avk

  • Hero Member
  • *****
  • Posts: 826
Re: Count elements in set
« Reply #31 on: August 09, 2025, 05:00:27 pm »
And I ended up with something like:
Code: Pascal  [Select][+][-]
  1. program test;
  2. {$mode objfpc}{$modeswitch implicitfunctionspecialization}{$packset 1}
  3.  
  4.  
  5. {$PUSH}{$MACRO ON}
  6. {$IF FPC_FULLVERSION>30300}{$WARN 6060 OFF : Case statement does not handle all possible cases}{$ENDIF}
  7. generic function Card<TSet>(const s: TSet): Integer; inline;
  8. {$DEFINE CardCaseMacro :=
  9.   case SizeOf(s) of
  10.     OfsMacro+1: Inc(Result, PopCnt(p[OfsMacro]));
  11.     OfsMacro+2: Inc(Result, PopCnt(PWord(p+OfsMacro)^));
  12.     OfsMacro+3:
  13.       begin
  14.         Inc(Result, PopCnt(PWord(p+OfsMacro)^));
  15.         Inc(Result, PopCnt(p[OfsMacro+2]));
  16.       end;
  17.     OfsMacro+4: Inc(Result, PopCnt(PDWord(p+OfsMacro)^));
  18.     OfsMacro+5:
  19.       begin
  20.         Inc(Result, PopCnt(PDWord(p+OfsMacro)^));
  21.         Inc(Result, PopCnt(p[OfsMacro+4]));
  22.       end;
  23.     OfsMacro+6:
  24.       begin
  25.         Inc(Result, PopCnt(PDWord(p+OfsMacro)^));
  26.         Inc(Result, PopCnt(PWord(p+OfsMacro+4)^));
  27.       end;
  28.     OfsMacro+7:
  29.       begin
  30.         Inc(Result, PopCnt(PDWord(p+OfsMacro)^));
  31.         Inc(Result, PopCnt(PWord(p+OfsMacro+4)^));
  32.         Inc(Result, PopCnt(p[OfsMacro+6]));
  33.       end;
  34.   end
  35. }
  36. var
  37.   p: PByte;
  38. begin
  39.   if GetTypeKind(s) <> tkSet then exit(-1);
  40.   p := @s;
  41.   case SizeOf(s) div 8 of
  42.     0:
  43.       begin
  44.         Result := 0;
  45.         {$DEFINE OfsMacro := 0}
  46.         CardCaseMacro;
  47.       end;
  48.     1:
  49.       begin
  50.         Result := PopCnt(PQWord(p)^);
  51.         {$DEFINE OfsMacro := 8}
  52.         CardCaseMacro;
  53.       end;
  54.     2:
  55.       begin
  56.         Result := PopCnt(PQWord(p)^) + PopCnt(PQWord(p+8)^);
  57.         {$DEFINE OfsMacro := 16}
  58.         CardCaseMacro;
  59.       end;
  60.     3:
  61.       begin
  62.         Result := PopCnt(PQWord(p)^) + PopCnt(PQWord(p+8)^) + PopCnt(PQWord(p+16)^);
  63.         {$DEFINE OfsMacro := 24}
  64.         CardCaseMacro;
  65.       end;
  66.     4: Result := PopCnt(PQWord(p)^) + PopCnt(PQWord(p+8)^) + PopCnt(PQWord(p+16)^) + PopCnt(PQWord(p+24)^);
  67.   end;
  68. end;
  69. {$UNDEF CardCaseMacro}{$UNDEF OfsMacro}{$POP}
  70.  
  71. type
  72.   TSet = set of 0..230;
  73.  
  74. var
  75.   s: TSet;
  76.  
  77. begin
  78.   s := [0..230];
  79.   WriteLn(SizeOf(s));
  80.   WriteLn(Card(s));
  81. end.
  82.  

FPC-3.3.1 generates the following code:
Code: ASM  [Select][+][-]
  1. ...
  2. # [80] WriteLn(Card(s));
  3.         call    fpc_get_output
  4.         movq    %rax,%rbx
  5.  
  6.         leaq    U_$P$TEST_$$_S(%rip),%rax
  7.         popcntq (%rax),%rdx
  8.         popcntq 8(%rax),%rcx
  9.         addq    %rcx,%rdx
  10.         popcntq 16(%rax),%rcx
  11.         leaq    (%rdx,%rcx),%r8
  12.         popcntl 24(%rax),%edx
  13.         addl    %edx,%r8d
  14.         movzbw  28(%rax),%ax
  15.         popcntw %ax,%ax
  16.         movzbl  %al,%eax
  17.         addl    %eax,%r8d
  18.         movslq  %r8d,%r8
  19.  
  20.         movq    %rbx,%rdx
  21.         xorl    %ecx,%ecx
  22.         call    fpc_write_text_sint
  23.         call    fpc_iocheck
  24.         movq    %rbx,%rcx
  25.         call    fpc_writeln_end
  26.         call    fpc_iocheck
  27.   ...
  28.  

Thaddy

  • Hero Member
  • *****
  • Posts: 19245
  • Glad to be alive.
Re: Count elements in set
« Reply #32 on: August 09, 2025, 08:51:04 pm »
Code: Pascal  [Select][+][-]
  1. call    fpc_iocheck
Use {$I-}, makes it faster. %)
Nice macro use, but I stick to the simplicity.
KUDOS.
And thx for helping.
Code: ASM  [Select][+][-]
  1. # Var p located in register rcx
  2. # Var i located in register r8d
  3. # [15] for i := 0 to SizeOf(T) - 1 do
  4.         movl    $24,%r8d
  5.         .p2align 4,,10
  6.         .p2align 3
  7. .Lj12:
  8. # [17] Result := Result + popcnt(p^);
  9.         movzbw  (%rcx),%dx
  10.         popcntw %dx,%dx
  11.         movzbl  %dl,%edx
  12.         addl    %edx,%eax
  13. # [18] Inc(p);
  14.         addq    $1,%rcx
  15.         subl    $1,%r8d
  16.         jne     .Lj12
  17. .Lc6:
  18. # [20] end;
  19.         ret
Which is slower.. Good enough.
« Last Edit: August 09, 2025, 09:09:56 pm by Thaddy »
objects are fine constructs. You can even initialize them with constructors.

avk

  • Hero Member
  • *****
  • Posts: 826
Re: Count elements in set
« Reply #33 on: August 10, 2025, 09:24:00 am »
@Thaddy, thanks.
I think that despite its grim appearance, Card() should be on average about ten times faster than some function that uses byte-wise calculation.

...
 All this bloat being presented does not show a lot of confidence of clean coding. You should know that for each
incursion of a Generic on a different Set is going to generate a complete set of code again, which is why I suggested a simple G interface calling a single generated code block for all.
...

@Jamie, thank you for your concern.
Nevertheless, the situation with the Card() function is even more dramatic than you described. It should be inlined in every place where it is used.
But no one forces you to use it, is it? After all, what you don't use can't hurt you in any way.

PascalDragon

  • Hero Member
  • *****
  • Posts: 6396
  • Compiler Developer
Re: Count elements in set
« Reply #34 on: August 10, 2025, 09:24:22 pm »
its obvious that all that needs to be done is have the compiler allow the use of popcnt with sets which are just
simple storage types. You can do that now with type cast and it works just fine.

No, sets can be larger than 4 (or 8 on 64-bit) Bytes, then you can't simply use PopCnt on them.

PascalDragon

  • Hero Member
  • *****
  • Posts: 6396
  • Compiler Developer
Re: Count elements in set
« Reply #35 on: August 14, 2025, 08:24:16 pm »
It just makes no sense to extend PopCnt to essentially work with any size of argument.
The correct solution for this is the introduction of the ISO Extended Pascal-compatible Cards intrinsic which obviously will use PopCnt internally.

Bart

  • Hero Member
  • *****
  • Posts: 5727
    • Bart en Mariska's Webstek
Re: Count elements in set
« Reply #36 on: August 19, 2025, 07:37:48 pm »
The correct solution for this is the introduction of the ISO Extended Pascal-compatible Cards intrinsic which obviously will use PopCnt internally.

IIRC it is CARD() not CARDS()?
At least Gnu Pascal implements it like that.

Bart

PascalDragon

  • Hero Member
  • *****
  • Posts: 6396
  • Compiler Developer
Re: Count elements in set
« Reply #37 on: August 19, 2025, 08:53:32 pm »
The correct solution for this is the introduction of the ISO Extended Pascal-compatible Cards intrinsic which obviously will use PopCnt internally.

IIRC it is CARD() not CARDS()?
At least Gnu Pascal implements it like that.

Yes, you're right, it's Card(). :-[

avk

  • Hero Member
  • *****
  • Posts: 826
Re: Count elements in set
« Reply #38 on: August 21, 2025, 10:46:39 am »
Finally, I added a generic GCard() function to compute the cardinality of an arbitrary set to the lgUtils unit (as a first approximation).

A quick and dirty benchmark for some set sizes, the code from post #27 was used as a reference implementation:
Code: Pascal  [Select][+][-]
  1. program gcard_bench;
  2. {$mode delphi}
  3. uses
  4.   SysUtils, lgUtils, StopWatch;
  5.  
  6.  
  7. function Card(const ASet; SizeOfSet: Cardinal): Integer;
  8. const
  9.   MaxSetSize = 32;
  10. type
  11.   TSetBytes = array [0..MaxSetSize-1] of byte;
  12. var
  13.   Bytes: TSetBytes absolute ASet;
  14.   i: Integer;
  15. begin
  16.   Assert(((SizeOfSet > 0) and (SizeOfSet <= MaxSetSize)),'Card: invalid value for SizeOfSet');
  17.   Result := 0;
  18.   for i := 0 to SizeOfSet-1 do Result := Result + PopCnt(Bytes[i]);
  19. end;
  20.  
  21. function GenRandomSet<TSet>: TSet;
  22. var
  23.   p: PByte;
  24.   I: Integer;
  25. begin
  26.   p := @Result;
  27.   for I := 0 to Pred(SizeOf(Result)) do
  28.     p[I] := Random(256);
  29. end;
  30.  
  31. const
  32.   TEST_SIZE = 1000000;
  33.   REP_COUNT = 10;
  34.  
  35. function GenData<TSet>: TArray<TSet>;
  36. var
  37.   I: Integer;
  38. begin
  39.   SetLength(Result, TEST_SIZE);
  40.   for I := 0 to High(Result) do
  41.     Result[I] := GenRandomSet<TSet>;
  42. end;
  43.  
  44. procedure Bench<TSet>;
  45. var
  46.   a: array of TSet;
  47.   I, J, K: Integer;
  48.   best: Double;
  49.   sw: TStopWatch;
  50. const
  51.   Inf = 1000000;
  52. begin
  53.   a := GenData<TSet>;
  54.   WriteLn('Set size: ', SizeOf(TSet));
  55.  
  56.   best := Inf;
  57.   for J := 1 to REP_COUNT do begin
  58.     sw := TStopWatch.StartNew;
  59.     for I := 0 to High(a) do
  60.       K := Card(a[I], SizeOf(TSet));
  61.     sw.Stop;
  62.     if sw.ElapsedMilliseconds < best then
  63.       best := sw.ElapsedMilliseconds;
  64.     Sleep(1);
  65.   end;
  66.   WriteLn('  Card:  ', best.ToString);
  67.  
  68.   best := Inf;
  69.   for J := 1 to REP_COUNT do begin
  70.     sw := TStopWatch.StartNew;
  71.     for I := 0 to High(a) do
  72.       K := GCard<TSet>(a[I]);
  73.     sw.Stop;
  74.     if sw.ElapsedMilliseconds < best then
  75.       best := sw.ElapsedMilliseconds;
  76.     Sleep(1);
  77.   end;
  78.   WriteLn('  GCard: ', best.ToString);
  79. end;
  80.  
  81. type
  82.   TSet1 = set of 0..7;
  83.   TSet2 = set of 0..31;
  84.   TSet3 = set of 0..55;
  85.   TSet4 = set of 0..119;
  86.   TSet5 = set of 0..183;
  87.   TSet6 = set of 0..247;
  88.  
  89. begin
  90.   Randomize;
  91.   Bench<TSet1>;
  92.   Bench<TSet2>;
  93.   Bench<TSet3>;
  94.   Bench<TSet4>;
  95.   Bench<TSet5>;
  96.   Bench<TSet6>;
  97. end.
  98.  

Results on my old Intel PC using FPC-3.2.2
  x86, default:
Code: Text  [Select][+][-]
  1. Set size: 1
  2.   Card:  3,9239
  3.   GCard: 3,492
  4. Set size: 4
  5.   Card:  10,5485
  6.   GCard: 3,2621
  7. Set size: 7
  8.   Card:  17,8451
  9.   GCard: 5,7214
  10. Set size: 15
  11.   Card:  43,5154
  12.   GCard: 9,5757
  13. Set size: 23
  14.   Card:  70,1043
  15.   GCard: 12,5587
  16. Set size: 31
  17.   Card:  96,3956
  18.   GCard: 15,6061  
  19.  

  x86, -CpCOREI:
Code: Text  [Select][+][-]
  1. Set size: 1
  2.   Card:  2,3765
  3.   GCard: 2,1752
  4. Set size: 4
  5.   Card:  4,4838
  6.   GCard: 2,0353
  7. Set size: 7
  8.   Card:  7,4558
  9.   GCard: 2,2124
  10. Set size: 15
  11.   Card:  15,8504
  12.   GCard: 2,8975
  13. Set size: 23
  14.   Card:  28,1081
  15.   GCard: 5,4356
  16. Set size: 31
  17.   Card:  38,7562
  18.   GCard: 4,7267  
  19.  

  x86_64, default:
Code: Text  [Select][+][-]
  1. Set size: 1
  2.   Card:  4,0573
  3.   GCard: 2,8238
  4. Set size: 4
  5.   Card:  9,3194
  6.   GCard: 2,6717
  7. Set size: 7
  8.   Card:  14,638
  9.   GCard: 3,1556
  10. Set size: 15
  11.   Card:  28,4771
  12.   GCard: 5,968
  13. Set size: 23
  14.   Card:  42,4791
  15.   GCard: 8,4285
  16. Set size: 31
  17.   Card:  56,4404
  18.   GCard: 10,9051
  19.  

  x86_64, -CpCOREI:
Code: Text  [Select][+][-]
  1. Set size: 1
  2.   Card:  2,0105
  3.   GCard: 0,8958
  4. Set size: 4
  5.   Card:  4,0216
  6.   GCard: 0,8657
  7. Set size: 7
  8.   Card:  6,3206
  9.   GCard: 1,5099
  10. Set size: 15
  11.   Card:  10,9119
  12.   GCard: 1,818
  13. Set size: 23
  14.   Card:  15,6164
  15.   GCard: 2,3253
  16. Set size: 31
  17.   Card:  20,5967
  18.   GCard: 2,9742
  19.  

jamie

  • Hero Member
  • *****
  • Posts: 7761
Re: Count elements in set
« Reply #39 on: August 22, 2025, 02:52:57 am »
My bigPopCnt.
I think this is faster than What I had before, seems to generate good code with -o4 level.

Code: Pascal  [Select][+][-]
  1. Function BigPopcnt(const aSet; aByteCount:Integer):Integer;
  2. Label Reloop,FirstEnter;
  3. Var
  4.  ByteCount:Integer=0;
  5.  BitCount:integer=0;
  6.  P:Pointer;
  7. Begin
  8.  P := @aSet;
  9.  Result:= 0;
  10.  Goto FirstEnter;
  11.  Reloop:
  12.    Inc(Result, BitCount);
  13.    Inc(P, ByteCount);
  14.    Dec(aByteCount,ByteCount);
  15.  FirstEnter:
  16.    Case aByteCount of
  17.     0:Exit;
  18.     4..7:Begin
  19.        BitCOunt := PopCnt(PDWord(P)^);
  20.        ByteCount := 4;
  21.        Goto Reloop;
  22.       End;
  23.     1:Begin
  24.        BitCount := PopCnt(PByte(P)^);
  25.        ByteCount := 1;
  26.        Goto Reloop;
  27.       End;
  28.     2..3:
  29.       Begin
  30.         BitCount := PopCnt(PWord(P)^);
  31.         ByteCount := 2;
  32.         Goto Reloop;
  33.       end;
  34.     8..255:Begin
  35.        BitCount := PopCnt(PQWord(P)^);
  36.        ByteCount := 8;
  37.        GOto Reloop;
  38.     end;
  39.   End;
  40. end;                                              
  41.  

But still, Compiler intrinsic can pick out the even based calls that already exist until it gets to large ones or unaligned types, then it can use some lard version.

Jamie


The only true wisdom is knowing you know nothing

 

TinyPortal © 2005-2018