Recent

Author Topic: Large or huge sets  (Read 827 times)

Thaddy

  • Hero Member
  • *****
  • Posts: 8681
Large or huge sets
« on: September 10, 2019, 10:15:11 am »
Large or huge sets are often requested. Just today I needed one. Wrote it in ten minutes, but maybe useful?
If the critics are not too harsh, I will add it to to the wiki.
Note it can be written *way* more efficient.
Code: Pascal  [Select]
  1. {$mode objfpc}{$inline on}
  2. { A simple way to implement large or huge sets.
  3.  (c) 2019 Thaddy de Koning. No license and not
  4.  re-licensable: free for all to use.
  5.  Copyright holds:
  6.  You can not claim license nor copyrigths }
  7. unit largesets;
  8. { Operator      Action
  9.   ---------------------------------------------
  10.   +             Union
  11.   -             Difference
  12.   *             Intersection
  13.   ><        Symmetric difference
  14.   <=        Contains
  15.   include   Include an element in the set
  16.   exclude   Exclude an element from the set
  17.   in        Check wether an element is in a set
  18. }
  19. interface
  20. uses
  21.   generics.collections;
  22.  
  23. type
  24.   TWordSet  = specialize TSortedHashSet<word >;
  25.   { The next two can cause some real memory pressure, be careful }
  26.   TDWordSet = specialize TSortedHashSet<dword>;
  27.   TQWordSet = specialize TSortedHashSet<Qword>;
  28.  
  29.   { union }
  30.   operator + (const a,b:TWordset ):TWordSet;
  31.   operator + (const a,b:TDWordset):TDWordSet;
  32.   operator + (const a,b:TQWordset):TQWordSet;
  33.  
  34.   { difference }
  35.   operator - (const a,b:Twordset ):TWordSet;
  36.   operator - (const a,b:TDwordset):TDWordSet;
  37.   operator - (const a,b:TQwordset):TQWordSet;
  38.  
  39.   { intersection }
  40.   operator * (const a,b:TwordSet ):TWordSet;
  41.   operator * (const a,b:TDwordSet):TDWordSet;
  42.   operator * (const a,b:TQwordSet):TQWordSet;
  43.  
  44.   { symmetric difference }
  45.   operator >< (const a,b:TWordSet ):TWordSet;
  46.   operator >< (const a,b:TDWordSet):TDWordSet;
  47.   operator >< (const a,b:TQWordSet):TQWordSet;
  48.  
  49.   { contains }
  50.   operator <= (const a,b:TWordSet ):Boolean;
  51.   operator <= (const a,b:TDWordSet):Boolean;
  52.   operator <= (const a,b:TQWordSet):Boolean;
  53.  
  54.   { in }
  55.   operator in (const a:word; const b:TWordset ):Boolean;
  56.   operator in (const a:dword;const b:TDWordset):Boolean;
  57.   operator in (const a:qword;const b:TQWordset):Boolean;
  58.  
  59.   { include }
  60.   procedure include(const a:TWordSet; const b:Word);
  61.   procedure include(const a:TDWordSet;const b:DWord);
  62.   procedure include(const a:TQWordSet;const b:QWord);
  63.  
  64.   { exclude }
  65.   procedure exclude(const a:TWordSet;const  b:Word);
  66.   procedure exclude(const a:TDWordSet;const b:DWord);
  67.   procedure exclude(const a:TQWordSet;const b:QWord);
  68.  
  69. implementation
  70.  
  71.   { union }
  72.   operator + (const a,b:TWordset):TWordSet;
  73.   begin  
  74.     b.UnionWith(a);
  75.     Result := b;
  76.   end;  
  77.  
  78.   operator + (const a,b:TDWordset):TDWordSet;
  79.   begin
  80.     b.UnionWith(a);
  81.     Result := b;
  82.   end;
  83.  
  84.   operator + (const a,b:TQWordset):TQWordSet;
  85.   begin
  86.     b.UnionWith(a);
  87.     Result := b;
  88.   end;
  89.  
  90.   { difference }
  91.   operator -(const a,b:Twordset):TWordSet;
  92.   begin
  93.     b.ExceptWith(a);
  94.     result:=b;
  95.   end;
  96.  
  97.   operator -(const a,b:TDwordset):TDWordSet;
  98.   begin
  99.     b.ExceptWith(a);
  100.     result:=b;
  101.   end;
  102.  
  103.   operator -(const a,b:TQwordset):TQWordSet;
  104.   begin
  105.     b.ExceptWith(a);
  106.     result:=b;
  107.   end;
  108.  
  109.   { intersection }
  110.   operator *(const a,b:TwordSet):TWordSet;
  111.   begin
  112.     b.IntersectWith(a);
  113.     Result := b;
  114.   end;
  115.  
  116.   operator *(const a,b:TDwordSet):TDWordSet;
  117.   begin
  118.     b.IntersectWith(a);
  119.     Result := b;
  120.   end;
  121.  
  122.   operator *(const a,b:TQwordSet):TQWordSet;
  123.   begin
  124.     b.IntersectWith(a);
  125.     Result := b;
  126.   end;
  127.  
  128.   { symmetric difference }
  129.   operator ><(const a,b:TWordSet):TWordSet;
  130.   begin
  131.     b.SymmetricExceptWith(a);
  132.     Result := b;
  133.   end;
  134.  
  135.   operator ><(const a,b:TDWordSet):TDWordSet;
  136.   begin
  137.     b.SymmetricExceptWith(a);
  138.     Result := b;
  139.   end;
  140.  
  141.   operator ><(const a,b:TQWordSet):TQWordSet;
  142.   begin
  143.     b.SymmetricExceptWith(a);
  144.     Result := b;
  145.   end;
  146.  
  147.   { contains }
  148.   operator <=(const a,b:TWordSet):Boolean;
  149.   var
  150.     c:word;
  151.   begin
  152.     Result := true;
  153.     for c in a do
  154.     if not b.contains(c) then
  155.     begin
  156.       Result := False;
  157.       break;
  158.     end;
  159.   end;
  160.  
  161.   operator <=(const a,b:TDWordSet):Boolean;
  162.   var
  163.     c:word;
  164.   begin
  165.     Result := true;
  166.     for c in a do
  167.     if not b.contains(c) then
  168.     begin
  169.       Result := False;
  170.       break;
  171.     end;
  172.   end;
  173.  
  174.   operator <=(const a,b:TQWordSet):Boolean;
  175.   var
  176.     c:word;
  177.   begin
  178.     Result := true;
  179.     for c in a do
  180.     if not b.contains(c) then
  181.     begin
  182.       Result := False;
  183.       break;
  184.     end;
  185.   end;
  186.  
  187.   { in }
  188.   operator in (const a:word;const b:TWordset):Boolean;
  189.   begin
  190.     Result := b.Contains(a);
  191.   end;  
  192.  
  193.   operator in (const a:dword;const b:TDWordset):Boolean;
  194.   begin
  195.     Result := b.Contains(a);
  196.   end;
  197.  
  198.   operator in (const a:qword;const b:TQWordset):Boolean;
  199.   begin
  200.     Result := b.Contains(a);
  201.   end;
  202.  
  203.   { include }
  204.   procedure include(const a:TWordSet;const b:Word);
  205.   begin
  206.     a.Add(b);
  207.   end;
  208.  
  209.   procedure include(const a:TDWordSet;const b:DWord);
  210.   begin
  211.     a.Add(b);
  212.   end;
  213.  
  214.   procedure include(const a:TQWordSet;const b:QWord);
  215.   begin
  216.     a.Add(b);
  217.   end;
  218.  
  219.   { exclude }
  220.   procedure exclude(const a:TWordSet;const b:Word);
  221.   begin
  222.     a.Remove(b);
  223.   end;
  224.  
  225.   procedure exclude(const a:TDWordSet;const b:DWord);
  226.   begin
  227.     a.Remove(b);
  228.   end;
  229.  
  230.   procedure exclude(const a:TQWordSet;const b:QWord);
  231.   begin
  232.     a.Remove(b);
  233.   end;
  234.  
  235. end.
Most people that want to use threading should learn to patch their jeans first: use a needle.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7362
Re: Large or huge sets
« Reply #1 on: September 10, 2019, 11:05:51 am »
The problem is that useset is the same as for TBits.

The problem with the set 256 limit is that you have a enum (as in, with identifier names) that suddenly grows beyond 256, and you have to redo your program.  Specially when the enum set tests are used in business code that hurts.

So try if your solution also works for a large enum.

And a demo to test (how would code look?) would be nice. E.g. will my IN operator tests remain the same?
« Last Edit: September 10, 2019, 11:11:20 am by marcov »

Ñuño_Martínez

  • Hero Member
  • *****
  • Posts: 914
    • Burdjia
Re: Large or huge sets
« Reply #2 on: September 10, 2019, 11:09:43 am »
Not sure but I think that some operators don't work as expected.  I mean, operator + shouldn't be like this?
Code: Pascal  [Select]
  1.  
  2. OPERATOR + (CONST a, b:TWordset): TWordSet;
  3. BEGIN
  4.   RESULT := TWordSet.Create;
  5.   RESULT.Copy (b);
  6.   RESULT.UnionWith (a)
  7. END;
  8.  

Note: Never used generics.collections and I didn't look for information about it, so my code would be so wrong... :-[

Also: I'm not a big fan of operators overloading using classes for obvious reasons (i.e: I expect that the result of a + b is a new object, not to modify one of the operands).
« Last Edit: September 10, 2019, 11:11:58 am by Ñuño_Martínez »
Are you interested in game programming? Join the Pascal Game Development community!
Also visit the Game Development Portal

Thaddy

  • Hero Member
  • *****
  • Posts: 8681
Re: Large or huge sets
« Reply #3 on: September 10, 2019, 11:14:08 am »
So try if your solution also works for a large enum.
That works.  Becase it avoids the size of enums.  work in progress.
Most people that want to use threading should learn to patch their jeans first: use a needle.

Thaddy

  • Hero Member
  • *****
  • Posts: 8681
Re: Large or huge sets
« Reply #4 on: September 10, 2019, 11:18:30 am »
Not sure but I think that some operators don't work as expected.  I mean, operator + shouldn't be like this?
Also: I'm not a big fan of operators overloading using classes for obvious reasons (i.e: I expect that the result of a + b is a new object, not to modify one of the operands).
You would be amazed. This code is not fastest, but satisfies most tests.
You are wrong about the new object: that would cause leaks, and I tested against leaks.

Anyway, it is rough and probably needs refinement.

Thanks for the response, both.
Most people that want to use threading should learn to patch their jeans first: use a needle.

Kays

  • Full Member
  • ***
  • Posts: 178
  • Whasup!?
    • KaiBurghardt.de
Re: Large or huge sets
« Reply #5 on: September 10, 2019, 12:06:08 pm »
Large or huge sets are often requested. […]
Meh, your code doesn’t support set of real. :P
Yours Sincerely
Kai Burghardt

Thaddy

  • Hero Member
  • *****
  • Posts: 8681
Re: Large or huge sets
« Reply #6 on: September 10, 2019, 12:20:14 pm »
Large or huge sets are often requested. […]
Meh, your code doesn’t support set of real. :P
No, because that is not mathematically a viable option for real sets.
The TSortedHashset supports that to some extend...But not like real sets, like in a Pascal set. My code does.
You need to have a certain amount of bit patterns, just like normal sets.
(I know it requires a small, just a small, bit of computer science...)
Furthermore, I never pretend to have infinite scale, just larger scale if you run out of 256....If you do not understand that plz refrain from taking part of the discussion,
I just got two very informed answers that really help.
« Last Edit: September 10, 2019, 12:28:02 pm by Thaddy »
Most people that want to use threading should learn to patch their jeans first: use a needle.

440bx

  • Hero Member
  • *****
  • Posts: 1087
Re: Large or huge sets
« Reply #7 on: September 10, 2019, 02:50:24 pm »
Large or huge sets are often requested. […]
Meh, your code doesn’t support set of real. :P
I presume that comment was tongue in cheek as only countable sets can be represented digitally.  The set of reals is not countable.

using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

jamie

  • Hero Member
  • *****
  • Posts: 1922
Re: Large or huge sets
« Reply #8 on: September 10, 2019, 04:01:17 pm »
There is one thing I can say about @Thaddy, good of bad, he knows how to write bloat!
 ;)

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7362
Re: Large or huge sets
« Reply #9 on: September 10, 2019, 04:09:39 pm »
Oops. I only read quickly initially, and it seems not to be a value type, meaning you must change business code (and streaming!). Not really an option for me. I wonder if changing the class to a record<>,  and then hacking till it works might do it. However the problem is dimensioning the static data field to the size of the enum.

Hmm. maybe abusing an ansistring/dynarray. Automated but many aspects like streaming and " IN " operator cost magnitude, remains the same
« Last Edit: September 10, 2019, 04:42:30 pm by marcov »

Thaddy

  • Hero Member
  • *****
  • Posts: 8681
Re: Large or huge sets
« Reply #10 on: September 10, 2019, 04:31:28 pm »
There is one thing I can say about @Thaddy, good of bad, he knows how to write bloat!
 ;)
Do it better. You can't. In 10 minutes. Clock is ticking....
( I am  one of the few that actually produce working code all the time..with compilable code....)
« Last Edit: September 10, 2019, 05:24:30 pm by Thaddy »
Most people that want to use threading should learn to patch their jeans first: use a needle.

howardpc

  • Hero Member
  • *****
  • Posts: 3102
Re: Large or huge sets
« Reply #11 on: September 10, 2019, 06:28:30 pm »
Thanks, Thaddy for sharing useful functionality.
My preference for "contains" (rather than via the slightly cryptic operator <=) is for a named function.

I think it also warrants a count function, which is what I miss most in the basic Pascal set implementation.

I would implement these as follows:
Code: Pascal  [Select]
  1.   function SetAContainsSetB(const a, b: TWordSet): Boolean;
  2.   var
  3.     c: Word;
  4.   begin
  5.     for c in a do
  6.       if not b.Contains(c) then
  7.         Exit(False);
  8.     Result := True;
  9.   end;
  10.  
  11.   function GetCount(const aSet: TWordSet): SizeInt;
  12.   begin
  13.     Exit(aSet.Count);
  14.   end;

And corresponding  code for DWord and QWord if you need such enormous data structures.
If you do need such things, you are probably better off with a database anyway. But all things have their niche uses and appeal.
« Last Edit: September 10, 2019, 06:37:42 pm by howardpc »

Thaddy

  • Hero Member
  • *****
  • Posts: 8681
Re: Large or huge sets
« Reply #12 on: September 10, 2019, 07:12:37 pm »
The slightly cryptic <= is a part of long standing Pascal set syntax, Howard  ;D ....
But I will try to improve.... (I know I could do it with bitmasks, but that is not the point here)
Most people that want to use threading should learn to patch their jeans first: use a needle.

howardpc

  • Hero Member
  • *****
  • Posts: 3102
Re: Large or huge sets
« Reply #13 on: September 10, 2019, 07:34:38 pm »
Well I am aware of that syntax. I was merely expressing my preference for a more verbose alternative.
To those of a mathematical bent <= is no doubt transparent and self-explanatory when used as a set operator.
Lesser mortals like myself prefer wordy self-declaring function names, a luxury which is also Pascalish.
FPC provides also for those who prefer more C-like syntax, so we can have the best of both worlds.

Kays

  • Full Member
  • ***
  • Posts: 178
  • Whasup!?
    • KaiBurghardt.de
Re: Large or huge sets
« Reply #14 on: September 10, 2019, 08:47:34 pm »
[…] Meh, your code doesn’t support set of real. :P
No, because that is not mathematically a viable option for real sets. […]
[…] I presume that comment was tongue in cheek as only countable sets can be represented digitally.  The set of reals is not countable.
Yes, it was tongue in cheek. Sorry, if that didn’t get through with the smiley. Of course I know there’s no bijective mapping ℝ → ℕ, so there’s no way to store 2 in a discrete manner, and bla, bla, bla. (Obviously I have *only* good memories about the math lectures. ;))
Yours Sincerely
Kai Burghardt