Recent

Author Topic: fyi benchmarks game k-nucleotide  (Read 16059 times)

igouy

  • New Member
  • *
  • Posts: 25
fyi benchmarks game k-nucleotide
« on: May 18, 2016, 01:55:09 am »
The benchmarks game k-nucleotide task now explicitly requires use of a built-in / library HashMap, here's the new description -

http://benchmarksgame.alioth.debian.org/u64q/knucleotide-description.html#knucleotide


So this old Free Pascal program will no longer be shown -

http://benchmarksgame.alioth.debian.org/u64q/program.php?test=knucleotide&lang=fpascal&id=2


If any one is interested in contributing a program that used the library hash table implementation then here are some instructions -

http://benchmarksgame.alioth.debian.org/play.html

AlexK

  • Jr. Member
  • **
  • Posts: 55
Re: fyi benchmarks game k-nucleotide
« Reply #1 on: May 29, 2016, 08:46:34 pm »
I'm implementing in FreePascal same knucleotide algorithm as in C++ version by Branimir Maksimovic.

I use ghashmap unit from {fpc_source}/packages/fcl-stl/src/.
That library was contributed by Vlado Boza in 2011. It has one problem though.

The algorithm in C++ uses custom class as a hash key, it requires overloading of "==" comparison operator for that key class to be set/get in a hash table(unique keys).
This is the only comparison operator in FreePascal that can't be overloaded for classes.
And ghashmap uses it, "=" for hash keys:

Code: Pascal  [Select][+][-]
  1. if ((FData[h])[i]).Key=key then exit(true);

For the algorithm I changed all such comparisons in ghashmap library unit to:

Code: Pascal  [Select][+][-]
  1. if ((FData[h])[i]).Key.equals(key) then exit(true);

The hash key custom class must provide equals method.
But in this case it doesn't support simple ordinal types for keys of hash. Dilemma.
« Last Edit: May 29, 2016, 09:07:45 pm by AlexK »

AlexK

  • Jr. Member
  • **
  • Posts: 55
Re: fyi benchmarks game k-nucleotide
« Reply #2 on: May 30, 2016, 12:10:12 am »
Even this does not work:

Code: Pascal  [Select][+][-]
  1. {$mode objfpc}
  2.  
  3. type
  4.    hashkey = record
  5.       data : qword; size : byte;
  6.       class operator =(lkey, rkey : hashkey) : boolean;
  7.    end;
  8.  
  9. class operator hashkey.=(lkey, rkey : hashkey) : boolean;
  10. begin
  11.    result := lkey.data = rkey.data;
  12. end;
  13.  
  14. begin  
  15.   // Code here  
  16. end.

test.pp(6,7) Fatal: Syntax error, "END" expected but "CLASS" found

But this page says it should work: http://www.freepascal.org/docs-html/current/ref/refse56.html

This renders FPC useless in many cases.

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: fyi benchmarks game k-nucleotide
« Reply #3 on: May 30, 2016, 12:16:29 am »
Sure it works:
Code: [Select]
program test;

{$mode objfpc}{$MODESWITCH ADVANCEDRECORDS}


type
   hashkey = record
      data : qword; size : byte;
      class operator =(lkey, rkey : hashkey) : boolean;
   end;

class operator hashkey.=(lkey, rkey : hashkey) : boolean;
begin
   result := lkey.data = rkey.data;
end;

begin 
  // Code here 
end.

That has nothing to do with operator overloading, rather with using advanced records.

As operator overloading works even without defining any mode at all (actually it defaults to mode fpc):
Code: [Select]
program LookNoModeDelphiOrModeFPC;
type 
  complex = record 
    re : real; 
    im : real; 
  end;

Var 
  C,Z : Complex; // New type complex 
  R   : Real;

operator := (r : real) z : complex; 
begin 
  z.re := r; 
  z.im := 0.0; 
end;

begin 
  Z := C;  // assignments between complex types. 
  C := R;
end.

You should be reading this and apply to your situation.
« Last Edit: May 30, 2016, 12:38:07 am by molly »

AlexK

  • Jr. Member
  • **
  • Posts: 55
Re: fyi benchmarks game k-nucleotide
« Reply #4 on: May 30, 2016, 01:05:51 am »

As operator overloading works even without defining any mode at all (actually it defaults to mode fpc):

You should be reading this and apply to your situation.

Nothing's mentioned there(official docs) about use cases for {$modeswitch advancedrecords} in a context of operator overloading.
You should've pointed to a wiki page: http://wiki.freepascal.org/Record
If one never used Delphi, he would've been clueless about such feature(operator overloading in advanced records).

"advancedrecords" works. It helped.

Code: Pascal  [Select][+][-]
  1. type  
  2.   complex = record  
  3.     re : real;  
  4.     im : real;  
  5.   end;
  6.  
  7. operator := (r : real) z : complex;  
  8. begin  
  9.   z.re := r;  
  10.   z.im := 0.0;  
  11. end;

I suppose this form of operator overloading is visible only in a unit it was defined in.
Therefore it's not visible to an external library unit, and compiler says "operator not overloaded" when compiling a library generic type specialized with user's record.

If a record is passed to a library, that record must contain operator overloading inside its definition.
So, in this case only {$modeswitch advancedrecords} works.
« Last Edit: May 30, 2016, 01:27:41 am by AlexK »

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: fyi benchmarks game k-nucleotide
« Reply #5 on: May 30, 2016, 01:35:54 am »
Quote
Nothing's mentioned there(official docs) about use cases for {$modeswitch advancedrecords} in a context of operator overloading.
_You_ are the one using advanced records, ergo you need the modeswitch. Don't use advanced records and you won't need the switch. Simple as that, and as explained in the documentation about generics that you linked to.

Quote
If one never used Delphi, he would be clueless about such feature(operator overloading in advanced records).
Pardon but you was the one linking to generics documentation, not advanced records. Please do not confuse the two as they are a different kind of breed.

The advanced record as shown in that documentation you linked to only serves as a _type_ for which the generics class is specialized to.

Quote
I suppose this form of operator overloading is visible only in a unit it was defined in.

Therefore it's not visible to an external library unit, and compiler says "operator not overloaded" when compiling that library unit .
Are you honestly saying that you have no idea how to declare functions and procedures in the interface section of a unit and let your other unit/program uses that unit so that unit can use those functions and procedures ?

Guess what: that also works for operators ;-)

Quote
If a record is passed to a library, that record must contain operator overloading inside its definition. So only advancedrecords works in this case.
Also wrong. Only if you want it to.

You simply do not have to if you use classes *period*

And yes, you can use class/records operators wherever you want (edit: might even be preferred) but afaik never required as you currently seem to think.

edit:
the reason i pointed you to generic operator overloading is that it seems the only documentation available about operator overloading, e.g. there does not seem specific documentation on using class operators and/or record operators. i was even unable to find anything on advanced records inside the online manual (but perhaps i overlooked). Thus, read what i linked to and apply for your situation. If that means you want to use record operators and/or class operators then simply use that :-)

edit2:
Oh, i might have been using some wrong wording there (or at least might work confusing). Yes, you need a record in case you want to enclose the operator declaration inside your class.
« Last Edit: May 30, 2016, 02:32:41 am by molly »

AlexK

  • Jr. Member
  • **
  • Posts: 55
Re: fyi benchmarks game k-nucleotide
« Reply #6 on: May 30, 2016, 02:36:05 am »
Thanks for {$modeswitch advancedrecords} hint, it's the only option that suits this algorithm.

FPC does not allow overloading of equality operator for classes because it's already overloaded internally as a comparison of pointers to instances(my conclusion after reading some threads).
I decided to use a record instead of a class as a hash key.

Top level operator overloading is visible only in a compilation unit it was defined in. Operator overloading doesn't become a part of a record's type.
If you specialize some library generic class by that record, then operator overloading for the record is ignored in methods of the specialized class.

{$modeswitch advancedrecords} makes operator overloading a part of a record's type.
If you specialize some library generic class by advanced record, then operator overloading for the record will be used in methods of the specialized class.

Here, TSomeHashMapImpl is from some library unit, and TSomeHashMapImpl methods will use operator overloading for the hashkey record, because operator overloading of an advanced record is a part its type:

Code: Pascal  [Select][+][-]
  1. {$modeswitch advancedrecords}
  2.  
  3. type
  4.    hashkey = record
  5.       data : qword; size : byte;
  6.       class operator = (lkey, rkey : hashkey) : boolean;
  7.    end;
  8.    hashtable = specialize TSomeHashMapImpl<hashkey>;
  9.  
  10. // TSomeHashMapImpl methods will use this overload
  11. class operator hashkey.= (lkey, rkey : hashkey) : boolean;
  12. begin
  13.    result := lkey.data = rkey.data;
  14. end;

In the next code snippet TSomeHashMapImpl methods will ignore operator overloading. Error: Operator is not overloaded: "hashkey" = "hashkey":

Code: Pascal  [Select][+][-]
  1. type
  2.    hashkey = record
  3.       data : qword; size : byte;
  4.    end;
  5.    hashtable = specialize TSomeHashMapImpl<hashkey>;
  6.  
  7. // this will be used only in the current compilation unit; TSomeHashMapImpl methods do not see this
  8. operator = (lkey, rkey : hashkey) : boolean;
  9. begin
  10.     result := lkey.data = rkey.data;
  11. end;
« Last Edit: May 30, 2016, 03:01:58 am by AlexK »

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: fyi benchmarks game k-nucleotide
« Reply #7 on: May 30, 2016, 03:05:33 am »
Quote
Here, TSomeHashMapImpl is from some library unit, and TSomeHashMapImpl methods will use operator overloading for the hashkey record, because operator overloading of an advanced record is a part its type:
Yes, that is correct AlexK. I f you want to use it the way you showed then that is perfectly ok.

You are also correct about the overloading of the = or <> operators, indeed this seems impossible to do globally (the infamous "impossble operator overload" error).

Ergo, in the last part that i correct you , _i_ was wrong and not you. my apologies for that.  :-[

The reason you get into problems with the last snippet you posted is because you defined a new generic using the record type. If you'd declare the hashkey and operator in the right order it should be no problem afaik.
Code: [Select]
type
   hashkey = record
      data : qword; size : byte;
   end;

   operator = (lkey, rkey : hashkey) : boolean;

Type
  hashtable = specialize TSomeHashMapImpl<hashkey>;

Please strike that last snippet/remark. It works for me either way.

main:
Code: [Select]
program testunits;

uses
  unit1;

var
  G1, G2: Ghashtable;

begin
  G1 := GhashTable.Create;
  G2 := GhashTable.Create;

  If G1.value = G2.value then
  begin
    WriteLn('are same');
  end;
 
  G1.Free;
  G2.Free;
end.
unit1:
Code: [Select]
unit unit1;

{$MODE OBJFPC}{$H+}

interface

uses
  unit2;

type
   Thashkey = record
     data : qword;
     size : byte;
   end;

Type
  Ghashtable = specialize TSomeHashMapImpl<Thashkey>;

  operator = (lkey, rkey : Thashkey) : boolean;

implementation

operator = (lkey, rkey : Thashkey) : boolean;
begin
  result := lkey.data = rkey.data;
end;

end.
unit2:
Code: [Select]
unit unit2;

{$MODE OBJFPC}{$H+}

interface

type
  generic TSomeHashMapImpl<T> = class(TObject)
   public
    Value : T;
  end;

implementation

end.
« Last Edit: May 30, 2016, 03:36:41 am by molly »

AlexK

  • Jr. Member
  • **
  • Posts: 55
Re: fyi benchmarks game k-nucleotide
« Reply #8 on: May 30, 2016, 03:39:50 am »
The reason you get into problems with the last snippet you posted is because you defined a new generic using the record type. If you'd declare the hashkey and operator in the right order it should be no problem afaik.
Code: Pascal  [Select][+][-]
  1. type
  2.    hashkey = record
  3.       data : qword; size : byte;
  4.    end;
  5.  
  6.    operator = (lkey, rkey : hashkey) : boolean;
  7.  
  8. Type
  9.   hashtable = specialize TSomeHashMapImpl<hashkey>;
  10.  

That won't compile.
Hypothetically it could be implemented in FPC as some kind of forward declaration for an operator overload, but it's not how it should be IMO.

{$modeswitch advancedrecords} is the only option. Extended record type.

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: fyi benchmarks game k-nucleotide
« Reply #9 on: May 30, 2016, 03:47:31 am »
it compiles perfectly fine here, and the forwarding works as advertised for me.

I'm not really sure what's buggin' your code.

PS: but one tends to get that with pasting half arsed snippets. The error "Error: Operator is not overloaded: "hashkey" = "hashkey":" can be solved by including your haskey definition unit in the unit with the generics declaration. So it would have been very informative telling that you are using (or at least wanted to use) the same operator inside your generics class again.
« Last Edit: May 30, 2016, 05:06:57 am by molly »

AlexK

  • Jr. Member
  • **
  • Posts: 55
Re: fyi benchmarks game k-nucleotide
« Reply #10 on: May 30, 2016, 09:28:26 am »
The difficulty of ghashmap unit consists in the fact that it does not provide default hash functor, user has to implement it.
Yet it's a way to optimize hashtables for specific use cases.

I just tested a primitive case statement as a hash function for small sequences(write_frequencies), and it generates the same output as a C++ version.
For long sequences like "GGTATT" and "GGTATTTTAATT" a real hash function must be implemented. That's a completely different story.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11450
  • FPC developer.
Re: fyi benchmarks game k-nucleotide
« Reply #11 on: May 30, 2016, 02:40:38 pm »

Nothing's mentioned there(official docs) about use cases for {$modeswitch advancedrecords} in a context of operator overloading.

If you don't want to know these kind of things, simply use {$mode delphi}

AlexK

  • Jr. Member
  • **
  • Posts: 55
Re: fyi benchmarks game k-nucleotide
« Reply #12 on: May 30, 2016, 06:11:17 pm »
it compiles perfectly fine here, and the forwarding works as advertised for me.

You helped by pointing to {$modeswitch advancedrecords}. That worked and I moved forward.
I'm sure that advancedrecords is the only way for external library units to use overloaded operators on records defined in your code, because advancedrecords contain operator overloading in their type definition, not as a top level unit definition.

I'm not really sure what's buggin' your code.

"Geometry".  ( https://www.youtube.com/watch?v=BKorP55Aqvg )

PS: but one tends to get that with pasting half arsed snippets. The error "Error: Operator is not overloaded: "hashkey" = "hashkey":" can be solved by including your haskey definition unit in the unit with the generics declaration. So it would have been very informative telling that you are using (or at least wanted to use) the same operator inside your generics class again.

I implemented a FreePascal version of the fastest(C++) knucleotide algorithm for The Computer Language Benchmarks Game.
The only problem left to solve, is to create a real hash function for ghashmap from FCL. Then multithreading maybe.
I used this func in place of the real hash function temporarily:

Code: Pascal  [Select][+][-]
  1. class function hashfunctor.hash(key : hashkey; b : sizeUint) : sizeUint;
  2. begin
  3.    case key.data of
  4.       0..2: result := 0;
  5.       3..5: result := 1;
  6.       6..8: result := 2;
  7.       9..11: result := 3;
  8.       12..14: result := 4;
  9.       15..255: result := 5;
  10.       256..65536: result := 6;
  11.    else
  12.       result := 7;
  13.    end;
  14. end;

It returns an index of a bucket in hashmap.
« Last Edit: May 30, 2016, 06:46:56 pm by AlexK »

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: fyi benchmarks game k-nucleotide
« Reply #13 on: May 31, 2016, 04:59:31 am »
First of all congratz on your progress !  :)

...
I'm sure that advancedrecords is the only way for external library units to use overloaded operators on records defined in your code, because advancedrecords contain operator overloading in their type definition, not as a top level unit definition.
You seem as much sure as i believe you're not entirely correct on that ;-)

Since you moved one, it is of no importance anymore but, i did not wanted to let this go without showing a possibility for the casual reader. Note that i don't recommend using it, only here to show it is possible (and it might not work for your specific implementation). Of course the generic itself as well as rest of the code contains nonsense and only serves as example.

Main program, includes p1unit
Code: [Select]
program p1testunits;

uses
  p1unit1;

var
  G1, G2  : Ghashtable;
  Hashish : Array[0..1] of THashKey =
  (
   ( Data : 1234 ; Size: 1),
   ( Data : 54321; Size: 1)
  );
  SamSam  : array[boolean] of string = ('differ', 'same');

begin
  G1 := GhashTable.Create;
  G2 := GhashTable.Create;


  G1.Value := Hashish[0];
  G2.Value := Hashish[0];
  WriteLn('G1.Value.Data = ', G1.Value.Data);
  WriteLn('G1.Value.Size = ', G1.Value.Size);
  WriteLn('G2.Value.Data = ', G2.Value.Data);
  WriteLn('G2.Value.Size = ', G2.Value.Size);
  WriteLn('G1 and G2 ', SamSam[G1.value = G2.value]);

  WriteLn;

  G2.value := Hashish[1];
  WriteLn('G1.Value.Data = ', G1.Value.Data);
  WriteLn('G1.Value.Size = ', G1.Value.Size);
  WriteLn('G2.Value.Data = ', G2.Value.Data);
  WriteLn('G2.Value.Size = ', G2.Value.Size);
  WriteLn('G1 and G2 ', SamSam[G1.value = G2.value]);

  WriteLn;

  WriteLn(G1.TestSomeThing(G1.Value));
  WriteLn(G1.TestSomeThing(G2.Value));
  WriteLn(G1.TestSomeThing(Hashish[0]));
  WriteLn(G1.TestSomeThing(Hashish[1]));


  G1.Free;
  G2.Free;
end.

Unit 1: includes hash record, specialization of the generic and the operator.
Code: [Select]
unit p1unit1;

{$MODE OBJFPC}{$H+}

interface

uses
  p1unit2;

type
   Thashkey = record
     data : qword;
     size : byte;
   end;

  operator   = (lkey, rkey : Thashkey) : boolean;

type
  Ghashtable = specialize TSomeHashMapImpl<Thashkey>;

implementation

operator = (lkey, rkey : Thashkey) : boolean;
begin
  result := lkey.data = rkey.data;
end;

end.

Unit 2: the generic declaration. Note how unit 1 is included in the implementation section in order to use the correct operator (inside the class itself).
Code: [Select]
unit p1unit2;

{$MODE OBJFPC}{$H+}

interface

type
  generic TSomeHashMapImpl<T> = class(TObject)
   private
    FValue : T;
   public
    constructor Create;
    function TestSomething(SomeHash: T): string;
   public
    property value: T read FValue Write FValue;
  end;

implementation

uses
  p1unit1;

constructor TSomeHashMapImpl.Create;
begin
  FValue := default(T);
end;

function TSomeHashMapImpl.TestSomething(SomeHash: T): string;
const Fn = 'TSomeHashMapImpl.TestSomething ';
begin
  if (FValue = SomeHash)
  then Result := Fn + 'determined that passed hash is same as internal stored hasvalue'
  else Result := Fn + 'determined that passed hash differs from internal stored hasvalue';
end;

end.

AlexK

  • Jr. Member
  • **
  • Posts: 55
Re: fyi benchmarks game k-nucleotide
« Reply #14 on: May 31, 2016, 06:16:58 am »
...
I'm sure that advancedrecords is the only way for external library units to use overloaded operators on records defined in your code, because advancedrecords contain operator overloading in their type definition, not as a top level unit definition.
You seem as much sure as i believe you're not entirely correct on that ;-)
...
Unit 2: the generic declaration. Note how unit 1 is included in the implementation section in order to use the correct operator (inside the class itself).

That is supposed to be a library unit with a generic type. A library doesn't know about your unit. You are using that library, not library uses your unit.
I explained it in initial comments.

You can only specialize library generic with your type in your unit. For operator overloading to work, it must be a part of the type you specialize library generic with, not a part of a unit's interface.

 

TinyPortal © 2005-2018