Forum > Suggestions

Proposal: Associative String Arrays

<< < (3/4) > >>

Stefan Glienke:

--- Quote from: marcov on May 11, 2023, 04:35:57 pm ---afaik generic.collections has special cuckoo2 hash for string content

--- End quote ---
By default it uses crc32 or xxhash32 (see initialization of Generics.Hashes)

440bx:

--- Quote from: PascalDragon on May 11, 2023, 10:05:46 pm ---Classes like Generics.Collections.TDictionary<,> and FGL.TFPGMap<,> already fullfill this purpose and work today and it's much easier to change the implementation than with a compiler-magic type. So, no, I'm against something like this. There are much more important things that need to be implemented that can't be easily done with existing functionality.

--- End quote ---
I have to concur.

When it comes to the implementation, critical considerations are, among others, the number of elements the array will host and the hashing algorithm used. 

In particular, a hashing algorithm that produces good results in one case can easily produce dismal ones in other cases and there is no way for a compiler to "divine" which algorithm it should use to ensure good performance.  The programmer is the one who should make the choice based on the nature of the data, not the compiler.

Interpreters can get away with this kind of inefficiencies because no one can expect an interpreter to rival the speeds obtained by a compiler (at least in most cases.)

I completely agree that hash tables, which is all an associative array is, should be implemented externally.

Warfley:

--- Quote from: 440bx on May 12, 2023, 02:01:01 am ---In particular, a hashing algorithm that produces good results in one case can easily produce dismal ones in other cases and there is no way for a compiler to "divine" which algorithm it should use to ensure good performance.  The programmer is the one who should make the choice based on the nature of the data, not the compiler.

Interpreters can get away with this kind of inefficiencies because no one can expect an interpreter to rival the speeds obtained by a compiler (at least in most cases.)

I completely agree that hash tables, which is all an associative array is, should be implemented externally.

--- End quote ---

One does not exclude the other. Having a solution that is easy to use and works "good enough" most of the time is still an improvement over something that is more complex to use but configurable for every Situation.

Take for example generics.collection maps, they use a runtime comparer using runtime polymorphism to handle the data. That's kinda inefficient, to always do VMT lookups for a simple comparison. A compiletime comparer like the generic parameter of the ghashmaps THashmap does not have any indirection and can even be inclined, making it much more efficient. But using gHashMap is much more complex then TDictionary, I often use TDictionary, as long as it's fast enough. If I run into problems I can change the hasmap implementation afterwards.

Similarly looking at the already existing dynamic arrays and strings, their lazy copy mechanism and reference counting is sub optimal in many cases where you don't pass the array around. Also insertion can be quite slow, due to linear growth. So if you need high performance it can be better to use TList instead (I even had at one point wrote my own array implementation with raw pointers due to performance). This does not mean that arrays are useless as a an inbuilt language construct, just that in some cases you may need to manually choose a more complex alternative.

If we would follow such a philosophy consequently, we would end up with something like C where you always have to do everything yourself from scratch, and ever program begins with you writing your own array implementation (I've spent uncountable hours implementing arrays manually in C for different projects, it's always the same for every new program I write), because c does not provide anything higher than raw memory management and basic control flow

BeniBela:

--- Quote from: Warfley on May 12, 2023, 10:44:03 am ---Take for example generics.collection maps, they use a runtime comparer using runtime polymorphism to handle the data. That's kinda inefficient, to always do VMT lookups for a simple comparison. A compiletime comparer like the generic parameter of the ghashmaps THashmap does not have any indirection and can even be inclined, making it much more efficient. But using gHashMap is much more complex then TDictionary, I often use TDictionary, as long as it's fast enough. If I run into problems I can change the hasmap implementation afterwards.

Similarly looking at the already existing dynamic arrays and strings, their lazy copy mechanism and reference counting is sub optimal in many cases where you don't pass the array around. Also insertion can be quite slow, due to linear growth. So if you need high performance it can be better to use TList instead (I even had at one point wrote my own array implementation with raw pointers due to performance). This does not mean that arrays are useless as a an inbuilt language construct, just that in some cases you may need to manually choose a more complex alternative.


--- End quote ---

I write my own arrays and hashmaps in Pascal, too.

I was just debugging my hashmap a few minute agos, after trying to change the stored types, and having forgotten how to change them.

The problem is that FPC does dynamic dispatch even for build in types like strings. Even if it knows the type at compile time, all memory management goes through functions that use RTTI to determine the type at runtime: https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/383

So I have a triple leveled hashmap. First a fully generic hash map. Then I specialize it with pointer, because pointer avoid that memory management.  Then I put a generic hash map over it, that cast everything to pointer and then puts it in the specialized map.

Like when I have a string, it is faster to keep the string in a pointer variable and call fpc_AnsiStr_Decr_Ref on it to free it, rather than relying on the automated freeing of fpc

jcmontherock:
In that case I personally use TStringList with names-values.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version