Recent

Author Topic: Faster hash tables  (Read 671 times)

cpicanco

  • Hero Member
  • *****
  • Posts: 661
  • Behavioral Scientist and Programmer
    • Portfolio
Faster hash tables
« on: February 14, 2025, 02:31:06 am »
Hi everyone, I just came across this news. Anyone working on something like that feels comfortable to comment upon it? I am just curious about your thoughts on the impact of this new state of the art hash tables.

https://www.quantamagazine.org/undergraduate-upends-a-40-year-old-data-science-conjecture-20250210/

Here is the paper: https://arxiv.org/abs/2501.02305
Be mindful and excellent with each other.
https://github.com/cpicanco/

440bx

  • Hero Member
  • *****
  • Posts: 5080
Re: Faster hash tables
« Reply #1 on: February 14, 2025, 04:41:36 am »
Hi everyone, I just came across this news. Anyone working on something like that feels comfortable to comment upon it? I am just curious about your thoughts on the impact of this new state of the art hash tables.
First, my comfort level isn't quite what I'd like it to be since I spent all of about 15 minutes speed reading the paper.  To make it clear: I'm far from smart enough to understand all the mathematical nuances in such a short amount of time.

With that out of the way, my opinion is that the state of the art will have advanced when real algorithms have demonstrated having those mathematical properties.  IOW, it's one thing to prove mathematically that an algorithm with specific properties will perform within proven bounds and quite another to have a _real_ algorithm that has those properties.

All that said, it is very useful to know that, at least in theory, it is conceivable to have an algorithm find keys in O(1) even in a table that is very close to 100% full.  Having an implementation that behaves that way would be genuinely impressive.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

TRon

  • Hero Member
  • *****
  • Posts: 4157
Re: Faster hash tables
« Reply #2 on: February 14, 2025, 05:09:08 am »
Could have sworn I watched a video from his professor but am unable to find it. Nevermind though, here he explains himself.
Today is tomorrow's yesterday.

440bx

  • Hero Member
  • *****
  • Posts: 5080
Re: Faster hash tables
« Reply #3 on: February 14, 2025, 06:35:46 am »
Thank you @TRon.  The presentation really shed light on the math and the concepts.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 16655
  • Kallstadt seems a good place to evict Trump to.
Re: Faster hash tables
« Reply #4 on: February 14, 2025, 06:51:43 am »
comment tempory on hold because I think i can improve.
« Last Edit: February 14, 2025, 07:29:12 am by Thaddy »
But I am sure they don't want the Trumps back...

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12118
  • FPC developer.
Re: Faster hash tables
« Reply #5 on: February 14, 2025, 10:14:07 am »
I'm not a CS scholar, but with my limited datastructures knowledge, I notice two things from reading the blurb

- no cache effects discussed. Cache locality is a big thing in datastructures nowadays.
- they are talking about very full hashtables. Is this relevant for how most software hashtables work, or is this more important for hardware assisted implementations with limited space? Improving the worst case doesn't always improve the average case, and maybe it can be simply circumvent by periodic enlargement.

I do still have to watch the video later though.
« Last Edit: February 14, 2025, 10:21:03 am by marcov »

BildatBoffin

  • New Member
  • *
  • Posts: 33
Re: Faster hash tables
« Reply #6 on: February 14, 2025, 10:30:26 am »
This is about an enhancement of probing-based hash tables. Basically you have two possible designs for a hash table

1. the hash gives you the entry point toward a "bucket", i.e a linked list of entries. But a bucket is a separate vector.
2. the hash gives you the entry point in a vector from where you start searching ("probing") for the thing (until you reach an empty slot, which means "thing is not there")

The way 2 is simple but known to be slower than the way 1. So different techniques exist to make it better than the naive algorithm (aka "linear probing"). to describe a few:

1. for insertion always put the new item first and move the previous one (if any) instead. The science for that is non-existent. The idea is simply that "last added is likely what is required"
2. double-hash. instead of using the first hash as base for the entry point, use a second hash.

A new technique seems to be tiny pointers...


« Last Edit: February 14, 2025, 10:32:06 am by BildatBoffin »

Thaddy

  • Hero Member
  • *****
  • Posts: 16655
  • Kallstadt seems a good place to evict Trump to.
Re: Faster hash tables
« Reply #7 on: February 14, 2025, 11:35:31 am »
I do still have to watch the video later though.
I suggest you do.
That's why I put my initial comment on hold.
He presents something "newish" that i can't directly place under current closed hashing theory.
But I am sure they don't want the Trumps back...

 

TinyPortal © 2005-2018