Recent

Author Topic: Just numerical MD5 or something else.  (Read 534 times)

loaded

  • Hero Member
  • *****
  • Posts: 876
Just numerical MD5 or something else.
« on: October 16, 2025, 01:33:18 pm »
Hi All,
I have a text like Konya-Selçuklu-Akıncılar-101-2 When I convert it to MD5, I get 4abd8df9fadd3ddd19a66ffcd853686e What I actually want is to generate a unique number for the sample text above. Is this possible?
The more memory computers have, the less memory people seem to use. 😅

Zvoni

  • Hero Member
  • *****
  • Posts: 3135
Re: Just numerical MD5 or something else.
« Reply #1 on: October 16, 2025, 02:02:47 pm »
Hi All,
I have a text like Konya-Selçuklu-Akıncılar-101-2 When I convert it to MD5, I get 4abd8df9fadd3ddd19a66ffcd853686e What I actually want is to generate a unique number for the sample text above. Is this possible?
Why?

Rewriting 4abd8df9fadd3ddd19a66ffcd853686e to
4a bd 8d f9 fa dd 3d dd 19 a6 6f fc d8 53 68 6e
and seeing that "f" is the highest single "Value" i wouldn't be surprised that this is Hex, which IS a number, with a high chance to being unique for your environment
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

Thaddy

  • Hero Member
  • *****
  • Posts: 18306
  • Here stood a man who saw the Elbe and jumped it.
Re: Just numerical MD5 or something else.
« Reply #2 on: October 16, 2025, 02:26:59 pm »
MD5 is simply a 128bit number e.g. $4abd8df9fadd3ddd19a66ffcd853686e is a valid Pascal notation if Pascal would know a 128 bit native OWORD.
You can represent it as a packed record of 2 QWORD and/or 4 DWORD.
« Last Edit: October 16, 2025, 02:31:42 pm by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

loaded

  • Hero Member
  • *****
  • Posts: 876
Re: Just numerical MD5 or something else.
« Reply #3 on: October 16, 2025, 02:36:55 pm »
Why?

Thank you so much for your answer, Zvoni.
I have a dataset containing only provinces, districts, neighborhoods, attribute 1, and attribute 2. I want to save these to the database with a unique key and also verify the relevance of the unique key to the data.

Yes, I can use MD5 numerically. I just wanted to know if it's simpler or if anyone has done this before.

MD5 is simply a 128bit number e.g. $4abd8df9fadd3ddd19a66ffcd853686e is a valid Pascal notation if Pascal would know a 128 bit native OWORD.
You can represent it as a packed record of 2 QWORD and/or 4 DWORD.

Thank you for your answer, Commander Thaddy. I'll try to do that.
The more memory computers have, the less memory people seem to use. 😅

Zvoni

  • Hero Member
  • *****
  • Posts: 3135
Re: Just numerical MD5 or something else.
« Reply #4 on: October 16, 2025, 02:56:23 pm »
Thank you so much for your answer, Zvoni.
I have a dataset containing only provinces, districts, neighborhoods, attribute 1, and attribute 2. I want to save these to the database with a unique key and also verify the relevance of the unique key to the data.

Yes, I can use MD5 numerically. I just wanted to know if it's simpler or if anyone has done this before.
See Thaddy's answer regarding 128-bit
As for Database-Storage:
have you thought about using a GUID instead of MD5?
A GUID is basically a 128-Bit-Number, too. Nevermind, that there are DBMS which offer a ready to use GUID-Datatype for Columns

Since Thaddy mentioned representing a 128-Bit number as a record of 2 QWords:
In your DB-Table define 2 Columns with 64-Bit Unsigned Integer each (for the Lower and Higher Part), and declare them COMBINED as a Primary Key.
Storing the MD5 as Text/String/Varchar or using a GUID not withstanding

EDIT: If your backend is SQLite, i think to remember there is a loadable extension offering the MD5-Function, which can be used directly within SQL-Statements
« Last Edit: October 16, 2025, 03:00:35 pm by Zvoni »
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

loaded

  • Hero Member
  • *****
  • Posts: 876
Re: Just numerical MD5 or something else.
« Reply #5 on: October 16, 2025, 03:01:14 pm »
I think I understand a little more. Thanks for the ideas. I'll try it out as soon as possible and let you know what's going on. Thanks, Zvoni. 🙋‍♂️
The more memory computers have, the less memory people seem to use. 😅

440bx

  • Hero Member
  • *****
  • Posts: 5809
Re: Just numerical MD5 or something else.
« Reply #6 on: October 16, 2025, 03:04:12 pm »
@loaded,

Basically, what you are doing is hashing.  In this case using MD5 as a hash algorithm, which is perfectly fine.  The advantage of doing that, which I'm sure you already know, is that you don't actually need to store the hash result since you can recompute it at any time.

With the above out of the way, _any_ hash algorithm cannot guarantee general uniqueness.  It can provide a very high probability of uniqueness.  This is because a hash is a function that maps points from one domain to another but the first domain is larger than the second one, therefore duplicates can always occur but if the second domain is very large, e.g, a 128 or 256 bit number, the probability of a duplicate/collision is very small (provided a reasonably decent hash algorithm which MD5 is.)

if the consequences of having a duplicate are "benign", you may want to accept the possibility.  if the consequences are "dire", you may want to detect the occurrence and perform and additional transformation to eliminate the collision (the double-hashing principle.)

As far as other people having done it before, yes, it's extremely common and it has been done in countless ways.   A search for "hashing algorithm" will yield a lot of information.

HTH.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Zvoni

  • Hero Member
  • *****
  • Posts: 3135
Re: Just numerical MD5 or something else.
« Reply #7 on: October 16, 2025, 03:13:52 pm »
@loaded,

Basically, what you are doing is hashing.  In this case using MD5 as a hash algorithm, which is perfectly fine.  The advantage of doing that, which I'm sure you already know, is that you don't actually need to store the hash result since you can recompute it at any time.

With the above out of the way, _any_ hash algorithm cannot guarantee general uniqueness.  It can provide a very high probability of uniqueness.  This is because a hash is a function that maps points from one domain to another but the first domain is larger than the second one, therefore duplicates can always occur but if the second domain is very large, e.g, a 128 or 256 bit number, the probability of a duplicate/collision is very small (provided a reasonably decent hash algorithm which MD5 is.)

if the consequences of having a duplicate are "benign", you may want to accept the possibility.  if the consequences are "dire", you may want to detect the occurrence and perform and additional transformation to eliminate the collision (the double-hashing principle.)

As far as other people having done it before, yes, it's extremely common and it has been done in countless ways.   A search for "hashing algorithm" will yield a lot of information.

HTH.

Quote
I have a dataset containing only provinces, districts, neighborhoods, attribute 1, and attribute 2. I want to save these to the database with a unique key and also verify the relevance of the unique key to the data.
I think he wants to store his Data, including the hash to check if someone has changed something in the database itself (say with an external tool).

Anything else wouldn't make sense, as you correctly pointed out, that he can recalculate the hash anytime
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

440bx

  • Hero Member
  • *****
  • Posts: 5809
Re: Just numerical MD5 or something else.
« Reply #8 on: October 16, 2025, 03:40:51 pm »
The point was that, a hash is a very common thing to use and general hashes, from one domain that is larger than another cannot every yield a unique number (since the source domain is larger than the target) but, given a target domain that is large, the probability of a collision is usually small (provided a reasonably decent hash algorithm which MD5 definitely is.)

It's up to him to use that information however it suits his needs best.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

loaded

  • Hero Member
  • *****
  • Posts: 876
Re: Just numerical MD5 or something else.
« Reply #9 on: October 16, 2025, 03:46:04 pm »
Thank you very much, valuable information from 440bx Academician. 👨‍🎓
The more memory computers have, the less memory people seem to use. 😅

Thaddy

  • Hero Member
  • *****
  • Posts: 18306
  • Here stood a man who saw the Elbe and jumped it.
Re: Just numerical MD5 or something else.
« Reply #10 on: October 16, 2025, 03:47:22 pm »
Note that md5 has a small, but higher, chance on hash collisions compared to GUID, which has a timestamp component
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

 

TinyPortal © 2005-2018