Recent

Author Topic: Algorithm for "unfair" dice roll  (Read 1831 times)

ArminLinder

  • Sr. Member
  • ****
  • Posts: 316
  • Keep it simple.
Algorithm for "unfair" dice roll
« on: May 05, 2024, 06:19:41 pm »
Hi all,

can anyone suggest a fast algorithm which I can use to solve this scenario:

For testing SQL databases I do quite often need to create large (100.000 - 1.000.000) records of random person data. To achieve this I have wordlists of common first names, last names, town names, street names, and so on.

To generate, I currently simply generate a random number between 1 and Ubound(wordlist) for each list, and concat the selected entries from each list, and repeat this until I get the desired number of records.

Now comes the interesting part :-)

To make things a bit more interesting I want to be able to add a "Bias" to the list entries, so the more common names and the towns with more inhabitants are picked more frequently. Ideally the "Bias" number is treated in such a way that I can, for instance, look up the number of inhabitants and use that. A town with n=100000 inhabitants would habe 10 times the chance to get selected compared with another town having n=10000 inhabitants.

Popular first- and given names should also appear more often than unusual ones.

The only idea i had so far was preprocessing the lists and adding n duplicates, maybe with some optimization to scale n down into a range between say 1 to 100.

Any idea for a smarter and faster algorithm?

Thnx, Armin



« Last Edit: May 05, 2024, 06:21:47 pm by ArminLinder »
Lazarus 3.3.2 on Windows 7,10,11, Debian 10.8 "Buster", macOS Catalina, macOS BigSur, VMWare Workstation 15, Raspberry Pi

cdbc

  • Hero Member
  • *****
  • Posts: 1673
    • http://www.cdbc.dk
Re: Algorithm for "unfair" dice roll
« Reply #1 on: May 05, 2024, 06:38:26 pm »
Hi
Maybe search google for 'Priority-lists'...?!?
I dunno, just a thought...
Regards Benny
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE5 -> FPC 3.2.2 -> Lazarus 2.2.6 up until Jan 2024 from then on it's: KDE5/QT5 -> FPC 3.3.1 -> Lazarus 3.0

explainedd

  • New Member
  • *
  • Posts: 11
Re: Algorithm for "unfair" dice roll
« Reply #2 on: May 05, 2024, 09:36:02 pm »
Consider the following normalized histogram:

event   relative probability   cumulative sum of relative probabilities
A       0.5                    1.0
B       0.3                    0.5
C       0.2                    0.2

The sum of the relative probabilities are one.

A,B,C are baskets. Each basket contains elementary events.

1. Generate a uniformly distributed floating point random value in range: 0, 1

This random value selects a point of “cumulative sum of relative probabilities” column.
You have to find the corresponding basket:

if random_number < 0.2 then C else
if random_number < 0.5 then B else
A

2. Generate a uniformly distributed integer random value in range: 0, high(selected_basket)
This random value gives the index of your item in the basket.
« Last Edit: May 05, 2024, 09:47:46 pm by explainedd »

Thaddy

  • Hero Member
  • *****
  • Posts: 16198
  • Censorship about opinions does not belong here.
Re: Algorithm for "unfair" dice roll
« Reply #3 on: May 05, 2024, 09:44:43 pm »
You can use a priority queue as a buiding block.
AVK has some good implementations in his lgenerics library.
https://github.com/avk959/LGenerics/tree/master.
Each item in the priority queue has an associated priority value, so there will be a bias towards the higher values.

Another - better - possibility is to use Gaussian random values, which implicitly has a normal distribution, which is what you want. It picks a random value based on the mean and standard deviation to achieve this. (It is a bell shape which in your case means that you can get randoms over a population where common values will appear more often than uncommon values. In the math unit there is a function for Gaussian random numbers: randg. That is probably easier than starting with a priority queue.
I would choose randg for what you need. It is actually there for just that goal.
Code: Pascal  [Select][+][-]
  1. function randg(mean,stddev: float): float;
As long as your are familiar with basic statistical functions it is really easy.
« Last Edit: May 05, 2024, 09:53:08 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 859
Re: Algorithm for "unfair" dice roll
« Reply #4 on: May 05, 2024, 09:56:15 pm »
It's very easy, if you know probability theory. In case of dice each number has equal weight 1. Probability formula is following: Probability = Weight / Sum_Of_All_Weights. So, for dice it's P = 1/(1+1+1+1+1+1) = 1/6. If you want some number to be more probable - just change it's weight. For example if you want number 1 to be 2x more probable than any other number, then just do Weight1 = 2. And therefore P1 = 2/(2+1+1+1+1+1) = 2/7. All you need to do after that - to change random number generation algorithm. It should return number between 0 and 6 instead of 0 and 5. And number 1 should be picked in case of 0 and 1 instead of just 0. Of course it's simplified explanation. In common case priority list would be needed.

For example you can use this list:
Code: [Select]
N - W
-------
1 - 1
2 - 2
3 - 3
4 - 4
5 - 5
6 - 6
Number N is picked, if Random(7) <= W. Binary search algorithm can be used to search for proper N.
« Last Edit: May 05, 2024, 10:06:38 pm by Mr.Madguy »
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

Thaddy

  • Hero Member
  • *****
  • Posts: 16198
  • Censorship about opinions does not belong here.
Re: Algorithm for "unfair" dice roll
« Reply #5 on: May 05, 2024, 10:01:15 pm »
What you describe is basically what randg does for you.
Here's an example that generates a series of random values for a given mean and standard deviation. Next we calculate if the random values indeed approximate the distribution.
Code: Pascal  [Select][+][-]
  1. Program Example40;
  2.  
  3. { Program to demonstrate the randg function. }
  4.  
  5. Uses Math;
  6.  
  7. Var
  8.   I : Integer;
  9.   ExArray : Array[1..10000] of Float;
  10.   Mean,stddev : Float;
  11.  
  12. begin
  13.   Randomize;
  14.   for I:=low(ExArray) to high(ExArray) do
  15.     ExArray[i]:=Randg(1,0.2);
  16.   MeanAndStdDev(ExArray,Mean,StdDev);
  17.   Writeln('Mean       : ',Mean:8:4);
  18.   Writeln('StdDev     : ',StdDev:8:4);
  19. end.
This code is taken from the manual.

In practice, you also need to use EnsureRange() because randg can return values outside  of the range of the actual distribution: this is normal however.
« Last Edit: May 06, 2024, 09:42:22 am by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

Lansdowne

  • New Member
  • *
  • Posts: 35
Re: Algorithm for "unfair" dice roll
« Reply #6 on: May 06, 2024, 05:26:17 pm »
I would consider a different technique.  Just model the real world.  (I am using the example of First names and last names.)

I presume you already have a database of names.  Many genealogy sites will have lists of the frequency of names in their database.  Here is one for the US at namecensus.com:
Code: [Select]
Smith      2442977
Johnson   1932812
Williams   1625252
Brown    1437026
Jones     1425470
Garcia    1166120
...
and at position 500 we have
Harrington 66959

So if you create a stringlist which contains 'Smith' 24 times, 'Johnson' 19 times, 'Garcia' 12 times and so on for the first 150 entries maybe, then one instance of each of the next 350 you would have surname database of the kind you want.
At a very approximate approximation, you would have a 1/100k sample of the US population's names and for a better sample just change the sample size.

And of course you could use a similar database for first names.  For sample database that you're looking for, it probably doesn't matter that you will get random but unlikely characters like Krzysztof Balasubramaniam or Saoirse González Pérez.

dseligo

  • Hero Member
  • *****
  • Posts: 1418
Re: Algorithm for "unfair" dice roll
« Reply #7 on: May 06, 2024, 07:40:00 pm »
You could do it like this (similar to answer #2).

You have your list of towns with number of inhabitants in thousands, like this:
Code: Text  [Select][+][-]
  1. Town            Pop.
  2. --------------------
  3. Port of Spain     38
  4. Tunis            639
  5. Ankara          5747
  6. Ashgabat         791
  7. Cockburn Town      4

Now, you calculate "begin" "end" inhabitant number for each town:
Code: Text  [Select][+][-]
  1. Town            Pop.   Begin    End
  2. -----------------------------------
  3. Port of Spain     38       1     38
  4. Tunis            639      39    677
  5. Ankara          5747     678   6424
  6. Ashgabat         791    6425   7215
  7. Cockburn Town      4    7216   7219

Now you get random number from 1 to 7219, and search which town falls in range of your result. In my example Ankara will be most often.

 

TinyPortal © 2005-2018