Hi all,
can anyone suggest a fast algorithm which I can use to solve this scenario:
For testing SQL databases I do quite often need to create large (100.000 - 1.000.000) records of random person data. To achieve this I have wordlists of common first names, last names, town names, street names, and so on.
To generate, I currently simply generate a random number between 1 and Ubound(wordlist) for each list, and concat the selected entries from each list, and repeat this until I get the desired number of records.
Now comes the interesting part :-)
To make things a bit more interesting I want to be able to add a "Bias" to the list entries, so the more common names and the towns with more inhabitants are picked more frequently. Ideally the "Bias" number is treated in such a way that I can, for instance, look up the number of inhabitants and use that. A town with n=100000 inhabitants would habe 10 times the chance to get selected compared with another town having n=10000 inhabitants.
Popular first- and given names should also appear more often than unusual ones.
The only idea i had so far was preprocessing the lists and adding n duplicates, maybe with some optimization to scale n down into a range between say 1 to 100.
Any idea for a smarter and faster algorithm?
Thnx, Armin