Recent

Author Topic: Best data structure to use?  (Read 12946 times)

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1114
  • Professional amateur ;-P
Re: Best data structure to use?
« Reply #45 on: March 23, 2021, 12:08:06 am »
Hey Rich,

How extremely kind of you!  Thank you - I just might do that.

Aww chucks  :-[ , just trying to help, so you do that. I'm available and eager to help.

Right now I just want to get a v1.0 going so we can test it out with our youth group.  I think they'll have a blast with it.

Yeap, I completely agree with that.

If you have the time to explore, you'll see that you have a good enough data situation going on with my stuff. And once you load the file, all data manipulation is done in memory, so rather quick.

At the moment the TCWSContainer does not expose direct access to the Categories/Words. But it's as simple as declaring a read-only property on the TCWSContainer.
But I think my code is simple enough for you to get your head around and then make all the changes you need.
All of the data pretty much behaves like arrays of objects, or more specifically TFPObjectList.

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

rwebb616

  • Full Member
  • ***
  • Posts: 133
Re: Best data structure to use?
« Reply #46 on: March 23, 2021, 01:06:31 am »
Hey all,

It was actually quite easy and quick to make the Category names unique.

New version coming up!!
Get it while it's hot!!

Version 0.6

Cheers,
Gus

Finally loaded this up - looks pretty much like what I need - I looked at the cws file it produces.  Do you think this will handle up to thousands of words split between many categories?

Rich

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1114
  • Professional amateur ;-P
Re: Best data structure to use?
« Reply #47 on: March 23, 2021, 01:30:59 am »
Hey Rich,

Finally loaded this up - looks pretty much like what I need - I looked at the cws file it produces.  Do you think this will handle up to thousands of words split between many categories?

Ok, let's do some math/maths (Dunno which side of the pond you are):
  • Let's assume, just for giggles, that average word and category name is 10 characters.
  • Let's say you have 1,000 categories, again, just for giggles.
  • Let's say you have 1,000 words per category, yeah, you got it, giggles.

The math says that you'll need at least:

10 x 1,000 x 1,000 x 10 = 100,000,000 bytes just for the strings alone.

That's 100,000,000 / 1024 / 1024 = 95.367431641 Mega Bytes

Add a little sprinkle for the boolean and the Date Time if you want and it's still gonna be WAYYYY under 1 Giga Byte.

So you tell me: With today's CPUs and today's size of RAM, do you think this app can cope?

Cheers.
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

rwebb616

  • Full Member
  • ***
  • Posts: 133
Re: Best data structure to use?
« Reply #48 on: March 23, 2021, 01:34:05 am »
Quote
So you tell me: With today's CPUs and today's size of RAM, do you think this app can cope?

Oh that's not what I meant. When you look at the file with a text editor it looks like everything is in a single line - don't know if that matters or not.

I imagine a 95MB text file might take some time to read in - not that I'll have that many words but anyway...

Yeah no problem on the computer being able to handle it.

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1114
  • Professional amateur ;-P
Re: Best data structure to use?
« Reply #49 on: March 23, 2021, 01:56:00 am »
Hey Rich,

Oh that's not what I meant. When you look at the file with a text editor it looks like everything is in a single line - don't know if that matters or not.

That's because I didn't tell the TJSONObject to save it formatted. If you change the extension form *.cws to *.json and open it up on Firefox, I think it has a parser for JSON by default so it will show you the info in a nice looking tree like manner.
If you open it up in some advanced editor like Notepad++ of even Microsoft's VS Code, I think they will provide ways to turn a non formatted JSON file into a formatted one.

Or, if I may be allowed to toot my own horn, I've develloped laz-JSON-Viewer just for this occasion.
You can get an installer that will associate all *.json files with the app under releases.
If you don't like to install stuff on your computer from stranger on the internet, you can always download the *.zip file with the app only.

I imagine a 95MB text file might take some time to read in - not that I'll have that many words but anyway...

I guess, if it's on an rotating plates HDD, but even then, it's not gonna be more that a minute or so. If you have an SSD, then it's negligible.
And that's once when you load the file. After loaded into memory, it's all in memory.
Then you have to save the thing at the end.
So you can count that under prep time for the game on start, and clean-up after the game is finished.

Yeah no problem on the computer being able to handle it.

Indeed!!

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

rwebb616

  • Full Member
  • ***
  • Posts: 133
Re: Best data structure to use?
« Reply #50 on: March 23, 2021, 02:02:16 am »
The particular machine this will run on for the youth group indeed has an SSD.  It's an i7 HP EliteBook with 16GB ram that has a somewhat striking resemblance to a macbook air  8)

The way my wife is going with putting together words though I wouldn't be surprised to have a 95MB file soon!

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: Best data structure to use?
« Reply #51 on: March 23, 2021, 02:07:54 am »
I imagine a 95MB text file might take some time to read in - not that I'll have that many words but anyway...

Well, you may not need to read it all. Though you may not get "perfect randomness".

If you have a line based file, with avg word size 10 of 95MB
1=> compute a random number between 0 and 9.5millions.
2=> seek to the given byte pos in the file, and read 1kb.
3=> search for the next new line
4=> check if the word is marked as "used", repeat from step3 (also check, if end of the 1k is reached and load the next 1k, looping from the end of file to its start).

However, this only works if you have a reasonable amount of "still unused" words, otherwise you will search through more and more of the file.

Yet, if you keep the used words in a separate file (as "byte pos, word len") and you have loaded that file completely, then you can check for the next unused word pos, without reading the word file.
Of course that again only works, if you can keep the "already used" file reasonable small. So again keep the last 100000 used words (then recycle them). Store the used info in binary, as
"record filepos: integer; wordlen: word end;" the that is 600 kb, that you can still load in a timely manner.
If you keep them in order, so that the first entry in the file is the one used furthest back, and the last entry is the most recent used one, then you need no timestamp. But you must order the entries when you read them (or put them into a tree). So in memory you need the list twice (with some tricks you can get away with once).


If you do not want to store the used files as "byte pos", you can have an index into the words file, that stores the bytepos of every word, and then store a the pos(1st,2nd,...) of each used word.

If words do not differ to much in length, you can pad all words to be stored as exactly 20 bytes. Then you can store an number and calculate the pos.
You can group words into many files, one file for each word-len. And store for each file the start number of its first word, in such a way that you do not have overlaps.


The easy option is a database, that will do all that kind of work for you...

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: Best data structure to use?
« Reply #52 on: March 23, 2021, 02:13:28 am »
Another option, if you want to keep the "used date" in the main words file.

The time consuming issue, is updating the file, as it would rewrite large portions of the file.

So, partition the file.
Create many files, with 1000 words.

At the start of each file write a header line.
  number of unused words in this file, date of oldest used word (for re-using), maybe date of most recent used word.

Then each time you need a word randomly choose a file, and then a word in that file.

For convenience,  when you start your app, pre-read the headers of each file. (That is only a few bytes from each file, and should be fast).

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1114
  • Professional amateur ;-P
Re: Best data structure to use?
« Reply #53 on: March 23, 2021, 02:21:07 am »
Hi Rich,

The particular machine this will run on for the youth group indeed has an SSD.  It's an i7 HP EliteBook with 16GB ram that has a somewhat striking resemblance to a macbook air  8)

Hummm, looks like a really nice machine. I don't see it being poorly with this amount of data.

The way my wife is going with putting together words though I wouldn't be surprised to have a 95MB file soon!

ROTFL!!! She seems quite invested I'm guessing.

Well, you let your wife play around with my sample program and then report the overall impression of speed.
It has the ever useful Ctrl+S shortcut to save the file. (Hummm, it opens the dialog every time, may need to fix that if a file is already loaded... You tell me.)
Even if we add or remove fields from the words object, she will not loose her progress since my implementation kinda glosses over missing or extra fields.
As long as we don't mess with Category.Name and Word.Word, she'll be fine.

This just popped in my head: You can find if it's free to create an Organization under GitHub. If so I think its free up to two programmers.
You then create one in the name of your church, or sum such.
You add me to it and I'll then start up the GitHub repo with my side of the code.
That way we are both able to be administrators(I think still not 100% sure) of a repo and not one is the admin and the other is just a collaborator.
What do you think?

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

rwebb616

  • Full Member
  • ***
  • Posts: 133
Re: Best data structure to use?
« Reply #54 on: March 23, 2021, 03:12:19 am »
I'll check into the non profit but another option i'll just throw out there is I have a private bit bucket server that could be used as well.  I have set up repositories on that before - we use it for Confluence but we also have a bitbucket license that we purchased with it.

Rich

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1114
  • Professional amateur ;-P
Re: Best data structure to use?
« Reply #55 on: March 23, 2021, 03:39:56 am »
Hey Rich,

I'll check into the non profit but another option i'll just throw out there is I have a private bit bucket server that could be used as well.  I have set up repositories on that before - we use it for Confluence but we also have a bitbucket license that we purchased with it.

I have to admit that I'm ignorant of the bitbucket interface. I don't know what capabilities it has.

It also would be private, hence only visible to us both and not to this community at large, but in the end it is your code and U da boss :)

At the moment I think the peeps behind GitHub have a very nice UI and, I can be biased here, it's a very good UX (while @wp may disagree :) ) and I would prefer it. And it does have private repos for free!!
But again, ultimately it's your code, so U da boss'man :)

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

dseligo

  • Hero Member
  • *****
  • Posts: 1196
Re: Best data structure to use?
« Reply #56 on: March 23, 2021, 04:03:11 am »
I don't understand why you don't want to use database.
With SQLite you need one file along your executable.

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1114
  • Professional amateur ;-P
Re: Best data structure to use?
« Reply #57 on: March 23, 2021, 04:39:05 am »
Hi dseligo,

I don't understand why you don't want to use database.
With SQLite you need one file along your executable.

This might come as a surprise, since I've been so vocal on the JSON thing I made, but I just hit me after I had a look, again, at the OP's first post: Why did SQLite get the boot??

@Rich: To be quite honest, SQLite would conform to all 5 of your initial needs, so why did SQLite get the boot?
I hope it wasn't me being a pestering loud mouth, that would shame me to death!!!

And @dseligo, thanks for the wake up call!!

Cheers,
Gus

PS: @dseligo: Your screen name always get me because it's really close to the Portuguese word for "turn off": desligo. Whenever I type it I have to triple check I'm not "turning you off" ;)
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Best data structure to use?
« Reply #58 on: March 23, 2021, 09:11:26 am »
I don't understand why you don't want to use database.
With SQLite you need one file along your executable.
I guess it's just @gcarreno hasn't gotten to the DB components yet.

Of course, the OP's problem can be solved in many ways.
I'll try to explain why I liked @gcarreno's idea of ​​using JSON.
The OP's data structure is essentially a tree, and trees are not very conveniently mapped to relational tables. But TJSONObject is exactly a tree with convenient methods for loading, navigating, editing and saving the encapsulated structure. What can also be important, JSON is a human-readable format and, if necessary, can be edited in any text editor. The speed of loading/saving data can, of course, cause concern. For the sake of curiosity, I generated a fairly extreme JSON consisting of 1000 categories with 1000 words for each category like:
Code: Text  [Select][+][-]
  1. {
  2.   "categories":
  3.   {
  4.     "category1":
  5.     {
  6.       "word1": {"played": false, "play date": <UnixTime>},
  7.       "word2": {"played": false, "play date": <UnixTime>},
  8.       ...
  9.     }
  10.     "category2":
  11.     {
  12.       .....
  13.     }
  14.     ...
  15.   }
  16. }
  17.  

The resulting file is ~ 43MB in size, TJSONObject parses it in about 2 seconds on my old machine. If necessary, you can find a faster parser.

So I think the idea is quite good.
« Last Edit: March 23, 2021, 10:18:14 am by avk »

rwebb616

  • Full Member
  • ***
  • Posts: 133
Re: Best data structure to use?
« Reply #59 on: March 23, 2021, 03:22:37 pm »
I don't understand why you don't want to use database.
With SQLite you need one file along your executable.

I was wanting to keep it into a single "portable" executable so nothing needs to be installed on the pc.

 

TinyPortal © 2005-2018