Recent

Author Topic: Best data structure to use?  (Read 12942 times)

rwebb616

  • Full Member
  • ***
  • Posts: 133
Best data structure to use?
« on: March 20, 2021, 11:42:38 pm »
Hello,

I'm writing a HeadsUp type game that has a series of categories and for each category there is a list of words.  I'm trying to figure out how best to handle the storing/retrieval of the data.  Here are the requirements:

1 - don't want to use a database because it requires a client library
2 - I want to be able to file/open a file and close it so it makes it easy to make "add-on" packs
3 - There are 3 fields of information - category (string), word (string), and played (boolean)
4 - I want to keep track of which words were already played and pick back up when the game is next loaded
5 - possibly randomizing the unplayed words

Right now I'm storing the data in an ini file and it works okay for filling up the list boxes (see attached) but it doesn't track whether the word is played.  I'm thinking either a class for the card with category, word, played as fields then using a Tcollection to track them but Tcollections are messy having to typecast to Tcard for example.  Thought about creating a record type Tcard with those three fields and making an array of Tcards but then dealing with the sizing of the array is complicated.  Also thought of using a binary file instead of an ini file that I could store the records into but then I don't know how to deal with a collection of them - back to the array discussion again.

Anyone have some suggestions how best to set this up?  I've got most of the rest of the application framework done just have to deal with the data at this point.

Rich

Bart

  • Hero Member
  • *****
  • Posts: 5275
    • Bart en Mariska's Webstek
Re: Best data structure to use?
« Reply #1 on: March 20, 2021, 11:59:16 pm »
You could use a TCSVDataSet, no external libary required.
Alternatively, if you can forbid some character (like '*') to be in a "word", then you can use that character append (or prefixed) to that word in the inifile, to indicate wether it is used.

Bart

dbannon

  • Hero Member
  • *****
  • Posts: 2786
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Best data structure to use?
« Reply #2 on: March 21, 2021, 12:01:43 am »
How about using TList to manage you TCard records ?  Far easier than an array.  I have used that model in my project (see sig) in the note_lister unit.

(I believe TFPList is faster but I have not tried it)

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

rwebb616

  • Full Member
  • ***
  • Posts: 133
Re: Best data structure to use?
« Reply #3 on: March 21, 2021, 12:02:19 am »
You could use a TCSVDataSet, no external libary required.

I will check into that - sounds interesting.

Quote
Alternatively, if you can forbid some character (like '*') to be in a "word", then you can use that character append (or prefixed) to that word in the inifile, to indicate wether it is used.

Tinifile does have the concept of name/value pairs so I could do it like this:

[category]
word=played

so..
[food]
apple=0 or 1

So when I read in the ini file I have 0 for not played and 1 for played.

rwebb616

  • Full Member
  • ***
  • Posts: 133
Re: Best data structure to use?
« Reply #4 on: March 21, 2021, 12:03:35 am »
How about using TList to manage you TCard records ?  Far easier than an array.  I have used that model in my project (see sig) in the note_lister unit.

(I believe TFPList is faster but I have not tried it)

Can you store all three fields in a Tlist? I was thinking it was only strings... maybe it can hold a custom data type (record maybe?)

I'll have to check into that.

dbannon

  • Hero Member
  • *****
  • Posts: 2786
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Best data structure to use?
« Reply #5 on: March 21, 2021, 12:28:59 am »
TList manages a mob of pointers.  So, you declare a record (of anything) and then pass a pointer to it to the TList.  Need to remember to free things afterwards but other than that, dead easy.

https://github.com/tomboy-notes/tomboy-ng/blob/master/source/note_lister.pas

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1111
  • Professional amateur ;-P
Re: Best data structure to use?
« Reply #6 on: March 21, 2021, 10:36:07 am »
Hi All,

I've put together an example with the data being stored in JSON format.
I called it CWS(Categories and Word Soup).

We have:
  • TCWSContainer
     
    • TCWSCategories
         
      • TCWSCategory
       
    • TCWSWords
         
      • TCWSWord
       

All the words are unique, no repetition. I'm not sure if this is intended, if not let me know and I'll do a normal tree like structure.

The Category only stores an index of the unique word.

The only thing that makes me unsure of this implementation is the played field.


Cheers,
Gus
« Last Edit: March 21, 2021, 10:38:11 am by gcarreno »
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

rwebb616

  • Full Member
  • ***
  • Posts: 133
Re: Best data structure to use?
« Reply #7 on: March 21, 2021, 01:12:07 pm »
The only thing that makes me unsure of this implementation is the played field.

Hi Gus,

So in the game Heads UP words are put up on the screen and there is a person standing with his/her back to the screen so they don't see the word.  Then team members (team 1 or 2) try to describe the word to the guesser to get them to say the word.  (very much like catch phrase but in reverse).  The intention for the played field is that when someone guesses a word correctly the played field will be toggled to a "true" condition (1 or whatever) to indicate that word has already been played (and written to disk).  Then next time that category is used it doesn't repeat words that have already been played.  Then once all words are "true" it will automatically go through and toggle all words back to false.

Quote
All the words are unique, no repetition. I'm not sure if this is intended, if not let me know and I'll do a normal tree like structure.
Words only must be unique within it's particular category so you don't want "Orange" to show twice in the "Food" category but if there were a "Colors" category it would be okay for Orange to be listed there.

I have never worked with JSON so I will have to look at your example and get an idea how it works.

Thank You for your suggestion.
Rich

rwebb616

  • Full Member
  • ***
  • Posts: 133
Re: Best data structure to use?
« Reply #8 on: March 21, 2021, 01:17:54 pm »
TList manages a mob of pointers.  So, you declare a record (of anything) and then pass a pointer to it to the TList.  Need to remember to free things afterwards but other than that, dead easy.

https://github.com/tomboy-notes/tomboy-ng/blob/master/source/note_lister.pas

Davo

I will need to read up on pointers - I understand how a pointer works as in passing a variable by reference but the ^ and @ syntax is new to me.  I must say I got a little overwhelmed with the 1500 lines of code in your note lister.  I know I probably need to focus on a specific section but just by nature if I have something like that in front of me I just read and try to figure out how it works.  Then there is the issue of once the Tlist has all the records in it what is the best way to store / retrieve that data from disk?  I'm sure many options - JSON, XML, CSV (for TCSVDataset), INI, binary .dat file... endless possibilities :)

Is that entire notes application written in Pascal?  Seems that I saw a version that had .cs files (c#?)

Rich

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1111
  • Professional amateur ;-P
Re: Best data structure to use?
« Reply #9 on: March 21, 2021, 01:42:43 pm »
Hi Rich,

So in the game Heads UP words are put up on the screen and there is a person standing with his/her back to the screen so they don't see the word.  Then team members (team 1 or 2) try to describe the word to the guesser to get them to say the word.  (very much like catch phrase but in reverse).  The intention for the played field is that when someone guesses a word correctly the played field will be toggled to a "true" condition (1 or whatever) to indicate that word has already been played (and written to disk).  Then next time that category is used it doesn't repeat words that have already been played.  Then once all words are "true" it will automatically go through and toggle all words back to false.

Okydokes, I think I know the game now :D

Just to be clear here: Different categories can have the same word, but the played field has to be relative to the category.
OR
I can have the word Mouse under Category Animals and Mouse under Category House Pests, but if I played Animals/Mouse, then House Pest/Mouse is not played, right?
If so, this needs a tree like approach, and not the 2 lists like I have now. That actually means less of the fiddling with the indexes.
I'll cook up the difference in my next version.

Words only must be unique within it's particular category so you don't want "Orange" to show twice in the "Food" category but if there were a "Colors" category it would be okay for Orange to be listed there.

Yeah, I noticed, quite late, that I allowed repeated words inside the category. Need to fix that.

I have never worked with JSON so I will have to look at your example and get an idea how it works.

The JSON part is just baked in for storage purposes. It's really just one object(Container) that encapsulates 2 TFPObjectList of TObject. One for the Categories and the other for Words.
But still, there is a bit of magic involved in getting it done.

Thank You for your suggestion.

No probs mate, quite welcome.

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Best data structure to use?
« Reply #10 on: March 21, 2021, 02:08:00 pm »
using Strings and String based storage is going to be slow and if you are looking for closer as possible real time stuff then I would suggest you use the string base as storage only for the initial start of the game and read in all values used into record fields first before the game starts.
Although "real time" generally means responses within microsconds... for something like this that might possibly be overkill.

Quote
Your game may require a Linked List to track tree type nodes etc. I am sure you understand how that works, if not that can be explained to you also...

That's cruel, OP has already explained that he's a bit shaky on pointers etc.

Rich: a pointer is the address of something, but from the POV of programming it also carries type information around with it. @ takes the address of something (most commonly a record), ^ dereferences a pointer and returns whatever was pointed to.

However, objects are a special case since they are implicitly represented by pointers which do not need to be explicitly dereferenced.

In practical terms, if somebody's not messing around at the system level I'd not expect them to need pointers in Pascal these days.

A useful compromise is the TStringList type. This is an object, i.e. you'd do something like

Code: Pascal  [Select][+][-]
  1. myStringList := TStringList.Create;
  2. try
  3.   ...
  4. finally
  5.   FreeAndNil(myStringList)
  6. end;
  7.  

It looks a bit like an array of strings, but it can be sorted and searched fairly efficiently (see the documentation starting at https://www.freepascal.org/docs-html/current/rtl/classes/index.html ) and each string can also have an object associated with it.

In your case it might be feasible to have a stringlist which contains the categories, each category having a stringlist containing the word. You could mark a word as used either by deleting it from the relevant stringlist, or by having a very simple object associated with each word... you'd almost certainly find other useful things you could store on a per-word basis.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9791
  • Debugger - SynEdit - and more
    • wiki
Re: Best data structure to use?
« Reply #11 on: March 21, 2021, 02:30:15 pm »
About the disk storage.
You may want to consider using a separate file for storing the info which words were already played.

That way you do not need to modify the original data file (or update packs).
If you do a new release, and the data file comes with more entries (added to the same main data file), you can replace the old data file, and still have the "played" info in the separate file.

It also would be in line with the recommendations of most OS. At least OS like Win,Linux,Mac that are multi-user.
If you distribute the app, it is possible that it is played by more than one person (OS User account)  on one PC. So the "played" file could be saved in the users home directory.

Also consider that if the app is installed normally (c\program files\  or /usr/local/bin|share) then the folder with the "words" file may not be write-able to the user. So it can not be updated. (Of course, you can install somewhere else...)


Next question is, if you want the files to be human readable. But probably that is a good idea.

Then keep it simple. Any line based file format will be fine.

For the static ("words" / "categories") files, ini files are ok, since there is existing code to read/write them.

For the "already played" I would suggest to just dump the words, one per line into a text file (TStringList can do that).
Note that if you keep the order in which they were played, then you can limit how many words you save.
You could limit to store the last 1000 played words. After that they can be played again. (If you want).


When you load the data, you may have to re-arrange it.

Note that if you sort "played" you may need to make a copy, so you can also keep the original order for limiting the total amount remembered.

Once the "words" are loaded and the "played" are loaded, you need to combine the info, so you can mark/remove already played items.
With big amount of words  (and lots already played that becomes crucial).

Say you have 1000 words, 500 played.
If you did
Code: Pascal  [Select][+][-]
  1. for i := 0 to words.count -1 do
  2.   for j := 0 to played.count - 1 do
  3.     if words[i].text = played[i].text
That makes 500 thousand runs of the loops.
Make it 2000 words, and 1000 played => 2 million runs.

If you did
Code: Pascal  [Select][+][-]
  1. played.sort;
  2. for i := 0 to words.count -1 do
  3.     if played.BinarySearch(words[i].text)
BinarySearch is BigO(log n). So 500 words means a loop of 9 iterations. For 1000 words = 9000 iterations total.
For 1000 of 2000 = 20000 iterations total.

And if you sort both list
Code: Pascal  [Select][+][-]
  1. played.sort; words.sort;
  2. p := 0;
  3. for i := 0 to words.count - 1 do begin
  4.   while (p<=played.count-1) and (played[p].text < words[i].text) do inc(p);
  5.   if words[i].text = played[p].text then MarkPlayed;
  6. end;
  7.  
You only need words.count (or played.count / whichever is bigger) iterations


As for the Data format in memory. Create a record (or class) and specialize a list for it.





Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1111
  • Professional amateur ;-P
Re: Best data structure to use?
« Reply #12 on: March 21, 2021, 03:10:45 pm »
Hey Rich,

Did a bit of a rewrite and now:
  • TCWSWords are under TCWSCategory.
  • Under each Category you cannot insert duplicate words.
  • Removed the Index from words

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Best data structure to use?
« Reply #13 on: March 21, 2021, 03:32:50 pm »
I like the idea of using TJSONObject.
It generally covers almost all needs and is fast enough. Something like:
Code: Text  [Select][+][-]
  1. {
  2.   "category1":
  3.     {
  4.       "new": ["word1", "word2"],
  5.       "played": ["word3", "word4"]
  6.     },
  7.   "category2":
  8.     {
  9.       ...
  10.     }
  11.   ...
  12. }
  13.  

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1111
  • Professional amateur ;-P
Re: Best data structure to use?
« Reply #14 on: March 21, 2021, 03:45:34 pm »
Hey avk,

At the moment, v0.5, my example produces a save file like this:

Code: Javascript  [Select][+][-]
  1. {
  2.     "categories": [
  3.         {
  4.             "name": "Cat1",
  5.             "words": [
  6.                 {
  7.                     "word": "Word1",
  8.                     "played": false
  9.                 },
  10.                 {
  11.                     "word": "Word3",
  12.                     "played": false
  13.                 }
  14.             ]
  15.         },
  16.         {
  17.             "name": "Cat2",
  18.             "words": [
  19.                 {
  20.                     "word": "Word1",
  21.                     "played": false
  22.                 },
  23.                 {
  24.                     "word": "Word4",
  25.                     "played": false
  26.                 }
  27.             ]
  28.         }
  29.     ]
  30. }

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

 

TinyPortal © 2005-2018