Recent

Author Topic: Is this Stupid? What do you think?  (Read 2792 times)

440bx

  • Hero Member
  • *****
  • Posts: 5820
Re: Is this Stupid? What do you think?
« Reply #15 on: September 13, 2025, 04:33:08 am »
I just wanted to say that, generally speaking, using a DB such as PostgreSQL is good for the flexibility it offers. Particularly when the number of ways the data can be queried is "unlimited".

OTH, if the dataset is relatively small, say less than 512MB _and_ the number of ways the data needs to be queried is limited and known in advance then having the data in flat files, loading them when needed and _indexing_ them dynamically in memory offers an in-memory database that is usually blazingly fast and is a good option for a single user application that uses data that will be queried a known number of ways.


FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

dbannon

  • Hero Member
  • *****
  • Posts: 3568
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Is this Stupid? What do you think?
« Reply #16 on: September 13, 2025, 05:26:10 am »
Wow, that is interesting !  I have often wondered how (good) authors keep each character's personality distinct, and sometimes, even, evolving.

Sadly, I do not purchase from Amazon. But thanks Egan for an insight into the process.

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

CharlyTango

  • Full Member
  • ***
  • Posts: 169
Re: Is this Stupid? What do you think?
« Reply #17 on: September 13, 2025, 08:04:27 pm »
@EganSolo
Give ZMSQL a try
https://wiki.lazarus.freepascal.org/ZMSQL

You can get it by OPM and it seems to be exactly what you're looking for.
A "database" based on csv files in memory with the possibility to query by SQL statements
Lazarus stable, Win32/64

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12536
  • FPC developer.
Re: Is this Stupid? What do you think?
« Reply #18 on: September 13, 2025, 10:24:00 pm »
These qualities are worthy of consideration. Nevertheless, there's another side to this coin:

1. Uniform debugging: Free Pascal doesn't treat SQL statements as first-class. It can't compile or debug them. By foregoing the DB, I'm able to leverage the IDE to its fullest
2. Transparency: My data is in CSV files. I can open them in a text file if I so wish, or leverage the full power of Excel
3. Reduction of bugs I don't understand: When using (or misusing) a component, I run into bugs I don't understand because I'm not the maintainer of the code or because I'm missing the author's original intent. By keeping the back-end of the app lightweight, I reduce these issues. You don't think that's important? Check the SQLite threads in the forum  :)

4. Any application that operates on a very large % of the dataset is not for a standard database (dedicated in memory databases however might work).

I have several applications that do several forms of analysis on multi gigabyte datatsets (up to 10GB with indexing) that usually touch every byte each query. I don't use binary formats rather than CSVs, but principle is the same.   Some of the query also need context, e.g. if a query yields a result "x", I also want the rows before and after that match the same entity.

Maybe this is old fashioned nowadays, but the base framework was from a case with about 300MB DBF datasets on a P4, in a time that even 1GB was a quite extreme luxury.  I hooked into the application server to create a logfile of its actions that fed an oracle database, allowing me to creating a shadow dataset, based on last weeks dump + mutations.  The dump was from Oracle to DBF for an older system. This app is probably the reason why TDBF is part of FPC now :-)

The consultants for the application server (with Oracle database) spelled doom and gloom (scalability, correctness etc etc), but the fact was that my analysis took 1m for data loading and 4s for the query. Their (Java J2EE) application ran for 3-4 hours, and frequently crashed and had to be restarted. Some days they didn't even manage a result (they ran once per quarter for the official billing, I ran mine weekly for statistics for the management and running projects).

Scalability was truly not a problem, the dataset was naturally partitioned (by muncipality), and could have easily been made more scalable by pulling some more P4s out of the dumpster and investing Eur 200 in extra memory for them and running the program on multiple machines with subsets of the dataset, and a simple combining program.

Moral of the story: yes, RDBMS is the sane default. But never close your eyes for the exceptions. Very large result sets or a very variable analysis (hard to pre-index, needing lots of temp tables in the SQL way) might be predictors for such exceptions.
« Last Edit: September 15, 2025, 12:00:20 pm by marcov »

EganSolo

  • Sr. Member
  • ****
  • Posts: 380
Re: Is this Stupid? What do you think?
« Reply #19 on: September 13, 2025, 11:51:39 pm »
@EganSolo
Give ZMSQL a try
https://wiki.lazarus.freepascal.org/ZMSQL

You can get it by OPM and it seems to be exactly what you're looking for.
A "database" based on csv files in memory with the possibility to query by SQL statements

Thanks, CharlyTango!
I'll seriously consider it post MVP. Right now, I'm on a tight schedule to get Book 8 of the Epic out by end of year. Nevertheless, ZMSQL looks promising.

Zvoni

  • Hero Member
  • *****
  • Posts: 3140
Re: Is this Stupid? What do you think?
« Reply #20 on: September 15, 2025, 09:18:00 am »
What about a "hybrid"-approach?
Keep your data in your csv flat files, but use the "InMemory-Database"-feature of sqlite?

In a nutshell: On Program startup (or whenever you read your flat files), instead of throwing the Data into that Array of StringLists, create an InMemory-sqlite-DB, throw your Data into those "virtual" tables (ForeignKeys supported!!), use SQL to your hearts content, on Program shutdown, serialize back to flat files.

You can even use UserDefinedFunctions like Regex (search this forum for my sample-code)
« Last Edit: September 15, 2025, 09:21:25 am by Zvoni »
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

EganSolo

  • Sr. Member
  • ****
  • Posts: 380
Re: Is this Stupid? What do you think?
« Reply #21 on: September 15, 2025, 10:53:12 am »
Hi Zvoni,

  The approach you proposed is sound and has many advantages, which other contributors have highlighted.

  I guess the one quality I wanted to preserve was code simplicity: keeping the entire code in Object Pascal means that, from a debugging standpoint, each line is a first-class citizen, capable of being inspected and debugged. Mixed code with SQL and Object Pascal means that to debug SQL, I have to rely on external means, stepping away from the IDE. Having extensive experience with this mixed model, I wanted to know what would happen if I kept the entirety of the code in Object Pascal.

  My experience suggests that for a small data footprint, that approach works well from a time-to-market and performance standpoint. As things stand, I understand a large portion of the code. Furthermore, by overusing TStringGrid instead of other facilities to manage CSV files, I've kept my dependencies on components to a minimum, which meant that I had a greater chance to avoid mystical bugs, that is bugs in components I didn't write and I don't understand.

  Are the qualities I chose to optimize for the best in all cases? Certainly not! I understand and respect the choices made by other coders implementing different use cases or who wish to optimize other qualities. Knuth called his classic tomes "The Art of Computer Science" to highlight the choices we make.

  Am I fully sold on my current approach? No. My code is highly encapsulated, and the effort to switch the TstringGrid-based backend with a SQLite one would take no more than ten days (with testing) to complete.

   When I started, I set out to find out if this approach is foolish, and, at this point, I would say the jury is still out, with a leaning toward "No. It depends..."

Zvoni

  • Hero Member
  • *****
  • Posts: 3140
Re: Is this Stupid? What do you think?
« Reply #22 on: September 15, 2025, 11:47:17 am »
Hi Zvoni,

  The approach you proposed is sound and has many advantages, which other contributors have highlighted.

  I guess the one quality I wanted to preserve was code simplicity: keeping the entire code in Object Pascal means that, from a debugging standpoint, each line is a first-class citizen, capable of being inspected and debugged. Mixed code with SQL and Object Pascal means that to debug SQL, I have to rely on external means, stepping away from the IDE. Having extensive experience with this mixed model, I wanted to know what would happen if I kept the entirety of the code in Object Pascal.

  My experience suggests that for a small data footprint, that approach works well from a time-to-market and performance standpoint. As things stand, I understand a large portion of the code. Furthermore, by overusing TStringGrid instead of other facilities to manage CSV files, I've kept my dependencies on components to a minimum, which meant that I had a greater chance to avoid mystical bugs, that is bugs in components I didn't write and I don't understand.

  Are the qualities I chose to optimize for the best in all cases? Certainly not! I understand and respect the choices made by other coders implementing different use cases or who wish to optimize other qualities. Knuth called his classic tomes "The Art of Computer Science" to highlight the choices we make.

  Am I fully sold on my current approach? No. My code is highly encapsulated, and the effort to switch the TstringGrid-based backend with a SQLite one would take no more than ten days (with testing) to complete.

   When I started, I set out to find out if this approach is foolish, and, at this point, I would say the jury is still out, with a leaning toward "No. It depends..."
I did read through the Thread, and i got your reasons for doing it your way.
Don't get me wrong: In the end you have to be happy with your solution.

But: I wouldn't be surprised, if the majority of your code is (basically) rewriting a "database-logic": Adding, Updating, Deleting, Finding "records" etc.
By using a real database-engine, you wouldn't need that, nevermind gaining automagically DRI (Data Referential Integrity) and other powerful features

As for "debugging code which isn't Pascal":
You would only have to "debug" your SQL-Statements, which is pretty easy for SQLite: for design use an external tool like DB-Browser for SQLite, design everything, test your SQL's,
copy all Statements from there, put it in your Lazarus-Project as String-Constants.
Use them in a logical order (e.g. creating Parent-Table before child-table etc.)
Done (more or less).

But as you said: Your choice, and we have to respect that (which i do)
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

BeniBela

  • Hero Member
  • *****
  • Posts: 947
    • homepage
Re: Is this Stupid? What do you think?
« Reply #23 on: September 15, 2025, 02:18:38 pm »
I used CSV for all my data but then I thought I would modernize it and switched to XML.

(and I had to change my files anyway because I wanted to add new columns and my CSV files didn't have a header. And I was using Delphi 4 which didn't have a UTF-8 string)


but that made everything worse, I should have stayed with CSV

but one can discuss the separators. Commas are the worst, there is really no reason to use commas. It is probably best to actually learn ASCII. There were already separator characters just for this purpose: GS, RS and US (U+001C~U+001F)

 

TinyPortal © 2005-2018