Recent

Author Topic: Easiest way to persist data?  (Read 7787 times)

francisco1844

  • New Member
  • *
  • Posts: 15
Easiest way to persist data?
« on: February 27, 2017, 12:16:13 am »
Writing an open source program as I am learning Object Pascal.

Need to find, for now, the simplest way to persist data. Tried looking at TZMSQL, but could not even figure out how to use/install.

Right now for my needs need the simplest approach to store data which will only will be used by my application. Database or not (hence why I did not post in the database area).

Any suggestions?
My initial needs, which will grow later but for now, are
* Ability to store multiple lines in a file.
* Ability to load data from file to memory.
* Ability to save data from memory to file. Ok to re-write entire file and not update.

Use case:
My utility, runcontrol, will be initially just to help some shell scripts not to execute too often. So I want to save something like
20170226-17|1

The format is
YYYMMDD-HH|#

Where "#" is just a number.

For my "phase 1" I just want to limit how many times something runs so I will be saving date+hour and how many times has run so far.

Eventually will move to a database format, but finding it I am spending more time trying to figure out which library to use and how to use it, than likely will take me to just do something simple myself. However, prefer to at least get familiar with some form of existing library as to not only not re-invent the wheel, but also to keep getting familiar with what is available.

Any pointers greatly appreciated.

sky_khan

  • Guest
Re: Easiest way to persist data?
« Reply #1 on: February 27, 2017, 12:59:18 am »
If your needs is really that simple, you can just use TStringList.

Create a stringlist, if there is a file saved before load it with TStringList.LoadFromFile, process it as whatever you like, add or delete some lines and save it back with SaveToFile method. Problem solved ?

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: Easiest way to persist data?
« Reply #2 on: February 27, 2017, 01:06:21 am »
As SkyKhan already wote, a simple stringlist seems to be enough for your needs.

In case you need something a bit more advanced then you could try tinifile (working with inifiles).

If that is still too simplistic, you could perhaps store your data using xml or json.

When that also isn't enough (e.g. too much data, require advanced filtering or there are too many relations) then it might perhaps be better to use a database.

Of course, storing 'simple' data into a database is also allowed. Just telling that you have multiple options at your disposal (and the ones i mentioned are not even all of them  :)).

francisco1844

  • New Member
  • *
  • Posts: 15
Re: Easiest way to persist data?
« Reply #3 on: February 27, 2017, 03:34:48 am »
Thanks SkyKhan and molly.

I think TStringList will be a great starting point.

And thanks molly for the other two suggestions. May come in really handy later in this or other projects.

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: Easiest way to persist data?
« Reply #4 on: February 27, 2017, 04:53:32 am »
Writing an open source program as I am learning Object Pascal.
Need to find, for now, the simplest way to persist data.
...

Use case:
My utility, runcontrol, will be initially just to help some shell scripts not to execute too often. So I want to save something like
...

For my "phase 1" I just want to limit how many times something runs so I will be saving date+hour and how many times has run so far.
...

Just curious.
You have not said what OS you are using, but let's suppose it is Linux, why are you not using cron to control the shell script frequency or using the shell script themselves to control whether they should be execute or not?

http://kvz.io/blog/2007/07/29/schedule-tasks-on-linux-using-crontab/
https://www.cyberciti.biz/faq/unix-howto-read-line-by-line-from-file/
https://www.digitalocean.com/community/tutorials/how-to-read-and-set-environmental-and-shell-variables-on-a-linux-vps

Thaddy

  • Hero Member
  • *****
  • Posts: 14215
  • Probably until I exterminate Putin.
Re: Easiest way to persist data?
« Reply #5 on: February 27, 2017, 11:37:17 am »

Just curious.
You have not said what OS you are using, but let's suppose it is Linux, why are you not using cron to control the shell script frequency or using the shell script themselves to control whether they should be execute or not?
Uhhhmm... Maybe because we are programmers? And cron is written in the "wrong language"?   >:( So we can do better?  O:-)
Note that Windows also has perfectly capable scheduling software as standard. That's certainly not a prerogative of Linux. < grumpy mode >:D >:D >

I also would suggest a TStringlist, but note that it has its limitations for that kind of job and you may want to implement a rotation (e.g. limit the size of the file, rename)  scheme.
« Last Edit: February 27, 2017, 11:41:36 am by Thaddy »
Specialize a type, not a var.

francisco1844

  • New Member
  • *
  • Posts: 15
Re: Easiest way to persist data?
« Reply #6 on: February 27, 2017, 04:13:29 pm »
..let's suppose it is Linux, why are you not using cron to control the shell script frequency

The OS is Ubuntu.

We use both the built in cron as well as some commercial "orchestration" software to manage crons acros many machines.

The first programs I want to use my program with are monitoring programs. Because we want to be alerted ASAP when there is an event some times have set cron to run every two minutes. Depending on the issue/event we could get a flood of emails every two minutes until someone logs in to manually comment the cron jobs; potentially in multiple machines. So, my first target for my program is to set counts of how many alerts should go out per hour.

or using the shell script themselves to control whether they should be execute or not?

I did not write the bash shell scripts and python programs that I am hoping to use my software with. It would be far more work to go program by program and ad logic than it would be to write the program I am trying to create and to run a "pre-check" with my program. I am also scheduled to "inherit" some of these systems soon and it will take me time to get familiar with them.

Additionally, with my program we can continue to run the checks every 2 minutes and only have the alerting part of the program check if it is ok to run by calling my program.

Moreover, my long term goal is to have dependencies prioritization:
  • Program A should run first
  • Program B should run after A has run
  • Program C should After B has run

The above is a trivial example. The actual dependencies are far more complex. Currently multiple teams try and estimate, based on historical data, how long a process has taken and then try and figure out how to schedule it all.

Recently we did some re-organization of databases which caused some jobs to run faster; that should be a good thing, except that it totally threw off the dependencies because now some jobs are trying to do parts before some other parts are done.

Long term my goal is for my program to handle dependencies so not only we don't have jobs run out of sequence, but also we have less wasted time. Right now if a job takes 1 to 2 hours, we may put the follow up job 3 hours after the dependency to make sure the other finishes. With my program, eventually we will be able to programs shortly after their dependencies are done.

Lastly, I want to learn Pascal and this seemed like a good use case. If I had done this with python I would have to worry about installing dependencies and potentially OS modules needed to be installed on machines I may not have root. As for doing it in Bash, it would likely have been ok for the first phase of simple controls, but unlikely to work for the dependency part where I will need to do queries against a DB.

francisco1844

  • New Member
  • *
  • Posts: 15
Re: Easiest way to persist data?
« Reply #7 on: February 27, 2017, 04:27:59 pm »
I also would suggest a TStringlist, but note that it has its limitations for that kind of job and you may want to implement a rotation

Eventually I will move to something like sqlite or similiar, but wanted to get something test ready sooner rather than later.

I am breaking my program into phases so I can get something in production sooner rather than later. For my "phase 1" I literally will have a single row per file so TStringlist sounds pretty viable.

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: Easiest way to persist data?
« Reply #8 on: February 27, 2017, 04:52:10 pm »

Just curious.
You have not said what OS you are using, but let's suppose it is Linux, why are you not using cron to control the shell script frequency or using the shell script themselves to control whether they should be execute or not?
Uhhhmm... Maybe because we are programmers? And cron is written in the "wrong language"?   >:( So we can do better?  O:-)
Note that Windows also has perfectly capable scheduling software as standard. That's certainly not a prerogative of Linux. < grumpy mode >:D >:D >

I also would suggest a TStringlist, but note that it has its limitations for that kind of job and you may want to implement a rotation (e.g. limit the size of the file, rename)  scheme.

Thaddy, you are not being constructive in here. And you are not always right, as a matter of fact, nobody is.
francisco1844 has told that he is trying to use binaries to control shell scrips which is usually done by OS scheduling tools, such as cron on Linux.
Any average programmer knows that ANY OS has scheduling tools, and I give an example on Linux and you give another on Windows. What's the difference?
I can understand that Microsoft Windows is so important to you, but not everybody here uses that OS.
I am trying to understand francisco1844's problem to see if I could help him and your sarcasms are useless on this thread, that is a pity because you seem to be a good person and competent professional and usually help many of us.
I don't know how your bad humor or being rude can make you happier, but there is no joy in it for the rest of us, specially for the new comers.
« Last Edit: February 27, 2017, 05:16:33 pm by valdir.marcos »

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: Easiest way to persist data?
« Reply #9 on: February 27, 2017, 05:15:15 pm »
The OS is Ubuntu.
We use both the built in cron as well as some commercial "orchestration" software to manage crons acros many machines.

The first programs I want to use my program with are monitoring programs. Because we want to be alerted ASAP when there is an event some times have set cron to run every two minutes. Depending on the issue/event we could get a flood of emails every two minutes until someone logs in to manually comment the cron jobs; potentially in multiple machines. So, my first target for my program is to set counts of how many alerts should go out per hour.

I did not write the bash shell scripts and python programs that I am hoping to use my software with. It would be far more work to go program by program and ad logic than it would be to write the program I am trying to create and to run a "pre-check" with my program. I am also scheduled to "inherit" some of these systems soon and it will take me time to get familiar with them.

Additionally, with my program we can continue to run the checks every 2 minutes and only have the alerting part of the program check if it is ok to run by calling my program.

Moreover, my long term goal is to have dependencies prioritization:


The actual dependencies are far more complex. Currently multiple teams try and estimate, based on historical data, how long a process has taken and then try and figure out how to schedule it all.

Recently we did some re-organization of databases which caused some jobs to run faster; that should be a good thing, except that it totally threw off the dependencies because now some jobs are trying to do parts before some other parts are done.

Long term my goal is for my program to handle dependencies so not only we don't have jobs run out of sequence, but also we have less wasted time. Right now if a job takes 1 to 2 hours, we may put the follow up job 3 hours after the dependency to make sure the other finishes. With my program, eventually we will be able to programs shortly after their dependencies are done.

Lastly, I want to learn Pascal and this seemed like a good use case. If I had done this with python I would have to worry about installing dependencies and potentially OS modules needed to be installed on machines I may not have root. As for doing it in Bash, it would likely have been ok for the first phase of simple controls, but unlikely to work for the dependency part where I will need to do queries against a DB.

I used to see the same problem on some clients and my suggestion is always very similar to what you are doing:
- review all process;
- create stored procedures on the databases, one stored procedure for each task;
- for batch operations, decide whether use binaries or shell scripts or both;
- create one binary or shell scripts or both for each task;
- create one binary or shell script or both to control (start/finish) all tasks and report all success and errors in log text files (usually) or database (rarely) for checking and statistical purposes;
- this program that control everything can send EMAIL, SMS, Facebook, Whatsapp, Telegram, etc, for a group when bad things happen.

Welcome to improve your Pascal skills here.
« Last Edit: February 27, 2017, 05:48:20 pm by valdir.marcos »

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: Easiest way to persist data?
« Reply #10 on: February 27, 2017, 05:42:02 pm »
I also would suggest a TStringlist, but note that it has its limitations for that kind of job and you may want to implement a rotation

Eventually I will move to something like sqlite or similiar, but wanted to get something test ready sooner rather than later.

I am breaking my program into phases so I can get something in production sooner rather than later. For my "phase 1" I literally will have a single row per file so TStringlist sounds pretty viable.

Some Linux log solutions such as syslog-ng can log directly into a database:
http://serverfault.com/questions/692309/what-is-the-difference-between-syslog-rsyslog-and-syslog-ng

Talking about file handling in Pascal, have you read http://wiki.freepascal.org/File_Handling_In_Pascal ?

francisco1844

  • New Member
  • *
  • Posts: 15
Re: Easiest way to persist data?
« Reply #11 on: February 27, 2017, 05:56:48 pm »
- for batch operations, decide whether use binaries or shell scripts or both;
...
- this program that control everything can send EMAIL, SMS, Facebook, Whatsapp, Telegram, etc for a group when the bad things happen.

I am currently a Postgresql DBA. Most of the monitoring scripts I am about to inherit were done by a team member. For those just need to control that we don't spam out alerts when something goes down.

The coordination part are ETL jobs done by multiple teams/people.

I am hoping to present my, open sourced, program to the different people/teams and see if we can (long term) have sort of  centralized dependency tree. Right now as I mentioned we have some system (I don't use it myself) for managing crons, but that is purely a distributed type of cron system. It doesn't know about dependencies.

I understand what you are saying about binary or script, but if I were to go to all these people/teams with these 2 options
* Fix your program so it doesn't spam / coordinate with other people teams more tightly so jobs run in the proper slot
* Use my script to avoid alerts from spamming and use my script so you can easily have your bash/python script run when it is supposed to

I think the second option is likely going to work better, if for no other reason that it will be less work for all those other teams/people.

Other than the scripts I am about to inherit most of the issues don't impact/involve me. But I thought I could create something to make the work easier for the other teams.

Some Linux log solutions such as syslog-ng can log directly into a database:
...
Talking about file handling in Pascal, have you read http://wiki.freepascal.org/File_Handling_In_Pascal ?

Not trying to log, but to have basically control data.. how many times has this shell script run this hour? For now to keep it simple for my first phase will create a single file per script. Later could use something like sqlite to centralize to a single file.

Imagine a cron like:
* */2 * * * monitoring_program1
* */2 * * * monitoring_program2

With my little program I just need to have the programs call my utility with something that likely will be like
if (runcontrol -h 1 monitoring_program1)
then
 do work
else
 exit
fi

Same for each monitoring program. I will then create one control file in ~/runcontrol/monitoring_program1 with how many times the program has run in the given hour (as an example). So, in this initial phase, I will just have one crontrol file per shell script.

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: Easiest way to persist data?
« Reply #12 on: February 27, 2017, 06:50:03 pm »
I am currently a Postgresql DBA. Most of the monitoring scripts I am about to inherit were done by a team member. For those just need to control that we don't spam out alerts when something goes down.

The coordination part are ETL jobs done by multiple teams/people.

I understand what you are saying about binary or script, but if I were to go to all these people/teams with these 2 options
* Fix your program so it doesn't spam / coordinate with other people teams more tightly so jobs run in the proper slot
* Use my script to avoid alerts from spamming and use my script so you can easily have your bash/python script run when it is supposed to

I think the second option is likely going to work better, if for no other reason that it will be less work for all those other teams/people.

Other than the scripts I am about to inherit most of the issues don't impact/involve me. But I thought I could create something to make the work easier for the other teams.

Not trying to log, but to have basically control data.. how many times has this shell script run this hour? For now to keep it simple for my first phase will create a single file per script. Later could use something like sqlite to centralize to a single file.

Imagine a cron like:
* */2 * * * monitoring_program1
* */2 * * * monitoring_program2

With my little program I just need to have the programs call my utility with something that likely will be like
if (runcontrol -h 1 monitoring_program1)
then
 do work
else
 exit
fi

Same for each monitoring program. I will then create one control file in ~/runcontrol/monitoring_program1 with how many times the program has run in the given hour (as an example). So, in this initial phase, I will just have one crontrol file per shell script.

I understand what you mean.
My experience goes on a different direction:
- centralized decisions;
- centralized server for production and centralized OLAP server for business intelligence;
- all branch servers work independent and replicate both way to the centralized production server;
- all ETL stuff happen out of working hours in batch mode in only one BI server and by only one BI team working strictly with the DBA;
- all other BI stuff, such as analysis cube, drills and reporting, happen only on one OLAP BI server separated from the production server;
- I rarely see OLTP data warehouses spread to branches of a great private company. And that is really hard to work in.

Sorry for I am not being able to help you right away.

As you explained, it seems that you are in the best way to have a solution for your problem.
Have you read the links below?
http://wiki.freepascal.org/File_Handling_In_Pascal and
http://wiki.freepascal.org/Lazarus_Database_Overview
http://wiki.freepascal.org/Databases

 

TinyPortal © 2005-2018