Hey SymbolicFrank,
I do use classes. Each table has it's own class, for example (a descendant of a superclass that does all the heavy lifting).
Hummm, this is where I disagree with your statement. The work of matching left(Input Data) to right(Database Data) should not be left to the parent class.
I see the data as very tightly coupled. Unless for some pratical or monetary reason some liberties had to be done to conform.
Let's say you have a field from the left that is a string with a name. You would never create a matching right data that would contain a DateTime.
In another example, if the left has a timestamp, you would never translate that into a file size, right?
The context has to be maintained/managed at the lowest end, the last child, right?
So yeah, the superclass can send you a basic data type, like Int64, but ultimately it's up to the children that is coupled with the database children that need to have the context of what that int64 is: ex. FileSize.
In my mind this leads to a lot of code like this:
begin
Table.Insert;
RightClass.Name := LeftClass.Name;
RightClass.DateOfBirth := LeftClass.DateOfBirth;
{......}
Table.Commit; // Forgot the actual procedure's name to save a table row that is in insert mode, sorry
end;
And this code is the culmination of the
LeftClass already containing all the parsed and sanitized data.
The
LeftClass did all the heavy lifting of getting the fields from the Input data and transforming it into something that can be immediately shoved into a database.
I do have a DataSet for each table, but they aren't used anymore. Everything is done directly with SQL. My "current record" for each table is basically an array of Variants (in a small class) .The problem is the relational database model. You cannot just add stuff to a table. You need all kinds of keys and there are many restrictions.
That is absolutely true, you do have to take into account the hierarchical nature of the input data.
For JSON, that's rather easy since the format already lends to it, but you mentioned that you can get stuff from CSV and other non hierarchical, by nature, types of sources.
Well, yes, this is where some effort is needed to implement it into the Classes.
The ultimate goal is to have the data ready to shove into the DB like I showed above. So all preparation done before, will pay off, right?
If you make a different application for each import, you end up with a lot of different applications, made by copying parts of other ones. You always have to debug and fix them to bring them up to date before they work. On last count, we have 92 of those. And some work for multiple formats.
Well, yes, if you have 92 Left sources, you'll need 92 ways of transposing it into the Right side.
But I wouldn't try and make each instance an island.
I would do one of the following:
- All common code should reside in a single unit, or a set of units included by all 92 instances
- Make a big monolith of it and have 92 tabs, one for each import Left/Right pair.
I'm more sympathetic to the former. I like neat little black boxes that help me do my job quick and not having to think about it.
The latter is a more challenging mess due to the shear size of such a beast
Some parts of the data imported are spread out over 10+ tables. Some of it is in this table, with those foreign keys. A slight variation is in those four tables, with the same foreign keys that point to different tables, etc. Three quarters of a conversion is in querying the keys, inserting new records if they don't exist and adding a lot of calculated values everywhere. Reading and converting the source data is less than a quarter of the work, and by far the easiest bit. So I made more classes and functions to vastly simplify those three quarters.
For me the classes on the Right should also be able to create/manage their Database Schema with tables, fields, indexes and foreign keys. But again, just my 2c.
Like I said before, I like self contained classes that have low to no coupling between them. The only coupling I'm allowing here is that the Class on the Left has the same exact fields as the class on the Right.
And say, I have an integer that is 0. As a key, probably completely valid (although not in DBF files, they start at 1). As an amount, probably not. It depends. Some fields expect a value of 0 to say that another field should be NULL or ignored. As a price or percentage, totally legit.
So, it is far easier to determine the status just before you insert it into the database, because it's hard to generalize.
I've explained this before. The class on the Left will identify and sanitize all the data. The class on the Right just has to deal with shoving the already sanitized data into the DB according to the hierarchy that the classes on the Left have implemented.
Hoping that this, still makes some sense
Cheers,
Gus