Recent

Author Topic: Testing (JSON) Variants  (Read 4441 times)

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 935
Testing (JSON) Variants
« on: January 21, 2022, 12:20:35 pm »
If I read a JSON file with fpjson, a value can either not exist (exception), be null or have a value. So, you first have to test for null.

Then I turn it into a variant, so I can pass it around. This has some obstacles as well, like JSON DateTime being a string, typically ISO. Variants of the varDate type are especially prone to raising undebuggable exceptions.

Anyway, then I have this Variant, and I want to see if it has an actual value. Comparing Variants to nil doesn't seem to be a good idea (strange things happen). Testing for NULL is mostly safe. But then, I would want to see if it is not a zero-length string, a numeric value of 0, or a TDateTime(0). Which I cannot figure out how to do without very frequent and totally undebuggable exceptions.

My current function:

Code: Pascal  [Select][+][-]
  1. function Empty(const s: string): Boolean;
  2. begin
  3.   if Length(Trim(s)) = 0 then Result := True
  4.   else Result := False;
  5. end;
  6.  
  7. function NullOrEmpty(const ThisOne: TJSONdata): Boolean;
  8. begin
  9.   Result := False;
  10.  
  11.   if (ThisOne = nil) or ThisOne.IsNull or Empty(ThisOne.AsString) then
  12.     Result := True;
  13. end;
  14.  
  15. function NullOrEmpty(const v: Variant): Boolean;
  16. begin
  17.   Result := False;
  18.  
  19.   if VarIsNull(v) or (VarIsStr(v) and Empty(v)) or
  20.     (VarIsType(v, varDate) and (v = TDateTime(0))) or
  21.     (VarIsNumeric(v) and (v = 0)) then
  22.     Result := True;
  23. end;

The last one is the main problem.

AlexTP

  • Hero Member
  • *****
  • Posts: 1775
    • UVviewsoft
Re: Testing (JSON) Variants
« Reply #1 on: January 21, 2022, 12:43:26 pm »
I use TJsonConfig method GetValue, so it returns "default" parameter value when JSON key not exists. I don't get an exception.

loaded

  • Hero Member
  • *****
  • Posts: 569
Re: Testing (JSON) Variants
« Reply #2 on: January 21, 2022, 12:58:43 pm »
When working with json I check with its assigned process.
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var
  3.   t:TJSONData;
  4.   jsonval:String='{"tip": "Feature","geometry": 2022}';
  5. begin
  6.   t:=GetJSON(jsonval);
  7.    if Assigned(t.FindPath('tipx')) then     //correct key name  tip
  8.     showmessage(t.FindPath('tipx').AsString)
  9.       else
  10.         showmessage('There is no such key');
  11. end;
If Ide=Lazarus 2.0.10 32 Bit and Os=Win 10 Home 64 Bit then Get up and do something useful! Because God is the helper of those who start again;

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 935
Re: Testing (JSON) Variants
« Reply #3 on: January 21, 2022, 01:11:18 pm »
Well, the main problem is not the JSON, but testing and comparing the variants without getting exceptions that present you with the "Quit or risk data corruption?" dialog. Continuing execution at that point is useless anyway, because you're thrown back into the message handler. There is no way to debug it.

In JSON, unused numeric values tend to be 0, not NULL. And dates are strings. There is no NULL-value in a TDateTime, and it seems that a varDate Variant is just a Double flagged as being varDate. So, you end up with either a Variant you cannot assign to a TDateTime, or a Variant that is actually NULL when it contains TDateTime(0). And assigning 0 to a Variant turns it into a Numeric value, not a TDateTime. So, I really want to know if the Variant contains an actual value, before I try assigning it to something typed. Otherwise, I get that dreaded dialog. 'try..except' doesn't help (I tried).

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 866
  • Professional amateur ;-P
Re: Testing (JSON) Variants
« Reply #4 on: January 21, 2022, 01:33:42 pm »
Hey SymbolicFrank,

I'm a bit confused...

When you have a TJSONData, there's this property called JSONType which is of type:
Code: Pascal  [Select][+][-]
  1. type
  2.   TJSONtype = (jtUnknown, jtNumber, jtString, jtBoolean, jtNull, jtArray, jtObject);

This will give you the major types that are stored on a TJSONData.

To drill down on the jtNumber case, the corresponding TJSONumber has a property called NumberType that is declared as:
Code: Pascal  [Select][+][-]
  1. type
  2.   TJSONNumberType = (
  3.     ntFloat,
  4.     ntInteger
  5.     {$IFNDEF PAS2JS}
  6.     ,ntInt64
  7.     ,ntQWord
  8.     {$else}
  9.     ,ntNativeInt
  10.     {$ENDIF}
  11.     );

So, you always know what your TJSONDATA and TJSONNumber are with very good precision and keeping out all the Variant mess.

Of course there's the dates, or timestamps. Well, these are the usual pain in the proverbial *rse and should be treated accordingly.

I'm not quite sure about using variants just for the dates/timestamps.
Dates will come in string format and should, in my opinion, be made into TDateTime.
TimeStamps will come in Int64 and should, again in my opinion, be made into TDateTime.

I'm guessing that there are way more code supporting TDateTime manipulation/testing/conversion/adding/subtrating etc, than with Variants of type date.
But I could be horribly wrong, in which case I deeply apologize.

Hope any of this helps in any small way.

Cheers,
Gus
« Last Edit: January 21, 2022, 01:36:28 pm by Gustavo 'Gus' Carreno »
Lazarus 2.3.0(trunk) FPC 3.3.1(trunk) Ubuntu 21.10 64b Dark Theme
Lazarus 2.0.12(stable) FPC 3.2.2(stable) Ubuntu 21.10 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 866
  • Professional amateur ;-P
Re: Testing (JSON) Variants
« Reply #5 on: January 21, 2022, 01:57:34 pm »
Hey SymbolicFrank,

A bit more rant on how I get out of some holes...

When I have to consume JSON, I never consume it RAW(as to say).

What I do is create a group of Object Pascal Classes that will contain the data serialized in the JSON.

This allows me to mimic the JSON data types with know Object Pascal data types, making it a bit more type safe.

For the data types that are not dates or timestamps on the JSON, it's easy to transpose the data.

The problem arises when you have strings with dates or Int64s with timestamps.

But because you know what they contain, or what they represent, since you're pairing a know JSON to a know Object Pascal Class, you don't have to test for all the others.
A field called DoB, or date-of-birth, will either be of type jtString and have a string with a date, or it will be a jtNumber, and depending of how log ago the timestamps represents it will be an Integer or an Int64.

To sum up, what I do is have a parallel set of Object Pascal Classes that they themselves gobble up the JSON either on the Create(), or a FillFromJSON(), or some such device and then I only deal with the classes, never the TJSON* types.

As an example have a look at this project: https://github.com/gcarreno/lazGeoJSON
You can maybe focus on this object: https://github.com/gcarreno/lazGeoJSON/blob/master/src/lazGeoJSON.Geometry.Point.pas

That project is still in an early stage and it's not that hard to follow.
I have other projects to consume more complex JSON sources that have Input and Output as JSON and also the basic TJSONData, TJSONObject and TJSONArray

This makes sense to me. I can concede that it's a bit time consuming to get the initial code up and running. but once you do have the self sustained classes, it's a breeze to consume JSON.
And going back to change a small bit is also a breeze if all is well organized, right?

Again, in the hopes that these rants will give you a bit of an inkling of a path to follow.

Cheers,
Gus
Lazarus 2.3.0(trunk) FPC 3.3.1(trunk) Ubuntu 21.10 64b Dark Theme
Lazarus 2.0.12(stable) FPC 3.2.2(stable) Ubuntu 21.10 64b Dark Theme
http://github.com/gcarreno

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 935
Re: Testing (JSON) Variants
« Reply #6 on: January 21, 2022, 02:16:34 pm »
Ok. Let's forget about the JSON altogether. It was just there to outline the problem. Want I want to know is:

How do you test if a Variant is of type varDate, without getting some 'illegal conversion error' if it DOESN'T contain a valid DateTime?

As there is no separate container for a TDateTime, VarIsType(v, varDate) seems to try to convert the value that is in there (like, NULL, or a string with an ISO date) to a date, which raises an exception when that fails. Or at least, that's what seems to be happening.


@Gustavo 'Gus' Carreno: I appreciate all the help, often the problem is me thinking the wrong way :)

I read some arbitrary data source that contains a (partial) database. In this case it is JSON, but it could just as easy be a bunch of CSV or XML files, or even a backup of an SQL server. It doesn't matter. I don't know up front. So, I make functions that accept Variants.

In that source file, each Integer that should be NULL, is most often 0, "0" or "". A DateTime can have many variants: Strings and Doubles are popular, 32 or 64 bit Cardinals as well (Unix time), etc. And if it is a String, it can be all kind of things. An empty date could be NULL, 0, 0.0, "0", "", "1970-01-01 00:00" or a lot of other things.

To prevent me having to write a separate conversion function for each variant of all the different input formats, I put the data, as-is, into a Variant and have the functions that are to insert it into the database figure it out. So, just as you are suggesting, I made a class hierarchy and import functions that look at the Variant values supplied and figure out if and how it should go into the database.

Hence my question.
« Last Edit: January 21, 2022, 02:18:43 pm by SymbolicFrank »

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 866
  • Professional amateur ;-P
Re: Testing (JSON) Variants
« Reply #7 on: January 21, 2022, 02:48:58 pm »
Hey SymbolicFrank,

How do you test if a Variant is of type varDate, without getting some 'illegal conversion error' if it DOESN'T contain a valid DateTime?

I'm gonna defer to someone with more experience in variants since I'm a complete ignorant of it's perils.
The only thing I know is to avoid them as the plague if at all possible.

@Gustavo 'Gus' Carreno: I appreciate all the help, often the problem is me thinking the wrong way :)

You are more than welcome and I completely understand the feeling. It plagues us all  :D

I read some arbitrary data source that contains a (partial) database. In this case it is JSON, but it could just as easy be a bunch of CSV or XML files, or even a backup of an SQL server. It doesn't matter. I don't know up front. So, I make functions that accept Variants.

My only advice here is that you should try and get away from generalization.
When I was a younger developer, every problem I tackled I tried to make it as generic as possible.
Most of the times the dead-lines would go WOOOSSHHH and I would be in a big mess of an untenable design.

But I think I've never tried to make an importer of assorted data very generic. I guess I always saw the data as being so unique that it would never see a similar enough recurring instance in the wild, or even in the same shop.

So every time I imported data, I would do it laboriously but with the knowledge that I would not have to chase generic windmills...

I would probably create a program that would generate Pascal code with the classes needed to import the data. I'm actually in the middle of doing such a program for JSON API's.
I'm even thinking about the program ingesting a JSON file to try and determine the basic constructs and then give me a base to go from.
There is also an API description effort, kinda like the WSDL for SOAP. I'll have to look into that.
The code output is almost done, just need to complete the UI to define the connection between JSON<-->ObjectPascal and I'm done.
I'm sorry I don't link to the project, but it's in a state of such incompleteness that it would hinder more than help anyone but me.

So if you do seek generalization, I would suggest you forget the Variants and invest your time in making a program that generates Classes :)

Again, my 2c. With the disclaimer that I abhor variants simply because I read somewhere or someone told me that they are more of a problem than a solution. Kinda like Reg Exps :)

Cheers,
Gus
Lazarus 2.3.0(trunk) FPC 3.3.1(trunk) Ubuntu 21.10 64b Dark Theme
Lazarus 2.0.12(stable) FPC 3.2.2(stable) Ubuntu 21.10 64b Dark Theme
http://github.com/gcarreno

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 935
Re: Testing (JSON) Variants
« Reply #8 on: January 21, 2022, 03:13:29 pm »
Thanks.

I do use classes. Each table has it's own class, for example (a descendant of a superclass that does all the heavy lifting). I do have a DataSet for each table, but they aren't used anymore. Everything is done directly with SQL. My "current record" for each table is basically an array of Variants (in a small class) .The problem is the relational database model. You cannot just add stuff to a table. You need all kinds of keys and there are many restrictions.

If you make a different application for each import, you end up with a lot of different applications, made by copying parts of other ones. You always have to debug and fix them to bring them up to date before they work. On last count, we have 92 of those. And some work for multiple formats.

Some parts of the data imported are spread out over 10+ tables. Some of it is in this table, with those foreign keys. A slight variation is in those four tables, with the same foreign keys that point to different tables, etc. Three quarters of a conversion is in querying the keys, inserting new records if they don't exist and adding a lot of calculated values everywhere. Reading and converting the source data is less than a quarter of the work, and by far the easiest bit. So I made more classes and functions to vastly simplify those three quarters.

And say, I have an integer that is 0. As a key, probably completely valid (although not in DBF files, they start at 1). As an amount, probably not. It depends. Some fields expect a value of 0 to say that another field should be NULL or ignored. As a price or percentage, totally legit.

So, it is far easier to determine the status just before you insert it into the database, because it's hard to generalize.
« Last Edit: January 21, 2022, 03:18:47 pm by SymbolicFrank »

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 866
  • Professional amateur ;-P
Re: Testing (JSON) Variants
« Reply #9 on: January 21, 2022, 03:58:18 pm »
Hey SymbolicFrank,

I do use classes. Each table has it's own class, for example (a descendant of a superclass that does all the heavy lifting).

Hummm, this is where I disagree with your statement. The work of matching left(Input Data) to right(Database Data) should not be left to the parent class.

I see the data as very tightly coupled. Unless for some pratical or monetary reason some liberties had to be done to conform.

Let's say you have a field from the left that is a string with a name. You would never create a matching right data that would contain a DateTime.
In another example, if the left has a timestamp, you would never translate that into a file size, right?

The context has to be maintained/managed at the lowest end, the last child, right?

So yeah, the superclass can send you a basic data type, like Int64, but ultimately it's up to the children that is coupled with the database children that need to have the context of what that int64 is: ex. FileSize.

In my mind this leads to a lot of code like this:
Code: Pascal  [Select][+][-]
  1. begin
  2.   Table.Insert;
  3.   RightClass.Name := LeftClass.Name;
  4.   RightClass.DateOfBirth := LeftClass.DateOfBirth;
  5.  {......}
  6.   Table.Commit; // Forgot the actual procedure's name to save a table row that is in insert mode, sorry
  7. end;

And this code is the culmination of the LeftClass already containing all the parsed and sanitized data.
The LeftClass did all the heavy lifting of getting the fields from the Input data and transforming it into something that can be immediately shoved into a database.


I do have a DataSet for each table, but they aren't used anymore. Everything is done directly with SQL. My "current record" for each table is basically an array of Variants (in a small class) .The problem is the relational database model. You cannot just add stuff to a table. You need all kinds of keys and there are many restrictions.

That is absolutely true, you do have to take into account the hierarchical nature of the input data.
For JSON, that's rather easy since the format already lends to it, but you mentioned that you can get stuff from CSV and other non hierarchical, by nature, types of sources.
Well, yes, this is where some effort is needed to implement it into the Classes.

The ultimate goal is to have the data ready to shove into the DB like I showed above. So all preparation done before, will pay off, right?

If you make a different application for each import, you end up with a lot of different applications, made by copying parts of other ones. You always have to debug and fix them to bring them up to date before they work. On last count, we have 92 of those. And some work for multiple formats.

Well, yes, if you have 92 Left sources, you'll need 92 ways of transposing it into the Right side.
But I wouldn't try and make each instance an island.
I would do one of the following:
  • All common code should reside in a single unit, or a set of units included by all 92 instances
  • Make a big monolith of it and have 92 tabs, one for each import Left/Right pair.

I'm more sympathetic to the former. I like neat little black boxes that help me do my job quick and not having to think about it.
The latter is a more challenging mess due to the shear size of such a beast :)

Some parts of the data imported are spread out over 10+ tables. Some of it is in this table, with those foreign keys. A slight variation is in those four tables, with the same foreign keys that point to different tables, etc. Three quarters of a conversion is in querying the keys, inserting new records if they don't exist and adding a lot of calculated values everywhere. Reading and converting the source data is less than a quarter of the work, and by far the easiest bit. So I made more classes and functions to vastly simplify those three quarters.

For me the classes on the Right should also be able to create/manage their Database Schema with tables, fields, indexes and foreign keys. But again, just my 2c.
Like I said before, I like self contained classes that have low to no coupling between them. The only coupling I'm allowing here is that the Class on the Left has the same exact fields as the class on the Right.

And say, I have an integer that is 0. As a key, probably completely valid (although not in DBF files, they start at 1). As an amount, probably not. It depends. Some fields expect a value of 0 to say that another field should be NULL or ignored. As a price or percentage, totally legit.

So, it is far easier to determine the status just before you insert it into the database, because it's hard to generalize.

I've explained this before. The class on the Left will identify and sanitize all the data. The class on the Right just has to deal with shoving the already sanitized data into the DB according to the hierarchy that the classes on the Left have implemented.


Hoping that this, still makes some sense :D

Cheers,
Gus
Lazarus 2.3.0(trunk) FPC 3.3.1(trunk) Ubuntu 21.10 64b Dark Theme
Lazarus 2.0.12(stable) FPC 3.2.2(stable) Ubuntu 21.10 64b Dark Theme
http://github.com/gcarreno

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 935
Re: Testing (JSON) Variants
« Reply #10 on: January 21, 2022, 04:46:15 pm »
Ok.

I have a large DBF database, with ~400 tables, which contain on average ~100 fields each. The first version of the application that used this is more than 30 years old. It gets updated at least once a month.

There is a lot of data customers want inserted. From lists that contain thousands of records with just two fields: a number and a short description, to complete databases from comparable products. There are many build-in import functions and many external ones. They all have to insert the data into that same database.


The current record of a TDataSet consists of a collection of TField, which is basically a set of Variants with a field name. Because that's the only way to generalize it. Yes, you could make a class TBasicField, from which you could inherit a whole set of fields with a specific type, like TStringField. But that's basically what a Variant is as well. Same functionality.

If you want to add a lot of data to a database, the most time-consuming part, by far (90+%), is looking up the keys. Especially with a file-based database. So, I start with duplicating an empty DBF database in an SQL database. 'select .. where' is far faster than filtering a DBF file.

It is easiest to add a new field to each table, in which you can insert the external key from the source data. Which can be just about anything. Numbers and strings are the most popular. So, I chose a string.

Because I do everything directly with SQL, I made a wrapper around each table, so you can do: TableName['FieldName'] := v; Like, a TSQLQuery-lite. And on top of that I made a few functions, that reduce the amount of code needed to insert the complex stuff by a factor of about 1000. Better to do it once in the core classes, than 100 times in each different import. So, I end up with a function like: AddLine(ExternalKey1, ExternalKey2, ExternalKey3, ExternalKey4, ExternalKey5: Variant; <a list of descriptions and values>: Variant); Originally that was a few thousand lines of code, all in all. Well, it still is, but now it is inside a black box.

So, at the outside, I read the source data and feed it to the core functions, which figure out how to insert it into the database. And the very specific stuff you add later with the individual table classes. And then I export it back to the DBF database.
« Last Edit: January 21, 2022, 04:49:22 pm by SymbolicFrank »

AlexTP

  • Hero Member
  • *****
  • Posts: 1775
    • UVviewsoft
Re: Testing (JSON) Variants
« Reply #11 on: January 21, 2022, 05:29:18 pm »
This page https://stackoverflow.com/questions/10286204/what-is-the-right-json-date-format tells that JSON has no standard date-time type. But JS has.

AlexTP

  • Hero Member
  • *****
  • Posts: 1775
    • UVviewsoft
Re: Testing (JSON) Variants
« Reply #12 on: January 21, 2022, 05:34:25 pm »
So app should (?) save date-time to a string, and then convert that string to date-time on JSON reading. No need to use Variant here.

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 866
  • Professional amateur ;-P
Re: Testing (JSON) Variants
« Reply #13 on: January 21, 2022, 05:40:57 pm »
Hey SymbolicFrank,

I have a large DBF database, with ~400 tables, which contain on average ~100 fields each. The first version of the application that used this is more than 30 years old. It gets updated at least once a month.

Ok, right here lies the crux of the issue in what is becoming apparent to me: I'm talking in hypotheticals, trying to use best practices and assuming a fresh start and you are dealing with 30 of technical debt.
It's not fair to you, at all!!

I wasn't trying to make you enter the fool path of a complete re-write, please don't think I was.
I was just musing on the way I would've done it if I had to start today, from fresh and had to decide on an architecture.

I would opt for the most OOP based implementation I could get away with based in Classes implementing Interfaces.
But that's me. My Opinion based on a path that is diametrically opposed to your reality.

You've mentioned function many times and I kinda glossed my eyes over it, but I'm guessing that you've inherited a procedural mess and are trying to get it into the now with a lot of pain, blood and tears.
30 years of legacy code is quite the daunting precipice  :o

It was nice to have a bit of a moment where I could hear myself talk about what I would do, but I now think that I should've kept quiet and let someone else, more versed in Variants, take the lead and advise you.

I'm really sorry if I made you loose your precious time on an old fool's ranting about the Best Way(tm)  :-[

I'll let you go on your quest for better Variant knowledge, then and give you my most sincere apologies :)

Cheers,
Gus
Lazarus 2.3.0(trunk) FPC 3.3.1(trunk) Ubuntu 21.10 64b Dark Theme
Lazarus 2.0.12(stable) FPC 3.2.2(stable) Ubuntu 21.10 64b Dark Theme
http://github.com/gcarreno

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Testing (JSON) Variants
« Reply #14 on: January 21, 2022, 09:37:00 pm »
Well, the main problem is not the JSON, but testing and comparing the variants without getting exceptions that present you with the "Quit or risk data corruption?" dialog. Continuing execution at that point is useless anyway, because you're thrown back into the message handler. There is no way to debug it.

In JSON, unused numeric values tend to be 0, not NULL. And dates are strings. There is no NULL-value in a TDateTime, and it seems that a varDate Variant is just a Double flagged as being varDate. So, you end up with either a Variant you cannot assign to a TDateTime, or a Variant that is actually NULL when it contains TDateTime(0). And assigning 0 to a Variant turns it into a Numeric value, not a TDateTime. So, I really want to know if the Variant contains an actual value, before I try assigning it to something typed. Otherwise, I get that dreaded dialog. 'try..except' doesn't help (I tried).

Unit Variants has a few global variables to handle Null, like NullStrictConvert. Try setting it to false. Also, if you typecast your variable to TVarData, you get access to what you want.

Variant is very flexible, you can add your own custom type to it, if you need.
« Last Edit: January 21, 2022, 09:38:40 pm by engkin »

 

TinyPortal © 2005-2018