Print Page - A new design for a JSON Parser

Programming => General => Topic started by: sysrpl on August 26, 2019, 09:12:55 am

Title: A new design for a JSON Parser
Post by: sysrpl on August 26, 2019, 09:12:55 am

I know the FCL already has a capable JSON parser, but I am writing some Amazon web service interfacing projects and wanted a smaller easier to use JSON parser to assist. I've created a new design for a JSON parser that is pretty small, yet powerful.

If you're interested, I've posted the code under GPLv3 and a write up of my thought process and the workflow of using a single small class to work with JSON:

https://www.getlazarus.org/json/

Any and all feedback is welcome.

Update: Aug 1 2021

I've posted an update that fixes an issue with escaped double quote characters and adds some convenience methods. You may read more about the update at the page linked above.

Title: Re: A new design for a JSON Parser
Post by: k1ng on August 26, 2019, 10:57:01 am

Hey,
nice work! Would be nice to see a speed comparison with LkJSON :)

Title: Re: A new design for a JSON Parser
Post by: sysrpl on August 26, 2019, 11:51:17 am

I am unsure how fast or slow it is, but I didn't design it for speed. That's not to say it's slow, but it's meant to be small, with powerful features, and just one class / unit.

With regard to speed, I am creating 1 pascal object for every node. If I wanted to make it fast I would getmem for many objects at once, and neither create nor destroy them. Instead I would put or get object memory from that pool and not heap allocation / deallocation an object for each node parsed.

That said, would it really be worth it? Do your programs spend most of their time parsing JSON? Are you writing a heavy traffic outward facing service that parses JSON frequently?

If this is the case and speed / scalability is a concern then you probably want to switch to nodejs which is optimized for heavy traffic and parallelization, and is based on JSON to boot. Many smart engineers have designed nodejs for exactly this use case.

Title: Re: A new design for a JSON Parser
Post by: marcov on August 26, 2019, 12:03:31 pm

(the pooling functionality in the fcl-XML unit has also been designed out over time, to sensitive, maintenance wise)

Title: Re: A new design for a JSON Parser
Post by: k1ng on August 26, 2019, 01:06:07 pm

Quote from: sysrpl on August 26, 2019, 11:51:17 am

I am unsure how fast or slow it is, but I didn't design it for speed. That's not to say it's slow, but it's meant to be small, with powerful features, and just one class / unit.

With regard to speed, I am creating 1 pascal object for every node. If I wanted to make it fast I would getmem for many objects at once, and neither create nor destroy them. Instead I would put or get object memory from that pool and not heap allocation / deallocation an object for each node parsed.

I'm not sure how LkJSON works internally but I assume it also creates objects for each node as you refer to them via Field identifier.

Code: Pascal [Select][+]

js := TlkJSONObject.Create();
js := TlkJSON.ParseText(jsonstr) as TlkJSONObject;
 
if js.Field['name'].Field['surname'].SelfType <> jsNull then
  surname := String(js.Field['name'].Field['surname'].Value);

So for me it seems both are working more or less the same just with a different in usage when getting values. For the latter I'd prefer your version as one don't need some typecast. Using AsString etc is more common in recent Delphi/FPC.
It was just a suggestion because I think more users would use your library if it's also faster/comparable to LkJSON, just with a better syntax. So it'd be another pro to try your version ;)

Quote from: sysrpl on August 26, 2019, 11:51:17 am

That said, would it really be worth it? Do your programs spend most of their time parsing JSON? Are you writing a heavy traffic outward facing service that parses JSON frequently?

If this is the case and speed / scalability is a concern then you probably want to switch to nodejs which is optimized for heavy traffic and parallelization, and is based on JSON to boot. Many smart engineers have designed nodejs for exactly this use case.

No, personally I don't need much JSON parsing but others may do. E.g. if your library is 100x times slower than LkJSON (I don't know if there are any other Delphi+FPC JSON Parsers) many people wouldn't use your version as the only 'pro' would be the different syntax but as you only need to write code once...

Title: Re: A new design for a JSON Parser
Post by: minesadorada on August 26, 2019, 01:15:50 pm

Good work sysrpl. I like a design based on simplicity of use and clear syntax.
Thank you for your effort.

Title: Re: A new design for a JSON Parser
Post by: marcov on August 26, 2019, 01:25:35 pm

Quote from: k1ng on August 26, 2019, 01:06:07 pm

I don't know if there are any other Delphi+FPC JSON Parsers

FPC comes with its own, fcl-json

Title: Re: A new design for a JSON Parser
Post by: sysrpl on August 26, 2019, 03:54:19 pm

king,

Code: Pascal [Select][+]

js := TlkJSONObject.Create();
js := TlkJSON.ParseText(jsonstr) as TlkJSONObject;
 
if js.Field['name'].Field['surname'].SelfType <> jsNull then
  surname := String(js.Field['name'].Field['surname'].Value);

The equivalent version of with my library would be:

Code: Pascal [Select][+]

N := TJsonNode.Create;
if N.TryParse(S) and (N.Find('name/surname') <> nil) then
  SurName := N.Find('name/surname').AsString;

With regards to speed, I am considering an experiment for my own curiosity. Here is how and what I would test.

1) Time parsing a large JSON structure thousands of times.
2) Remove the TJsonNode create during the parsing, internally overwriting the same node over and over again and repeat the same test.
3) Remove the internal TList and add, and repeat the test again yet again and note the time.

This should give me a good base line to understand how much time it take the FPC to parse JSON with my library, first as it is now, second as it would be with some type of object pooling, and third with a fixed size list shared among all nodes.

If the times show a marked difference in speed, then adding pooling and a shared list might be a worthwhile enhancement. Also, I may test against uLkJSON. I've look at its source code and I'll be curious to see the speed difference.

Thank you for your replies.

Title: Re: A new design for a JSON Parser
Post by: serbod on August 26, 2019, 04:40:17 pm

Another implementation of JSON serialization:

https://github.com/serbod/dbitems/blob/master/datastorage.pas
https://github.com/serbod/dbitems/blob/master/JsonStorage.pas

IDataStorage (TDataStorage) - abstract item similar to Variant, that can store any value/list/dictionary. Using as Interface allows automatic free unused items by refcount.

TDataSerializerJson - serialize/deserialize items to/from JSON.

TDataSerializerBencode - Bencode serializer/deserializer. Fast, compact and human-readable. Used in torrent files, for example.

Title: Re: A new design for a JSON Parser
Post by: BeniBela on August 26, 2019, 05:54:32 pm

My internettools (http://benibela.de/sources_en.html#internettools) can do JSON, too:

The above task could be solved as:

Code: Pascal [Select][+]

  xsurname := query('json($_1)/name/surname', [jsonstr]);
  if not xsurname.isUndefined then
    surname := xsurname.toString;
 

It is the opposite of being small. If jsonstr was an url, it would be downloaded from the internet; and you can use it with same syntax for HTML

Title: Re: A new design for a JSON Parser
Post by: avra on August 26, 2019, 08:53:14 pm

Quote from: sysrpl on August 26, 2019, 09:12:55 am

https://www.getlazarus.org/json/

I could not connect https because of invalid certificate. I could not connect http because OpenDNS flagged site as malware.

Title: Re: A new design for a JSON Parser
Post by: Leledumbo on August 27, 2019, 08:38:24 pm

I see a few improvements over fcl-json interface. First is that a lot fcl-json methods accepts or returns TJSONData, the top most generic JSON value representation. This will require downcasting to actual type everytime the real value will be used. Second, you node is designed with parent node access, so moving around the tree is possible without the need to keep parent node reference. Other than those, they're pretty much equal. I just with the root node doesn't have to be explicitly created (well, a little wrapper function similar to fcl-json's GetJSON is rather easy to make).

Title: Re: A new design for a JSON Parser
Post by: heejit on August 27, 2019, 10:37:22 pm

If possible please add your library into Online package manager
it help this library available to many user.

Title: Re: A new design for a JSON Parser
Post by: sysrpl on August 28, 2019, 01:36:03 am

I appreciate the feedback

Leledumbo,

I just thought I'd explain the Find(Path) syntax, as it relates to your mentioning of the Parent node.

Code: Pascal [Select][+]

AnyNode.Find('/'); // returns the root node
AnyNode.Find('search/for/name'); // returns a node 3 levels from the current node
AnyNode.Find('/search/for/name'); // returns a node 3 levels from the root node
 

There is also a NodeByName property which does not try to evaluate a path.

Code: Pascal [Select][+]

AnyNode.Find('/search/for/name'); // returns a node directly under the current name with a name of "/search/for/name"

In other words if S contains:

{
"test": {
},
"stuff": {
"enabled": true,
"/search/for/name": "you've found me"
}
}

Then ...

Code: Pascal [Select][+]

N := TJsonNode.Create;
N.Value := S;
N := N.Find('stuff');
WriteLn(N.NodeByName['/search/for/name'].AsString);
N.Root.Free;

Outputs:

you've found me

Title: Re: A new design for a JSON Parser
Post by: sysrpl on August 28, 2019, 02:00:09 am

More usage examples:

Each node can parse / load / save JSON text at any level. That is you can compose a document like so:

Code: Pascal [Select][+]

N := TJsonNode.Create;
N.Add('employees').LoadFromFile('employees.json');
N := N.Add('contacts');
N.LoadFromFile('contacts.json');
N.Add('emergency').LoadFromFile('emergency.json');
WriteLn(N.Value);
WriteLn(N.Root.Value);

Would write out first:

{
.. contact nodes
"emergency": {
.. emergency nodes
}
}

Then write out second:

{
"employees": {
.. employee nodes
},
"contacts": {
.. contact nodes
"emergency": {
.. emergency nodes
}
}
}

So in this way my library allows you to load, parse, or set the JSON value of any node at any level as if it were the root node. The difference being is that child nodes just append to more nodes which ultimate fall under the same root.

The only requirement is this is that the JSON for the root node must be in the form of an object {} or array []. Child node JSON can be object {}, array [], boolean true/false, null null, number 123, or string "hello world" (note the double quotes).

To use NON JSON with nodes, such as 'hello world' (single quotes), use the type safe properties:

AsObject
AsArray
AsBoolean
AsNull
AsNumber
AsString

Title: Re: A new design for a JSON Parser
Post by: Leledumbo on August 29, 2019, 02:15:34 am

Quote from: sysrpl on August 28, 2019, 01:36:03 am

Code: Pascal [Select][+][-]
...
AnyNode.Find('search/for/name'); // returns a node 3 levels from the current node
AnyNode.Find('/search/for/name'); // returns a node 3 levels from the root node

I think I missed the difference between the two. From your example, you go for the second after setting N to stuff node and in this case, stuff node is the root? Then what is current node here? If instead you go for the first, what will you get?

Title: Re: A new design for a JSON Parser
Post by: sysrpl on August 29, 2019, 06:02:41 am

It the same as if you were typing a file system path. If your path string brings with a forward slash, then the path identities an item starting at the root of the files system. If it does not start with a forward slash, then the path evaluates from the current directory.

So for example:

NodeA.Find('/preferences/pallets/inspector/visible').AsBoolean;
NodeA.Find('visible').AsBoolean;

The first line would search the JSON starting at the root, even if NodeA is not the root.

The second line would search for an item called "visible" directly under NodeA.

So the way it works is you can Find from the root level at any node if the path string begins with "/". This is how files paths works, it's how XPath works, and I may eventually make Find support more of XPath.

Title: Re: A new design for a JSON Parser
Post by: Leledumbo on August 29, 2019, 01:01:04 pm

Quote from: sysrpl on August 29, 2019, 06:02:41 am

It the same as if you were typing a file system path. If your path string brings with a forward slash, then the path identities an item starting at the root of the files system. If it does not start with a forward slash, then the path evaluates from the current directory.

If that's so, recalling your example:

Code: Pascal [Select][+]

N := N.Find('stuff');
WriteLn(N.NodeByName['/search/for/name'].AsString);
 

Shouldn't output "you've found me", instead 'search/for/name' or '/stuff/search/for/name' should, but the latter doesn't care whether N points to 'stuff' node or not. Am I right or something is missing here?

Title: Re: A new design for a JSON Parser
Post by: krexon on September 25, 2019, 11:05:48 am

@sysrpl I use your parser, because fpJsonparser can't parse JSON, when there are duplicate keys.
Everything works fine, but ...

There is a key with price (always with 4 decimal places, but last 2 digits are zero), ie.

Code: Pascal [Select][+]

'price1': 0.9900
'price2': 1.9900

I get this price using such code:

Code: Pascal [Select][+]

p1 := n.Find('price1').AsNumber // p1: double
p2 := n.Find('price2').AsNumber // p2: double
ShowMessage(FloatToStr(p1+p2));

At 2 computers (Windows 10) I didn't have any problems, but at one computer (Windows 10) sometimes above code shows 0 instead 2.98

I modified code to check if parser gets proper JSON value:

Code: Pascal [Select][+]

p1 := n.Find('price1').AsNumber // p1: double
p2 := n.Find('price2').AsNumber // p2: double
pj1 := n.Find('price1').AsJSON // pj1: string
pj2 := n.Find('price2').AsJSON // pj2: string
ShowMessage(FloatToStr(p1+p2) + LineEnding + p1 + LineEnding + p2));

Then it shows:
0
0.9900
1.9900
Everything works fine after restarting app. Problem occures again after some time :(
So It seems that getting value AsNumber is broken

Title: Re: A new design for a JSON Parser
Post by: lainz on December 10, 2019, 12:59:13 am

Thanks for this Library.

A thing I found that maybe can be improved is 'AsString', maybe you can do some defaults when the value is Double? Like doing automatically value.ToString?

As well maybe you can provide AsInteger? Doing internally a trunc(value)...?

Title: Re: A new design for a JSON Parser
Post by: marcov on December 10, 2019, 12:49:58 pm

krexon: do those computers have different locale systems?

Title: Re: A new design for a JSON Parser
Post by: Hansaplast on December 23, 2019, 01:48:00 pm

Quote from: sysrpl on August 26, 2019, 09:12:55 am

I know the FCL already has a capable JSON parser, but I am writing some Amazon web service interfacing projects and wanted a smaller easier to use JSON parser to assist. I've create a new design for a JSON parser that is pretty small, yet powerful.

I've been tinkering with several JSON parsers, and just wanted to express my gratitude for this fast and very easy to use parser. It works great for my purposes.

Title: Re: A new design for a JSON Parser
Post by: mboxmas on January 21, 2020, 07:48:43 pm

Hi sysrpl,

For the following JSON data

{
"attr1": "value1",
"attr2": {
"attr21": "value21",
"attr22": "value22"
}
}

why the construct

for C in N do
WriteLn(C.Value);

for attrib1 yields only the value, but for attr2 yields the attribute-value pair?

Title: Re: A new design for a JSON Parser
Post by: soerensen3 on January 21, 2020, 09:37:55 pm

Is it just me or is your page getlazarus always redirecting to youtube?
Maybe the site has been hacked?

Title: Re: A new design for a JSON Parser
Post by: GAN on January 21, 2020, 10:12:55 pm

Quote from: soerensen3 on January 21, 2020, 09:37:55 pm

Is it just me or is your page getlazarus always redirecting to youtube?
Maybe the site has been hacked?

Yes, redirecting to youtube.
I checked the site:

Connecting to https://www.getlazarus.org
Exception: Unexpected response status code: 500
Status: 500

Title: Re: A new design for a JSON Parser
Post by: Hansaplast on January 22, 2020, 12:43:33 pm

Redirection here as well - sure seems "hacked" ...

For anyone is interested (I hope sysrpl doesn't mind);
I still have a copy of the JSONTools source and the documentation that I had found on getlazarus.org (saved as RTF).
This is a copy of 12/23/2019.

I'm using this version in one of my recent projects to read JSON files - I love its simplicity and its very good performance.

Title: Re: A new design for a JSON Parser
Post by: Thaddy on January 22, 2020, 01:54:24 pm

getlazarus has never been an official source.

Title: Re: A new design for a JSON Parser
Post by: soerensen3 on January 27, 2020, 02:04:33 pm

I really like it. Especially the possibility to walk nodes from child to parent and to use absolute paths is nice.
However I'm missing some features.

- You do not have the formatjson method to format the output with spaces which makes the output less readable.
- A GetJSON function would be nice which is trivial to implement.
- There is no possibility to add existing nodes to the tree (None I could find).

Title: Re: A new design for a JSON Parser
Post by: lainz on January 27, 2020, 05:14:07 pm

Quote from: Thaddy on January 22, 2020, 01:54:24 pm

getlazarus has never been an official source.

Well, yes for jsontools that is what we're talking about

But the source is at github
https://github.com/sysrpl/JsonTools/blob/master/jsontools.pas

Title: Re: A new design for a JSON Parser
Post by: GDean on July 11, 2020, 08:04:24 am

Quote from: sysrpl on August 26, 2019, 09:12:55 am

Any and all feedback is welcome.

Great work sysrpl. Checked speed and it was at least 30% faster than fpjson in my middle tier app. Some 1.6 seconds to return a string in 100,000 loops running on a i7 3770k.

Given my middle tier is parsing a lot of json data from vue front end, Its speed does make a lot of difference.

I have swapped over to yours :)

Thanks Glen

Title: Re: A new design for a JSON Parser
Post by: Awkward on July 11, 2020, 08:31:52 am

Yes, JSONTools is good but looks like unfinished and abandoned :(

Title: Re: A new design for a JSON Parser
Post by: alantelles on August 13, 2020, 07:17:04 am

Quote from: sysrpl on August 26, 2019, 09:12:55 am

I know the FCL already has a capable JSON parser, but I am writing some Amazon web service interfacing projects and wanted a smaller easier to use JSON parser to assist. I've create a new design for a JSON parser that is pretty small, yet powerful.

If your interested, I've posted the code under GPLv3 and a write up of my thought process and the workflow of using an single small class to work with JSON:

https://www.getlazarus.org/json/

Any and all feedback is welcome.

I used your parser to be the json to dictionary parser for my UltraGen language. Thanks! it was easy to use your parser to make the conversion.

https://github.com/alantelles/ultragen (https://github.com/alantelles/ultragen)

Thanx!

Title: Re: A new design for a JSON Parser
Post by: VTwin on October 15, 2020, 01:22:43 am

I have been using json to store program preferences, recently switching from xml. I have been working with a user from Costa Rica who reports floating point errors in Windows 10. I can not reproduce the issue myself, even when internationalizing to Costa Rica on my computer.

The error seems to have started in the version that changes over from xml to json, possibly a coincidence, but a clue I am following up.

In poking into the fpjson code I see it uses FloatToStr and TryStrToFloat which internationalize, assuming DefaultFormatSettings is initialized.

Getting to my question. It seems logical to me that json should use a standard float format that can be read regardless of location, such as output by the Str and Val functions. Is that addressed by any standard?

EDIT

I guess this answers my question:

https://www.json.org/json-en.html

I assume fpjson follows this convention by using the appropriate TFormatSettings? I'll poke around some more, but I'd appreciate confirmation if someone knows the answer.

Title: Re: A new design for a JSON Parser
Post by: BeniBela on October 15, 2020, 10:19:18 pm

Quote from: VTwin on October 15, 2020, 01:22:43 am

In poking into the fpjson code I see it uses FloatToStr and TryStrToFloat which internationalize, assuming DefaultFormatSettings is initialized.

Floating point numbers are really broken

Never use FloatToStr. Besides the format settings, it is printing 15 digit numbers, which is not enough to encode a double precisely. Use Str directly

Title: Re: A new design for a JSON Parser
Post by: Jvan on October 17, 2020, 03:55:02 am

How to get only the value of a json pair?

Code: Pascal [Select][+]

ShowMessage(myJson.Find('Data/SubData').AsJson);
 

but I get this:

Quote

"SubData":{...}

And what I want is:

Quote

{...}

Title: Re: A new design for a JSON Parser
Post by: VTwin on October 17, 2020, 05:11:13 pm

Quote from: BeniBela on October 15, 2020, 10:19:18 pm

Quote from: VTwin on October 15, 2020, 01:22:43 am

In poking into the fpjson code I see it uses FloatToStr and TryStrToFloat which internationalize, assuming DefaultFormatSettings is initialized.

Floating point numbers are really broken

Never use FloatToStr. Besides the format settings, it is printing 15 digit numbers, which is not enough to encode a double precisely. Use Str directly

Thanks for the reply. I was surprised to see FloatToStr and TryStrToFloat used in fpjson. I suspect this is causing the problem, but was having trouble pinning it down. I have been bitten by these before when trying to internationalize. I will likely go back to the hand-rolled xml code I was using previously.

Title: Re: A new design for a JSON Parser
Post by: VTwin on October 17, 2020, 08:05:42 pm

Actually in fpjson I see:

Code: Pascal [Select][+]

function TJSONString.GetAsFloat: TJSONFloat;
 
Var
  C : Integer;
 
begin
  Val(FValue,Result,C);
  If (C<>0) then
    If Not TryStrToFloat(FValue,Result) then
      Raise EConvertError.CreateFmt(SErrInvalidFloat,[FValue]);
end;

and

Code: Pascal [Select][+]

procedure TJSONString.SetAsFloat(const AValue: TJSONFloat);
begin
  FValue:=FloatToStr(AValue);
end;

So Val is called before trying TryStrToFloat, however FloatToStr is used instead of Str.

JsonTools uses StrToFloatDef and FloatToStr.

EDIT

I see that

FloatToStr(Value);

is equivalent to:

FloatToStrF(Value, ffGeneral,15, 0);

so may not be a problem, even if Str might be a better choice.

Title: Re: A new design for a JSON Parser
Post by: BeniBela on October 17, 2020, 11:20:58 pm

There is no good choice

Val/Str is also broken

If you parse '1.421085474167199E-14' with Val/TryStrToFloat you get a double that Str prints as 1.4210854741671992E-014

But that is an abbreviation, the actual value of the double is 1.42108547416719915783983492945642068425800459696706212753269937820732593536376953125E-014

Which is wrong!

Because the next smaller double is 1.421085474167198842295472841051698519566578483852570258250125334598124027252197265625E-014

Which is closer to 1.421085474167199E-14 than former double, so that is the one it should convert to

-

And FloatToStr makes an even bigger mess out of this by returning '1.4210854741672E-14'

Which is nowhere close to the doubles above

And is converted by Val and Str back to 1.4210854741672001E-014 (which is in this case correct)

Title: Re: A new design for a JSON Parser
Post by: VTwin on October 18, 2020, 05:38:32 pm

:o

Do you know of bug reports, or third party tools?

Title: Re: A new design for a JSON Parser
Post by: Hansaplast on October 19, 2020, 11:06:24 am

Quote from: VTwin on October 18, 2020, 05:38:32 pm

Do you know of bug reports, or third party tools?

For JSONTools, if that is what you are referring to, I believe you can report bugs here: JSONTools Github Issues (https://github.com/sysrpl/JsonTools/issues).

Title: Re: A new design for a JSON Parser
Post by: VTwin on October 20, 2020, 01:17:21 am

Quote from: Hansaplast on October 19, 2020, 11:06:24 am

Quote from: VTwin on October 18, 2020, 05:38:32 pm
Do you know of bug reports, or third party tools?

For JSONTools, if that is what you are referring to, I believe you can report bugs here: JSONTools Github Issues (https://github.com/sysrpl/JsonTools/issues).

I'd prefer to just use fpjson, rather than depend on an additional library.

Title: Re: A new design for a JSON Parser
Post by: BeniBela on October 20, 2020, 07:23:25 pm

Quote from: VTwin on October 18, 2020, 05:38:32 pm

Do you know of bug reports, or third party tools?

here: https://bugs.freepascal.org/view.php?id=29531

Title: Re: A new design for a JSON Parser
Post by: VTwin on October 20, 2020, 10:03:21 pm

Quote from: BeniBela on October 20, 2020, 07:23:25 pm

Quote from: VTwin on October 18, 2020, 05:38:32 pm

Do you know of bug reports, or third party tools?

here: https://bugs.freepascal.org/view.php?id=29531

Excellent! Thank you for looking into this issue.

Title: Re: A new design for a JSON Parser
Post by: abouchez on July 25, 2021, 06:49:39 pm

If you want a fast JSON parser for FPC, you may try what mORMot 2 offers.

Some numbers, parsing a JSON array of 8000 objects, for a bit more than 1MB:

Code: [Select]

  - JSON benchmark: 100,267 assertions passed  1.31s
     IsValidUtf8() in 16.63ms, 1.1 GB/s
     IsValidJson(RawUtf8) in 24.78ms, 790.8 MB/s
     IsValidJson(PUtf8Char) in 23.22ms, 843.9 MB/s
     JsonArrayCount(P) in 23.26ms, 842.7 MB/s
     JsonArrayCount(P,PMax) in 22.74ms, 862 MB/s
     JsonObjectPropCount() in 9.28ms, 1.1 GB/s
     TDocVariant in 140.43ms, 139.6 MB/s
     TDocVariant dvoInternNames in 156.73ms, 125 MB/s
     TOrmTableJson GetJsonValues in 24.98ms, 345.1 MB/s
     TOrmTableJson expanded in 37.36ms, 524.7 MB/s
     TOrmTableJson not expanded in 20.96ms, 411.2 MB/s
     fpjson in 810.40ms, 10.6 MB/s

In short, mORMot 2 JSON parser is from 13 times to 50 times faster than fpjson - and I guess JSON tools.

Title: Re: A new design for a JSON Parser
Post by: avk on July 26, 2021, 08:12:00 am

Have you tried your parser with this test suite (https://github.com/nst/JSONTestSuite)?

Title: Re: A new design for a JSON Parser
Post by: Gustavo 'Gus' Carreno on July 26, 2021, 07:59:07 pm

Hey A.Bouchez,

Quote from: abouchez on July 25, 2021, 06:49:39 pm

If you want a fast JSON parser for FPC, you may try what mORMot 2 offers.

Is there a link you can provide that gives a simple example on how to start with using mORMot 2's JSON parser only?
Something that will give you a simple set of instructions to only install the parser and not have to depend on the entirety of mORMot's code.

I would be eternally grateful for that!!

Cheers,
Gus

Title: Re: A new design for a JSON Parser
Post by: Okoba on July 27, 2021, 10:33:37 am

To get you started:
- Use mORMot2, and it has a package for Lazarus: https://github.com/synopse/mORMot2
- Remember that some methods are renamed in version 2, but read the comments, it always helps what you should use next
- Always read the comments, they have instructions
- Start with variant version as it is quick, easy and still very fast
- For a more structured code, use record or class way
- For record and class ways, you will hit some issues when you use custom types, you will need to register them like I did or register for custom events and other stuff. Read the blog and docs and search the forum if you need it. Almost are questions that are already answered.
- There are more JSON methods for arrays (eg JsonArrayCount) and custom field reading. Read the code for more info, but you probably will not need them for daily stuff.

- Forum: https://synopse.info/forum/viewforum.php?id=2
- Docs: https://synopse.info/files/html/Synopse%20mORMot%20Framework%20SAD%201.18.html#TITLE_237
- Blog: https://blog.synopse.info/?tag/JSON/

Here is a sample:

Code: Pascal [Select][+]

program project1;
 
{$mode objfpc}{$H+}
 
uses
  mormot.core.base,
  mormot.core.text,
  mormot.core.json,
  mormot.core.variants;
 
type
  TTestClass = class(TSynAutoCreateFields)
  private
    FX: Integer;
    FY: String;
    FZ: TBooleanDynArray;
  published
    property X: Integer read FX write FX;
    property Y: String read FY write FY;
    property Z: TBooleanDynArray read FZ write FZ;
  end;
 
  TTestRecord = packed record //Need to be packed
    X: Integer;
    Y: String;
    Z: TBooleanDynArray;
  end;
const
  __TTestRecord = 'X: Integer;  Y: String; Z: TBooleanDynArray';
 
  procedure Decode;
  var
    S: RawUtf8;
    J: Variant;
    V: array[0..1] of TValuePUTF8Char;
    C: TTestClass;
    R: TTestRecord;
  begin
    S := '{"X":1,Y:"Test",Z:[false,true]}';
    J := TDocVariant.NewJson(S);
 
    //Variant way
    WriteLn(J.X);
    WriteLn(J.Y);
    WriteLn(J.Z._(0));
 
    //TDocVariantData way
    WriteLn(TDocVariantData(J).S['Y']);
 
    //ObjectLoadJson Way
    C := TTestClass.Create;
    WriteLn(ObjectLoadJson(C, S));
    WriteLn(C.X);
    WriteLn(C.Z[0]);
    C.Free;
 
    RecordLoadJson(R, S, TypeInfo(TTestRecord));
    WriteLn(R.X);
    WriteLn(R.Z[0]);
 
    //JsonDecode way (Warning: Inplace and changes S)
    JsonDecode(S, ['X', 'Y'], @V);
    WriteLn(V[0].ToCardinal);
  end;
 
  procedure Encode;
  var
    C: TTestClass;
    R: TTestRecord;
  begin
    //ObjectToJson way
    C := TTestClass.Create;
    C.X := 1;
    C.Y := 'Test';
    C.Z := [False, True];
    WriteLn(ObjectToJson(C));
    C.Free;
 
    //RecordSaveJson way
    R.X := 1;
    R.Y := 'Test';
    R.Z := [False, True];
    WriteLn(RecordSaveJson(R, TypeInfo(TTestRecord)));
 
    //JsonEncode way
    WriteLn(JsonEncode(['X', 1, 'Y', 'Test', 'Z', '[', False, True, ']']));
  end;
 
begin
  //Only needed once
  TRttiJson.RegisterFromText(TypeInfo(TTestRecord), __TTestRecord, [], []);
 
  Decode;
  Encode;
  ReadLn;
end.

Title: Re: A new design for a JSON Parser
Post by: Gustavo 'Gus' Carreno on July 27, 2021, 04:42:26 pm

Hey Okoba,

Quote from: Okoba on July 27, 2021, 10:33:37 am

To get you started:
[...]
- Forum: https://synopse.info/forum/viewforum.php?id=2
- Docs: https://synopse.info/files/html/Synopse%20mORMot%20Framework%20SAD%201.18.html#TITLE_237
- Blog: https://blog.synopse.info/?tag/JSON/

This is freakin AWESOME, thank you SOOOO much Okoba!!!

I'll pour into all the code and blog posts you provided to get my head around the entirety of what is needed to wrap my head around a different paradigm of doing JSON.

I have to admit, that from the code you provided, it is quite a paradigm shift from the approach that fpjson takes you ;)

Again, thank you SOO much for all the detailed info!!

Cheers,
Gus

Title: Re: A new design for a JSON Parser
Post by: Okoba on July 27, 2021, 04:47:11 pm

Welcome!
If you like fpjson approach, you may like to use the Variant way.

Title: Re: A new design for a JSON Parser
Post by: Gustavo 'Gus' Carreno on July 27, 2021, 04:52:51 pm

Hey Okoba,

Quote from: Okoba on July 27, 2021, 04:47:11 pm

If you like fpjson approach, you may like to use the Variant way.

It's not that I like it per se. It's the fact that it's the only one I've been exposed to up til now. But I'll keep it in mind :)

I don't mind change and I'm actually really curious to learn this new approach, so again, many thanks for giving me a guide on how to tackle this new challenge ;)

Cheers,
Gus

Title: Re: A new design for a JSON Parser
Post by: alpine on July 27, 2021, 09:32:32 pm

Quote from: Okoba on July 27, 2021, 10:33:37 am

To get you started:
- Use mORMot2, and it has a package for Lazarus: https://github.com/synopse/mORMot2
- Remember that some methods are renamed in version 2, but read the comments, it always helps what you should use next
- Always read the comments, they have instructions
- Start with variant version as it is quick, easy and still very fast

May I politely ask what are the advantages of using a Variant instead of fpjson.TJSONData and descendants?

Quote from: Okoba on July 27, 2021, 10:33:37 am

- For a more structured code, use record or class way
- For record and class ways, you will hit some issues when you use custom types, you will need to register them like I did or register for custom events and other stuff.
*snip*

What is the point when we have fine fpjsonrtti unit with the TJSONStreamer and TJSONDeStreamer?

Sorry for being out of topic, but I don't really see a big difference.

Title: Re: A new design for a JSON Parser
Post by: engkin on July 27, 2021, 09:58:19 pm

Quote from: y.ivanov on July 27, 2021, 09:32:32 pm

I don't really see a big difference.

I am also interested. According to reply #43, it is 13 to 50 times faster. Would be nice to see the benchmark code.

Title: Re: A new design for a JSON Parser
Post by: Okoba on July 28, 2021, 04:42:50 am

Variant version is faster, not much for being variant, because the underlining JSON parsing of mORMot. Being variant makes it simpler to use to some tastes. If you need a more structured code, you should use the record or class way.
I am not much experienced with TJSONStreamer but the mORMot version, has options like:
- Auto creating and destroying fields (if you inherit from TSynAutoCreateFields
- Supports records
- Much more options for handling custom types, enums, comments, keyword names in JSON (type, class)

The key thing to choose between them is if you need more speed or more options or Delphi support, then mORMot seems the better option.

The benchmark code:
https://github.com/synopse/mORMot2/blob/087f740c577a0e38f83f8193874a343ed789fb46/test/test.core.data.pas#L2840

Title: Re: A new design for a JSON Parser
Post by: engkin on July 28, 2021, 04:51:02 am

Quote from: Okoba on July 28, 2021, 04:42:50 am

The benchmark code:
https://github.com/synopse/mORMot2/blob/087f740c577a0e38f83f8193874a343ed789fb46/test/test.core.data.pas#L2840

Thank you.

Title: Re: A new design for a JSON Parser
Post by: abouchez on July 28, 2021, 09:44:01 am

I tried to include jsontools to the benchmark.
I downloaded the current version from https://github.com/sysrpl/JsonTools

Sadly, this library doesn't seem very well tested.
TryParse('["XS\"\"\"."]') fails, whereas this is valid JSON.

After a quick fix, I run the benchmark tests:

Code: [Select]

  Some numbers on FPC 3.2 + Linux x86_64:
  - JSON benchmark: 100,299 assertions passed  810.30ms
     StrLen() in 820us, 23.3 GB/s
     IsValidUtf8(RawUtf8) in 1.46ms, 13 GB/s
     IsValidUtf8(PUtf8Char) in 2.23ms, 8.5 GB/s
     IsValidJson(RawUtf8) in 27.23ms, 719.8 MB/s
     IsValidJson(PUtf8Char) in 25.87ms, 757.6 MB/s
     JsonArrayCount(P) in 25.26ms, 775.9 MB/s
     JsonArrayCount(P,PMax) in 25.04ms, 783 MB/s
     JsonObjectPropCount() in 8.40ms, 1.3 GB/s
     TDocVariant in 118.81ms, 165 MB/s
     TDocVariant dvoInternNames in 145.08ms, 135.1 MB/s
     TOrmTableJson GetJsonValues in 22.88ms, 376.8 MB/s (write)
     TOrmTableJson expanded in 41.26ms, 475.1 MB/s
     TOrmTableJson not expanded in 21.44ms, 402.2 MB/s
     DynArrayLoadJson in 62.02ms, 316 MB/s
     fpjson in 79.36ms, 24.7 MB/s
     jsontools in 51.41ms, 38.1 MB/s
     SuperObject in 187.79ms, 10.4 MB/s

So mORMot 2 DynArrayLoadJson() is almost 10 times faster than jsontools, and TDocVariant is 5 times faster.

The fix is a dirty goto (the fastest to write):

Code: Pascal [Select][+]

  if C^ = '"'  then
  begin
    repeat
fix:  Inc(C);
      if C^ = '\' then
      begin
        Inc(C);
        if C^ = '"' then
          goto fix
        else if C^ = 'u' then
 

I would not use a library with so limited testing, anyway.

Title: Re: A new design for a JSON Parser
Post by: alpine on July 28, 2021, 09:53:51 am

@Okoba,
Thank you for the info.

Quote from: Okoba on July 28, 2021, 04:42:50 am

Variant version is faster, not much for being variant, because the underlining JSON parsing of mORMot. Being variant makes it simpler to use to some tastes.

By "simpler" I guess you mean writing J.X instead of C.Integers['X'], both of them require a lookup, but as the former depends on some compiler magic to skip quotes, the latter has at least a run-time type check. Both ways will require a Find('X') to ensure the attribute is present and there won't be a "bang".

So, the latter is for my taste, it's just not so crafty.

Quote from: Okoba on July 28, 2021, 04:42:50 am

If you need a more structured code, you should use the record or class way.
I am not much experienced with TJSONStreamer but the mORMot version, has options like:
- Auto creating and destroying fields (if you inherit from TSynAutoCreateFields

The mere existence of TSynAutoCreateFields is something that worries me. Hacking with the RTTI is a bummer and how it can be justified? What if RTTI layout changes? Portable?

Quote from: Okoba on July 28, 2021, 04:42:50 am

- Supports records
- Much more options for handling custom types, enums, comments, keyword names in JSON (type, class)
*snip*

IMHO that framework tends to shift Pascal paradigm to something dynamically-typed like i.e. Python, something I don't agree with. But that is my personal opinion.

Title: Re: A new design for a JSON Parser
Post by: abouchez on July 28, 2021, 10:11:17 am

Some hints:
- the mORMot custom variant type with is just a way of using it - you are not required to use late binding - and in fact, I prefer to use directly the TDocVariantData record and only typecast it into a variant when I want to transmit it as such;
- the mORMOt custom variant type is just a convenient way to store some object/array document, with built-in JSON support, and automatic memory management by the compiler, like any variant or record; the mORMot ORM also uses such document variants to store any JSON/BSON in a SQL/NoSQL database, or handle dynamic content from client/server SOA using interfaces; on Delphi (I hope with fpdebug soon) you can even see the JSON content when you inspect any such variant value in the debugger - much appreciated, and impossible to do with a class or an interface;
- the more "pascalish" is to use records and array of records and mORMot JSON serialization: there will be no lookup, minimal memory consumption, and best performance (>300MB/s instead of 24MB/s for fpjson), with no compiler magic - just plain efficient pascal code;
- mORMot doesn't change the RTTI - TSynAutoCreateFields is just a way to auto-initiate nested published classes instances in a class, which is very handy in some cases; what mORMot does, is to cache the RTTI for efficiency, and in a cross-platform way.

Title: Re: A new design for a JSON Parser
Post by: abouchez on July 28, 2021, 10:22:53 am

The \" parsing issue I found is known since october 2019.
https://github.com/sysrpl/JsonTools/issues/11

But the https://github.com/sysrpl/JsonTools/issues/12 decimal dot problem is even more concerning.

Title: Re: A new design for a JSON Parser
Post by: alpine on July 28, 2021, 10:40:17 am

Quote from: abouchez on July 28, 2021, 10:11:17 am

*snip*
- mORMot doesn't change the RTTI - TSynAutoCreateFields is just a way to auto-initiate nested published classes instances in a class, which is very handy in some cases; what mORMot does, is to cache the RTTI for efficiency, and in a cross-platform way.

I see.
You're building it, not changing it. Does it make a difference?

in mormot.core.json:

Code: Pascal [Select][+]

procedure AutoCreateFields(ObjectInstance: TObject);
var
  rtti: TRttiJson;
  n: integer;
  p: ^PRttiCustomProp;
begin
  // inlined ClassPropertiesGet
  rtti := PPointer(PPAnsiChar(ObjectInstance)^ + vmtAutoTable)^;
  if (rtti = nil) or
     not (rcfAutoCreateFields in rtti.Flags) then
    rtti := DoRegisterAutoCreateFields(ObjectInstance);
  p := pointer(rtti.fAutoCreateClasses);
  if p = nil then
    exit;
  // create all published class fields
  n := PDALen(PAnsiChar(p) - _DALEN)^ + _DAOFF; // length(AutoCreateClasses)
  repeat
    with p^^ do
      PPointer(PAnsiChar(ObjectInstance) + OffsetGet)^ :=
        TRttiJson(Value).fClassNewInstance(Value);
    inc(p);
    dec(n);
  until n = 0;
end;

and a lot of internals definitions in mormot.core.base.pas :

Code: Pascal [Select][+]

/// cross-compiler negative offset to TDynArrayRec.high/length field
  // - to be used inlined e.g. as
  // ! PDALen(PAnsiChar(Values) - _DALEN)^ + _DAOFF
  // - both FPC and Delphi uses PtrInt/NativeInt for dynamic array high/length
  _DALEN = SizeOf(TDALen);
 
  /// cross-compiler adjuster to get length from TDynArrayRec.high/length field
  _DAOFF = {$ifdef FPC} 1 {$else} 0 {$endif};
  
  /// cross-compiler negative offset to TDynArrayRec.refCnt field
  // - to be used inlined e.g. as PRefCnt(PAnsiChar(Values) - _DAREFCNT)^
  _DAREFCNT = Sizeof(TRefCnt) + _DALEN;
 
 // ... and a lot more FPC/Delphi internal layouts ... 
 

I believe those defs aren't for patching, right?

Title: Re: A new design for a JSON Parser
Post by: abouchez on July 28, 2021, 10:56:05 am

> You're building it, not changing it. Does it make a difference?

I am not sure I understand what you mean.
We are not building it, we are using it.
In the AutoCreateFields() we don't build anything, we just cache the RTTI and its published properties classes the first time we use this class.
Then fClassNewInstance() is a very efficient way of creating each needed class instance, with the proper virtual constructor if needed.

The FPC internal layouts are used to bypass the RTL when it makes a difference.
See mormot.core.rtti.pas about how we use the official typinfo unit as source, but encapsulate it into a Delphi/FPC compatible wrapper, and also introduce some RTTI cache as TRttiCustom/TRttiJson classes, with ready-to-use methods and settings.

mORMot users don't need to deal into those details. They just use the high level methods like JSON, ORM or SOA, letting the low level framework do its work.
Most of the low level code is deeply optimized, with a lot of pointer arithmetic for sure, sometimes with huge amount of asm (up to AVX2/BMI SIMD), but it is transparent to the user, and cross-platform.

If you look at the AutoCreateFields() function generated, once inlined into the class constructor, you will see:

Code: [Select]

MORMOT.CORE.JSON$_$TSYNAUTOCREATEFIELDS_$__$$_CREATE$$TSYNAUTOCREATEFIELDS PROC
        push    rbx                                     ; 0000 _ 53
.....
        mov     rax, qword ptr [rsp+8H]                 ; 0072 _ 48: 8B. 44 24, 08
        mov     rax, qword ptr [rax]                    ; 0077 _ 48: 8B. 00
        mov     rbx, qword ptr [rax+48H]                ; 007A _ 48: 8B. 58, 48
        test    rbx, rbx                                ; 007E _ 48: 85. DB
        jz      ?_2462                                  ; 0081 _ 74, 09
        test    dword ptr [rbx+3CH], 4000H              ; 0083 _ F7. 43, 3C, 00004000
        jnz     ?_2463                                  ; 008A _ 75, 0D
?_2462: mov     rdi, qword ptr [rsp+8H]                 ; 008C _ 48: 8B. 7C 24, 08
        call    MORMOT.CORE.JSON_$$_DOREGISTERAUTOCREATEFIELDS$TOBJECT$$TRTTIJSON; 0091 _ E8, 00000000(PLT r)
        mov     rbx, rax                                ; 0096 _ 48: 89. C3
?_2463: mov     r12, qword ptr [rbx+0DCH]               ; 0099 _ 4C: 8B. A3, 000000DC
        test    r12, r12                                ; 00A0 _ 4D: 85. E4
        jz      ?_2465                                  ; 00A3 _ 74, 35
        mov     rax, qword ptr [r12-8H]                 ; 00A5 _ 49: 8B. 44 24, F8
        lea     rbx, ptr [rax+1H]                       ; 00AA _ 48: 8D. 58, 01
ALIGN   8
?_2464: mov     r13, qword ptr [r12]                    ; 00B0 _ 4D: 8B. 2C 24
        mov     rdi, qword ptr [r13]                    ; 00B4 _ 49: 8B. 7D, 00
        mov     rax, qword ptr [r13]                    ; 00B8 _ 49: 8B. 45, 00
        call    qword ptr [rax+0D4H]                    ; 00BC _ FF. 90, 000000D4
        mov     rcx, qword ptr [rsp+8H]                 ; 00C2 _ 48: 8B. 4C 24, 08
        mov     rdx, qword ptr [r13+8H]                 ; 00C7 _ 49: 8B. 55, 08
        add     rdx, rcx                                ; 00CB _ 48: 01. CA
        mov     qword ptr [rdx], rax                    ; 00CE _ 48: 89. 02
        add     r12, 8                                  ; 00D1 _ 49: 83. C4, 08
        sub     ebx, 1                                  ; 00D5 _ 83. EB, 01
        jnz     ?_2464                                  ; 00D8 _ 75, D6
?_2465: mov     qword ptr [rsp+10H], 1                  ; 00DA _ 48: C7. 44 24, 10, 00000001
.....

The resulting asm is really optimized, as fast as it could be with manually written asm, even if it was written in plain pascal.
It may be confusing to read, but it is how we achieve best performance.
But it is still real cross-platform pascal, and the very same code works on ARM32 or AARCH64 with no problem, and good performance.

In the mORMot core, we use the pascal language as a "portable assembler", as C is used in the Linux kernel or SQlite3 library for instance.
It may be confusing, but it is similar to what is done is the lowest part of the FPC RTL.
This is how we achieved our JSON parsing to be magnitude times faster than FPC/Delphi alternatives, in plain pascal code: by looking deeply at the generated assembly and aggressively profiling the code, following https://www.agner.org/optimize reference material.

Title: Re: A new design for a JSON Parser
Post by: alpine on July 28, 2021, 11:29:03 am

Quote from: abouchez on July 28, 2021, 10:56:05 am

*snip*
The FPC internal layouts are used to bypass the RTL when it makes a difference.
See mormot.core.rtti.pas about how we use the official typinfo unit as source, but encapsulate it into a Delphi/FPC compatible wrapper, and also introduce some RTTI cache as TRttiCustom/TRttiJson classes, with ready-to-use methods and settings.

Patching variants, strings, dynarrays, bypassing RTL, caching RTTI (OK, you named it), alternate ways of creating instances - that is what I meant. In other words - hacks: https://en.wikipedia.org/wiki/Hack_(computer_science)

Quote from: abouchez on July 28, 2021, 10:56:05 am

mORMot users don't need to deal into those details. They just use the high level methods like JSON, ORM or SOA, letting the low level framework do its work.
*snip*

Until they hit the curb! I've been there.

Title: Re: A new design for a JSON Parser
Post by: sysrpl on July 28, 2021, 11:29:34 am

I am the author of JsonTools. I know guys it's been a long while since it was originally posted, but I will publish an update with fixes and some new features, including xpath like querying.

If you have any bug examples with JsonTools, or feature requests, please post them here. The update will likely be pushed this weekend, and I will provide a write up of the fixes and enhancements.

Anthony

Title: Re: A new design for a JSON Parser
Post by: engkin on July 28, 2021, 01:05:21 pm

Quote from: sysrpl on July 28, 2021, 11:29:34 am

If you have any bug examples with JsonTools, or feature requests, please post them here.

There is a small bug pointed out by Lainz here (https://forum.lazarus.freepascal.org/index.php/topic,54850.html).

Title: Re: A new design for a JSON Parser
Post by: avk on July 28, 2021, 01:15:44 pm

Hmm, maybe I don't understand something, but is this

Code: Javascript [Select][+]

  [ -01001, ,- , , ,42.e]
 

the valid JSON?
Anyway, Mormot.Core.Json.IsValidJson() claims yes.

Title: Re: A new design for a JSON Parser
Post by: PascalDragon on July 28, 2021, 01:19:05 pm

Quote from: y.ivanov on July 28, 2021, 09:53:51 am

Quote from: Okoba on July 28, 2021, 04:42:50 am
Variant version is faster, not much for being variant, because the underlining JSON parsing of mORMot. Being variant makes it simpler to use to some tastes.
By "simpler" I guess you mean writing J.X instead of C.Integers['X'], both of them require a lookup, but as the former depends on some compiler magic to skip quotes, the latter has at least a run-time type check. Both ways will require a Find('X') to ensure the attribute is present and there won't be a "bang".

There is no compiler magic that "skips quotes", but there is compiler magic that will replace J.X by (pseudocode) TDocVariantDataInstance.DispInvoke(@J, 'X'). So it is a bit more indirect, but in the end both will do the same to determine whether X exists.

Also this is not a hack, but a well defined feature of the Object Pascal language.

Title: Re: A new design for a JSON Parser
Post by: alpine on July 28, 2021, 01:39:49 pm

Quote from: PascalDragon on July 28, 2021, 01:19:05 pm

Quote from: y.ivanov on July 28, 2021, 09:53:51 am
Quote from: Okoba on July 28, 2021, 04:42:50 am
Variant version is faster, not much for being variant, because the underlining JSON parsing of mORMot. Being variant makes it simpler to use to some tastes.
By "simpler" I guess you mean writing J.X instead of C.Integers['X'], both of them require a lookup, but as the former depends on some compiler magic to skip quotes, the latter has at least a run-time type check. Both ways will require a Find('X') to ensure the attribute is present and there won't be a "bang".

There is no compiler magic that "skips quotes", but there is compiler magic that will replace J.X by (pseudocode) TDocVariantDataInstance.DispInvoke(@J, 'X'). So it is a bit more indirect, but in the end both will do the same to determine whether X exists.

Sorry if I didn't use correct wording, by 'skipping quotes' I meant calling fpc_dispinvoke_variant internally, instead of compile-time field address calculation, i.e. method calling, just like TJSONData.Integers['X']. Same as your explanation.

Quote from: PascalDragon on July 28, 2021, 01:19:05 pm

Also this is not a hack, but a well defined feature of the Object Pascal language.

I'm not calling that a hack.

Quote from: y.ivanov

Patching variants, strings, dynarrays, bypassing RTL, caching RTTI (OK, you named it), alternate ways of creating instances - that is what I meant. In other words - hacks

What I'm saying is that the mORMot source is full of hacks. All justified with one word: speed.

Title: Re: A new design for a JSON Parser
Post by: alpine on July 28, 2021, 03:08:25 pm

Quote from: avk on July 28, 2021, 01:15:44 pm

Hmm, maybe I don't understand something, but is this
Code: Javascript [Select][+][-]
[ -01001, ,- , , ,42.e]

the valid JSON?
Anyway, Mormot.Core.Json.IsValidJson() claims yes.

My findings:

Code: Pascal [Select][+]

  WriteLn(VariantSaveJSON(_Json('[ -01001, 42.e]')));    // [-1001,"42.e"]
  WriteLn(VariantSaveJSON(_Json('[ -01001, 42.0e]')));   // [-1001,42]
  WriteLn(VariantSaveJSON(_Json('[ -01001, 42.0.1e]'))); // [-1001,"42.0.1e"]
  WriteLn(VariantSaveJSON(_Json('[ -01001, 42.0e1]')));  // [-1001,42]
  WriteLn(VariantSaveJSON(_Json('[ -01001, 42.0e1,]'))); // null
  WriteLn(VariantSaveJSON(_Json('[ -01001, 42.0e1,false]'))); // [-1001,42,false]
  WriteLn(VariantSaveJSON(_Json('[ -01001, 42.0e1,FALSE]'))); // null 
  WriteLn(VariantSaveJSON(_Json('[ -01001, 42.0e1,0FALSE]'))); // [-1001,42,"0FALSE"] 

It accepts leading zeros in numbers (all lines) (!?)
It treats tokens starting with digit as strings when they are ill-formed numbers (lines 1,3,8) (!?!)
It accepts numbers with empty exponent (line 2) (!?!)

Edit:
It accepts leading zeros in numbers (all lines), because of wrong condition at:
https://github.com/synopse/mORMot2/blob/f2d748b39fd582a61d18e7972447724dbefb8a3b/src/core/mormot.core.variants.pas#L7345

It treats tokens starting with digit as strings when they are ill-formed numbers (lines 1,3,8), because of the default result at:
https://github.com/synopse/mORMot2/blob/f2d748b39fd582a61d18e7972447724dbefb8a3b/src/core/mormot.core.variants.pas#L7337
... and then ...
https://github.com/synopse/mORMot2/blob/f2d748b39fd582a61d18e7972447724dbefb8a3b/src/core/mormot.core.variants.pas#L7396
... or ...
https://github.com/synopse/mORMot2/blob/f2d748b39fd582a61d18e7972447724dbefb8a3b/src/core/mormot.core.variants.pas#L7413

Title: Re: A new design for a JSON Parser
Post by: avk on July 29, 2021, 05:59:54 am

Sometimes worse things can happen. If you feed Mormot.Core.Variants.JSONToVariant() JSON from an attachment, the parser will just silently die.

Edit:
At least in Win64, it really dies silently.
In Win32, it crashes with a yell "Out of memory".

Title: Re: A new design for a JSON Parser
Post by: abouchez on July 29, 2021, 08:09:59 am

@avk
Thanks for the feedback!

IsValidJson() is meant to be fast, not strict. It guesses if the layout seems JSON. It has false positive for sure, but valid JSON will always be seen as such. It is used before feeding some slower functions.
This is a limitation I will try to circumvent. A bit more paranoid may not hurt.

About [[[[[[[[[......[[[[[[[[[ it is a nice catch. It should reject it directly.
But the "out of memory" seems like a stack overflow problem due to the recursive calls handling the arrays. But not easy to fix I guess: switching from recursion to a state machine will be some work for sure.

About numbers or invalid numbers converted into strings, this was seen as a feature: the idea is to not loose information if the data can't be stored.
For instance, with TDocVariant double values are not converted by default. It parses only currency values (up to 4 decimals) which are always correct. You need to add the dvoAllowDoubleValue option when you create the variant.

Note that TextToVariantNumberType() is not involved in this conversion. But GetNumericVariantFromJson() is used for TDocVariant. I will try to make it more compliant: it rejects 0123 but not -0123 indeed.

Last hint is that the mORMot JSON parser can be very relaxed. On purpose.
For instance, it supports the MongoDB extended syntax:{a:1,b:2} or {first:date("2021/01/23")} is seens as valid JSON. We use this syntax between mORMot client and server to save bandwidth.
It was never meant to be a strict JSON parser. Just an efficient JSON parser which will try to reflect its input.

Title: Re: A new design for a JSON Parser
Post by: avk on July 29, 2021, 08:58:24 am

Quote from: abouchez on July 29, 2021, 08:09:59 am

...
About [[[[[[[[[......[[[[[[[[[ it is a nice catch.
...

I didn't invent it myself.
In my post #44 I asked you a question about an interesting test suite, you didn't answer, so I decided to find out myself.

Title: Re: A new design for a JSON Parser
Post by: alpine on July 29, 2021, 04:09:51 pm

Quote from: abouchez on July 25, 2021, 06:49:39 pm

If you want a fast JSON parser for FPC, you may try what mORMot 2 offers.

Some numbers, parsing a JSON array of 8000 objects, for a bit more than 1MB:

Code: [Select]
- JSON benchmark: 100,267 assertions passed 1.31s IsValidUtf8() in 16.63ms, 1.1 GB/s IsValidJson(RawUtf8) in 24.78ms, 790.8 MB/s IsValidJson(PUtf8Char) in 23.22ms, 843.9 MB/s JsonArrayCount(P) in 23.26ms, 842.7 MB/s JsonArrayCount(P,PMax) in 22.74ms, 862 MB/s JsonObjectPropCount() in 9.28ms, 1.1 GB/s TDocVariant in 140.43ms, 139.6 MB/s TDocVariant dvoInternNames in 156.73ms, 125 MB/s TOrmTableJson GetJsonValues in 24.98ms, 345.1 MB/s TOrmTableJson expanded in 37.36ms, 524.7 MB/s TOrmTableJson not expanded in 20.96ms, 411.2 MB/s fpjson in 810.40ms, 10.6 MB/s

I have certain doubts about the numbers written and their meanings. As far as I understand this is the output from the tests\mormot2tests program. For the specific tests cited, they don't do the same thing the last one does, i.e. full JSON parsing: fpjson := GetJSON(people, {utf8=}true).
In particular:

IsValidUtf8() - returns TRUE if the supplied buffer has valid UTF-8 encoding - irrelevant
IsValidJson(RawUtf8), IsValidJson(PUtf8Char) - returns true when argument looks like a JSON - irrelevant
JsonArrayCount() - does a flat scan, counting the elements in a JSON, which is assumed to be an array
JsonObjectPropCount() - counts number of fields in 1-st object into the JSON array
TOrmTableJson - does a partial split (1-st level) of a JSON, which is assumed to be an array

Throughput is calculated as length of the parsed data (* iterations) for the duration of execution, normalized in seconds. But how can they be compared when they do different things?

The only relevant test I see is TDocVariant. In the above figures TDocVariant is 139.6 MB/s vs 10.6 MB/s for fpjson. Let say that it is approx. 13:1 result.
At the other hand, I ran the recent mormot2tests as-is from the github. What I've got is:

Code: Text [Select][+]

- JSON benchmark: 100,293 assertions passed  3.13s
     StrLen() in 1.01ms, 18.8 GB/s
     IsValidUtf8(RawUtf8) in 14.84ms, 1.2 GB/s
     IsValidUtf8(PUtf8Char) in 17.48ms, 1 GB/s
     IsValidJson(RawUtf8) in 41.40ms, 473.4 MB/s
     IsValidJson(PUtf8Char) in 39.19ms, 500.2 MB/s
     JsonArrayCount(P) in 39.87ms, 491.6 MB/s
     JsonArrayCount(P,PMax) in 37.54ms, 522.2 MB/s
     JsonObjectPropCount() in 12.99ms, 873.4 MB/s
     TDocVariant in 1.22s, 15.9 MB/s
     TDocVariant dvoInternNames in 710.61ms, 27.5 MB/s
     TOrmTableJson GetJsonValues in 36.38ms, 237 MB/s
     TOrmTableJson expanded in 66.58ms, 294.4 MB/s
     TOrmTableJson not expanded in 41.52ms, 207.7 MB/s
     DynArrayLoadJson in 359.06ms, 54.6 MB/s
     fpjson in 480.29ms, 4 MB/s

Quote from: abouchez on July 25, 2021, 06:49:39 pm

In short, mORMot 2 JSON parser is from 13 times to 50 times faster than fpjson - and I guess JSON tools.

Figures are: TDocVariant is 15.9 MB/s vs 4 MB/s for fpjson. That is 4:1. Not 50:1! Not 13:1!
I think 4:1 is also impressive, but in no way same as 13-50 to 1.

Title: Re: A new design for a JSON Parser
Post by: abouchez on July 29, 2021, 05:11:15 pm

The FPC tests were done on Linux x86_64 with FPC 3.2.0 and -O3.
So it is properly inlined, and uses our x86_64 memory manager written in asm.
Which are the recommended settings for a production server. You get similar results on Win64.
I guess you are using Win32 with the default FPC memory manager.

Some numbers are indeed irrelevant when comparing to other parsers - but meaningful anyway, since they are used within the framework and I wanted to have wide benchmark values. For instance, IsValidUtf8() on x86_64 is around 13GB/s on my AVX2 CPU.

The relevant parsers, making a full parsing and extracting, are indeed:
- TDocVariant = object/array nodes parser storing values as variants, stored in a TDocVariantData record which could be mapped as a custom variant type;
- TOrmTableJson = array parser, optimized for an ORM list, with in-place unescape and #0 ending, creating a list of PUtf8Char to each value;
- DynArrayLoadJson = array parser, filling a dynamic array of records with all values.

As I wrote after this initial post, the initial numbers above were wrong about fpjson, due to an invalid length in the calculation. I have then published new values.

I have rewritten the JSON validator to be stronger and also slightly faster.
With the latest JSON validator (which I will commit in the next hours), here are some updated numbers:

Code: [Select]

 - JSON benchmark: 100,307 assertions passed  843.40ms
     StrLen() in 826us, 23.1 GB/s
     IsValidUtf8(RawUtf8) in 1.46ms, 13 GB/s
     IsValidUtf8(PUtf8Char) in 2.29ms, 8.3 GB/s
     IsValidJson(RawUtf8) in 20.74ms, 0.9 GB/s
     IsValidJson(PUtf8Char) in 20.95ms, 0.9 GB/s
     JsonArrayCount(P) in 20.12ms, 0.9 GB/s
     JsonArrayCount(P,PMax) in 19.97ms, 0.9 GB/s
     JsonObjectPropCount() in 10.98ms, 1 GB/s
     TDocVariant in 123.71ms, 158.4 MB/s
     TDocVariant dvoInternNames in 146.39ms, 133.9 MB/s
     TOrmTableJson GetJsonValues in 24.31ms, 354.5 MB/s
     TOrmTableJson expanded in 39.12ms, 501 MB/s
     TOrmTableJson not expanded in 20.89ms, 412.6 MB/s
     DynArrayLoadJson in 61.68ms, 317.8 MB/s
     fpjson in 79.39ms, 24.6 MB/s
     jsontools in 50.50ms, 38.8 MB/s
     SuperObject in 184.59ms, 10.6 MB/s

The numbers slightly change during each call on my Core i5 laptop, but the order of magnitude remains.
DynArrayLoadJson is almost 13 times faster than fpjson, and TOrmTableJson is 20 times faster. TDocVariant is "only" 6 times faster, and if you intern the names (i.e. you don't allocate a string for each property name, but reuse an existing string with refcnt + 1 which reduces a lot the memory usage) it is slightly below 6 times faster.
Since jsontools is faster then fpjson, the numbers are lower for it, but DynArrayLoadJson is still 8 times faster than jsontools, with a lot less memory consumption.

Title: Re: A new design for a JSON Parser
Post by: abouchez on July 29, 2021, 05:20:36 pm

Here are some numbers on Win32:

Code: [Select]

  - Encode decode JSON: 430,145 assertions passed  97.31ms
  - JSON benchmark: 100,307 assertions passed  1.03s
     StrLen() in 813us, 23.5 GB/s
     IsValidUtf8(RawUtf8) in 11.08ms, 1.7 GB/s
     IsValidUtf8(PUtf8Char) in 11.91ms, 1.6 GB/s
     IsValidJson(RawUtf8) in 23.44ms, 836.2 MB/s
     IsValidJson(PUtf8Char) in 21.99ms, 891.6 MB/s
     JsonArrayCount(P) in 21.29ms, 920.6 MB/s
     JsonArrayCount(P,PMax) in 21.38ms, 917 MB/s
     JsonObjectPropCount() in 10.54ms, 1 GB/s
     TDocVariant in 200.88ms, 97.6 MB/s
     TDocVariant dvoInternNames in 196.29ms, 99.8 MB/s
     TOrmTableJson GetJsonValues in 24.37ms, 353.9 MB/s
     TOrmTableJson expanded in 44.57ms, 439.8 MB/s
     TOrmTableJson not expanded in 30.68ms, 281 MB/s
     DynArrayLoadJson in 88.07ms, 222.6 MB/s
     fpjson in 74.31ms, 26.3 MB/s
     jsontools in 61.22ms, 32 MB/s
     SuperObject in 178.94ms, 10.9 MB/s

On Win32, DynArrayLoadJson is still 8 times faster than fpjson and 7 times faster than jsontools.

When you see that dvoInternNames is faster than the plain TDocVariant, we can guess that on Win32 the FPC memory manager becomes a bottleneck: our AesNiHash + cached string assignment is faster than direct string allocation.

Edit: Please check https://github.com/synopse/mORMot2/commit/10cef0c3edeee786707ac23aaa91c6e23842bce8 about the new state engine.
IsValidJson() GotoEndJsonItemStrict() JsonArrayCount() JsonObjectPropCount() are still relaxed on numbers, escaped strings and commas, but [[[[[...[[[[[[ will be detected, with an explicit Strict: boolean parameter if you really don't want to parse the MongoDB extended JSON syntax. On my laptop, the state machine reaches 900MB/s. Of course, it is not a full parser, but it is used e.g. by TDocVariant or DynArrayLoadJson to guess the number of items of the supplied JSON, to pre-allocate the buffers. I will now try to integrate it in other mORMot methods, instead of dedicated written code.

Title: Re: A new design for a JSON Parser
Post by: BeniBela on July 29, 2021, 05:58:47 pm

fpjson also has three different variants. jsonparser, jsonreader, and jsonscanner

You should benchmark them all.

Title: Re: A new design for a JSON Parser
Post by: alpine on July 29, 2021, 06:20:15 pm

@abouchez,
From your latest figures I can see that a more realistic TDocVariant vs fpjson ratio emerges: 3.711.You continue to compare items which do other things and were specially crafted for specific use cases. TDocVariant is the only thing that is comparable. Do you realize that you're throwing some unrelated numbers and after that you state a general conclusion below?

Title: Re: A new design for a JSON Parser
Post by: avk on July 29, 2021, 06:50:35 pm

It seems that the parsing speed of TDocVariant largely depends on the structure of the document. If we take, for example, this sample.json (https://code.google.com/archive/p/json-test-suite/downloads), then the benchmark results will be something like this:

Code: Text [Select][+]

sample.json:
 lgJson    171.6 MB/s
 JsonTools 0 MB/s
 FpJson    122.2 MB/s
 Mormot2   6.2 MB/s
 

JsonTools flatly refuses to parse this JSON.

TDocVariant also stopped parsing this JSON after the last commit.

Title: Re: A new design for a JSON Parser
Post by: abouchez on July 29, 2021, 09:22:13 pm

@y.ivanov
TDocVariant vs fpjson = 158.4 / 24.6 = 6.4 on x86_64 - please try at least on x86_64 with -O3
And DynArrayLoadJson is IMHO comparable to fpson - it is a very native and efficient way to parse some JSON and fill a dynamic array of records. The fact that fpjson create nodes is a technical detail.

@avk
I will look into your sample file.
The performance problem comes from a regression in JSON parsing, which I will fix.

Thanks you all for the feedback.

Title: Re: A new design for a JSON Parser
Post by: alpine on July 29, 2021, 09:55:48 pm

@abouchez,
Your figures from post #72:

Code: Text [Select][+]

  - Encode decode JSON: 430,145 assertions passed  97.31ms
  - JSON benchmark: 100,307 assertions passed  1.03s
     StrLen() in 813us, 23.5 GB/s
     IsValidUtf8(RawUtf8) in 11.08ms, 1.7 GB/s
     IsValidUtf8(PUtf8Char) in 11.91ms, 1.6 GB/s
     IsValidJson(RawUtf8) in 23.44ms, 836.2 MB/s
     IsValidJson(PUtf8Char) in 21.99ms, 891.6 MB/s
     JsonArrayCount(P) in 21.29ms, 920.6 MB/s
     JsonArrayCount(P,PMax) in 21.38ms, 917 MB/s
     JsonObjectPropCount() in 10.54ms, 1 GB/s
     TDocVariant in 200.88ms, 97.6 MB/s
     TDocVariant dvoInternNames in 196.29ms, 99.8 MB/s
     TOrmTableJson GetJsonValues in 24.37ms, 353.9 MB/s
     TOrmTableJson expanded in 44.57ms, 439.8 MB/s
     TOrmTableJson not expanded in 30.68ms, 281 MB/s
     DynArrayLoadJson in 88.07ms, 222.6 MB/s
     fpjson in 74.31ms, 26.3 MB/s
     jsontools in 61.22ms, 32 MB/s
     SuperObject in 178.94ms, 10.9 MB/s

97.6 MB/s divided by 26.3 MB/s = 3.711026615969582

DynArrayLoadJson loads an array. Not any valid JSON. It can't load the last avk sample, because it is an object at the top. It can't handle even '{}'!

Title: Re: A new design for a JSON Parser
Post by: abouchez on July 29, 2021, 10:58:04 pm

@y.ivanov
To read {} you can use RecordLoadJson of course: it is another pattern.
Did you run the tests on x86_64 with -O3 ? I don't cheat the numbers, just copy&paste from my terminal.
Post #72 was on Win32. The best numbers, and the one which matter most because it is for a server process, are on x86_64 with our memory manager (JSON parsing is always fast enough on client side). Our framework is specifically optimized for CPUs with a lot of registers (like x86_64 or ARM/AARCH64 - i386 lags behind).
On x86_64 the ratio is more than 6 times faster (159.4 / 23.6 = 6.754237288 for the last numbers I took):

Code: [Select]

     TDocVariant in 122.94ms, 159.4 MB/s
     TDocVariant no guess in 127.56ms, 153.7 MB/s
     TDocVariant dvoInternNames in 146.02ms, 134.2 MB/s
     fpjson in 82.91ms, 23.6 MB/s

@avk
The sample.json contains some floating point values, which are not read by default, because most of the time the precision is lost - only currency are read by default.
So to load it properly, you need to add the corresponding flag:

Code: Pascal [Select][+]

 dv.InitJson(people, JSON_OPTIONS_FAST + [dvoAllowDoubleValue]);

You are right: by default, TDocVariant is not good with a lot of nested documents (but who would create such a document?).
I have added a new parameter to disable the "count guess" optimization, which works well on small objects/arrays but not on such nested documents.

Code: Pascal [Select][+]

        dv.InitJson(sample, JSON_OPTIONS_FAST +
          [dvoAllowDoubleValue, dvoJsonParseDoNotGuessCount]);
 

And here are the numbers:

Code: [Select]

     TDocVariant sample.json in 38.94ms, 16.8 MB/s
     TDocVariant sample.json no guess in 31.93ms, 410.6 MB/s
     fpjson sample.json in 11.20ms, 116.9 MB/s

So with this option, TDocVariant is faster than fpjson.

Edit:
dvoJsonParseDoNotGuessCount option will now be forced by InitJson if a huge nest of objects is detected - this doesn't slow down standard content like people.json but dramatically enhance performance on some deeply nested documents like sample.json.
New numbers:

Code: [Select]

     TDocVariant sample.json in 1.70ms, 384.3 MB/s
     TDocVariant sample.json no guess in 30.77ms, 426 MB/s
     fpjson sample.json in 11.18ms, 117.2 MB/s

Thanks a lot for your feedback: it helps a lot!

Title: Re: A new design for a JSON Parser
Post by: alpine on July 30, 2021, 01:17:13 am

Quote from: abouchez on July 29, 2021, 10:58:04 pm

@y.ivanov
To read {} you can use RecordLoadJson of course: it is another pattern.

Both "patterns", as you call them, are included in RFC8259. So, your routines are fast, but only on a half of the specification.

Quote from: abouchez on July 29, 2021, 10:58:04 pm

Did you run the tests on x86_64 with -O3 ? I don't cheat the numbers, just copy&paste from my terminal.
Post #72 was on Win32.

I wouldn't try it. On the contrary - I intent to disable as much of your optimizations, inline assembly and other 'hacks' and to evaluate what impact they have at overall. My initial guess is that they speed-up no more than 20-30%. Not by x13-x50.

Quote from: abouchez on July 29, 2021, 10:58:04 pm

The best numbers, and the one which matter most because it is for a server process, are on x86_64 with our memory manager (JSON parsing is always fast enough on client side). Our framework is specifically optimized for CPUs with a lot of registers (like x86_64 or ARM/AARCH64 - i386 lags behind).

You didn't mention those requirements (64-bit, your own memory manager) with your initial claims of x13-50 times supremacy over fpjson.

Quote from: abouchez on July 29, 2021, 10:58:04 pm

On x86_64 the ratio is more than 6 times faster (159.4 / 23.6 = 6.754237288 for the last numbers I took):
*snip*

Good. Now we're arguing about 4-6 times against fpjson. What is the reduction over your initial claim? Tenfold?

IMHO your extensive use of adjectives, such as: "specifically optimized", "dramatically enhance", "a very native and efficient way", etc. won't help much and rather irritate like a TV commercial.

I am very well aware why your 'parser' routines are faster than fpjson and actually how much faster they can be, so please, don't impose such untrue statements as of post #43.

Title: Re: A new design for a JSON Parser
Post by: abouchez on July 30, 2021, 08:34:33 am

@y.ivanov
You can do whatever you want. If you require slow code and run everything in -O0 on a Z80 abusing of slow IX/IY registers you can of course: my first computer was a 1MHz ZX81 and I wrote on asm on it by poking hexa in REM (!), so I found that CP/M and pascal in your signature was a bit too fast and lazy. ;)

But it is not what we do on production. We need to host as many clients as possible per server. This helps a lot - and also the planet by being more "green".
I specifically wrote where the numbers come from, either Linux x86_64 and our memory manager, or Win32 and the FPC memory manager.
The initial post #43 had an issue about the JSON length used to benchmark fpjson. As soon as I discovered that, I claimed there was a mistake, have been very sorry about it, fixed it, and provided new numbers.

I am sorry if you don't like "marketing" stuff, but I also provided the numbers.
To not be too much marketing, I propose we take a look at the big picture of JSON and its real use.

I think the user/consumer/practical point of view is the point. JSON is needed for real work, not for benchmarks or just validation.
- If the user needs to parse some input, it is most likely that the structure is known. So a dynamic array and DynArrayLoadJson is very easy and efficient, to fill record or dynamic array of class instance (yes it works too with mORMot).
- We use JSON as basic data representation in our ORM, between the objects and the data provider, and also when publishing a DB over REST: we introduced a direct DB access writing JSON from the SQL/NoSQL drivers with no TDataSet in between, for both reading (lists) and writing (even in batch insert/update mode): this is why our ORM could be so fast - faster than TDataSet for individual objects SELECT for instance, or bulk insertion.
- We use JSON as basic data representation in our SOA, for interface-based services. If you know the data structure, any class or record or array would be serialized very efficiently, without creating nodes in memory, but directly filling data structures. If you don't know the data structure, you specify a variant parameter and the JSON will be parsed or written using a TDocVariant custom variant instance.
- On whatever use we may imagine, one huge point about performance is to reduce memory allocation, especially on multi-threaded server process. The whole mORMot code tends to allocate/reallocate as little as possible, to leverage the performance. This is also why we wrote a dedicated MM for x86_64, and why we added unique features like field names (or values) text interning as an option. Because it helps in the real world, even if it is slightly slower in micro benchmarks.

A "JSON library to rule them all" is not so useful in practice. Only in benchmarks you need to parse anything with no context.
If you need to create fpjson or jsontools classes by hand when writing SOA server or client side, it is less convenient than a set of dedicated JSON engines as mORMot does. In fact, even if the nodes are automatically created by the SOA library, it is always slower than a dedicated engine.
So in respect to production code, where performance matters, which is mainly on server side, putting TOrmTableJson results in a benchmark does make sense. And TOrmTableJson is definitively 20 times faster parsing a typical JSON ORM/REST result than fpjson on x86_64. And if the DB emits "not expanded" JSON (array of values instead of array of object), it is 40 times faster in practice because there is less JSON to parse for the same dataset. This 40 times factor is a fact for this realistic dataset.

So here are this morning numbers, on Debian x86_64, from FPC 3.2.0 in -O3 mode:

Code: [Select]

(people.json = array of 8227 ORM objects, for 1MB file)
     StrLen() in 828us, 23.1 GB/s
     IsValidUtf8(RawUtf8) in 1.45ms, 13.1 GB/s
     IsValidUtf8(PUtf8Char) in 2.21ms, 8.6 GB/s
     IsValidJson(RawUtf8) in 21.36ms, 917.7 MB/s
     IsValidJson(PUtf8Char) in 20.63ms, 0.9 GB/s
     JsonArrayCount(P) in 19.70ms, 0.9 GB/s
     JsonArrayCount(P,PMax) in 20.14ms, 0.9 GB/s
     JsonObjectPropCount() in 10.54ms, 1 GB/s

     TDocVariant in 121.68ms, 161.1 MB/s
     TDocVariant no guess in 127.85ms, 153.3 MB/s
     TDocVariant dvoInternNames in 147.27ms, 133.1 MB/s

     TOrmTableJson expanded in 37.57ms, 521.8 MB/s
     TOrmTableJson not expanded in 20.36ms, 423.5 MB/s
 (here the time is relevant because the JSON size is smaller: 20.36 ms instead of 777.7 ms)

     DynArrayLoadJson in 62.38ms, 314.3 MB/s

     fpjson in 77.78ms, 25.2 MB/s
 (run 10 times less because it is slower - and yes, the length is also div 10 and correct I hope)

(sample.json with a lot of nested documents)
     TDocVariant sample.json in 32.32ms, 405.7 MB/s
     TDocVariant sample.json no guess in 31.93ms, 410.6 MB/s
     fpjson sample.json in 11.25ms, 116.4 MB/s

Title: Re: A new design for a JSON Parser
Post by: sysrpl on July 31, 2021, 08:50:19 pm

An update to JsonTools has been posted to its github page. The escaped double quoted string issue has been fixed and some helpful methods have been added. This page summarizes the changes:

https://www.getlazarus.org/json/#update

Title: Re: A new design for a JSON Parser
Post by: avk on August 01, 2021, 01:12:56 pm

@sysrpl, did I understand correctly, if any key in JSON contains a slash, then it will be impossible to find this key using TJsonNode.Find()?

Title: Re: A new design for a JSON Parser
Post by: sysrpl on August 01, 2021, 03:59:20 pm

Quote from: avk on August 01, 2021, 01:12:56 pm

@sysrpl, did I understand correctly, if any key in JSON contains a slash, then it will be impossible to find this key using TJsonNode.Find()?

As it is currently yes. The forward slash is used as a name separator, much like with XPATH. If you wanted to use a free text search using any keys, then you'd need to provide a list of string keys.

For example:

Code: Pascal [Select][+]

function JsonFindKeys(N: TJsonNode; Keys: array of string): TJsonNode;
var
  I: Integer;
begin
  Result := nil;
  for I := 0 to Length(Keys) - 1 do
  begin
    N := N.Child(Keys[I]);
    if N = nil then
      Break;
  end;
  Result := N;
end;   
 

Usage:

Code: Pascal [Select][+]

S := JsonFindKeys(N, ['customer', 'first']).AsString;

If you want I can add this form of Find as a method overload.

Title: Re: A new design for a JSON Parser
Post by: avk on August 01, 2021, 04:16:35 pm

No thanks, I just wanted to clarify.

Title: Re: A new design for a JSON Parser
Post by: zoltanleo on August 04, 2021, 07:40:10 am

Hi sysrpl

I want to express my deep gratitude for the wonderful json parser. I have taken the liberty of translating this manual (https://www.getlazarus.org/json/) into Russian (https://github.com/zoltanleo/translations/blob/master/JSONTool/JSON%20Tools%20for%20Pascal.md). I hope this will make your module more popular and motivate you for new projects. ;)

Title: Re: A new design for a JSON Parser
Post by: zoltanleo on August 05, 2021, 10:41:54 pm

Hi sysrpl

Now I check the contents of the file for validity json by loading its contents into the stringlist. If TryToParse function returns true, then I then upload the contents of the file to the finished node.

Code: Pascal [Select][+]

var
  RootNode: TJsonNode = nil;
  SL: TStringList = nil;
begin
  RootNode:= TJsonNode.Create;
  SL:= TStringList.Create;
  try
    try
      //check validity of the file contents
      if FileExistsUTF8(ExtractFilePath(Application.ExeName) + jsonfile) then
      begin
        SL.LoadFromFile(ExtractFilePath(Application.ExeName) + jsonfile);
 
        //if file content matches valid json then load it
        if RootNode.TryParse(SL.Text)
          then RootNode.LoadFromFile(ExtractFilePath(Application.ExeName) + jsonfile);
      end;
 
      with RootNode do
      begin
        //some useful work
      end;
 
    finally
      FreeAndNil(SL);
      RootNode.Free;
    end;
  except
    on E:Exception do
    ShowMessage(Format('Error: %s' + LineEnding + LineEnding + '%s',
                    [E.Message, SysErrorMessageUTF8(GetLastOSError)]));
  end;                                                                       

Can I ask you to add a function (something like TryToParseLoadingFile(const aFileName: string; out aNode: TJsonNode): boolean) that will combine these two operations? If the function returns true, you can read the parameters from the node aNode: TJsonNode.

Title: Re: A new design for a JSON Parser
Post by: zoltanleo on August 06, 2021, 09:11:03 pm

Hi all

Please tell me how can I save all the elements of a list as an array to a json file using a for..to loop?

Title: Re: A new design for a JSON Parser
Post by: engkin on August 06, 2021, 09:54:37 pm

Add a node of type array where you need it:

Code: Pascal [Select][+]

  Arr:=rootNode.Add('testArray',nkArray);

Now, simply use add, ignoring the first param, to add the values one by one:

Code: Pascal [Select][+]

  Arr.Add('',s);

Quick test:

Code: Pascal [Select][+]

program project1;
 
{$mode objfpc}{$H+}
 
uses
  {$IFDEF UNIX}{$IFDEF UseCThreads}
  cthreads,
  {$ENDIF}{$ENDIF}
  Classes,SysUtils,
  JsonTools
  { you can add units after this };
 
var
  j,Arr:TJSONNode;
  s:string;
  sl:TStringList;
  i:integer;
begin
  {Test data}
  sl:=TStringList.Create;
  for i:=1 to 10 do
    sl.Add('Test'+i.ToString);
 
  j:=TJSONNode.Create;//nkObject by defalut
  Arr:=j.Add('TestArray',nkArray);
  for s in sl do
    Arr.add('',s);
 
  j.SaveToFile('test.json');
 
  j.Free;
  sl.Free;
end.

The file it generates is:

Quote

{
   "TestArray": [
      "Test1",
      "Test2",
      "Test3",
      "Test4",
      "Test5",
      "Test6",
      "Test7",
      "Test8",
      "Test9",
      "Test10"
   ]
}

Title: Re: A new design for a JSON Parser
Post by: zoltanleo on August 06, 2021, 10:29:48 pm

Quote from: engkin on August 06, 2021, 09:54:37 pm

Now, simply use add, ignoring the first param, to add the values one by one:
Code: Pascal [Select][+][-]
Arr.Add('',s);

Hi engkin.
Thank U for the answer. I didn't know there was such a way.

Thanks again!

Title: Re: A new design for a JSON Parser
Post by: sysrpl on August 11, 2021, 11:18:23 am

Another way to add an array of string nodes would be:

Code: Pascal [Select][+]

procedure AppendStrings(Node: TJsonNode; Strings: TStrings);
var
  S: string;
begin
  Node := Node.Add('Strings').AsArray;
  for S in Strings do
    Node.Add.AsString := S;
end;

If Strings contained `apple, banana, cucumber, doughnut` then after using the `AppendStrings` code above Node would at minumum look like:

Code: Text [Select][+]

{ "Strings": ["apple", "banana", "cucumber", "doughnut"] }

Title: Re: A new design for a JSON Parser
Post by: zoltanleo on August 11, 2021, 04:49:07 pm

Hi sysrpl.

Thank a lot. I try to use it.

Upd: I already wanted to be glad that you have added a new convenient procedure to the module. I was wrong.
I found a workaround a long time ago by following @engkin's advice. I was hoping you would add one more to some useful features. :-[