Recent

Author Topic: A new design for a JSON Parser  (Read 34480 times)

avk

  • Sr. Member
  • ****
  • Posts: 492
    • my self-education project
Re: A new design for a JSON Parser
« Reply #75 on: July 29, 2021, 06:50:35 pm »
It seems that the parsing speed of TDocVariant largely depends on the structure of the document. If we take, for example, this sample.json, then the benchmark results will be something like this:
Code: Text  [Select][+][-]
  1. sample.json:
  2.  lgJson    171.6 MB/s
  3.  JsonTools 0 MB/s
  4.  FpJson    122.2 MB/s
  5.  Mormot2   6.2 MB/s
  6.  

JsonTools flatly refuses to parse this JSON.

TDocVariant also stopped parsing this JSON after the last commit.

abouchez

  • New Member
  • *
  • Posts: 31
Re: A new design for a JSON Parser
« Reply #76 on: July 29, 2021, 09:22:13 pm »
@y.ivanov
TDocVariant vs fpjson = 158.4 / 24.6 = 6.4 on x86_64 - please try at least on x86_64 with -O3
And DynArrayLoadJson is IMHO comparable to fpson - it is a very native and efficient way to parse some JSON and fill a dynamic array of records. The fact that fpjson create nodes is a technical detail.

@avk
I will look into your sample file.
The performance problem comes from a regression in JSON parsing, which I will fix.

Thanks you all for the feedback.

y.ivanov

  • Sr. Member
  • ****
  • Posts: 284
Re: A new design for a JSON Parser
« Reply #77 on: July 29, 2021, 09:55:48 pm »
@abouchez,
Your figures from post #72:
Code: Text  [Select][+][-]
  1.   - Encode decode JSON: 430,145 assertions passed  97.31ms
  2.   - JSON benchmark: 100,307 assertions passed  1.03s
  3.      StrLen() in 813us, 23.5 GB/s
  4.      IsValidUtf8(RawUtf8) in 11.08ms, 1.7 GB/s
  5.      IsValidUtf8(PUtf8Char) in 11.91ms, 1.6 GB/s
  6.      IsValidJson(RawUtf8) in 23.44ms, 836.2 MB/s
  7.      IsValidJson(PUtf8Char) in 21.99ms, 891.6 MB/s
  8.      JsonArrayCount(P) in 21.29ms, 920.6 MB/s
  9.      JsonArrayCount(P,PMax) in 21.38ms, 917 MB/s
  10.      JsonObjectPropCount() in 10.54ms, 1 GB/s
  11.      TDocVariant in 200.88ms, 97.6 MB/s
  12.      TDocVariant dvoInternNames in 196.29ms, 99.8 MB/s
  13.      TOrmTableJson GetJsonValues in 24.37ms, 353.9 MB/s
  14.      TOrmTableJson expanded in 44.57ms, 439.8 MB/s
  15.      TOrmTableJson not expanded in 30.68ms, 281 MB/s
  16.      DynArrayLoadJson in 88.07ms, 222.6 MB/s
  17.      fpjson in 74.31ms, 26.3 MB/s
  18.      jsontools in 61.22ms, 32 MB/s
  19.      SuperObject in 178.94ms, 10.9 MB/s

97.6 MB/s divided by 26.3 MB/s = 3.711026615969582

DynArrayLoadJson loads an array. Not any valid JSON. It can't load the last avk sample, because it is an object at the top. It can't handle even '{}'!
« Last Edit: July 29, 2021, 09:59:58 pm by y.ivanov »

abouchez

  • New Member
  • *
  • Posts: 31
Re: A new design for a JSON Parser
« Reply #78 on: July 29, 2021, 10:58:04 pm »
@y.ivanov
To read {} you can use RecordLoadJson of course: it is another pattern.
Did you run the tests on x86_64 with -O3 ? I don't cheat the numbers, just copy&paste from my terminal.
Post #72 was on Win32. The best numbers, and the one which matter most because it is for a server process, are on x86_64 with our memory manager (JSON parsing is always fast enough on client side). Our framework is specifically optimized for CPUs with a lot of registers (like x86_64 or ARM/AARCH64 - i386 lags behind).
On x86_64 the ratio is more than 6 times faster (159.4 / 23.6 = 6.754237288 for the last numbers I took):
Code: [Select]
     TDocVariant in 122.94ms, 159.4 MB/s
     TDocVariant no guess in 127.56ms, 153.7 MB/s
     TDocVariant dvoInternNames in 146.02ms, 134.2 MB/s
     fpjson in 82.91ms, 23.6 MB/s
 

@avk
The sample.json contains some floating point values, which are not read by default, because most of the time the precision is lost - only currency are read by default.
So to load it properly, you need to add the corresponding flag:
Code: Pascal  [Select][+][-]
  1.  dv.InitJson(people, JSON_OPTIONS_FAST + [dvoAllowDoubleValue]);
You are right: by default, TDocVariant is not good with a lot of nested documents (but who would create such a document?).
I have added a new parameter to disable the "count guess" optimization, which works well on small objects/arrays but not on such nested documents.
Code: Pascal  [Select][+][-]
  1.         dv.InitJson(sample, JSON_OPTIONS_FAST +
  2.           [dvoAllowDoubleValue, dvoJsonParseDoNotGuessCount]);
  3.  
And here are the numbers:
Code: [Select]
     TDocVariant sample.json in 38.94ms, 16.8 MB/s
     TDocVariant sample.json no guess in 31.93ms, 410.6 MB/s
     fpjson sample.json in 11.20ms, 116.9 MB/s
So with this option, TDocVariant is faster than fpjson.

Edit:
dvoJsonParseDoNotGuessCount option will now be forced by InitJson if a huge nest of objects is detected - this doesn't slow down standard content like people.json but dramatically enhance performance on some deeply nested documents like sample.json.
New numbers:
Code: [Select]
     TDocVariant sample.json in 1.70ms, 384.3 MB/s
     TDocVariant sample.json no guess in 30.77ms, 426 MB/s
     fpjson sample.json in 11.18ms, 117.2 MB/s

Thanks a lot for your feedback: it helps a lot!
« Last Edit: July 29, 2021, 11:32:16 pm by abouchez »

y.ivanov

  • Sr. Member
  • ****
  • Posts: 284
Re: A new design for a JSON Parser
« Reply #79 on: July 30, 2021, 01:17:13 am »
@y.ivanov
To read {} you can use RecordLoadJson of course: it is another pattern.
Both "patterns", as you call them, are included in RFC8259. So, your routines are fast, but only on a half of the specification.

Did you run the tests on x86_64 with -O3 ? I don't cheat the numbers, just copy&paste from my terminal.
Post #72 was on Win32.
I wouldn't try it. On the contrary - I intent to disable as much of your optimizations, inline assembly and other 'hacks' and to evaluate what impact they have at overall. My initial guess is that they speed-up no more than 20-30%. Not by x13-x50.

The best numbers, and the one which matter most because it is for a server process, are on x86_64 with our memory manager (JSON parsing is always fast enough on client side). Our framework is specifically optimized for CPUs with a lot of registers (like x86_64 or ARM/AARCH64 - i386 lags behind).
You didn't mention those requirements (64-bit, your own memory manager) with your initial claims of x13-50 times supremacy over fpjson.   

On x86_64 the ratio is more than 6 times faster (159.4 / 23.6 = 6.754237288 for the last numbers I took):
*snip*
Good. Now we're arguing about 4-6 times against fpjson. What is the reduction over your initial claim? Tenfold?

IMHO your extensive use of adjectives, such as: "specifically optimized", "dramatically enhance", "a very native and efficient way", etc. won't help much and rather irritate like a TV commercial.

I am very well aware why your 'parser' routines are faster than fpjson and actually how much faster they can be, so please, don't impose such untrue statements as of post #43.

abouchez

  • New Member
  • *
  • Posts: 31
Re: A new design for a JSON Parser
« Reply #80 on: July 30, 2021, 08:34:33 am »
@y.ivanov
You can do whatever you want. If you require slow code and run everything in -O0 on a Z80 abusing of slow IX/IY registers you can of course: my first computer was a 1MHz ZX81 and I wrote on asm on it by poking hexa in REM (!), so I found that CP/M and pascal in your signature was a bit too fast and lazy. ;)

But it is not what we do on production. We need to host as many clients as possible per server. This helps a lot - and also the planet by being more "green".
I specifically wrote where the numbers come from, either Linux x86_64 and our memory manager, or Win32 and the FPC memory manager.
The initial post #43 had an issue about the JSON length used to benchmark fpjson. As soon as I discovered that, I claimed there was a mistake, have been very sorry about it, fixed it, and provided new numbers.

I am sorry if you don't like "marketing" stuff, but I also provided the numbers.
To not be too much marketing, I propose we take a look at the big picture of JSON and its real use.

I think the user/consumer/practical point of view is the point. JSON is needed for real work, not for benchmarks or just validation.
- If the user needs to parse some input, it is most likely that the structure is known. So a dynamic array and DynArrayLoadJson is very easy and efficient, to fill record or dynamic array of class instance (yes it works too with mORMot).
- We use JSON as basic data representation in our ORM, between the objects and the data provider, and also when publishing a DB over REST: we introduced a direct DB access writing JSON from the SQL/NoSQL drivers with no TDataSet in between, for both reading (lists) and writing (even in batch insert/update mode): this is why our ORM could be so fast - faster than TDataSet for individual objects SELECT for instance, or bulk insertion.
- We use JSON as basic data representation in our SOA, for interface-based services. If you know the data structure, any class or record or array would be serialized very efficiently, without creating nodes in memory, but directly filling data structures. If you don't know the data structure, you specify a variant parameter and the JSON will be parsed or written using a TDocVariant custom variant instance.
- On whatever use we may imagine, one huge point about performance is to reduce memory allocation, especially on multi-threaded server process. The whole mORMot code tends to allocate/reallocate as little as possible, to leverage the performance. This is also why we wrote a dedicated MM for x86_64, and why we added unique features like field names (or values) text interning as an option. Because it helps in the real world, even if it is slightly slower in micro benchmarks.

A "JSON library to rule them all" is not so useful in practice. Only in benchmarks you need to parse anything with no context.
If you need to create fpjson or jsontools classes by hand when writing SOA server or client side, it is less convenient than a set of dedicated JSON engines as mORMot does. In fact, even if the nodes are automatically created by the SOA library, it is always slower than a dedicated engine.
So in respect to production code, where performance matters, which is mainly on server side, putting TOrmTableJson results in a benchmark does make sense. And TOrmTableJson is definitively 20 times faster parsing a typical JSON ORM/REST result than fpjson on x86_64. And if the DB emits "not expanded" JSON (array of values instead of array of object), it is 40 times faster in practice because there is less JSON to parse for the same dataset. This 40 times factor is a fact for this realistic dataset.

So here are this morning numbers, on Debian x86_64, from FPC 3.2.0 in -O3 mode:
Code: [Select]
(people.json = array of 8227 ORM objects, for 1MB file)
     StrLen() in 828us, 23.1 GB/s
     IsValidUtf8(RawUtf8) in 1.45ms, 13.1 GB/s
     IsValidUtf8(PUtf8Char) in 2.21ms, 8.6 GB/s
     IsValidJson(RawUtf8) in 21.36ms, 917.7 MB/s
     IsValidJson(PUtf8Char) in 20.63ms, 0.9 GB/s
     JsonArrayCount(P) in 19.70ms, 0.9 GB/s
     JsonArrayCount(P,PMax) in 20.14ms, 0.9 GB/s
     JsonObjectPropCount() in 10.54ms, 1 GB/s

     TDocVariant in 121.68ms, 161.1 MB/s
     TDocVariant no guess in 127.85ms, 153.3 MB/s
     TDocVariant dvoInternNames in 147.27ms, 133.1 MB/s

     TOrmTableJson expanded in 37.57ms, 521.8 MB/s
     TOrmTableJson not expanded in 20.36ms, 423.5 MB/s
 (here the time is relevant because the JSON size is smaller: 20.36 ms instead of 777.7 ms)

     DynArrayLoadJson in 62.38ms, 314.3 MB/s

     fpjson in 77.78ms, 25.2 MB/s
 (run 10 times less because it is slower - and yes, the length is also div 10 and correct I hope)

(sample.json with a lot of nested documents)
     TDocVariant sample.json in 32.32ms, 405.7 MB/s
     TDocVariant sample.json no guess in 31.93ms, 410.6 MB/s
     fpjson sample.json in 11.25ms, 116.4 MB/s
« Last Edit: July 30, 2021, 01:38:23 pm by abouchez »

sysrpl

  • Full Member
  • ***
  • Posts: 237
Re: A new design for a JSON Parser
« Reply #81 on: July 31, 2021, 08:50:19 pm »
An update to JsonTools has been posted to its github page. The escaped double quoted string issue has been fixed and some helpful methods have been added. This page summarizes the changes:

https://www.getlazarus.org/json/#update

avk

  • Sr. Member
  • ****
  • Posts: 492
    • my self-education project
Re: A new design for a JSON Parser
« Reply #82 on: August 01, 2021, 01:12:56 pm »
@sysrpl, did I understand correctly, if any key in JSON contains a slash, then it will be impossible to find this key using TJsonNode.Find()?

sysrpl

  • Full Member
  • ***
  • Posts: 237
Re: A new design for a JSON Parser
« Reply #83 on: August 01, 2021, 03:59:20 pm »
@sysrpl, did I understand correctly, if any key in JSON contains a slash, then it will be impossible to find this key using TJsonNode.Find()?
As it is currently yes. The forward slash is used as a name separator, much like with XPATH. If you wanted to use a free text search using any keys, then you'd need to provide a list of string keys.

For example:
Code: Pascal  [Select][+][-]
  1. function JsonFindKeys(N: TJsonNode; Keys: array of string): TJsonNode;
  2. var
  3.   I: Integer;
  4. begin
  5.   Result := nil;
  6.   for I := 0 to Length(Keys) - 1 do
  7.   begin
  8.     N := N.Child(Keys[I]);
  9.     if N = nil then
  10.       Break;
  11.   end;
  12.   Result := N;
  13. end;  
  14.  

Usage:

Code: Pascal  [Select][+][-]
  1. S := JsonFindKeys(N, ['customer', 'first']).AsString;

If you want I can add this form of Find as a method overload.
« Last Edit: August 01, 2021, 04:02:19 pm by sysrpl »

avk

  • Sr. Member
  • ****
  • Posts: 492
    • my self-education project
Re: A new design for a JSON Parser
« Reply #84 on: August 01, 2021, 04:16:35 pm »
No thanks, I just wanted to clarify.

zoltanleo

  • Sr. Member
  • ****
  • Posts: 359
Re: A new design for a JSON Parser
« Reply #85 on: August 04, 2021, 07:40:10 am »
Hi sysrpl

I want to express my deep gratitude for the wonderful json parser. I have taken the liberty of translating this manual into Russian. I hope this will make your module more popular and motivate you for new projects.  ;)

Win10 LTSC x64/Deb 11 amd64/Darwin Cocoa (Big Sur):
Lazarus x32/x64 2.3(trunk); FPC 3.3.1 (trunk), FireBird 3.0.7

Sorry for my bad English, I'm using translator ;)

zoltanleo

  • Sr. Member
  • ****
  • Posts: 359
Re: A new design for a JSON Parser
« Reply #86 on: August 05, 2021, 10:41:54 pm »
Hi sysrpl

Now I check the contents of the file for validity json by loading its contents into the stringlist. If TryToParse function returns true, then I then upload the contents of the file to the finished node.

Code: Pascal  [Select][+][-]
  1. var
  2.   RootNode: TJsonNode = nil;
  3.   SL: TStringList = nil;
  4. begin
  5.   RootNode:= TJsonNode.Create;
  6.   SL:= TStringList.Create;
  7.   try
  8.     try
  9.       //check validity of the file contents
  10.       if FileExistsUTF8(ExtractFilePath(Application.ExeName) + jsonfile) then
  11.       begin
  12.         SL.LoadFromFile(ExtractFilePath(Application.ExeName) + jsonfile);
  13.  
  14.         //if file content matches valid json then load it
  15.         if RootNode.TryParse(SL.Text)
  16.           then RootNode.LoadFromFile(ExtractFilePath(Application.ExeName) + jsonfile);
  17.       end;
  18.  
  19.       with RootNode do
  20.       begin
  21.         //some useful work
  22.       end;
  23.  
  24.     finally
  25.       FreeAndNil(SL);
  26.       RootNode.Free;
  27.     end;
  28.   except
  29.     on E:Exception do
  30.     ShowMessage(Format('Error: %s' + LineEnding + LineEnding + '%s',
  31.                     [E.Message, SysErrorMessageUTF8(GetLastOSError)]));
  32.   end;                                                                      

Can I ask you to add a function (something like TryToParseLoadingFile(const aFileName: string; out aNode: TJsonNode): boolean) that will combine these two operations? If the function returns true, you can read the parameters from the node aNode: TJsonNode.


Win10 LTSC x64/Deb 11 amd64/Darwin Cocoa (Big Sur):
Lazarus x32/x64 2.3(trunk); FPC 3.3.1 (trunk), FireBird 3.0.7

Sorry for my bad English, I'm using translator ;)

zoltanleo

  • Sr. Member
  • ****
  • Posts: 359
Re: A new design for a JSON Parser
« Reply #87 on: August 06, 2021, 09:11:03 pm »
Hi all

Please tell me how can I save all the elements of a list as an array to a json file using a for..to loop?
Win10 LTSC x64/Deb 11 amd64/Darwin Cocoa (Big Sur):
Lazarus x32/x64 2.3(trunk); FPC 3.3.1 (trunk), FireBird 3.0.7

Sorry for my bad English, I'm using translator ;)

engkin

  • Hero Member
  • *****
  • Posts: 2958
Re: A new design for a JSON Parser
« Reply #88 on: August 06, 2021, 09:54:37 pm »
Add a node of type array where you need it:
Code: Pascal  [Select][+][-]
  1.   Arr:=rootNode.Add('testArray',nkArray);

Now, simply use add, ignoring the first param, to add the values one by one:
Code: Pascal  [Select][+][-]
  1.   Arr.Add('',s);

Quick test:
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   {$IFDEF UNIX}{$IFDEF UseCThreads}
  7.   cthreads,
  8.   {$ENDIF}{$ENDIF}
  9.   Classes,SysUtils,
  10.   JsonTools
  11.   { you can add units after this };
  12.  
  13. var
  14.   j,Arr:TJSONNode;
  15.   s:string;
  16.   sl:TStringList;
  17.   i:integer;
  18. begin
  19.   {Test data}
  20.   sl:=TStringList.Create;
  21.   for i:=1 to 10 do
  22.     sl.Add('Test'+i.ToString);
  23.  
  24.   j:=TJSONNode.Create;//nkObject by defalut
  25.   Arr:=j.Add('TestArray',nkArray);
  26.   for s in sl do
  27.     Arr.add('',s);
  28.  
  29.   j.SaveToFile('test.json');
  30.  
  31.   j.Free;
  32.   sl.Free;
  33. end.

The file it generates is:
Quote
{
   "TestArray": [
      "Test1",
      "Test2",
      "Test3",
      "Test4",
      "Test5",
      "Test6",
      "Test7",
      "Test8",
      "Test9",
      "Test10"
   ]
}
« Last Edit: August 06, 2021, 10:00:37 pm by engkin »

zoltanleo

  • Sr. Member
  • ****
  • Posts: 359
Re: A new design for a JSON Parser
« Reply #89 on: August 06, 2021, 10:29:48 pm »
Now, simply use add, ignoring the first param, to add the values one by one:
Code: Pascal  [Select][+][-]
  1.   Arr.Add('',s);

Hi engkin.
Thank U  for the answer. I didn't know there was such a way.

Thanks again!
Win10 LTSC x64/Deb 11 amd64/Darwin Cocoa (Big Sur):
Lazarus x32/x64 2.3(trunk); FPC 3.3.1 (trunk), FireBird 3.0.7

Sorry for my bad English, I'm using translator ;)

 

TinyPortal © 2005-2018