Recent

Author Topic: A new design for a JSON Parser  (Read 42691 times)

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1088
  • Professional amateur ;-P
Re: A new design for a JSON Parser
« Reply #45 on: July 26, 2021, 07:59:07 pm »
Hey A.Bouchez,

If you want a fast JSON parser for FPC, you may try what mORMot 2 offers.

Is there a link you can provide that gives a simple example on how to start with using mORMot 2's JSON parser only?
Something that will give you a simple set of instructions to only install the parser and not have to depend on the entirety of mORMot's code.

I would be eternally grateful for that!!

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Okoba

  • Hero Member
  • *****
  • Posts: 528
Re: A new design for a JSON Parser
« Reply #46 on: July 27, 2021, 10:33:37 am »
To get you started:
- Use mORMot2, and it has a package for Lazarus: https://github.com/synopse/mORMot2
- Remember that some methods are renamed in version 2, but read the comments, it always helps what you should use next
- Always read the comments, they have instructions
- Start with variant version as it is quick, easy and still very fast
- For a more structured code, use record or class way
- For record and class ways, you will hit some issues when you use custom types, you will need to register them like I did or register for custom events and other stuff. Read the blog and docs and search the forum if you need it. Almost are questions that are already answered.
- There are more JSON methods for arrays (eg JsonArrayCount) and custom field reading. Read the code for more info, but you probably will not need them for daily stuff.

- Forum: https://synopse.info/forum/viewforum.php?id=2
- Docs: https://synopse.info/files/html/Synopse%20mORMot%20Framework%20SAD%201.18.html#TITLE_237
- Blog: https://blog.synopse.info/?tag/JSON/

Here is a sample:
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   mormot.core.base,
  7.   mormot.core.text,
  8.   mormot.core.json,
  9.   mormot.core.variants;
  10.  
  11. type
  12.   TTestClass = class(TSynAutoCreateFields)
  13.   private
  14.     FX: Integer;
  15.     FY: String;
  16.     FZ: TBooleanDynArray;
  17.   published
  18.     property X: Integer read FX write FX;
  19.     property Y: String read FY write FY;
  20.     property Z: TBooleanDynArray read FZ write FZ;
  21.   end;
  22.  
  23.   TTestRecord = packed record //Need to be packed
  24.     X: Integer;
  25.     Y: String;
  26.     Z: TBooleanDynArray;
  27.   end;
  28. const
  29.   __TTestRecord = 'X: Integer;  Y: String; Z: TBooleanDynArray';
  30.  
  31.   procedure Decode;
  32.   var
  33.     S: RawUtf8;
  34.     J: Variant;
  35.     V: array[0..1] of TValuePUTF8Char;
  36.     C: TTestClass;
  37.     R: TTestRecord;
  38.   begin
  39.     S := '{"X":1,Y:"Test",Z:[false,true]}';
  40.     J := TDocVariant.NewJson(S);
  41.  
  42.     //Variant way
  43.     WriteLn(J.X);
  44.     WriteLn(J.Y);
  45.     WriteLn(J.Z._(0));
  46.  
  47.     //TDocVariantData way
  48.     WriteLn(TDocVariantData(J).S['Y']);
  49.  
  50.     //ObjectLoadJson Way
  51.     C := TTestClass.Create;
  52.     WriteLn(ObjectLoadJson(C, S));
  53.     WriteLn(C.X);
  54.     WriteLn(C.Z[0]);
  55.     C.Free;
  56.  
  57.     RecordLoadJson(R, S, TypeInfo(TTestRecord));
  58.     WriteLn(R.X);
  59.     WriteLn(R.Z[0]);
  60.  
  61.     //JsonDecode way (Warning: Inplace and changes S)
  62.     JsonDecode(S, ['X', 'Y'], @V);
  63.     WriteLn(V[0].ToCardinal);
  64.   end;
  65.  
  66.   procedure Encode;
  67.   var
  68.     C: TTestClass;
  69.     R: TTestRecord;
  70.   begin
  71.     //ObjectToJson way
  72.     C := TTestClass.Create;
  73.     C.X := 1;
  74.     C.Y := 'Test';
  75.     C.Z := [False, True];
  76.     WriteLn(ObjectToJson(C));
  77.     C.Free;
  78.  
  79.     //RecordSaveJson way
  80.     R.X := 1;
  81.     R.Y := 'Test';
  82.     R.Z := [False, True];
  83.     WriteLn(RecordSaveJson(R, TypeInfo(TTestRecord)));
  84.  
  85.     //JsonEncode way
  86.     WriteLn(JsonEncode(['X', 1, 'Y', 'Test', 'Z', '[', False, True, ']']));
  87.   end;
  88.  
  89. begin
  90.   //Only needed once
  91.   TRttiJson.RegisterFromText(TypeInfo(TTestRecord), __TTestRecord, [], []);
  92.  
  93.   Decode;
  94.   Encode;
  95.   ReadLn;
  96. end.

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1088
  • Professional amateur ;-P
Re: A new design for a JSON Parser
« Reply #47 on: July 27, 2021, 04:42:26 pm »
Hey Okoba,

To get you started:
[...]
- Forum: https://synopse.info/forum/viewforum.php?id=2
- Docs: https://synopse.info/files/html/Synopse%20mORMot%20Framework%20SAD%201.18.html#TITLE_237
- Blog: https://blog.synopse.info/?tag/JSON/

This is freakin AWESOME, thank you SOOOO much Okoba!!!

I'll pour into all the code and blog posts you provided to get my head around the entirety of what is needed to wrap my head around a different paradigm of doing JSON.

I have to admit, that from the code you provided, it is quite a paradigm shift from the approach that fpjson takes you ;)

Again, thank you SOO much for all the detailed info!!

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Okoba

  • Hero Member
  • *****
  • Posts: 528
Re: A new design for a JSON Parser
« Reply #48 on: July 27, 2021, 04:47:11 pm »
Welcome!
If you like fpjson approach, you may like to use the Variant way.

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1088
  • Professional amateur ;-P
Re: A new design for a JSON Parser
« Reply #49 on: July 27, 2021, 04:52:51 pm »
Hey Okoba,

If you like fpjson approach, you may like to use the Variant way.

It's not that I like it per se. It's the fact that it's the only one I've been exposed to up til now. But I'll keep it in mind :)

I don't mind change and I'm actually really curious to learn this new approach, so again, many thanks for giving me a guide on how to tackle this new challenge ;)

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

alpine

  • Hero Member
  • *****
  • Posts: 1032
Re: A new design for a JSON Parser
« Reply #50 on: July 27, 2021, 09:32:32 pm »
To get you started:
- Use mORMot2, and it has a package for Lazarus: https://github.com/synopse/mORMot2
- Remember that some methods are renamed in version 2, but read the comments, it always helps what you should use next
- Always read the comments, they have instructions
- Start with variant version as it is quick, easy and still very fast
May I politely ask what are the advantages of using a Variant instead of fpjson.TJSONData and descendants?
 
- For a more structured code, use record or class way
- For record and class ways, you will hit some issues when you use custom types, you will need to register them like I did or register for custom events and other stuff.
*snip*

What is the point when we have fine fpjsonrtti unit with the TJSONStreamer and TJSONDeStreamer?

Sorry for being out of topic, but I don't really see a big difference.
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: A new design for a JSON Parser
« Reply #51 on: July 27, 2021, 09:58:19 pm »
I don't really see a big difference.

I am also interested. According to reply #43, it is 13 to 50 times faster. Would be nice to see the  benchmark code.

Okoba

  • Hero Member
  • *****
  • Posts: 528
Re: A new design for a JSON Parser
« Reply #52 on: July 28, 2021, 04:42:50 am »
Variant version is faster, not much for being variant, because the underlining JSON parsing of mORMot. Being variant makes it simpler to use to some tastes. If you need a more structured code, you should use the record or class way.
I am not much experienced with TJSONStreamer but the mORMot version, has options like:
- Auto creating and destroying fields (if you inherit from TSynAutoCreateFields
- Supports records
- Much more options for handling custom types, enums, comments, keyword names in JSON (type, class)

The key thing to choose between them is if you need more speed or more options or Delphi support, then mORMot seems the better option.

The benchmark code:
https://github.com/synopse/mORMot2/blob/087f740c577a0e38f83f8193874a343ed789fb46/test/test.core.data.pas#L2840


abouchez

  • Full Member
  • ***
  • Posts: 110
    • Synopse
Re: A new design for a JSON Parser
« Reply #54 on: July 28, 2021, 09:44:01 am »
I tried to include jsontools to the benchmark.
I downloaded the current version from https://github.com/sysrpl/JsonTools

Sadly, this library doesn't seem very well tested.
TryParse('["XS\"\"\"."]') fails, whereas this is valid JSON.

After a quick fix, I run the benchmark tests:

Code: [Select]
  Some numbers on FPC 3.2 + Linux x86_64:
  - JSON benchmark: 100,299 assertions passed  810.30ms
     StrLen() in 820us, 23.3 GB/s
     IsValidUtf8(RawUtf8) in 1.46ms, 13 GB/s
     IsValidUtf8(PUtf8Char) in 2.23ms, 8.5 GB/s
     IsValidJson(RawUtf8) in 27.23ms, 719.8 MB/s
     IsValidJson(PUtf8Char) in 25.87ms, 757.6 MB/s
     JsonArrayCount(P) in 25.26ms, 775.9 MB/s
     JsonArrayCount(P,PMax) in 25.04ms, 783 MB/s
     JsonObjectPropCount() in 8.40ms, 1.3 GB/s
     TDocVariant in 118.81ms, 165 MB/s
     TDocVariant dvoInternNames in 145.08ms, 135.1 MB/s
     TOrmTableJson GetJsonValues in 22.88ms, 376.8 MB/s (write)
     TOrmTableJson expanded in 41.26ms, 475.1 MB/s
     TOrmTableJson not expanded in 21.44ms, 402.2 MB/s
     DynArrayLoadJson in 62.02ms, 316 MB/s
     fpjson in 79.36ms, 24.7 MB/s
     jsontools in 51.41ms, 38.1 MB/s
     SuperObject in 187.79ms, 10.4 MB/s

So mORMot 2 DynArrayLoadJson() is almost 10 times faster than jsontools, and TDocVariant is 5 times faster.

The fix is a dirty goto (the fastest to write):
Code: Pascal  [Select][+][-]
  1.   if C^ = '"'  then
  2.   begin
  3.     repeat
  4. fix:  Inc(C);
  5.       if C^ = '\' then
  6.       begin
  7.         Inc(C);
  8.         if C^ = '"' then
  9.           goto fix
  10.         else if C^ = 'u' then
  11.  

I would not use a library with so limited testing, anyway.

alpine

  • Hero Member
  • *****
  • Posts: 1032
Re: A new design for a JSON Parser
« Reply #55 on: July 28, 2021, 09:53:51 am »
@Okoba,
Thank you for the info.

Variant version is faster, not much for being variant, because the underlining JSON parsing of mORMot. Being variant makes it simpler to use to some tastes.
By "simpler" I guess you mean writing J.X instead of C.Integers['X'], both of them require a lookup, but as the former depends on some compiler magic to skip quotes, the latter has at least a run-time type check. Both ways will require a Find('X') to ensure the attribute is present and there won't be a "bang".

So, the latter is for my taste, it's just not so crafty.

If you need a more structured code, you should use the record or class way.
I am not much experienced with TJSONStreamer but the mORMot version, has options like:
- Auto creating and destroying fields (if you inherit from TSynAutoCreateFields
The mere existence of TSynAutoCreateFields is something that worries me. Hacking with the RTTI is a bummer and how it can be justified? What if RTTI layout changes? Portable?
 
- Supports records
- Much more options for handling custom types, enums, comments, keyword names in JSON (type, class)
*snip*
IMHO that framework tends to shift Pascal paradigm to something dynamically-typed like i.e. Python, something I don't agree with. But that is my personal opinion.
« Last Edit: July 28, 2021, 09:55:32 am by y.ivanov »
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

abouchez

  • Full Member
  • ***
  • Posts: 110
    • Synopse
Re: A new design for a JSON Parser
« Reply #56 on: July 28, 2021, 10:11:17 am »
Some hints:
- the mORMot custom variant type with is just a way of using it -  you are not required to use late binding - and in fact, I prefer to use directly the TDocVariantData record and only typecast it into a variant when I want to transmit it as such;
- the mORMOt custom variant type is just a convenient way to store some object/array document, with built-in JSON support, and automatic memory management by the compiler, like any variant or record; the mORMot ORM also uses such document variants to store any JSON/BSON in a SQL/NoSQL database, or handle dynamic content from client/server SOA using interfaces; on Delphi (I hope with fpdebug soon) you can even see the JSON content when you inspect any such variant value in the debugger - much appreciated, and impossible to do with a class or an interface;
- the more "pascalish" is to use records and array of records and mORMot JSON serialization: there will be no lookup, minimal memory consumption, and best performance (>300MB/s instead of 24MB/s for fpjson), with no compiler magic - just plain efficient pascal code;
- mORMot doesn't change the RTTI - TSynAutoCreateFields is just a way to auto-initiate nested published classes instances in a class, which is very handy in some cases; what mORMot does, is to cache the RTTI for efficiency, and in a cross-platform way.
« Last Edit: July 28, 2021, 10:24:13 am by abouchez »

abouchez

  • Full Member
  • ***
  • Posts: 110
    • Synopse
Re: A new design for a JSON Parser
« Reply #57 on: July 28, 2021, 10:22:53 am »
The \" parsing issue I found is known since october 2019.
https://github.com/sysrpl/JsonTools/issues/11

But the https://github.com/sysrpl/JsonTools/issues/12 decimal dot problem is even more concerning.
« Last Edit: July 28, 2021, 10:36:34 am by abouchez »

alpine

  • Hero Member
  • *****
  • Posts: 1032
Re: A new design for a JSON Parser
« Reply #58 on: July 28, 2021, 10:40:17 am »
*snip*
- mORMot doesn't change the RTTI - TSynAutoCreateFields is just a way to auto-initiate nested published classes instances in a class, which is very handy in some cases; what mORMot does, is to cache the RTTI for efficiency, and in a cross-platform way.
I see.
You're building it, not changing it. Does it make a difference?

in mormot.core.json:
Code: Pascal  [Select][+][-]
  1. procedure AutoCreateFields(ObjectInstance: TObject);
  2. var
  3.   rtti: TRttiJson;
  4.   n: integer;
  5.   p: ^PRttiCustomProp;
  6. begin
  7.   // inlined ClassPropertiesGet
  8.   rtti := PPointer(PPAnsiChar(ObjectInstance)^ + vmtAutoTable)^;
  9.   if (rtti = nil) or
  10.      not (rcfAutoCreateFields in rtti.Flags) then
  11.     rtti := DoRegisterAutoCreateFields(ObjectInstance);
  12.   p := pointer(rtti.fAutoCreateClasses);
  13.   if p = nil then
  14.     exit;
  15.   // create all published class fields
  16.   n := PDALen(PAnsiChar(p) - _DALEN)^ + _DAOFF; // length(AutoCreateClasses)
  17.   repeat
  18.     with p^^ do
  19.       PPointer(PAnsiChar(ObjectInstance) + OffsetGet)^ :=
  20.         TRttiJson(Value).fClassNewInstance(Value);
  21.     inc(p);
  22.     dec(n);
  23.   until n = 0;
  24. end;

and a lot of internals definitions in mormot.core.base.pas :
Code: Pascal  [Select][+][-]
  1. /// cross-compiler negative offset to TDynArrayRec.high/length field
  2.   // - to be used inlined e.g. as
  3.   // ! PDALen(PAnsiChar(Values) - _DALEN)^ + _DAOFF
  4.   // - both FPC and Delphi uses PtrInt/NativeInt for dynamic array high/length
  5.   _DALEN = SizeOf(TDALen);
  6.  
  7.   /// cross-compiler adjuster to get length from TDynArrayRec.high/length field
  8.   _DAOFF = {$ifdef FPC} 1 {$else} 0 {$endif};
  9.  
  10.   /// cross-compiler negative offset to TDynArrayRec.refCnt field
  11.   // - to be used inlined e.g. as PRefCnt(PAnsiChar(Values) - _DAREFCNT)^
  12.   _DAREFCNT = Sizeof(TRefCnt) + _DALEN;
  13.  
  14.  // ... and a lot more FPC/Delphi internal layouts ...
  15.  

I believe those defs aren't for patching, right?
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

abouchez

  • Full Member
  • ***
  • Posts: 110
    • Synopse
Re: A new design for a JSON Parser
« Reply #59 on: July 28, 2021, 10:56:05 am »
> You're building it, not changing it. Does it make a difference?

I am not sure I understand what you mean.
We are not building it, we are using it.
In the AutoCreateFields() we don't build anything, we just cache the RTTI and its published properties classes the first time we use this class.
Then fClassNewInstance() is a very efficient way of creating each needed class instance, with the proper virtual constructor if needed.

The FPC internal layouts are used to bypass the RTL when it makes a difference.
See mormot.core.rtti.pas about how we use the official typinfo unit as source, but encapsulate it into a Delphi/FPC compatible wrapper, and also introduce some RTTI cache as TRttiCustom/TRttiJson classes, with ready-to-use methods and settings.

mORMot users don't need to deal into those details. They just use the high level methods like JSON, ORM or SOA, letting the low level framework do its work.
Most of the low level code is deeply optimized, with a lot of pointer arithmetic for sure, sometimes with huge amount of asm (up to AVX2/BMI SIMD), but it is transparent to the user, and cross-platform.

If you look at the AutoCreateFields() function generated, once inlined into the class constructor, you will see:
Code: [Select]
MORMOT.CORE.JSON$_$TSYNAUTOCREATEFIELDS_$__$$_CREATE$$TSYNAUTOCREATEFIELDS PROC
        push    rbx                                     ; 0000 _ 53
.....
        mov     rax, qword ptr [rsp+8H]                 ; 0072 _ 48: 8B. 44 24, 08
        mov     rax, qword ptr [rax]                    ; 0077 _ 48: 8B. 00
        mov     rbx, qword ptr [rax+48H]                ; 007A _ 48: 8B. 58, 48
        test    rbx, rbx                                ; 007E _ 48: 85. DB
        jz      ?_2462                                  ; 0081 _ 74, 09
        test    dword ptr [rbx+3CH], 4000H              ; 0083 _ F7. 43, 3C, 00004000
        jnz     ?_2463                                  ; 008A _ 75, 0D
?_2462: mov     rdi, qword ptr [rsp+8H]                 ; 008C _ 48: 8B. 7C 24, 08
        call    MORMOT.CORE.JSON_$$_DOREGISTERAUTOCREATEFIELDS$TOBJECT$$TRTTIJSON; 0091 _ E8, 00000000(PLT r)
        mov     rbx, rax                                ; 0096 _ 48: 89. C3
?_2463: mov     r12, qword ptr [rbx+0DCH]               ; 0099 _ 4C: 8B. A3, 000000DC
        test    r12, r12                                ; 00A0 _ 4D: 85. E4
        jz      ?_2465                                  ; 00A3 _ 74, 35
        mov     rax, qword ptr [r12-8H]                 ; 00A5 _ 49: 8B. 44 24, F8
        lea     rbx, ptr [rax+1H]                       ; 00AA _ 48: 8D. 58, 01
ALIGN   8
?_2464: mov     r13, qword ptr [r12]                    ; 00B0 _ 4D: 8B. 2C 24
        mov     rdi, qword ptr [r13]                    ; 00B4 _ 49: 8B. 7D, 00
        mov     rax, qword ptr [r13]                    ; 00B8 _ 49: 8B. 45, 00
        call    qword ptr [rax+0D4H]                    ; 00BC _ FF. 90, 000000D4
        mov     rcx, qword ptr [rsp+8H]                 ; 00C2 _ 48: 8B. 4C 24, 08
        mov     rdx, qword ptr [r13+8H]                 ; 00C7 _ 49: 8B. 55, 08
        add     rdx, rcx                                ; 00CB _ 48: 01. CA
        mov     qword ptr [rdx], rax                    ; 00CE _ 48: 89. 02
        add     r12, 8                                  ; 00D1 _ 49: 83. C4, 08
        sub     ebx, 1                                  ; 00D5 _ 83. EB, 01
        jnz     ?_2464                                  ; 00D8 _ 75, D6
?_2465: mov     qword ptr [rsp+10H], 1                  ; 00DA _ 48: C7. 44 24, 10, 00000001
.....
The resulting asm is really optimized, as fast as it could be with manually written asm, even if it was written in plain pascal.
It may be confusing to read, but it is how we achieve best performance.
But it is still real cross-platform pascal, and the very same code works on ARM32 or AARCH64 with no problem, and good performance.

In the mORMot core, we use the pascal language as a "portable assembler", as C is used in the Linux kernel or SQlite3 library for instance.
It may be confusing, but it is similar to what is done is the lowest part of the FPC RTL.
This is how we achieved our JSON parsing to be magnitude times faster than FPC/Delphi alternatives, in plain pascal code: by looking deeply at the generated assembly and aggressively profiling the code, following https://www.agner.org/optimize reference material.
« Last Edit: July 28, 2021, 10:59:45 am by abouchez »

 

TinyPortal © 2005-2018