Lazarus
Programming => General => Topic started by: sysrpl on August 30, 2019, 10:20:04 am
-
I've posted a new page that tests the speed and correctness of several pascal based JSON parsers.
https://www.getlazarus.org/json/tests/
In full disclosure I am the author of the new open source JsonTools library, and even though my parser seems to a big improvement over the other alternatives, my tests were not biased.
If anyone would like help in replication the tests, let me know and I'll see what I can do.
Also, to be thorough, you should read through both the article I posted at the top this message, and my original page (https://www.getlazarus.org/json) which has been updated with more information. Both pages took some time to write, and I promise if you read through them some of your questions will be answered without having to ask others for help or insight.
-
Thank you for sharing this elegantly and compactly coded tool, which is most impressive.
However, I could not get the tests.lpr in the /tests folder to compile. It looks as though tests.lpr was designed for a different incarnation of jsontools.pas?
-
Your parser seems not to handle 4 byte UTF16 codepoints... It seems limited to the 2 byte subset of UTF16 a.k.a. UCS2? I still have to test it, though, observation based on reading your code.
-
Thaddy, it supports UTF8, but strictly adheres to the JSON spec which syas you can only use 4 character hex encodes with \u. It also says that it supports UTF8. What this means is that if you want a 4 byte unicode, then don't try to encode it in hex, rather just use the 4 byte character directly.
So to do this your would write:
{ "name": "𠜎" }
Instead of trying to write:
{ "name": "\u2070E" }
Does that make any sense?
howardpc,
Osrry about that, I forgot to add tests.lpr back into the git repo and only had been adding the jsontools.pas unit. I've pushed the newer version of test.lpr and it's fixed. Thanks for noticing.
-
Thaddy, it supports UTF8, but strictly adheres to the JSON spec
That's a contradiction in terms: JSON specifies UTF16, Final draft ECMA-404 second revision ad 4: JSON text
It is also implied from the ECMA script specification, which is also UTF16.
So even if UTF8 is supported its format should be UTF16
-
From the current specification and also noted on Wikipedia on https://tools.ietf.org/html/rfc8259:
8.1. Character Encoding
JSON text exchanged between systems that are not part of a closed
ecosystem MUST be encoded using UTF-8 [RFC3629].
Previous specifications of JSON have not required the use of UTF-8
when transmitting JSON text. However, the vast majority of JSON-
based software implementations have chosen to use the UTF-8 encoding,
to the extent that it is the only encoding that achieves
interoperability.
Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a networked-transmitted JSON text. In the interests of
interoperability, implementations that parse JSON texts MAY ignore
the presence of a byte order mark rather than treating it as an
error.
From https://www.json.org/ :
string -> \ -> u -> 4 hex digits
UTF-8 allows for decoding of 4 byte characters.
-
I've posted a new page that tests the speed and correctness of several pascal based JSON parsers.
https://www.getlazarus.org/json/tests/ (https://www.getlazarus.org/json/tests/)
In full disclosure I am the author of the new open source JsonTools library, and even though my parser seems to a big improvement over the other alternatives, my tests were not biased.
If anyone would like help in replication the tests, let me know and I'll see what I can do.
Also, to be thorough, you should read through both the article I posted at the top this message, and my original page (https://www.getlazarus.org/json) which has been updated with more information. Both pages took some time to write, and I promise if you read through them some of your questions will be answered without having to ask others for help or insight.
Thanks for the code and also the test page. I found it intriguing. There is one typo that I noticed on the second test:
"100,00 times."
I expect this to be: "100,000 times."
-
100,000 fixed. Thanks for noticing.
-
From the current specification and also noted on Wikipedia on https://tools.ietf.org/html/rfc8259:
The only reliable source is 8.2. not 8.1. and it is not accepted.
Furthermore: it will not be accepted because what is described in 8.1. Usually one is a lot smarter than giving in to inappropriate use or deviation from underlying standards.
It would be a first: it has no technical merit.
To paraphrase:
We are not all elderly British that try to destroy their children's future.., Some English that bury heads in the sand about Brexit and listening to a blond with dual nationality... - not Swedish 8-) -(temporarily on topic...today... 8-) I am still a political scientist too.. :-X )
Because he can't https://www.nytimes.com/2017/02/08/world/europe/britain-boris-johnson-renounces-american-citizenship.html according to a certain average golfer...
And so he proves himself an opportunist...which we can translate to certain prediction algorithms... (Yes, it IS on topic... just.... ;D ;D :D :'( )
Thank you to everyone that is actually following my reasoning... You do not have to agree..
-
Thanks, hope this can speed up our application. :)
-
Thanks, hope this can speed up our application. :)
Original author here. Let me know if my library works better for you.
-
It would be nice to add this lib as a package and make it available through OPM.
Library is dual license as GPL3 and LGPL3. As I understand LGPL3 forces derivative work (including modifications or anything statically linked to the library) to be only redistributed under LGPL3. Simply including your unit makes a static linking by default, so it seams that my application would then be forced to be LGPL3? You do not allow linking exceptions like FPC and LAZ, do you?
Anyway thanks for sharing the library. ;)
-
Thanks, hope this can speed up our application. :)
Original author here. Let me know if my library works better for you.
Thanks, really impressive, now is super fast!!! =)