Print Page - Programming language comparison by implementing the same transit data app

Miscellaneous => Other => Topic started by: PierceNg on October 27, 2022, 03:25:00 am

Title: Programming language comparison by implementing the same transit data app
Post by: PierceNg on October 27, 2022, 03:25:00 am

In case someone is interested to implement in Pascal:

Programming language comparison by reimplementing the same transit data app (https://github.com/losvedir/transit-lang-cmp)

Current implementations:

C#
TypeScript (Deno)
Elixir
Go
Rust
Scala

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on October 29, 2022, 08:46:53 am

Interesting for a quick coding fun. The Go implementation is single short which can be easily translated. Perhaps a little more inconvenient because our JSON streaming library doesn't yet incorporate attributes, so proper casing of properties will be used instead.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 14, 2022, 12:26:13 pm

I seem to have hit either a bug or limitation of dynamic array. Attached is my current attempt, it crashes when reading stop_times.txt (that file has 1790906 rows!) with the following stack trace:

Code: Bash [Select][+]

Program received signal SIGSEGV, Segmentation fault.
0x000000000042e0e3 in SYSTEM_$$_SPLIT_BLOCK$PMEMCHUNK_VAR$QWORD$$QWORD ()
(gdb) bt
#0  0x000000000042e0e3 in SYSTEM_$$_SPLIT_BLOCK$PMEMCHUNK_VAR$QWORD$$QWORD ()
#1  0x000000000042ea22 in SYSTEM_$$_SYSGETMEM_VAR$QWORD$$POINTER ()
#2  0x0000000000000300 in ?? ()
#3  0x0000000000000188 in ?? ()
#4  0x0000000000000158 in ?? ()
#5  0x0000000000000300 in ?? ()
#6  0x00007fffffffd458 in ?? ()
#7  0x000000000042eaaa in SYSTEM_$$_SYSGETMEM$QWORD$$POINTER ()
#8  0x00000000000002b0 in ?? ()
#9  0x0000000000000188 in ?? ()
#10 0x00007fffffffd458 in ?? ()
#11 0x000000000042f207 in SYSTEM_$$_SYSREALLOCMEM$POINTER$QWORD$$POINTER ()
#12 0x0000000000543468 in RTTI_$P$TRASCAL_$$_TARRAYHELPER$1$CRC713F463B$indirect ()
#13 0x00007fffad0aebb0 in ?? ()
#14 0x00000000001b53b9 in ?? ()
#15 0x000000000042dc9b in SYSTEM_$$_REALLOCMEM$POINTER$QWORD$$POINTER ()
#16 0x000000000000003f in ?? ()
#17 0x0000000000426855 in fpc_dynarray_setlength ()
#18 0x000000000040225b in PrepareAddingItem (this=0x7fff7bad1f00) at generics.collections.pas:1253
#19 0x0000000000403415 in Add (this=0x7fff7bad1f00, AValue=1128500) at generics.collections.pas:1466
#20 0x0000000000414ed2 in GetStopTimes (AStopTimes=<error reading variable: value of type `TStopTimeDynArr' requires 8000000 bytes, which is more than max-value-size>, AStopTimesIxByTrip=0x7ffff7f84140) at app.pas:223
#21 0x0000000000416615 in $main () at app.pas:305
 

At first, I thought it was due to generics.collections.TList bug, but then I replace it with gvector.TVector, while adapting Add -> PushBack and Count->Size and still get a crash with similar stacktrace:

Code: Bash [Select][+]

Program received signal SIGSEGV, Segmentation fault.
0x000000000042b3f3 in SYSTEM_$$_SPLIT_BLOCK$PMEMCHUNK_VAR$QWORD$$QWORD ()
(gdb) bt
#0  0x000000000042b3f3 in SYSTEM_$$_SPLIT_BLOCK$PMEMCHUNK_VAR$QWORD$$QWORD ()
#1  0x000000000042bd32 in SYSTEM_$$_SYSGETMEM_VAR$QWORD$$POINTER ()
#2  0x0000000000000300 in ?? ()
#3  0x0000000000000210 in ?? ()
#4  0x0000000000000158 in ?? ()
#5  0x0000000000000300 in ?? ()
#6  0x00007fffffffd478 in ?? ()
#7  0x000000000042bdba in SYSTEM_$$_SYSGETMEM$QWORD$$POINTER ()
#8  0x00000000000002b0 in ?? ()
#9  0x0000000000000210 in ?? ()
#10 0x00007fffffffd478 in ?? ()
#11 0x000000000042c517 in SYSTEM_$$_SYSREALLOCMEM$POINTER$QWORD$$POINTER ()
#12 0x000000000053ee48 in RTTI_$P$TRASCAL_$$_TINTLIST$indirect ()
#13 0x00007fffa8d499b0 in ?? ()
#14 0x00000000001b53b9 in ?? ()
#15 0x000000000042afab in SYSTEM_$$_REALLOCMEM$POINTER$QWORD$$POINTER ()
#16 0x0000000000000040 in ?? ()
#17 0x0000000000423b65 in fpc_dynarray_setlength ()
#18 0x00000000004012cf in IncreaseCapacity (this=0x7fff7b748980) at gvector.pp:195
#19 0x000000000041141d in Add (this=0x7fff7b748980, i=1192582) at app.pas:111
#20 0x0000000000412429 in GetStopTimes (AStopTimes=<error reading variable: value of type `TStopTimeDynArr' requires 8000000 bytes, which is more than max-value-size>, AStopTimesIxByTrip=0x7ffff7f84140) at app.pas:241
#21 0x0000000000413925 in $main () at app.pas:323
 

So this could be a case study for dynamic arrays. Perhaps the size exceeds maximum?

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Thaddy on November 14, 2022, 01:04:08 pm

Quote from: Leledumbo on November 14, 2022, 12:26:13 pm

So this could be a case study for dynamic arrays. Perhaps the size exceeds maximum?

The maximum size for a dynamic array is High(SizeInt). So this differs between 32/64 bit. SizeInt = longint.
So on 32 bit it is 2147483647 and 64 bit is 9223372036854775807 both are way more than 8000000 but of course that depends on memory and element size.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on November 14, 2022, 04:30:21 pm

Quote from: Leledumbo on November 14, 2022, 12:26:13 pm

I seem to have hit either a bug or limitation of dynamic array. Attached is my current attempt, it crashes when reading stop_times.txt (that file has 1790906 rows!) with the following stack trace:<...>

I guess the reason is mainly in this line:

Code: Pascal [Select][+]

procedure GetStopTimes(var AStopTimes: TStopTimeDynArr; var AStopTimesIxByTrip: TStringIntListMap);
var
  LCSV: TCSVDocument;
  LStart,LEnd: TDateTime;
  i: Integer;
  LTrip: String;
  LStopTimesIx: TIntList;
begin
  ...
    //SetLength(AStopTimes, 1000000);  //<===
    SetLength(AStopTimes, LCSV.RowCount - 1);
  ...
end;
 

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 15, 2022, 02:58:29 am

Quote from: avk on November 14, 2022, 04:30:21 pm

I guess the reason is mainly in this line:

Hell, you're correct, alongside another SetLength in GetTrips. Guess Go doesn't do immediate allocation while FPC does, where it allocates too much. Now it has succeeded, indeed the amount of RAM eaten is almost 2GB (1.8GB to be exact). I'll continue with the benchmark, could be fun to see where FPC sits.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 15, 2022, 04:30:20 am

Alright, seems like it's working now but results is way worse than Go counterpart. Side by side load test result comparison:

Code: [Select]

running (0m43.4s), 00/50 VUs, 50 complete and 0 interrupted i | running (0m34.0s), 00/50 VUs, 286 complete and 0 interrupted 
default ✓ [======================================] 50 VUs  30   default ✓ [======================================] 50 VUs  30

     data_received..................: 658 MB 15 MB/s          |      data_received..................: 28 GB  823 MB/s
     data_sent......................: 106 kB 2.4 kB/s         |      data_sent......................: 2.7 MB 78 kB/s
     http_req_blocked...............: avg=6.81ms   min=0s     |      http_req_blocked...............: avg=7.66µs  min=1.08µs 
     http_req_connecting............: avg=6.38ms   min=0s     |      http_req_connecting............: avg=2.12µs  min=0s     
     http_req_duration..............: avg=428.67ms min=0s     |      http_req_duration..............: avg=57.8ms  min=98.58µs
       { expected_response:true }...: avg=2.32s    min=1.04ms |        { expected_response:true }...: avg=57.8ms  min=98.58µs
     http_req_failed................: 85.63% ✓ 4239       ✗ 7 |      http_req_failed................: 0.00%  ✓ 0          ✗ 2
     http_req_receiving.............: avg=1.96ms   min=0s     |      http_req_receiving.............: avg=7.16ms  min=15.06µs
     http_req_sending...............: avg=484.72µs min=0s     |      http_req_sending...............: avg=42.78µs min=5.11µs 
     http_req_tls_handshaking.......: avg=0s       min=0s     |      http_req_tls_handshaking.......: avg=0s      min=0s     
     http_req_waiting...............: avg=426.23ms min=0s     |      http_req_waiting...............: avg=50.59ms min=74.66µs
     http_reqs......................: 4950   114.065486/s     |      http_reqs......................: 28314  831.923149/s
     iteration_duration.............: avg=43.35s   min=43.25s |      iteration_duration.............: avg=5.73s   min=3.91s  
     iterations.....................: 50     1.152177/s       |      iterations.....................: 286    8.403264/s
     vus............................: 50     min=50       max |      vus............................: 4      min=4        max
     vus_max........................: 50     min=50       max        vus_max........................: 50     min=50       max

And I've got so many connection refused (you see that http_req_failed? that means only almost 15% of all requests are successfully handled).
TCSVDocument.LoadFromFile bottlenecks so bad. I'm on an NVMe SSD so this is not a slow I/O problem.
Lastly, fphttpserver is really not meant for high performance, I remember Marco said so a couple of years ago. So maybe I would need an alternative implementation configured for performance.

Latest code can be found on my fork (https://github.com/leledumbo/transit-lang-cmp/tree/main/trascal) of the original repo. I don't think I will make a PR yet, until the performance is satisfactory.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: PierceNg on November 15, 2022, 04:47:32 am

Quote from: Leledumbo on November 15, 2022, 04:30:20 am

Lastly, fphttpserver is really not meant for high performance, I remember Marco said so a couple of years ago. So maybe I would need an alternative implementation configured for performance.

I've done some testing and indeed the pure Pascal fphttpserver server performs the worst. The libmicrohttpd-based one (in <fpc-src>/packages/libmicrohttpd) and Brook Framework (which uses a C library that wraps libmicrohttpd) are much faster.

Some more notes using libmicrohttpd (built from source) on Ubuntu 20.04 with FPC:

- fcl-web options (mcoSelectInternally, mcoEPollLinuxOnly) => good
- fcl-web option mcoThreadPerConnection => poor
- Brook Framework single-threaded => quite good
- Brook Framework multi-threaded => poor
- Brook Framework with built-in favicon handling => boosted single-threaded's performance

Title: Re: Programming language comparison by implementing the same transit data app
Post by: PascalDragon on November 15, 2022, 07:08:55 am

Quote from: Leledumbo on November 15, 2022, 04:30:20 am

Latest code can be found on my fork (https://github.com/leledumbo/transit-lang-cmp/tree/main/trascal) of the original repo. I don't think I will make a PR yet, until the performance is satisfactory.

Maybe base it on mORMot2 (https://github.com/synopse/mORMot2) instead?

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on November 15, 2022, 08:31:05 am

Quote from: Leledumbo on November 15, 2022, 02:58:29 am

<...>
Guess Go doesn't do immediate allocation while FPC does, where it allocates too much.
<...>

It's just that in the go version, stopTimes is not an array, but a slice, the capacity of which can be increased on demand.
BTW, on my machine TCSVDocument parses the file stop_times.txt in about 14 seconds. I suspect that even if you load the file into a TStringList and then parse it line by line using string.Split(), things will go about 6-7 times faster.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 15, 2022, 10:01:11 am

Quote from: PascalDragon on November 15, 2022, 07:08:55 am

Maybe base it on mORMot2 (https://github.com/synopse/mORMot2) instead?

I'll check if this example (https://github.com/synopse/mORMot2/blob/master/ex/http-server-raw/httpServerRaw.dpr) is enough to take as a base.

Quote from: avk on November 15, 2022, 08:31:05 am

It's just that in the go version, stopTimes is not an array, but a slice, the capacity of which can be increased on demand.

All of my arrays are dynamic as well, which should be equivalent to Go slice.

Quote from: avk on November 15, 2022, 08:31:05 am

BTW, on my machine TCSVDocument parses the file stop_times.txt in about 14 seconds. I suspect that even if you load the file into a TStringList and then parse it line by line using string.Split(), things will go about 6-7 times faster.

Either that or I'll just stream it along the way, no idea which one is faster. But still I think TCSVDocument is just not made with performance in mind (plus that Cells property is kinda arrrgh because it puts column first instead of row first).

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 15, 2022, 12:18:18 pm

An improvement:

Code: [Select]

running (0m52.6s), 00/15 VUs, 15 complete and 0 interrupted i | running (0m30.8s), 00/15 VUs, 301 complete and 0 interrupted 
default ✓ [======================================] 15 VUs  30   default ✓ [======================================] 15 VUs  30

     data_received..................: 1.5 GB 28 MB/s          |      data_received..................: 30 GB  956 MB/s
     data_sent......................: 277 kB 5.3 kB/s         |      data_sent......................: 2.8 MB 91 kB/s
     http_req_blocked...............: avg=139.3µs  min=66.52µ |      http_req_blocked...............: avg=4.79µs  min=1.06µs 
     http_req_connecting............: avg=2.64µs   min=0s     |      http_req_connecting............: avg=725ns   min=0s     
     http_req_duration..............: avg=526.63ms min=2.51ms |      http_req_duration..............: avg=15.24ms min=85.57µs
       { expected_response:true }...: avg=526.63ms min=2.51ms |        { expected_response:true }...: avg=15.24ms min=85.57µs
     http_req_failed................: 0.00%  ✓ 0         ✗ 14 |      http_req_failed................: 0.00%  ✓ 0          ✗ 2
     http_req_receiving.............: avg=1.84ms   min=34.02µ |      http_req_receiving.............: avg=1.41ms  min=12.72µs
     http_req_sending...............: avg=19.56ms  min=19.47µ |      http_req_sending...............: avg=49.25µs min=4.83µs 
     http_req_tls_handshaking.......: avg=0s       min=0s     |      http_req_tls_handshaking.......: avg=0s      min=0s     
     http_req_waiting...............: avg=505.22ms min=181.48 |      http_req_waiting...............: avg=13.78ms min=54.7µs 
     http_reqs......................: 1485   28.236484/s      |      http_reqs......................: 29799  966.969529/s
     iteration_duration.............: avg=52.14s   min=51.45s |      iteration_duration.............: avg=1.51s   min=898.64m
     iterations.....................: 15     0.285217/s       |      iterations.....................: 301    9.767369/s
     vus............................: 11     min=11      max= |      vus............................: 15     min=15       max
     vus_max........................: 15     min=15      max= |      vus_max........................: 15     min=15       max

The latest code now contains my own made TCSVDocument with the same properties and methods of TCSVDocument that I use in this program, requiring 0 changes apart from the uses clause. It boosts reading stop_times.txt by 400% on my machine (28s down to 7s) with -O3. Yet still loses to Go's 2s, seems like TStringList isn't the best choice despite being magnitude faster than whatever TCSVDocument uses under the hood.
I also fixed all of the memory leaks which makes the previous test fails due to OOM.
Only one last byte: the HTTP server. I just realized that the number of iterations = number of VUs in previous failed test and last successful one, seems like the connection ain't properly closed after sending the response, making each VU stuck and cannot issue another request.

EDIT: forgot to commit csvutils.pas, it's now using low level implementation that pushes reading stop_times.txt to another 33% (7s -> 5s), I have no idea what technique employed by Go to reach that 2s. Impressive stuff.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: PascalDragon on November 16, 2022, 07:26:35 am

Quote from: Leledumbo on November 15, 2022, 12:18:18 pm

Only one last byte: the HTTP server. I just realized that the number of iterations = number of VUs in previous failed test and last successful one, seems like the connection ain't properly closed after sending the response, making each VU stuck and cannot issue another request.

You could try with FPC 3.3.1 as there were some fixes (https://gitlab.com/freepascal.org/fpc/source/-/commit/a1a30876d596e9bca2a5409b53b0fc637eda5dfd) regarding the closing of sockets recently (don't know if that is what affects you however).

Quote from: Leledumbo on November 15, 2022, 12:18:18 pm

EDIT: forgot to commit csvutils.pas, it's now using low level implementation that pushes reading stop_times.txt to another 33% (7s -> 5s), I have no idea what technique employed by Go to reach that 2s. Impressive stuff.

What is the time if you simply read the file with TFileStream (discarding the data, only the raw read time)?

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on November 16, 2022, 07:56:56 am

Quote from: Leledumbo on November 15, 2022, 12:18:18 pm

EDIT: forgot to commit csvutils.pas, it's now using low level implementation that pushes reading stop_times.txt to another 33% (7s -> 5s), I have no idea what technique employed by Go to reach that 2s. Impressive stuff.

Curious, which platform did you test on? I just ran your latest version against Go in a Linux x86-64 virtual machine and got a slightly different result:

Code: Text [Select][+]

go:  parsed 1739278 stop times in 2.3342127s
fpc: parsed 1739278 stop times in 2.470 seconds
 

go version: go1.19.3 linux/amd64
fpc version: 3.3.1-12058-g9b6926c5f5 linux x86_64

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 16, 2022, 09:00:52 am

Quote from: PascalDragon on November 16, 2022, 07:26:35 am

You could try with FPC 3.3.1 as there were some fixes (https://gitlab.com/freepascal.org/fpc/source/-/commit/a1a30876d596e9bca2a5409b53b0fc637eda5dfd) regarding the closing of sockets recently (don't know if that is what affects you however).

I do use 3.3.1, updated like 3 days ago or so. Let me update again.

Quote from: PascalDragon on November 16, 2022, 07:26:35 am

What is the time if you simply read the file with TFileStream (discarding the data, only the raw read time)?

You mean only the

Code: Pascal [Select][+]

fs.Read(s[1], n);
 

? It's only like 50-70ms.

Quote from: avk on November 16, 2022, 07:56:56 am

Curious, which platform did you test on? I just ran your latest version against Go in a Linux x86-64 virtual machine and got a slightly different result:
Code: Text [Select][+][-]
go: parsed 1739278 stop times in 2.3342127s
fpc: parsed 1739278 stop times in 2.470 seconds

go version: go1.19.3 linux/amd64
fpc version: 3.3.1-12058-g9b6926c5f5 linux x86_64

Same Linux x86_64, but on a real machine, i7-7700HQ, DDR4-2400 dual channel, Sandisk Extreme Portable SSD V2 500GB over USB 3.0.

I've taken a peek at Go implementation, and boy it's a hell lot more complex than my code (https://github.com/golang/go/blob/master/src/encoding/csv/reader.go#L233), so I'm surprised how good their optimizations work. Or perhaps SetLength() on every iteration is not a good idea, let's see if I can optimize that.

EDIT: Optimized by replacing direct dynamic array with TVector which has quite efficient growth factor, I got additional 250ms.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on November 16, 2022, 10:03:46 am

Quote from: Leledumbo on November 16, 2022, 09:00:52 am

Same Linux x86_64, but on a real machine, i7-7700HQ, DDR4-2400 dual channel, Sandisk Extreme Portable SSD V2 500GB over USB 3.0.

Hmm, then it's even more curious, my virtual Linux is running on an ancient i3-4150, HDD.

Quote from: Leledumbo on November 16, 2022, 09:00:52 am

EDIT: Optimized by replacing direct dynamic array with TVector which has quite efficient growth factor, I got additional 250ms.

It seems even more:

Code: Text [Select][+]

parsed 1739278 stop times in 2.124 seconds
 

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Thaddy on November 16, 2022, 11:01:51 am

I may have missed something, but what are the optimization settings for Go and FPC? And which linker is used?
To me that is valuable information. The latter even matters for FPC e.g fpc-llvm should probably have similar results as Go.

Futhermore, language comparisons are futile, since what really matters is the compiler and linker implementation.
In principle there is no such thing as one language being faster than some other language. It is all about how a compiler generates efficient code and a linker can further optimize.
The above is a constant sorrow and shows not many people get how it really works....

When you read about - compiled - language comparisons related to speed all alarm bells should start sounding. https://www.youtube.com/watch?v=KcizmrkNcas
The people who propose such things lack any knowledge. The comparison is always about the compiler and linker.

Any language that is Turing complete should, given a fictitious common compiler/linker, generate executables that have the same speed.
That is not opinion but fact.

(Note to test this, compare a fpc-llvm executable with a go executable: they are already very similar in execution speed, even although fpc-llvm is still a moving target with room to optimize the llvm tool chain. At least a level playing ground.)

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 16, 2022, 09:51:29 pm

Quote from: avk on November 16, 2022, 10:03:46 am

Hmm, then it's even more curious, my virtual Linux is running on an ancient i3-4150, HDD.

That's sick, how come it beats my 4 generations younger CPU with SSD?

Quote from: avk on November 16, 2022, 10:03:46 am

It seems even more:
Code: Text [Select][+][-]
parsed 1739278 stop times in 2.124 seconds

Same reason as above, I think.

Quote from: Thaddy on November 16, 2022, 11:01:51 am

I may have missed something, but what are the optimization settings for Go and FPC? And which linker is used?

Go: none, they don't have individual optimizations switch, only all (default) or none. FPC uses -CX -XXs -O3 (tried -O4, doesn't matter much). Linker also default for both.

Quote from: Thaddy on November 16, 2022, 11:01:51 am

Futhermore, language comparisons are futile, since what really matters is the compiler and linker implementation.
In principle there is no such thing as one language being faster than some other language. It is all about how a compiler generates efficient code and a linker can further optimize.
The above is a constant sorrow and shows not many people get how it really works....

Sure, I do realize this, that's why I said FPC, not Pascal. I still want to know where FPC lies in the performance realm with more mainstream languages implementation. I consider only Google Go's gc, ignoring gccgo and gollvm despite all 3 are officially supported.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on November 17, 2022, 08:22:25 am

Well, nothing out of the ordinary on my end either, the app is built as a Lazarus project in release mode(-CX -Xs -XX -O3).
And Go version:

Code: Text [Select][+]

go build <module_name>
 

BTW, lines 245-246 of app.pas look suspicious. Seems like it should be:

Code: Pascal [Select][+]

LStopTimesIx.Add(i - 1);
AStopTimes[i - 1] := TStopTime.Create(LTrip, LCSV.Cells[3, i], LCSV.Cells[1, i], LCSV.Cells[2, i]);
 

There is another question: is it really necessary to fully parse a multi-megabyte CSV document if only its first 4 columns are needed?
For example, this version

Code: Pascal [Select][+]

  ...
type
  ...
  TIntList          = specialize TList<Integer>;
  TStringIntListMap = specialize TObjectDictionary<string, TIntList>;
  TStopTimeDynArr   = specialize TObjectList<TStopTime>;
  ...
procedure GetStopTimes(out AStopTimes: TStopTimeDynArr; out AStopTimesIxByTrip: TStringIntListMap);
type
  TKeySet = array[0..3] of string;
  procedure ParseLine(p, pEnd: PChar; out Keys: TKeySet);
  var
    Idx: Integer;
    pStart: PChar;
  begin
    pStart := p;
    Idx := 0;
    while p <= pEnd do begin
      if p^ = ',' then begin
        SetLength(Keys[Idx], p - pStart);
        Move(pStart^, Keys[Idx][1], p - pStart);
        if Idx = 3 then exit;
        Inc(Idx);
        pStart := p+1;
      end;
      Inc(p);
    end;
  end;
var
  ms: TMemoryStream;
  LStart,LEnd: TDateTime;
  p, pStart, pStop: PChar;
  LStopTimesIx: TIntList;
  k: TKeySet;
  HeaderOk: Boolean = False;
begin
  ms := TMemoryStream.Create;
  try
    LStart := Now;
 
    ms.LoadFromFile('../MBTA_GTFS/stop_times.txt');
    p := ms.Memory;
    pStop := p + ms.Size;
    pStart := nil;
    AStopTimesIxByTrip := TStringIntListMap.Create([doOwnsValues]);
    AStopTimes := TStopTimeDynArr.Create;
    while p < pStop do begin
      if p^ in [#10, #13] then
        if pStart <> nil then begin
          ParseLine(pStart, p - 1, k);
          if HeaderOk then begin
            if not AStopTimesIxByTrip.TryGetValue(k[0], LStopTimesIx) then begin
              LStopTimesIx := TIntList.Create;
              AStopTimesIxByTrip.Add(k[0], LStopTimesIx);
            end;
            LStopTimesIx.Add(AStopTimes.Count);
            AStopTimes.Add(TStopTime.Create(k[0], k[3], k[1], k[2]));
          end else begin
            if (k[0] <> 'trip_id') or (k[3] <> 'stop_id') or (k[1] <> 'arrival_time') or (k[2] <> 'departure_time') then begin
              WriteLn('stop_times.txt not in expected format.');
              Halt(1);
            end;
            HeaderOk := True;
          end;
          pStart := nil;
        end else
      else
        if pStart = nil then
          pStart := p;
      Inc(p);
    end;
  finally
    ms.Free;
  end;
 
  LEnd := Now;
  WriteLn('parsed ', AStopTimes.Count, ' stop times in ', SecondSpan(LStart, LEnd):1:3,' seconds');
end; 
...
 

works out in 1.317 s, and if primitives from LGenerics are used, then in 0.810 s.

Quote from: Thaddy on November 16, 2022, 11:01:51 am

Futhermore, language comparisons are futile, since what really matters is the compiler and linker implementation.

Don't forget about libraries too.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Thaddy on November 17, 2022, 09:37:43 am

Quote from: avk on November 17, 2022, 08:22:25 am

Don't forget about libraries too.

Libraries by themselves are by their very nature not part of it: they have nothing to do with code generation.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 17, 2022, 11:07:26 am

Quote from: avk on November 17, 2022, 08:22:25 am

BTW, lines 245-246 of app.pas look suspicious. Seems like it should be:
Code: Pascal [Select][+][-]
LStopTimesIx.Add(i - 1);
AStopTimes[i - 1] := TStopTime.Create(LTrip, LCSV.Cells[3, i], LCSV.Cells[1, i], LCSV.Cells[2, i]);

Nice catch, it might be a problem for correctness, not performance though.

Quote from: avk on November 17, 2022, 08:22:25 am

There is another question: is it really necessary to fully parse a multi-megabyte CSV document if only its first 4 columns are needed?
For example, this version
Code: Pascal [Select][+][-]
...
type
...
TIntList = specialize TList<Integer>;
TStringIntListMap = specialize TObjectDictionary<string, TIntList>;
TStopTimeDynArr = specialize TObjectList<TStopTime>;
...
procedure GetStopTimes(out AStopTimes: TStopTimeDynArr; out AStopTimesIxByTrip: TStringIntListMap);
type
TKeySet = array[0..3] of string;
procedure ParseLine(p, pEnd: PChar; out Keys: TKeySet);
var
Idx: Integer;
pStart: PChar;
begin
pStart := p;
Idx := 0;
while p <= pEnd do begin
if p^ = ',' then begin
SetLength(Keys[Idx], p - pStart);
Move(pStart^, Keys[Idx][1], p - pStart);
if Idx = 3 then exit;
Inc(Idx);
pStart := p+1;
end;
Inc(p);
end;
end;
var
ms: TMemoryStream;
LStart,LEnd: TDateTime;
p, pStart, pStop: PChar;
LStopTimesIx: TIntList;
k: TKeySet;
HeaderOk: Boolean = False;
begin
ms := TMemoryStream.Create;
try
LStart := Now;

ms.LoadFromFile('../MBTA_GTFS/stop_times.txt');
p := ms.Memory;
pStop := p + ms.Size;
pStart := nil;
AStopTimesIxByTrip := TStringIntListMap.Create([doOwnsValues]);
AStopTimes := TStopTimeDynArr.Create;
while p < pStop do begin
if p^ in [#10, #13] then
if pStart <> nil then begin
ParseLine(pStart, p - 1, k);
if HeaderOk then begin
if not AStopTimesIxByTrip.TryGetValue(k[0], LStopTimesIx) then begin
LStopTimesIx := TIntList.Create;
AStopTimesIxByTrip.Add(k[0], LStopTimesIx);
end;
LStopTimesIx.Add(AStopTimes.Count);
AStopTimes.Add(TStopTime.Create(k[0], k[3], k[1], k[2]));
end else begin
if (k[0] <> 'trip_id') or (k[3] <> 'stop_id') or (k[1] <> 'arrival_time') or (k[2] <> 'departure_time') then begin
WriteLn('stop_times.txt not in expected format.');
Halt(1);
end;
HeaderOk := True;
end;
pStart := nil;
end else
else
if pStart = nil then
pStart := p;
Inc(p);
end;
finally
ms.Free;
end;

LEnd := Now;
WriteLn('parsed ', AStopTimes.Count, ' stop times in ', SecondSpan(LStart, LEnd):1:3,' seconds');
end;
...

works out in 1.317 s, and if primitives from LGenerics are used, then in 0.810 s.

My expectation is to use a generic (in the sense of not specifically tailored for this needs) csv loading code, because the Go version also uses their generic encoding/csv package. Hence, whole parsing still needs to be done. Using TMemoryStream might be a good idea since FPC doesn't yet have optimizations for array indexing with loop variables (I requested years ago, but FPK only came up with a showcase but never really committed to the repo, AFAIR) which I believe, Go's gc has. So using explicit pointer is the way to go.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on November 17, 2022, 12:04:49 pm

Quote from: Thaddy on November 17, 2022, 09:37:43 am

Libraries by themselves are by their very nature not part of it: they have nothing to do with code generation.

Meanwhile, it looks like the benchmark under discussion is mostly about libraries. Or maybe you believe that when compiled under LLVM, FCL.TCSVDocument will suddenly start flying like a starfighter?

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Thaddy on November 17, 2022, 01:49:03 pm

starfighers usually crashed.
https://en.wikipedia.org/wiki/Lockheed_F-104_Starfighter

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 17, 2022, 01:59:08 pm

Interestingly, I tried:

Adapting avk's parsing algorithm: nope, slower, back to 5s
Using TMemoryStream instead of TFileStream + single Read to a string: nope, slower, about 4.8s

My version is still the fastest at 4.6s.

I don't think the app part needs modification yet, as the bottleneck there is still the HTTP server (tried comparing with curl to see the download speed, fphttpserver is around 4.5MBps while Go http server is 100-160MBps, huge difference). It's however quite fun to test various list/vector data structures. generics.collections.TList is damn slow compared to gvector.TVector, making the code a whole reading time 20% slower to 6s. I'll test LGenerics, despite only map benchmark is available, I hope the other data structures (list especially) is coded with similar performance characteristics in mind.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: BrunoK on November 17, 2022, 04:02:45 pm

Modifying csvutils.TCSVDocument.LoadFromFile(const AFileName: String); to pre compute number of rows and do a single initial SetLength(FCells, lRows) seem to improve GetStopTimes by roughly 20 %

Code: Pascal [Select][+]

procedure TCSVDocument.LoadFromFile(const AFileName: String);
var
  fs: TFileStream;
  s: String;
  n,i,j,r,c: SizeInt;
  lRows : integer;
begin
  fs := TFileStream.Create(AFileName, fmOpenRead);
  n := fs.Size;
  SetLength(s, n);
  fs.Read(s[1], n);
 
  i := 1;
  j := i;
  r := 0;
  c := 0;
  if n > 0 then begin
    lRows := 0;
    while i <= n do begin
      if s[i] = #10 then
        inc(lRows);
      inc(i);
    end;
    // SetLength(FCells, 1);
    SetLength(FCells, lRows);
    i := 1;
    while i <= n do begin
        case s[i] of
        ',': begin
          SetLength(FCells[r], c + 1);
          FCells[r, c] := Copy(s, j, i - j);
          Inc(c);
          j := i + 1;
        end;
        #10: begin
          SetLength(FCells[r], c + 1);
          FCells[r, c] := Copy(s, j, i - j);
          Inc(r);
          c := 0;
          // SetLength(FCells, r + 1);
          j := i + 1;
        end;
      end;
      Inc(i);
    end;
  end;
  SetLength(FCells, r);
 
  fs.Free;
end;

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 17, 2022, 10:19:07 pm

Quote from: BrunoK on November 17, 2022, 04:02:45 pm

Modifying csvutils.TCSVDocument.LoadFromFile(const AFileName: String); to pre compute number of rows and do a single initial SetLength(FCells, lRows) seem to improve GetStopTimes by roughly 20 %

Still slower than the latest commit which is based on TVector:

Code: [Select]

parsed 1790905 stop times in 4600.000ms
parsed 71091 trips in 180.000ms

vs:

Code: [Select]

parsed 1790905 stop times in 5266.000ms
parsed 71091 trips in 211.000ms

And so I've cloned LGenerics only to find out that its lgList unit contains no simple list class that can hold strings, only objects, so sad.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on November 18, 2022, 09:15:18 am

Probably, lgList is not a very good name, this unit contains some special kind of lists: sorted, hashed. You can look into the lgVector unit.
Although TVector from fcl-stl is quite fast, it seems that this version of CSVDocument

Code: Pascal [Select][+]

unit ucsv;
{$mode objfpc}{$h+}
interface
 
uses
  SysUtils, lgVector;
 
type
  TCSVDoc = class
  private
  type
    TStrList   = specialize TGVector<string>;
    TStrList2D = specialize TGObjectVector<TStrList>;
  var
    FCells: TStrList2D;
    function  GetCell(aCol, aRow: SizeInt): string; inline;
    function  GetColCount(aRow: SizeInt): SizeInt; inline;
    function  GetRowCount: SizeInt; inline;
  public
    destructor Destroy; override;
    procedure LoadFromFile(const aFileName: string);
    property  Cells[aCol, aRow: SizeInt]: string read GetCell; default;
    property  ColCount[aRow: SizeInt]: SizeInt read GetColCount;
    property  RowCount: SizeInt read GetRowCount;
  end;
 
implementation
 
uses
  Classes, Math, lgUtils;
 
function TCSVDoc.GetCell(aCol, aRow: SizeInt): string;
begin
  Result := FCells[aRow][aCol];
end;
 
function TCSVDoc.GetColCount(aRow: SizeInt): SizeInt;
begin
  Result := FCells[aRow].Count;
end;
 
function TCSVDoc.GetRowCount: SizeInt; inline;
begin
  Result := FCells.Count;
end;
 
destructor TCSVDoc.Destroy;
begin
  FCells.Free;
  inherited;
end;
 
procedure TCSVDoc.LoadFromFile(const aFileName: string);
var
  ss: specialize TGAutoRef<TStringStream>;
  I, CellStart, Size: SizeInt;
  Row: TStrList;
  s: string;
begin
  ss.Instance.LoadFromFile(aFileName);
  s := ss.Instance.DataString;
  ss.Clear;
  if FCells = nil then
    FCells := TStrList2D.Create;
  FCells.Clear;
  CellStart := 0;
  Size := -1;
  Row := nil;
 
  for I := 1 to Length(s) do
    case s[I] of
      ',':
        begin
          if Row = nil then begin
            Row := TStrList.Create(Max(Size, DEFAULT_CONTAINER_CAPACITY));
            FCells.Add(Row);
          end;
          Row.Add(Copy(s, CellStart, I - CellStart));
          CellStart := I + 1;
        end;
      #13, #10:
        begin
          if CellStart = 0 then continue;
          Row.Add(Copy(s, CellStart, I - CellStart));
          if Size = -1 then
            Size := Row.Count;
          CellStart := 0;
          Row := nil;
        end;
    else
      if CellStart = 0 then
        CellStart := I;
    end;
end;
 
end.  
 

is 8-9 percent faster than your TVector-based one.
And accordingly, this version of GetStopTimes()

Code: Pascal [Select][+]

  ...
uses
  ...
  lgUtils, lgHashMap, lgVector,
  ... 
type
  TIntList          = specialize TGVector<Integer>;
  TStringIntListMap = specialize TGObjHashMapLP<String, TIntList>;
  ...
procedure GetStopTimes(var AStopTimes: TStopTimeDynArr; var AStopTimesIxByTrip: TStringIntListMap);
var
  LCSV: TCSVDoc;
  LStart,LEnd: TDateTime;
  i: Integer;
  LTrip: String;
  LStopTimesIx: ^TIntList;
begin
  LCSV := TCSVDoc.Create; 
  try
    LStart := Now;
 
    LCSV.LoadFromFile('../MBTA_GTFS/stop_times.txt');
 
    if (LCSV[0, 0] <> 'trip_id') or (LCSV[3, 0] <> 'stop_id') or (LCSV[1, 0] <> 'arrival_time') or (LCSV[2, 0] <> 'departure_time') then begin
      WriteLn('stop_times.txt not in expected format:');
      for i := 0 to LCSV.ColCount[0] - 1 do begin
        WriteLn(i, ' ' + LCSV[i, 0]);
      end;
      Halt(1);
    end;
 
    SetLength(AStopTimes, LCSV.RowCount - 1);
    AStopTimesIxByTrip := TStringIntListMap.Create([moOwnsValues]);
    for i := 1 to LCSV.RowCount - 1 do begin
      LTrip := LCSV[0, i];
      if not AStopTimesIxByTrip.FindOrAddMutValue(LTrip, LStopTimesIx) then
        LStopTimesIx^ := TIntList.Create;
      LStopTimesIx^.Add(i - 1);
      AStopTimes[i - 1] := TStopTime.Create(LTrip, LCSV[3, i], LCSV[1, i], LCSV[2, i]);
    end;
 
    LEnd := Now;
 
    WriteLn('parsed ', Length(AStopTimes), ' stop times in ', SecondSpan(LStart, LEnd):1:3,' seconds');
  finally
    LCSV.Free;
  end;
end; 
 

works out in about 1.640 s.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on November 18, 2022, 10:33:42 am

Quote from: Thaddy on November 17, 2022, 01:49:03 pm

starfighers usually crashed.
https://en.wikipedia.org/wiki/Lockheed_F-104_Starfighter

I remember reading a long time ago that pilots had very mixed feelings about the F-104, something like "I love it and I hate it".

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 18, 2022, 07:36:32 pm

Quote from: avk on November 18, 2022, 09:15:18 am

Probably, lgList is not a very good name, this unit contains some special kind of lists: sorted, hashed. You can look into the lgVector unit.

Oh, my dear eyes. How the heck they skip "vector" while it's so clear... maybe I shouldn't code after midnight haha.

Quote from: avk on November 18, 2022, 09:15:18 am

Although TVector from fcl-stl is quite fast, it seems that this version of CSVDocument
<code intentionally skipped for brevity>
is 8-9 percent faster than your TVector-based one.
And accordingly, this version of GetStopTimes()
<code intentionally skipped for brevity>
works out in about 1.640 s.

Wow! Almost another full second drop on my system!

Code: [Select]

parsed 1790905 stop times in 3.846 seconds
parsed 71091 trips in 155.000ms

This is how the fun in code optimizing should be! Great job, avk!
So basically now on my system the loading code is only about twice as slow than the Go counterpart. The original repo has its Go loading time at around 850ms, which is 0.45x the time needed on my system (about 1900ms). Assuming linear result, on the original repo system it should only take about 1725ms, making it still the slowest among the mainstreams, but at least still beating Deno and Elixir. Now on to the HTTP server...

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on November 19, 2022, 07:01:36 am

Thank you. It seems possible to squeeze a little more out of TCSVDoc, at least my Windows version of this

Code: Pascal [Select][+]

procedure TCSVDoc.LoadFromFile(const aFileName: string);
var
  ms: specialize TGAutoRef<TMemoryStream>;
  p, pCell, pEnd: pChar;
  Size: SizeInt;
  Row: TStrList;
  pItem: TStrList.PItem;
begin
  ms.Instance.LoadFromFile(aFileName);
  if FCells = nil then
    FCells := TStrList2D.Create;
  FCells.Clear;
  p := ms.Instance.Memory;
  pEnd := p + ms.Instance.Size;
  pCell := nil;
  Size := -1;
  Row := nil;
 
  while p < pEnd do begin
    case p^ of
      ',':
        begin
          if pCell = nil then pCell := p;
          if Row = nil then begin
            Row := TStrList.Create(Max(Size, DEFAULT_CONTAINER_CAPACITY));
            FCells.Add(Row);
          end;
          pItem := Row.UncMutable[Row.Add('')];
          if p - pCell > 0 then begin
            SetLength(pItem^, p - pCell);
            Move(pCell^, PChar(pItem^)^, p - pCell);
          end;
          pCell := p + 1;
        end;
      #10, #13:
        if pCell <> nil then begin
          pItem := Row.UncMutable[Row.Add('')];
          if p - pCell > 0 then begin
            SetLength(pItem^, p - pCell);
            Move(pCell^, PChar(pItem^)^, p - pCell);
          end;
          if Size = -1 then
            Size := Row.Count;
          pCell := nil;
          Row := nil;
        end;
    else
      if pCell = nil then
        pCell := p;
    end;
    Inc(p);
  end;
  if pCell <> nil then begin
    pItem := Row.UncMutable[Row.Add('')];
    SetLength(pItem^, p - pCell);
    Move(pCell^, PChar(pItem^)^, p - pCell);
  end;
end;
 

shows a 7-8 percent performance improvement.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 19, 2022, 01:56:24 pm

Quote from: avk on November 19, 2022, 07:01:36 am

Thank you. It seems possible to squeeze a little more out of TCSVDoc, at least my Windows version of this
<code intentionally skipped for brevity>
shows a 7-8 percent performance improvement.

Yep, another 200ms improvement. Getting closer to C#. I still think we can do something on the else part of that case, as it will be the most frequent match, so basically we check for pCell's nil-ity on every iteration, which is not so good.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on November 19, 2022, 05:23:19 pm

Quote from: Leledumbo on November 19, 2022, 01:56:24 pm

<...>
I still think we can do something on the else part of that case, as it will be the most frequent match, so basically we check for pCell's nil-ity on every iteration, which is not so good.

Something like this?

Code: Pascal [Select][+]

procedure TCsvDoc.LoadFromFile(const aFileName: string);
var
  ms: specialize TGAutoRef<TMemoryStream>;
  p, pCell, pEnd: pChar;
  Size: SizeInt;
  Row: TStrList;
  pItem: TStrList.PItem;
begin
  ms.Instance.LoadFromFile(aFileName);
  if FCells = nil then
    FCells := TStrList2D.Create;
  FCells.Clear;
  p := ms.Instance.Memory;
  pEnd := p + ms.Instance.Size;
  pCell := p;
  Size := -1;
  Row := nil;
 
  while p < pEnd do begin
    case p^ of
      ',':
        begin
          if Row = nil then begin
            Row := TStrList.Create(Max(Size, DEFAULT_CONTAINER_CAPACITY));
            FCells.Add(Row);
          end;
          pItem := Row.UncMutable[Row.Add('')];
          if p - pCell > 0 then begin
            SetLength(pItem^, p - pCell);
            Move(pCell^, PChar(pItem^)^, p - pCell);
          end;
          pCell := p + 1;
        end;
      #10, #13:
        if Row <> nil then begin
          pItem := Row.UncMutable[Row.Add('')];
          if p - pCell > 0 then begin
            SetLength(pItem^, p - pCell);
            Move(pCell^, PChar(pItem^)^, p - pCell);
          end;
          if Size = -1 then
            Size := Row.Count;
          Row := nil;
          pCell := p + 1;
          Inc(pCell, Ord(pCell^ in [#10, #13]));
        end;
    end;
    Inc(p);
  end;
  if Row = nil then exit;
  pItem := Row.UncMutable[Row.Add('')];
  SetLength(pItem^, p - pCell);
  Move(pCell^, PChar(pItem^)^, p - pCell);
end; 
 

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 20, 2022, 10:04:46 pm

Quote from: avk on November 19, 2022, 05:23:19 pm

Something like this?

Yep, but I'm surprised the improvement is very little, despite consistent. 3 times call result, old code:

Code: [Select]

$ ./app
parsed 1790905 stop times in 3.389 seconds
parsed 71091 trips in 133.000ms
$ ./app
parsed 1790905 stop times in 3.396 seconds
parsed 71091 trips in 133.000ms
$ ./app
parsed 1790905 stop times in 3.386 seconds
parsed 71091 trips in 131.000ms

New code:

Code: [Select]

$ ./app
parsed 1790905 stop times in 3.373 seconds
parsed 71091 trips in 128.000ms
$ ./app
parsed 1790905 stop times in 3.391 seconds
parsed 71091 trips in 128.000ms
$ ./app
parsed 1790905 stop times in 3.378 seconds
parsed 71091 trips in 127.000ms

Nevertheless, I think this is good enough. I'm working on the HTTP server as I've found Fundamentals library (both version 4 and 5) also has it, but mORMot 2 might be the first I would try, as mORMot (1) has been very concerned about benchmarks and performance.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 23, 2022, 12:44:03 pm

I've successfully turned the server into mORMot 2 one, however, the slowness remains. So I tried localizing to find bottlenecks.
I count elapsed time between 2 operations: a call to BuildTripResponse and JSONification of the response, using the last 5 parameters from the loadTest.js, with shuffleArray disabled.

Here's the result from Go version:

Code: [Select]

buildTripResponse: 1.324407ms
jsonify: 16.314437ms
buildTripResponse: 912.183µs
jsonify: 10.437676ms
buildTripResponse: 779.098µs
jsonify: 8.114551ms
buildTripResponse: 103.812µs
jsonify: 1.047072ms
buildTripResponse: 68.561µs
jsonify: 953.276µs

and Pascal version:

Code: [Select]

BuildTripResponse: 1ms
JSONify: 36ms
BuildTripResponse: 4ms
JSONify: 90ms
BuildTripResponse: 3ms
JSONify: 73ms
BuildTripResponse: 7ms
JSONify: 176ms
BuildTripResponse: 23ms
JSONify: 581ms

or maybe in table format:

Code: [Select]

+------------+----------------------+-----------------------+-----------------+----------------+
| request no | Go buildTripResponse | FPC BuildTripResponse | Go json.Marshal | FPC FormatJSON |
+------------+----------------------+-----------------------+-----------------+----------------+
|          1 |               1.32ms |                   1ms |         16.31ms |           36ms |
|          2 |               0.91ms |                   4ms |         10.44ms |           90ms |
|          3 |               0.78ms |                   3ms |          8.11ms |           73ms |
|          4 |               0.10ms |                   7ms |          1.05ms |          176ms |
|          5 |               0.07ms |                  23ms |          0.95ms |          581ms |

The JSON-ification is clearly a bottleneck, partly due to my way of coding it as it expects array response at the top instead of object, and there's no ArrayToJSON, only ObjectToJSON available, hence my manual per object method. but the BuildTripResponse is not good either, losing up to 328x (the last result).

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on November 23, 2022, 03:53:10 pm

There are ArrayToJson-like methods in mORMot.

Use DynArrayLoadJson() and DynArraySaveJson() in mormot.core.json.

Ensure you defined the embedded records as packed, and define the fields as text calling Rtti.RegisterFromText() once at startup.

You could share your code on a gist (or somewhere else) so that we could look at it and optimized it to match mORMot's best performance needs.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on November 23, 2022, 03:58:23 pm

Quote from: Leledumbo on November 23, 2022, 12:44:03 pm

....
Code: [Select]
+------------+----------------------+-----------------------+-----------------+----------------+ | request no | Go buildTripResponse | FPC BuildTripResponse | Go json.Marshal | FPC FormatJSON | +------------+----------------------+-----------------------+-----------------+----------------+ | 1 | 1.32ms | 1ms | 16.31ms | 36ms | | 2 | 0.91ms | 4ms | 10.44ms | 90ms | | 3 | 0.78ms | 3ms | 8.11ms | 73ms | | 4 | 0.10ms | 7ms | 1.05ms | 176ms | | 5 | 0.07ms | 23ms | 0.95ms | 581ms |...

IIRC, the fastest way to stringify TJSONData is TJSONData.DumpJSON().

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 23, 2022, 10:45:34 pm

Quote from: abouchez on November 23, 2022, 03:53:10 pm

You could share your code on a gist (or somewhere else) so that we could look at it and optimized it to match mORMot's best performance needs.

I've pushed it to the repo, with all the elapsed time counters. Feel free to dig in and find things to optimize. Btw, my test still shows that mORMot HTTP server bandwidth is not that much faster from fphttpserver, even similar. So maybe you can also take a look at that.

Btw, I copied only needed files from mormot2 repository and not using the Lazarus packages. Hope you already have a directory with all precompiled units and static assets.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on November 24, 2022, 09:57:44 am

I just found the repo at https://github.com/leledumbo/transit-lang-cmp
*Edit*: there are .o and .ppu files included in the repo. :(

In fact, there are several HTTP servers.
THttpAsyncServer is meant for thousands of concurrent kept-alive clients.
THttpServer is socket-based, and use one thread per kept-alive client.
THttpApiServer is Windows only, and use the http.sys API.

For 50 concurrent clients, on Windows, THttpApiServer is likely to be the fastest, and perhaps THttpServer also would have good numbers.

From what I can see, the main bottleneck is not the server, but the trip/route process. A lot of allocations, and I am not sure that using generics is the fastest path.
I will try to reimplement it in a mORMot way.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 24, 2022, 04:49:41 pm

Quote from: abouchez on November 24, 2022, 09:57:44 am

*Edit*: there are .o and .ppu files included in the repo. :(

Dang, must be accidentally added when committing.

Quote from: abouchez on November 24, 2022, 09:57:44 am

In fact, there are several HTTP servers.
THttpAsyncServer is meant for thousands of concurrent kept-alive clients.
THttpServer is socket-based, and use one thread per kept-alive client.
THttpApiServer is Windows only, and use the http.sys API.

For 50 concurrent clients, on Windows, THttpApiServer is likely to be the fastest, and perhaps THttpServer also would have good numbers.

As the load test does concurrent attacks, I choose the first one.

Quote from: abouchez on November 24, 2022, 09:57:44 am

From what I can see, the main bottleneck is not the server, but the trip/route process. A lot of allocations, and I am not sure that using generics is the fastest path.
I will try to reimplement it in a mORMot way.

Please do, but please also compare with the original Go code from which I converted the code from.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on November 25, 2022, 09:14:38 am

Please check https://github.com/synopse/mORMot2/tree/master/ex/lang-cmp

This is the "mORMot way" of implementing the language comparison server.
- It uses mORMot RTTI to parse the CSV into records, and generate JSON from the results.
- We did not use a map/dictionary, but TDynArray feature of binary searching of sorted data.
- By default, the CSV parse will "intern" all values to de-duplicate strings: it is slightly slower (around 30% I guess), but it reduces the memory a lot, and also speed up comparisons of identical strings (they both share the same pointer, so no need to compare the characters).

From what I can see, performance should be much better that the current pascal version.
One amazing difference with other implementations is the memory consumption. A running server consume less than 70MB or RAM on my PC with all the data loaded - thanks to interning: it is 380 MB otherwise, which is much lower than alternatives anyway.
Please compare with GO and RUST on your machine.

For a given schedule:

Code: [Select]

ab@ab:~/dev/github/mORMot2$ wrk -c 50 -d 15s -t 4 http://localhost:4000/schedules/354
Running 15s test @ http://localhost:4000/schedules/354
  4 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    16.54ms    4.18ms  53.47ms   91.70%
    Req/Sec   728.48     62.13   840.00     72.00%
  43545 requests in 15.02s, 8.96GB read
Requests/sec:   2898.99
Transfer/sec:    611.09MB

For a void result (i.e. a non existing schedule which return only [] JSON result):

Code: [Select]

ab@ab :~/dev/github/mORMot2$ wrk -c 50 -d 15s -t 4 http://localhost:4000/schedules/7777777
Running 15s test @ http://localhost:4000/schedules/7777777
  4 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   323.12us  396.82us  20.32ms   97.96%
    Req/Sec    34.93k     1.70k   41.89k    72.83%
  2086314 requests in 15.01s, 240.75MB read
Requests/sec: 138980.30
Transfer/sec:     16.04MB

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 25, 2022, 11:53:15 am

Quote from: abouchez on November 25, 2022, 09:14:38 am

Please check https://github.com/synopse/mORMot2/tree/master/ex/lang-cmp

This is the "mORMot way" of implementing the language comparison server.

Impressive loading time on my machine:

Code: [Select]

parsed 1790905 stop times in 972.85ms
parsed 71091 trips in 45.05ms

Quote from: abouchez on November 25, 2022, 09:14:38 am

- It uses mORMot RTTI to parse the CSV into records, and generate JSON from the results.

I see you just added the CSV loading functionality and boy, mORMot's RTTI is surely powerful.

Quote from: abouchez on November 25, 2022, 09:14:38 am

- We did not use a map/dictionary, but TDynArray feature of binary searching of sorted data.
- By default, the CSV parse will "intern" all values to de-duplicate strings: it is slightly slower (around 30% I guess), but it reduces the memory a lot, and also speed up comparisons of identical strings (they both share the same pointer, so no need to compare the characters).

Lots of pointers, but I guess that what makes it fast, compared to the convenience of generics.

Quote from: abouchez on November 25, 2022, 09:14:38 am

From what I can see, performance should be much better that the current pascal version.
One amazing difference with other implementations is the memory consumption. A running server consume less than 70MB or RAM on my PC with all the data loaded - thanks to interning: it is 380 MB otherwise, which is much lower than alternatives anyway.
Please compare with GO and RUST on your machine.

Done:

Code: [Select]

running (0m34.7s), 00/50 VUs, 296 complete and 0 interrupted  | running (0m31.8s), 00/50 VUs, 308 complete and 0 interrupted 
default ✓ [======================================] 50 VUs  30   default ✓ [======================================] 50 VUs  30

     data_received..................: 27 GB  763 MB/s         |      data_received..................: 30 GB  948 MB/s
     data_sent......................: 2.8 MB 79 kB/s          |      data_sent......................: 2.9 MB 90 kB/s
     http_req_blocked...............: avg=7.46µs  min=1.09µs  |      http_req_blocked...............: avg=7.78µs  min=1.04µs 
     http_req_connecting............: avg=1.17µs  min=0s      |      http_req_connecting............: avg=2.71µs  min=0s     
     http_req_duration..............: avg=56.71ms min=73.52µs |      http_req_duration..............: avg=50.84ms min=143.5µs
       { expected_response:true }...: avg=56.71ms min=73.52µs |        { expected_response:true }...: avg=50.84ms min=143.5µs
     http_req_failed................: 0.00%  ✓ 0          ✗ 2 |      http_req_failed................: 0.00%  ✓ 0          ✗ 3
     http_req_receiving.............: avg=24.56ms min=16.13µs |      http_req_receiving.............: avg=7.9ms   min=15.58µs
     http_req_sending...............: avg=23.58µs min=5.41µs  |      http_req_sending...............: avg=54.32µs min=5.56µs 
     http_req_tls_handshaking.......: avg=0s      min=0s      |      http_req_tls_handshaking.......: avg=0s      min=0s     
     http_req_waiting...............: avg=32.12ms min=45.78µs |      http_req_waiting...............: avg=42.88ms min=98.37µs
     http_reqs......................: 29304  844.805816/s     |      http_reqs......................: 30492  958.247358/s
     iteration_duration.............: avg=5.62s   min=4.38s   |      iteration_duration.............: avg=5.04s   min=2s     
     iterations.....................: 296    8.533392/s       |      iterations.....................: 308    9.679266/s
     vus............................: 13     min=13       max |      vus............................: 30     min=30       max
     vus_max........................: 50     min=50       max        vus_max........................: 50     min=50       max

Go version still runs a little bit faster, but not by much. I still attribute this to their higher HTTP server download bandwidth for some reason.

I guess it's better for your version to be submitted instead of my lackluster one.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on November 25, 2022, 12:22:13 pm

You are right: some pointers, but they are typed pointers, so it is a somewhat safe approach.
Using array indexes with no runtime range checking is as unsafe as inc(typedpointer).

I have just commited a huge LangCmp sample speed-up
https://github.com/synopse/mORMot2/commit/cfdcb7f2
- we use PUtf8Char for JSON responses
- it is fair, since Rust is using &'data str
- string refcounting has a price, even if it does not allocate memory: but it has a thread-safe incremented lock, which is somewhat slow - even slower on ARM
- also made the sample Windows compatible
- should be faster than Go now - please try !

Numbers are pretty good now:

Code: [Select]

ab@ab:~/dev/github/mORMot2$ wrk -c 50 -d 15s -t 4 http://localhost:4000/schedules/354
Running 15s test @ http://localhost:4000/schedules/354
  4 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.97ms    2.73ms  38.58ms   92.03%
    Req/Sec     1.10k    70.60     1.30k    70.83%
  65615 requests in 15.01s, 13.51GB read
Requests/sec:   4371.23
Transfer/sec:      0.90GB

To be included on the main repository, performance may be less obvious on the Mac M1 / AARCH64 architecture.
The mORMot HTTP server is less optimized on Mac (use good old poll API there), and there is no optimized asm involved.
But I have seen string interning to be of good benefit on AARCH64, because it avoids most memory allocations.
My problem is that I don't know how to propose a simple way of building the project with no FPC/Lazarus knowledge. ;)

I wonder about the memory consumption of the server on your machine, e.g. in comparison with Go and its GC.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Fred vS on November 25, 2022, 02:00:20 pm

@Leledumbo + @abouchez + @avk : WoW! ;D

Many thanks to show that fpc is still standing and continuing to fight in the ring.

[EDIT] Did you also test with fpc-llvm with -O4 ?

Fre;D

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 25, 2022, 11:25:47 pm

Quote from: abouchez on November 25, 2022, 12:22:13 pm

You are right: some pointers, but they are typed pointers, so it is a somewhat safe approach.
Using array indexes with no runtime range checking is as unsafe as inc(typedpointer).

I have just commited a huge LangCmp sample speed-up
https://github.com/synopse/mORMot2/commit/cfdcb7f2
- we use PUtf8Char for JSON responses
- it is fair, since Rust is using &'data str
- string refcounting has a price, even if it does not allocate memory: but it has a thread-safe incremented lock, which is somewhat slow - even slower on ARM
- also made the sample Windows compatible
- should be faster than Go now - please try !

Nice!

Code: [Select]

parsed 1790905 stop times in 968.43ms                         | parsed 1790905 stop times in 3.245251432s
parsed 71091 trips in 39.54ms                                 | parsed 71091 trips in 85.747852ms

running (0m33.4s), 00/50 VUs, 348 complete and 0 interrupted  | running (0m32.3s), 00/50 VUs, 320 complete and 0 interrupted 
default ✓ [======================================] 50 VUs  30   default ✓ [======================================] 50 VUs  30

     data_received..................: 31 GB  933 MB/s         |      data_received..................: 31 GB  971 MB/s
     data_sent......................: 3.2 MB 97 kB/s          |      data_sent......................: 3.0 MB 92 kB/s
     http_req_blocked...............: avg=9µs     min=1.09µs  |      http_req_blocked...............: avg=6.77µs  min=1.09µs 
     http_req_connecting............: avg=2.95µs  min=0s      |      http_req_connecting............: avg=1.73µs  min=0s     
     http_req_duration..............: avg=47.59ms min=97.28µs |      http_req_duration..............: avg=49.02ms min=123.81µ
       { expected_response:true }...: avg=47.59ms min=97.28µs |        { expected_response:true }...: avg=49.02ms min=123.81µ
     http_req_failed................: 0.00%  ✓ 0           ✗  |      http_req_failed................: 0.00%  ✓ 0          ✗ 3
     http_req_receiving.............: avg=9.66ms  min=15.35µs |      http_req_receiving.............: avg=5.92ms  min=14.76µs
     http_req_sending...............: avg=87.24µs min=5.2µs   |      http_req_sending...............: avg=70.71µs min=5.2µs  
     http_req_tls_handshaking.......: avg=0s      min=0s      |      http_req_tls_handshaking.......: avg=0s      min=0s     
     http_req_waiting...............: avg=37.83ms min=54.74µs |      http_req_waiting...............: avg=43.02ms min=91.84µs
     http_reqs......................: 34452  1032.205528/s    |      http_reqs......................: 31680  981.949476/s
     iteration_duration.............: avg=4.72s   min=3.54s   |      iteration_duration.............: avg=4.86s   min=2.19s  
     iterations.....................: 348    10.426318/s      |      iterations.....................: 320    9.918682/s
     vus............................: 30     min=30        ma |      vus............................: 15     min=15       max
     vus_max........................: 50     min=50        ma |      vus_max........................: 50     min=50       max

Quote from: abouchez on November 25, 2022, 12:22:13 pm

To be included on the main repository, performance may be less obvious on the Mac M1 / AARCH64 architecture.
The mORMot HTTP server is less optimized on Mac (use good old poll API there), and there is no optimized asm involved.
But I have seen string interning to be of good benefit on AARCH64, because it avoids most memory allocations.

Yes, I don't have the machine either, so I can't test myself.

Quote from: abouchez on November 25, 2022, 12:22:13 pm

My problem is that I don't know how to propose a simple way of building the project with no FPC/Lazarus knowledge. ;)

My local build uses -Fu<mormot2>/src/* and -Fi<mormot2>/src as well as -FU<somewhere else>, this way the compiled units will end up in a single directory instead of alongside their source code.

Quote from: abouchez on November 25, 2022, 12:22:13 pm

I wonder about the memory consumption of the server on your machine, e.g. in comparison with Go and its GC.

Upon finished loading the CSV, it only eats 80MB, heck so little. Sounds a bit magical. But during load test, it fluctuates between 250-350MB, upon which it returns to 80MB at the end. I kinda feel like a GC behavior here, or just a proper memory deallocation scheme.
The Go version eats 925MB upon finished loading the CSV. During load test, it tops at 1.5GB, returning to 925MB afterwards.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 26, 2022, 12:15:12 am

@abouchez:
I've added all the required files from mORMot 2 repo, alongside LICENSE, for easy building reason, also a Makefile for the same reason.
a README.md is provided to mention mORMot 2 and your name, please check and feel free to ask for change if you mind something:
https://github.com/leledumbo/transit-lang-cmp/tree/main/trascal

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on November 26, 2022, 09:21:37 am

@Leledumbo
Thanks for the numbers! ;D

During load test, there are in fact a lot of memory allocated during the response computation - so no GC, just the response array and the JSON cooked in memory. A HTTP request with a static response consumes no more memory (just some minimal information per connection).

About the static files, you are right, there are none needed for darwin-aarch64 IIRC.
To be fair this target was not well tested yet, since I don't have the HW. ;) Nor optimized yet: it uses pascal code everywhere.
Only linux-aarch64 has been validated, with some optimized asm for AES and hashing - my guess is that those binary is likely to work on darwin-aarch64 also, but I could not test it. And aarch64 is supported good enough by FPC - https://blog.synopse.info/?post/2021/08/17/mORMot-2-on-Ampere-AARM64-CPU 8-)

It could be good to add memory consumption numbers to your readme file, for reference.
Memory consumption is of great interest for real server/business work, even if GB of RAM is somewhat cheap these days - but not for Apple M1 O:-)

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on November 26, 2022, 10:01:44 am

@FredvS
I never tested fpc-llvm to be fair.
I usually use fpcupdeluxe to setup my environment, and llvm is not supported by it, IIRC.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Thaddy on November 26, 2022, 10:11:26 am

Freepascal has always tested good values for memory consumption. That is nothing new.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on November 26, 2022, 10:50:28 am

Quote from: Thaddy on November 26, 2022, 10:11:26 am

Freepascal has always tested good values for memory consumption. That is nothing new.

The original FreePascal version by Leledumbo consumed much more memory.

Here good memory values comes not only from pascal, but the algorithms we used, and the mORMot way of handling the data.
We are dealing about 10 times less memory for storing the data in respect to Go, and 5 times less memory when serving the requests.
https://forum.lazarus.freepascal.org/index.php/topic,61276.msg461380.html#msg461380

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Fred vS on November 26, 2022, 01:01:44 pm

Quote from: abouchez on November 26, 2022, 10:01:44 am

@FredvS
I never tested fpc-llvm to be fair.
I usually use fpcupdeluxe to setup my environment, and llvm is not supported by it, IIRC.

Hello.
https://forum.lazarus.freepascal.org/index.php/topic,61069.msg459333.html#msg459333

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 28, 2022, 02:54:33 am

Quote from: abouchez on November 26, 2022, 10:50:28 am

The original FreePascal version by Leledumbo consumed much more memory.

True. A good example of how the convenience of object orientation, reference counted strings and generics can massively contribute to memory consumption as opposed to plain dynamic arrays (with clever growth), pchars and manual memory (de)allocation.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on November 28, 2022, 11:52:34 am

@Leledumbo
Perhaps you may add my last blog article in the ReadMe:
https://blog.synopse.info/?post/2022/11/26/Modern-Pascal-is-Still-in-the-Race

Title: Re: Programming language comparison by implementing the same transit data app
Post by: PierceNg on November 28, 2022, 12:57:03 pm

Kudos to @Leledumbo and @abouchez for working on this.

For others who want to run this stuff on your machines, you need the k6 tool from https://github.com/grafana/k6.

Repos:

https://github.com/losvedir/transit-lang-cmp
https://github.com/leledumbo/transit-lang-cmp
https://github.com/synopse/mORMot2/

Steps:

Code: Text [Select][+]

% git clone original-repo-or-fork
% cd transit-lang-cmp
% curl -o MBTA_GTFS.zip https://cdn.mbta.com/MBTA_GTFS.zip
% unzip -d MBTA_GTFS MBTA_GTFS.zip

Build and run the implementation you want to try. In another terminal window, run

Code: Text [Select][+]

k6 run loadTest.js

and/or

Code: Text [Select][+]

k6 run loadTestSmallResponses.js

For abouchez's repo, the code is in the subdirectory ex/lang-cmp/.

Edit: Rust's compilation time is terrible.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: PierceNg on November 28, 2022, 04:04:57 pm

Numbers on my laptop, using whatever versions of compilers I happened to have installed, not the latest but not very outdated either, no attempt to 'control the environment':

Rust ran the fastest
Pascal mORMot version and Go comparable
.NET 6.0 C# came fourth
Deno TypeScript

Don't have Scala.

As for Leledumbo's version, fphttpapp is the bottleneck.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on November 28, 2022, 05:27:04 pm

@PierceNg

Nice to read!

Perhaps you may try to get raw numbers also with wrk as tool:

Code: [Select]

wrk -c 50 -d 15s -t 4 http://localhost:4000/schedules/354Then try with -c 500 and perhaps -c 5000 (with proper ulimit set if needed). To see how they scale with a high number of connections.
I don't know very much k6, but wrk has been proven to be very low-resource consuming.

Perhaps also comparing top memory use by each language.

Thanks a lot for the feedback!

Title: Re: Programming language comparison by implementing the same transit data app
Post by: fcu on November 28, 2022, 09:12:30 pm

good job Arnaud
i remember watching this video ( https://youtu.be/Z5hxDviE_DM ) which benchmarking mormot with other library , it shows that mormot is the fastest .

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on November 29, 2022, 05:21:26 am

Quote from: abouchez on November 28, 2022, 11:52:34 am

@Leledumbo
Perhaps you may add my last blog article in the ReadMe:
https://blog.synopse.info/?post/2022/11/26/Modern-Pascal-is-Still-in-the-Race

Quote from: PierceNg on November 28, 2022, 04:04:57 pm

Numbers on my laptop, using whatever versions of compilers I happened to have installed, not the latest but not very outdated either, no attempt to 'control the environment':
Rust ran the fastest

LLVM magic is hard to beat, indeed.

Quote from: PierceNg on November 28, 2022, 04:04:57 pm

As for Leledumbo's version, fphttpapp is the bottleneck.

Acknowledged, but I can't figure out how to make it faster. Michael patched it with exception handling heavily to solve stability issue with it way back then, I suppose it will need a proper rewrite if performance has become a target.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on December 11, 2022, 03:14:03 pm

Quote from: Leledumbo on November 29, 2022, 05:21:26 am

<...>
Quote from: PierceNg on November 28, 2022, 04:04:57 pm
As for Leledumbo's version, fphttpapp is the bottleneck.
Acknowledged, but I can't figure out how to make it faster. Michael patched it with exception handling heavily to solve stability issue with it way back then, I suppose it will need a proper rewrite if performance has become a target.

I wondered what contribution response building and serialization made to the overall problem.
So I replaced all these things with something like this:

Code: Pascal [Select][+]

program app1;
 
{$mode objfpc}{$H+}
{$modeswitch advancedrecords}
{$modeswitch typehelpers}
 
uses
  Classes,
  SysUtils,
  DateUtils,
  lgUtils,
  lgHashMap,
  lgVector,
  lgJson,
  lgCsvUtils,
  fphttpapp,
  httpdefs,
  httproute;
 
type
 
  TStopTime = record
    stop_id,
    arrival_time,
    departure_time: string;
    constructor Make(const aStopID, aArrival, aDeparture: string);
    procedure WriteJson(aWriter: TJsonWriter);
  end;
 
  TStopTimeList = specialize TGVector<TStopTime>;
  TStopTimeMap  = specialize TGObjHashMapLP<string, TStopTimeList>;
 
  TTrip = record
    trip_id,
    service_id,
    route_id: string;
    constructor Make(const aTripID, aRouteID, aServiceID: string);
  end;
 
  TTripList = specialize TGVector<TTrip>;
  TTripMap  = specialize TGObjHashMapLP<string, TTripList>;
 
  TTripResponse = record
    trip_id,
    service_id,
    route_id: string;
    schedules: TStopTimeList;
    constructor Make(const aTrip: TTrip);
    procedure WriteJson(aWriter: TJsonWriter);
  end;
  TTripResponses = array of TTripResponse;
 
  TTrHelper = type helper for TTripResponses
    procedure WriteJson(aWriter: TJsonWriter);
  end;
 
{ TStopTime }
 
constructor TStopTime.Make(const aStopID, aArrival, aDeparture: string);
begin
  stop_id := aStopID;
  arrival_time := aArrival;
  departure_time := aDeparture;
end;
 
procedure TStopTime.WriteJson(aWriter: TJsonWriter);
begin
  aWriter
    .BeginObject
      .Add('stop_id', stop_id)
      .Add('arrival_time', arrival_time)
      .Add('departure_time', departure_time)
    .EndObject;
end;
 
{ TTrip }
 
constructor TTrip.Make(const aTripID, aRouteID, aServiceID: string);
begin
  trip_id := aTripID;
  route_id := aRouteID;
  service_id := aServiceID;
end;
 
{ TTripResponse }
 
constructor TTripResponse.Make(const aTrip: TTrip);
begin
  trip_id := aTrip.trip_id;
  service_id := aTrip.service_id;
  route_id := aTrip.route_id;
  schedules := nil;
end;
 
procedure TTripResponse.WriteJson(aWriter: TJsonWriter);
var
  I: SizeInt;
begin
  aWriter.BeginObject
    .Add('trip_id', trip_id)
    .Add('service_id', service_id)
    .Add('route_id', route_id)
    .AddName('schedules')
    .BeginArray;
    if schedules <> nil then
      for I := 0 to Pred(schedules.Count) do
        schedules.UncMutable[I]^.WriteJson(aWriter);
    aWriter.EndArray;
  aWriter.EndObject;
end;
 
{ TTrHelper }
 
procedure TTrHelper.WriteJson(aWriter: TJsonWriter);
var
  I: SizeInt;
begin
  aWriter.BeginArray;
  for I := 0 to High(Self) do
    Self[I].WriteJson(aWriter);
  aWriter.EndArray;
end;
 
function BuildTripResponse(const ARoute: string; const AStopTimesIxByTrip: TStopTimeMap;
  const ATripsIxByRoute: TTripMap): TTripResponses;
var
  LTrips: TTripList;
  LStopTimes: TStopTimeList;
  LTripIx: Integer;
  LTripResponse: TTripResponse;
begin
  Result := nil;
  if ATripsIxByRoute.TryGetValue(ARoute, LTrips) then begin
    SetLength(Result, LTrips.Count);
    for LTripIx := 0 to Pred(LTrips.Count) do begin
      LTripResponse := TTripResponse.Make(LTrips.UncMutable[LTripIx]^);
      if AStopTimesIxByTrip.TryGetValue(LTripResponse.trip_id, LStopTimes) then
        LTripResponse.schedules := LStopTimes;
      Result[LTripIx] := LTripResponse;
    end;
  end;
end;
 
procedure GetStopTimes(var AStopTimesIxByTrip: TStopTimeMap);
var
  CsvRef: specialize TGAutoRef<TCsvDoc>;
  LCSV: TCsvDoc;
  LStart,LEnd: TDateTime;
  i: Integer;
  LStopTimesIx: ^TStopTimeList;
  p: TCsvDoc.PRow;
begin
  LStart := Now;
  LCSV := CsvRef.Instance.LoadFromFile('../MBTA_GTFS/stop_times.txt', [true, true, true, true]);
  if (LCSV[0, 0] <> 'trip_id') or (LCSV[0, 3] <> 'stop_id') or (LCSV[0, 1] <> 'arrival_time') or (LCSV[0, 2] <> 'departure_time') then begin
    WriteLn('stop_times.txt not in expected format:');
    for i := 0 to LCSV.ColCount[0] - 1 do begin
      WriteLn(i, ' ' + LCSV[0, i]);
    end;
    Halt(1);
  end;
  AStopTimesIxByTrip := TStopTimeMap.Create([moOwnsValues]);
  for i := 1 to LCSV.RowCount - 1 do begin
    p := LCSV.Rows[i];
    if not AStopTimesIxByTrip.FindOrAddMutValue(p^[0], LStopTimesIx) then
      LStopTimesIx^ := TStopTimeList.Create;
    LStopTimesIx^.Add(TStopTime.Make(p^[3], p^[1], p^[2]));
  end;
  LEnd := Now;
  WriteLn('parsed ', LCSV.RowCount, ' stop times in ', SecondSpan(LStart, LEnd):1:3,' seconds');
end;
 
procedure GetTrips(var ATripsIxByRoute: TTripMap);
var
  CsvRef: specialize TGAutoRef<TCsvDoc>;
  LCSV: TCsvDoc;
  LStart,LEnd: TDateTime;
  i: Integer;
  LTripsIx: ^TTripList;
  LRoute: string;
  p: TCsvDoc.PRow;
begin
  LStart := Now;
  LCSV := CsvRef.Instance.LoadFromFile('../MBTA_GTFS/trips.txt', [true, true, true]);
  if (LCSV[0, 2] <> 'trip_id') or (LCSV[0, 0] <> 'route_id') or (LCSV[0, 1] <> 'service_id') then begin
    WriteLn('trips.txt not in expected format:');
    for i := 0 to LCSV.ColCount[0] - 1 do begin
      WriteLn(i, ' ' + LCSV[0, 1]);
    end;
    Halt(1);
  end;
  ATripsIxByRoute := TTripMap.Create([moOwnsValues]);
  for i := 1 to LCSV.RowCount - 1 do begin
    p := LCSV.Rows[i];
    LRoute := p^[0];
    if not ATripsIxByRoute.FindOrAddMutValue(LRoute, LTripsIx) then
      LTripsIx^ := TTripList.Create;
    LTripsIx^.Add(TTrip.Make(p^[2], LRoute, p^[1]));
  end;
  LEnd := Now;
  WriteLn('parsed ', LCSV.RowCount, ' trips in ', SecondSpan(LStart, LEnd):1:3,' seconds');
end;
 
var
  GStopTimesIxByTrip: TStopTimeMap = nil;
  GTripsIxByRoute: TTripMap = nil;
 
procedure SchedulesHandler(ARequest: TRequest; AResponse: TResponse);
var
  LTripResponses: TTripResponses;
  LStringStream: specialize TGAutoRef<TStringStream>;
  LWriter: specialize TGUniqRef<TJsonWriter>;
begin
  LTripResponses := BuildTripResponse(ARequest.RouteParams['route'], GStopTimesIxByTrip, GTripsIxByRoute);
  {%H-}LWriter.Instance := TJsonWriter.New(LStringStream.Instance);
  LTripResponses.WriteJson(LWriter.Instance);
  LWriter.Clear;
  AResponse.ContentType := 'application/json';
  AResponse.Content := LStringStream.Instance.DataString;
end;
 
procedure StopHandler(ARequest: TRequest; AResponse: TResponse);
begin
  GStopTimesIxByTrip.Free;
  GTripsIxByRoute.Free;
  Application.Terminate;
end;
 
begin
  GetStopTimes(GStopTimesIxByTrip);
  GetTrips(GTripsIxByRoute);
 
  Application.Port := 4000;
  Application.Threaded := True;
  HTTPRouter.RegisterRoute('/schedules/:route',@SchedulesHandler);
  HTTPRouter.RegisterRoute('/stop', @StopHandler);
  Application.Run;
end.
 

Results on my machine, the left column is the results of the current version of app the right column is the results of the modified version:

Code: Text [Select][+]

running (1m28.0s), 00/25 VUs, 25 complete and 0 inter | running (1m08.2s), 00/25 VUs, 125 complete and 0 interrupted iterations
default ✓ [ 100% ] 25 VUs  1m0s                         default ✓ [ 100% ] 25 VUs  1m0s
                                                      |
data_received..................: 2.3 GB 26 MB/s       | data_received..................: 11 GB  155 MB/s
data_sent......................: 285 kB 3.2 kB/s      | data_sent......................: 1.6 MB 23 kB/s
http_req_blocked...............: avg=28.28ms  min=0s  | http_req_blocked...............: avg=2.57ms   min=0s
http_req_connecting............: avg=17.96ms  min=0s  | http_req_connecting............: avg=1.71ms   min=0s
http_req_duration..............: avg=839.07ms min=1ms | http_req_duration..............: avg=134.26ms min=0s
  { expected_response:true }...: avg=839.07ms min=1ms |   { expected_response:true }...: avg=134.26ms min=0s
http_req_failed................: 0.00%  ✓ 0             http_req_failed................: 0.00%  ✓ 0
http_req_receiving.............: avg=133.48ms min=0s  | http_req_receiving.............: avg=8.76ms   min=0s
http_req_sending...............: avg=26.31ms  min=0s  | http_req_sending...............: avg=1.35ms   min=0s
http_req_tls_handshaking.......: avg=0s       min=0s  | http_req_tls_handshaking.......: avg=0s       min=0s
http_req_waiting...............: avg=679.28ms min=0s  | http_req_waiting...............: avg=124.14ms min=0s
http_reqs......................: 2475   28.13618/s    | http_reqs......................: 12375  181.409315/s
iteration_duration.............: avg=1m25s    min=1m22| iteration_duration.............: avg=13.5s    min=11.62
iterations.....................: 25     0.284204/s    | iterations.....................: 125    1.832417/s
vus............................: 9      min=9      max| vus............................: 8      min=8
vus_max........................: 25     min=25     max| vus_max........................: 25     min=25
 

It looks like the problem is not just with fphttpapp?

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on December 11, 2022, 07:42:10 pm

Quote from: avk on December 11, 2022, 03:14:03 pm

It looks like the problem is not just with fphttpapp?

No, you are right, I don't think the web server is the issue with the Leledumbo's version.
Switching to mORMot Web Server did not make the code much better. Whereas the same mORMot Web Server with the mORMot JSON serialization is way faster - as fast as Go, i.e. reaching 1GB/s.

IMHO the JSON serialization code is clearly unoptimized and not scaling.
For instance, there are some manual concatenations, and a lot of objects creation, which makes it very slow.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on December 13, 2022, 01:13:21 pm

Quote from: abouchez on December 11, 2022, 07:42:10 pm

<...>
Switching to mORMot Web Server did not make the code much better. Whereas the same mORMot Web Server with the mORMot JSON serialization is way faster - as fast as Go, i.e. reaching 1GB/s.
<...>

Oh, I have a pretty old budget computer, all the results look noticeably more modest on it.

I improved the performance of TJsonWriter a bit and at the same time tried to get rid of redundant entities in the application code, which now looks like this:

Code: Pascal [Select][+]

program app2;
 
{$mode objfpc}{$h+}{$modeswitch advancedrecords}{$modeswitch typehelpers}
 
uses
{$ifdef UNIX}
  cthreads,
{$endif}
  Classes, SysUtils, DateUtils, Math, FpHttpApp, HttpDefs, HttpRoute,
  lgUtils, lgHashMap, lgVector, lgJson, lgCsvUtils;
 
type
  TStopTimeCols   = (stcTripId, stcArrival, stcDeparture, stcStopId);
  TStopTimeFields = stcArrival..stcStopId;
  TTripCols       = (tcRouteId, tcService, tcTripId);
 
{$push}{$J-}
const
  StopNames:    array[TStopTimeCols] of string = ('trip_id', 'arrival_time', 'departure_time', 'stop_id');
  TripNames:    array[TTripCols] of string = ('route_id', 'service_id', 'trip_id');
  StopTimeMask: array[TStopTimeCols] of Boolean = (True, True, True, True);
  TripMask:     array[TTripCols] of Boolean = (True, True, True);
{$pop}
  SchedName = 'schedules';
 
type
  PRow        = TCsvDoc.PRow;
  TRowList    = specialize TGLiteVector<PRow>;
  TRowHashMap = specialize TGLiteChainHashMap<string, TRowList, string>;
  TRowMap     = TRowHashMap.TMap;
  PRowList    = ^TRowList;
 
function BuildTripResponse(const ARoute: string; const aStopTimeRowMap, aTripRowMap: TRowMap): string;
var
  Trips, StopTimes: PRowList;
  TripIdx, StopIdx: Integer;
  WriteRef: specialize TGUniqRef<TJsonStrWriter>;
  Writer: TJsonStrWriter;
  p: PRow;
  TripCol: TTripCols;
  StopField: TStopTimeFields;
begin
  {%H-}WriteRef.Instance := TJsonStrWriter.Create;
  Writer := WriteRef;
  Writer.BeginArray;
  if aTripRowMap.TryGetMutValue(ARoute, Trips) then
    for TripIdx := 0 to Pred(Trips^.Count) do begin
      p := Trips^.UncMutable[TripIdx]^;
      Writer.BeginObject;
      for TripCol in TTripCols do
        Writer.Add(TripNames[TripCol], p^[Ord(TripCol)]);
      Writer.AddName(SchedName).BeginArray;
      if aStopTimeRowMap.TryGetMutValue(p^[Ord(tcTripId)], StopTimes) then
        for StopIdx := 0 to Pred(StopTimes^.Count) do begin
          p := StopTimes^.UncMutable[StopIdx]^;
          Writer.BeginObject;
          for StopField in TStopTimeFields do
            Writer.Add(StopNames[StopField], p^[Ord(StopField)]);
          Writer.EndObject;
       end;
      Writer.EndArray;
      Writer.EndObject;
    end;
  Writer.EndArray;
  Result := Writer.DataString;
end;
 
procedure GetStopTimes(var aStopTimeRowMap: TRowMap; aDoc: TCsvDoc);
var
  Start, Stop: TDateTime;
  I: Integer;
  pStopList: PRowList;
  p: PRow;
begin
  Start := Now;
  aDoc.LoadFromFileMT('../MBTA_GTFS/stop_times.txt', StopTimeMask, Math.Min(TThread.ProcessorCount, 8)){%H-};
  //aDoc.LoadFromFile('../MBTA_GTFS/stop_times.txt', StopTimeMask);
  p := aDoc.Rows[0];
  if(p^[Ord(stcTripId)] <> StopNames[stcTripId])or(p^[Ord(stcArrival)] <> StopNames[stcArrival])or
    (p^[Ord(stcDeparture)] <> StopNames[stcDeparture])or(p^[Ord(stcStopId)] <> StopNames[stcStopId]) then begin
    WriteLn('stop_times.txt not in expected format:');
    for I := 0 to aDoc.ColCount[0] - 1 do
      WriteLn(I, ' ' + p^[I]);
    Halt(1);
  end;
  for I := 1 to Pred(aDoc.RowCount) do begin
    p := aDoc.Rows[I];
    aStopTimeRowMap.FindOrAddMutValue(p^[Ord(stcTripId)], pStopList);
    pStopList^.Add(p);
  end;
  Stop := Now;
  WriteLn('parsed ', aDoc.RowCount, ' stop times in ', SecondSpan(Start, Stop):1:3,' seconds');
end;
 
procedure GetTrips(var aTripRowMap: TRowMap; aDoc: TCsvDoc);
var
  Start, Stop: TDateTime;
  I: Integer;
  pTripList: PRowList;
  p: PRow;
begin
  Start := Now;
  aDoc.LoadFromFile('../MBTA_GTFS/trips.txt', TripMask);
  p := aDoc.Rows[0];
  if(p^[Ord(tcRouteId)]<>TripNames[tcRouteId])or(p^[Ord(tcService)]<>TripNames[tcService])or(p^[Ord(tcTripId)]<>TripNames[tcTripId])then begin
    WriteLn('trips.txt not in expected format:');
    for I := 0 to aDoc.ColCount[0] - 1 do
      WriteLn(I, ' ' + p^[I]);
    Halt(1);
  end;
  for I := 1 to Pred(aDoc.RowCount) do begin
    p := aDoc.Rows[I];
    aTripRowMap.FindOrAddMutValue(p^[Ord(tcRouteId)], pTripList);
    pTripList^.Add(p);
  end;
  Stop := Now;
  WriteLn('parsed ', aDoc.RowCount, ' trips in ', SecondSpan(Start, Stop):1:3,' seconds');
end;
 
var
  StopTimeRowMap, TripRowMap: TRowMap;
  StopTimes, Trips: specialize TGAutoRef<TCsvDoc>;
 
procedure SchedulesHandler(aRequest: TRequest; aResponse: TResponse);
begin
  aResponse.ContentType := 'application/json';
  aResponse.Content := BuildTripResponse(aRequest.RouteParams['route'], StopTimeRowMap, TripRowMap);
end;
 
procedure StopHandler(aRequest: TRequest; aResponse: TResponse);
begin
  Application.Terminate;
end;
 
begin
  GetStopTimes(StopTimeRowMap, StopTimes.Instance);
  GetTrips(TripRowMap, Trips.Instance);
 
  Application.Port := 4000;
  Application.Threaded := True;
  HTTPRouter.RegisterRoute('/schedules/:route',@SchedulesHandler);
  HTTPRouter.RegisterRoute('/stop', @StopHandler);
  Application.Run;
end.
 

Results, the right column shows the results of the application, which is named in Leledumbo's repository as alt:

Code: [Select]

parsed 1739279 stop times in 0.611 seconds            | parsed 1739278 stop times in 748.66ms
parsed 69754 trips in 0.042 seconds                   | parsed 69753 trips in 36.05ms
                                                      |
running (0m42.2s), 00/50 VUs, 100 complete and 0 inter| running (0m35.6s), 00/50 VUs, 100 complete and 0 interra
default ✓ [ 100% ] 50 VUs  30s                          default ✓ [ 100% ] 50 VUs  30ss
                                                      |
 data_received..................: 8.5 GB 201 MB/s     | data_received..................: 8.5 GB 239 MB/s
 data_sent......................: 1.3 MB 30 kB/s      | data_sent......................: 930 kB 26 kB/s
 http_req_blocked...............: avg=2.75ms   min=0s | http_req_blocked...............: avg=107.98µs min=0s
 http_req_connecting............: avg=1.67ms   min=0s | http_req_connecting............: avg=37.17µs  min=0s
 http_req_duration..............: avg=206.97ms min=0s | http_req_duration..............: avg=171.51ms min=0s
   { expected_response:true }...: avg=206.97ms min=0s |   { expected_response:true }...: avg=171.51ms min=0s
 http_req_failed................: 0.00%  ✓ 0            http_req_failed................: 0.00%  ✓ 0
 http_req_receiving.............: avg=8.23ms   min=0s | http_req_receiving.............: avg=123.33ms min=0s
 http_req_sending...............: avg=1.35ms   min=0s | http_req_sending...............: avg=192.63µs min=0s
 http_req_tls_handshaking.......: avg=0s       min=0s | http_req_tls_handshaking.......: avg=0s       min=0s
 http_req_waiting...............: avg=197.38ms min=0s | http_req_waiting...............: avg=47.99ms  min=0s
 http_reqs......................: 9900   234.367143/s | http_reqs......................: 9900   278.394605/s
 iteration_duration.............: avg=20.72s   min=18.| iteration_duration.............: avg=17.04s   min=10.44s
 iterations.....................: 100    2.367345/s   | iterations.....................: 100    2.812067/s
 vus............................: 16     min=16       | vus............................: 12     min=12
 vus_max........................: 50     min=50       | vus_max........................: 50     min=50

Everything was compiled with FPC-3.3.1-12168-g14466ee9d9.

It would be very interesting to see how fphttpapp behaves on a modern and advanced device.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on December 13, 2022, 01:40:13 pm

@avk
Great!
Manual JSON serialization is clearly a way to have good performance. But it is clearly cheating against other languages, which use RTTI for this, over a dedicated transient list/array of results.
IIRC the fpjsonrtti unit may be able to serialize a collection of TObject?

As far as I can tell, fphttpserver.pp maintains one thread per request in threaded mode, not one thread per connection?
Weird. Would be slow on any modern CPUs, in comparison to thread-polled HTTP servers, which maintains the connection.
Edit: I was looking at an older version of fphttpserver source from last year. Now there is a thread maintained per connection. Not good for a lot of connections at the same time, but much better than the previous implementation.

Where from are the numbers on the left side?

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on December 13, 2022, 02:13:04 pm

Thank you. This is undoubtedly cheating, but it's not like we're going to submit it as an example.

As for the numbers, the left side shows app2(fphttpapp) results and the right shows alt(Mormot2 server, i.e. yours).

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on December 13, 2022, 02:55:21 pm

Great!
Just to be sure, you are measuring it on Linux?

As expected, for 50 connections, fphttpapp is just fine with HTTP/1.1 connections. It is similar to the old mORMot 1 socket server from 10 years ago for HTTP/1.1: one thread per connection. For HTTP1.0 connection, it seems to create a thread per request, whereas mORMot 1 used a thread-poll for such requests.
It also miss some features like "100 continue" or body chunking requests, which are pretty common in the wild.

My guess is that if we try with 5000 connections, or with HTTP/1.0 short-living connections (e.g. over a reverse proxy) fphttpapp would weep. :P

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on December 13, 2022, 03:45:00 pm

No, I ran it on a Windows system, could that cause any negative impacts?

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on December 13, 2022, 03:57:19 pm

On Windows, our async HTTP server is in "compatibility mode" with regular socket access. Not very powerful. You may switch to our http.sys server instead for a pure Windows access, by switching to the THttpApiServer class instead of THttpAsyncServer.

On Linux, our async HTTP server uses the very efficient epoll API.
Also ensure you use x86_64 as CPU, since it is a typical server CPU, and we optimized mORMot for it.

For my laptop, I have magnitude order better results with Linux x86_64 than with Win32/Win64.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on December 13, 2022, 07:21:04 pm

Thank you very much!

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on December 14, 2022, 07:52:01 am

Quote from: avk on December 13, 2022, 01:13:21 pm

Results, the right column shows the results of the application, which is named in Leledumbo's repository as alt:

I think in your solution fphttpapp is now the bottleneck (look at http_req_blocked, http_req_connecting, http_req_sending and http_req_waiting, although interestingly http_req_receiving is the other way around but that's the only one that's significantly better), I know mine isn't yet, it's the too plain translation from Go with random inefficient replacements that I did was the actual bottleneck.

Btw, do you might want to incorporate this solution to my repo?

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on December 14, 2022, 08:32:50 am

I also noticed that you are using PRow data, not high-level classes/records to store the data.
Another trick which is not fair, in respect to the language comparison goal.

@Leledumbo
About fphttpserver, the numbers are very good, for global throughput. But creating the per-connection thread makes connection slower.

What I don't understand is why the "sent" data is lower on the left. It should be the same for both sides.
Did you verify the JSON content?

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on December 14, 2022, 10:14:34 am

Quote from: Leledumbo on December 14, 2022, 07:52:01 am

<...>
Btw, do you might want to incorporate this solution to my repo?

Doesn't it bother you that manual serialization in this case looks like an obvious cheating?

Quote from: abouchez on December 14, 2022, 08:32:50 am

I also noticed that you are using PRow data, not high-level classes/records to store the data.
Another trick which is not fair, in respect to the language comparison goal.

Yes, of course, but don't get me wrong, I had a slightly different goal: to find out approximately
which part of the responsibility for delays in the current test lies with the server and which part
lies with the code that generates the response.

Quote from: abouchez on December 14, 2022, 08:32:50 am

What I don't understand is why the "sent" data is lower on the left. It should be the same for both sides.
Did you verify the JSON content?

Roughly and selectively checked what BuildTripResponse() returns, something like this:

Code: Pascal [Select][+]

procedure Test;
var
  I, Row, Range: SizeInt;
  Rout, Resp: string;
  Start: QWord;
begin
  Randomize;
  Range := Trips.Instance.RowCount - 1;
  Start := GetTickCount64;
  for I := 1 to 10000 do
    begin
      Row := Succ(Random(Range));
      Rout := Trips.Instance[Row, Ord(tcRouteId)];
      Resp := BuildTripResponse(Rout, StopTimeRowMap, TripRowMap);
      if (Resp = '[]') or not TJsonNode.ValidJson(Resp) then
        WriteLn('invalid response');
    end;
  WriteLn('Elapsed: ', GetTickCount64 - Start);
end;
 

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on December 14, 2022, 10:34:40 am

Perhaps compare the responses of the 3 solutions you have.
They should return the very same JSON for the same URI, I suppose.

The fphttpapp server seems stable enough. Congrats. A lot of good work done since the last months.
It won't perform badly with a few client connections.
It is in pair with mORMot 1 HTTP server from 10 years ago.
But I am afraid it won't scale with a lot of concurrent connections, with its one-thread-per-client pattern.

IMHO my previous remarks should better be fixed to be usable on production:
- proper HTTP/1.0 process (very common behind a reverse proxy like nginx, especially if you want to use the proxy to maintain a lot of concurrent connections) without a new thread per request
- chunked input body support (very common with Web clients)
- 100 CONTINUE support (still happens)
- and also perhaps explicit HEAD support (I didn't remember anything about HEAD requests in fphttpserver source)

I discovered stability issues and bottlenecks in the mORMot server from aggressive tests with the wrk tool for instance.
This wrk tool is much better than the ab tool from my tests.
It could be very good for you to test and validate fphttpserver with wrk, and see how many concurrent connections it sustains.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on December 14, 2022, 12:03:06 pm

Just checked, it looks like the responses from app2 and alt are the same, but app generates something completely different.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on December 14, 2022, 03:06:24 pm

So mORMot results are not the same?

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on December 14, 2022, 04:00:32 pm

I seem to have created a fair amount of confusion with the versions?
Let me fix it, so:

app - app.pas in Leledumbo's repository, fphttpapp application
alt - alt.pas in Leledumbo's repository, mormot2 application
app2 - post #59 in this thread, modified version of app

So mORMot results seem to be okay :).

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on December 15, 2022, 08:49:16 am

I guess I found the reason why responses of app are different from others.
line 181 of app.pas:

Code: Pascal [Select][+]

      //LTrip := ATrips[LTripIx]; 
      LTrip := ATrips[LTripIxs[LTripIx]];
 

and line 187:

Code: Pascal [Select][+]

          //LStopTime := AStopTimes[LStopTimeIx];
          LStopTime := AStopTimes[LStopTimeIxs[LStopTimeIx]];
 

Title: Re: Programming language comparison by implementing the same transit data app
Post by: PierceNg on December 15, 2022, 09:53:11 am

I just built FPC LLVM from source with LLVM 14. It doesn't compile the lgenerics library:

Code: Text [Select][+]

% fpc -Clv14.0 -Fulgenerics app.pas
Free Pascal Compiler version 3.3.1 [2022/12/10] for x86_64
Copyright (c) 1993-2022 by Florian Klaempfl and others
Target OS: Linux for x86-64
Compiling app.pas
Compiling ./lgenerics/lgutils.pas
Compiling ./lgenerics/lgstrconst.pas
Writing Resource String Table file: lgstrconst.rsj
Assembling lgstrconst
lgutils.pas(3321,26) Warning: Function result variable of a managed type does not seem to be initialized
Assembling lgutils
Compiling ./lgenerics/lghashmap.pas
Compiling ./lgenerics/lghelpers.pas
Compiling ./lgenerics/lghash.pas
lghash.pas(442,35) Fatal: Internal error 200611011
Fatal: Compilation aborted
Error: /home/pierce/pkg/fpcllvm/lib/fpc/3.3.1/ppcx64 returned an error exitcode

This is the second program I'm building with FPC LLVM. The first one, hello world :P, builds and runs.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on December 15, 2022, 10:26:56 am

It is a pity, of course, but from time to time there are cases when even the native compiler flatly refuses to compile LGenerics.
To be honest, I have never even tried to build FPC LLVM.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: cpicanco on December 15, 2022, 01:52:32 pm

TLDR

Quote from: Leledumbo on December 14, 2022, 07:52:01 am

Leledumbo

Just wondering what is the analysis/profiling method used in this performance optimization fun.

Does it take into account, for example, Emery Berger's notes (https://www.youtube.com/watch?v=r-TLSBdHe1A)?

***

This video https://www.youtube.com/watch?v=r-TLSBdHe1A (https://www.youtube.com/watch?v=r-TLSBdHe1A) has a talk by Emery Berger, it contains topics such as:

- performance driven development was boosted by 2000's, after clock speed reaching asymptotic levels despite hardware upgrades;

- performance optimization is very hard without proper analysis and profiling tools;

- layout biases measurements shows that speed can be affected by a lot of non-code related stuff, including changes in executable directory;

- link order (changes function adresses) and enviroment variable size (moves the program stack) has +-40% more speed impact than -O3;

- and so on

Title: Re: Programming language comparison by implementing the same transit data app
Post by: abouchez on December 15, 2022, 06:00:49 pm

@cpicanco
Yes, we know about this video. It is very funny, but with some quirks / oversimplifications.
This is why benchmarks, especially micro benchmarks, are not very reliable.
TLWR: Only real profiling makes sense to find the bottlenecks, not guessing.

In mORMot, we mostly optimized the core from actual performance data taken on production sites, doing real work.
The mormot.core.log unit has built-in features to measure time elapsed on real work, and make true profiling.
And our regression tests are not just unit tests, but they have a lot of real-world-similar scenarios, to have some good hint about potential performance regressions (or optimizations).

For FPC, the fact that its compiler and linker parts are simpler/less optimized than gcc/llvm counterparts makes it less likely to suffer from layout biases.
But, with the challenge involved in this forum thread, when you come from 26MB/s to 900MB/s, or when you see 10 times less memory usage running the benchmark on this thread, it is fair to say that the comparison means something.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: PascalDragon on December 16, 2022, 04:35:42 pm

Quote from: PierceNg on December 15, 2022, 09:53:11 am

I just built FPC LLVM from source with LLVM 14. It doesn't compile the lgenerics library:

Code: Text [Select][+][-]
% fpc -Clv14.0 -Fulgenerics app.pas
Free Pascal Compiler version 3.3.1 [2022/12/10] for x86_64
Copyright (c) 1993-2022 by Florian Klaempfl and others
Target OS: Linux for x86-64
Compiling app.pas
Compiling ./lgenerics/lgutils.pas
Compiling ./lgenerics/lgstrconst.pas
Writing Resource String Table file: lgstrconst.rsj
Assembling lgstrconst
lgutils.pas(3321,26) Warning: Function result variable of a managed type does not seem to be initialized
Assembling lgutils
Compiling ./lgenerics/lghashmap.pas
Compiling ./lgenerics/lghelpers.pas
Compiling ./lgenerics/lghash.pas
lghash.pas(442,35) Fatal: Internal error 200611011
Fatal: Compilation aborted
Error: /home/pierce/pkg/fpcllvm/lib/fpc/3.3.1/ppcx64 returned an error exitcode

This is the second program I'm building with FPC LLVM. The first one, hello world :P, builds and runs.

Best report this with a small, self contained example (and maybe also test whether it happens with non-LLVM FPC of the same commit as well).

Title: Re: Programming language comparison by implementing the same transit data app
Post by: Leledumbo on December 17, 2022, 04:51:05 pm

Quote from: avk on December 14, 2022, 10:14:34 am

Doesn't it bother you that manual serialization in this case looks like an obvious cheating?

Not really sure why, after all the goal of this language comparison is to compare possible solutions in each language, so if the language can do it, it's not cheating. At least that's my interpretation.

Title: Re: Programming language comparison by implementing the same transit data app
Post by: avk on December 18, 2022, 08:51:57 am

I added two versions of trascal to the example/json/transit-lang-cmp folder, the first(app2.pas) uses manual serialization and the second(app3.pas) works in much the same way as the Go version. I would like to note that the lgPdo unit still requires extensive testing.
You just need to update your LGenerics repository.