Recent

Author Topic: Programming language comparison by implementing the same transit data app  (Read 12792 times)

PierceNg

  • Sr. Member
  • ****
  • Posts: 387
    • SamadhiWeb
In case someone is interested to implement in Pascal:

Programming language comparison by reimplementing the same transit data app

Current implementations:
  • C#
  • TypeScript (Deno)
  • Elixir
  • Go
  • Rust
  • Scala

Leledumbo

  • Hero Member
  • *****
  • Posts: 8770
  • Programming + Glam Metal + Tae Kwon Do = Me
Interesting for a quick coding fun. The Go implementation is single short which can be easily translated. Perhaps a little more inconvenient because our JSON streaming library doesn't yet incorporate attributes, so proper casing of properties will be used instead.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8770
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Programming language comparison by implementing the same transit data app
« Reply #2 on: November 14, 2022, 12:26:13 pm »
I seem to have hit either a bug or limitation of dynamic array. Attached is my current attempt, it crashes when reading stop_times.txt (that file has 1790906 rows!) with the following stack trace:
Code: Bash  [Select][+][-]
  1. Program received signal SIGSEGV, Segmentation fault.
  2. 0x000000000042e0e3 in SYSTEM_$$_SPLIT_BLOCK$PMEMCHUNK_VAR$QWORD$$QWORD ()
  3. (gdb) bt
  4. #0  0x000000000042e0e3 in SYSTEM_$$_SPLIT_BLOCK$PMEMCHUNK_VAR$QWORD$$QWORD ()
  5. #1  0x000000000042ea22 in SYSTEM_$$_SYSGETMEM_VAR$QWORD$$POINTER ()
  6. #2  0x0000000000000300 in ?? ()
  7. #3  0x0000000000000188 in ?? ()
  8. #4  0x0000000000000158 in ?? ()
  9. #5  0x0000000000000300 in ?? ()
  10. #6  0x00007fffffffd458 in ?? ()
  11. #7  0x000000000042eaaa in SYSTEM_$$_SYSGETMEM$QWORD$$POINTER ()
  12. #8  0x00000000000002b0 in ?? ()
  13. #9  0x0000000000000188 in ?? ()
  14. #10 0x00007fffffffd458 in ?? ()
  15. #11 0x000000000042f207 in SYSTEM_$$_SYSREALLOCMEM$POINTER$QWORD$$POINTER ()
  16. #12 0x0000000000543468 in RTTI_$P$TRASCAL_$$_TARRAYHELPER$1$CRC713F463B$indirect ()
  17. #13 0x00007fffad0aebb0 in ?? ()
  18. #14 0x00000000001b53b9 in ?? ()
  19. #15 0x000000000042dc9b in SYSTEM_$$_REALLOCMEM$POINTER$QWORD$$POINTER ()
  20. #16 0x000000000000003f in ?? ()
  21. #17 0x0000000000426855 in fpc_dynarray_setlength ()
  22. #18 0x000000000040225b in PrepareAddingItem (this=0x7fff7bad1f00) at generics.collections.pas:1253
  23. #19 0x0000000000403415 in Add (this=0x7fff7bad1f00, AValue=1128500) at generics.collections.pas:1466
  24. #20 0x0000000000414ed2 in GetStopTimes (AStopTimes=<error reading variable: value of type `TStopTimeDynArr' requires 8000000 bytes, which is more than max-value-size>, AStopTimesIxByTrip=0x7ffff7f84140) at app.pas:223
  25. #21 0x0000000000416615 in $main () at app.pas:305
  26.  
At first, I thought it was due to generics.collections.TList bug, but then I replace it with gvector.TVector, while adapting Add -> PushBack and Count->Size and still get a crash with similar stacktrace:
Code: Bash  [Select][+][-]
  1. Program received signal SIGSEGV, Segmentation fault.
  2. 0x000000000042b3f3 in SYSTEM_$$_SPLIT_BLOCK$PMEMCHUNK_VAR$QWORD$$QWORD ()
  3. (gdb) bt
  4. #0  0x000000000042b3f3 in SYSTEM_$$_SPLIT_BLOCK$PMEMCHUNK_VAR$QWORD$$QWORD ()
  5. #1  0x000000000042bd32 in SYSTEM_$$_SYSGETMEM_VAR$QWORD$$POINTER ()
  6. #2  0x0000000000000300 in ?? ()
  7. #3  0x0000000000000210 in ?? ()
  8. #4  0x0000000000000158 in ?? ()
  9. #5  0x0000000000000300 in ?? ()
  10. #6  0x00007fffffffd478 in ?? ()
  11. #7  0x000000000042bdba in SYSTEM_$$_SYSGETMEM$QWORD$$POINTER ()
  12. #8  0x00000000000002b0 in ?? ()
  13. #9  0x0000000000000210 in ?? ()
  14. #10 0x00007fffffffd478 in ?? ()
  15. #11 0x000000000042c517 in SYSTEM_$$_SYSREALLOCMEM$POINTER$QWORD$$POINTER ()
  16. #12 0x000000000053ee48 in RTTI_$P$TRASCAL_$$_TINTLIST$indirect ()
  17. #13 0x00007fffa8d499b0 in ?? ()
  18. #14 0x00000000001b53b9 in ?? ()
  19. #15 0x000000000042afab in SYSTEM_$$_REALLOCMEM$POINTER$QWORD$$POINTER ()
  20. #16 0x0000000000000040 in ?? ()
  21. #17 0x0000000000423b65 in fpc_dynarray_setlength ()
  22. #18 0x00000000004012cf in IncreaseCapacity (this=0x7fff7b748980) at gvector.pp:195
  23. #19 0x000000000041141d in Add (this=0x7fff7b748980, i=1192582) at app.pas:111
  24. #20 0x0000000000412429 in GetStopTimes (AStopTimes=<error reading variable: value of type `TStopTimeDynArr' requires 8000000 bytes, which is more than max-value-size>, AStopTimesIxByTrip=0x7ffff7f84140) at app.pas:241
  25. #21 0x0000000000413925 in $main () at app.pas:323
  26.  
So this could be a case study for dynamic arrays. Perhaps the size exceeds maximum?

Thaddy

  • Hero Member
  • *****
  • Posts: 15682
  • Censorship about opinions does not belong here.
Re: Programming language comparison by implementing the same transit data app
« Reply #3 on: November 14, 2022, 01:04:08 pm »
So this could be a case study for dynamic arrays. Perhaps the size exceeds maximum?
The maximum size for a dynamic array is High(SizeInt). So this differs between 32/64 bit. SizeInt = longint.
So on 32 bit it is 2147483647 and 64 bit is 9223372036854775807 both are way more than 8000000 but of course that depends on memory and element size.

If I smell bad code it usually is bad code and that includes my own code.

avk

  • Hero Member
  • *****
  • Posts: 756
Re: Programming language comparison by implementing the same transit data app
« Reply #4 on: November 14, 2022, 04:30:21 pm »
I seem to have hit either a bug or limitation of dynamic array. Attached is my current attempt, it crashes when reading stop_times.txt (that file has 1790906 rows!) with the following stack trace:<...>

I guess the reason is mainly in this line:
Code: Pascal  [Select][+][-]
  1. procedure GetStopTimes(var AStopTimes: TStopTimeDynArr; var AStopTimesIxByTrip: TStringIntListMap);
  2. var
  3.   LCSV: TCSVDocument;
  4.   LStart,LEnd: TDateTime;
  5.   i: Integer;
  6.   LTrip: String;
  7.   LStopTimesIx: TIntList;
  8. begin
  9.   ...
  10.     //SetLength(AStopTimes, 1000000);  //<===
  11.     SetLength(AStopTimes, LCSV.RowCount - 1);
  12.   ...
  13. end;
  14.  

Leledumbo

  • Hero Member
  • *****
  • Posts: 8770
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Programming language comparison by implementing the same transit data app
« Reply #5 on: November 15, 2022, 02:58:29 am »
I guess the reason is mainly in this line:
Hell, you're correct, alongside another SetLength in GetTrips. Guess Go doesn't do immediate allocation while FPC does, where it allocates too much. Now it has succeeded, indeed the amount of RAM eaten is almost 2GB (1.8GB to be exact). I'll continue with the benchmark, could be fun to see where FPC sits.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8770
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Programming language comparison by implementing the same transit data app
« Reply #6 on: November 15, 2022, 04:30:20 am »
Alright, seems like it's working now but results is way worse than Go counterpart. Side by side load test result comparison:
Code: [Select]
running (0m43.4s), 00/50 VUs, 50 complete and 0 interrupted i | running (0m34.0s), 00/50 VUs, 286 complete and 0 interrupted
default ✓ [======================================] 50 VUs  30   default ✓ [======================================] 50 VUs  30

     data_received..................: 658 MB 15 MB/s          |      data_received..................: 28 GB  823 MB/s
     data_sent......................: 106 kB 2.4 kB/s         |      data_sent......................: 2.7 MB 78 kB/s
     http_req_blocked...............: avg=6.81ms   min=0s     |      http_req_blocked...............: avg=7.66µs  min=1.08µs
     http_req_connecting............: avg=6.38ms   min=0s     |      http_req_connecting............: avg=2.12µs  min=0s     
     http_req_duration..............: avg=428.67ms min=0s     |      http_req_duration..............: avg=57.8ms  min=98.58µs
       { expected_response:true }...: avg=2.32s    min=1.04ms |        { expected_response:true }...: avg=57.8ms  min=98.58µs
     http_req_failed................: 85.63% ✓ 4239       ✗ 7 |      http_req_failed................: 0.00%  ✓ 0          ✗ 2
     http_req_receiving.............: avg=1.96ms   min=0s     |      http_req_receiving.............: avg=7.16ms  min=15.06µs
     http_req_sending...............: avg=484.72µs min=0s     |      http_req_sending...............: avg=42.78µs min=5.11µs
     http_req_tls_handshaking.......: avg=0s       min=0s     |      http_req_tls_handshaking.......: avg=0s      min=0s     
     http_req_waiting...............: avg=426.23ms min=0s     |      http_req_waiting...............: avg=50.59ms min=74.66µs
     http_reqs......................: 4950   114.065486/s     |      http_reqs......................: 28314  831.923149/s
     iteration_duration.............: avg=43.35s   min=43.25s |      iteration_duration.............: avg=5.73s   min=3.91s 
     iterations.....................: 50     1.152177/s       |      iterations.....................: 286    8.403264/s
     vus............................: 50     min=50       max |      vus............................: 4      min=4        max
     vus_max........................: 50     min=50       max        vus_max........................: 50     min=50       max
And I've got so many connection refused (you see that http_req_failed? that means only almost 15% of all requests are successfully handled).
TCSVDocument.LoadFromFile bottlenecks so bad. I'm on an NVMe SSD so this is not a slow I/O problem.
Lastly, fphttpserver is really not meant for high performance, I remember Marco said so a couple of years ago. So maybe I would need an alternative implementation configured for performance.

Latest code can be found on my fork of the original repo. I don't think I will make a PR yet, until the performance is satisfactory.

PierceNg

  • Sr. Member
  • ****
  • Posts: 387
    • SamadhiWeb
Re: Programming language comparison by implementing the same transit data app
« Reply #7 on: November 15, 2022, 04:47:32 am »
Lastly, fphttpserver is really not meant for high performance, I remember Marco said so a couple of years ago. So maybe I would need an alternative implementation configured for performance.

I've done some testing and indeed the pure Pascal fphttpserver server performs the worst. The libmicrohttpd-based one (in <fpc-src>/packages/libmicrohttpd) and Brook Framework (which uses a C library that wraps libmicrohttpd) are much faster.

Some more notes using libmicrohttpd (built from source) on Ubuntu 20.04 with FPC:

- fcl-web options (mcoSelectInternally, mcoEPollLinuxOnly) => good
- fcl-web option mcoThreadPerConnection => poor
- Brook Framework single-threaded => quite good
- Brook Framework multi-threaded => poor
- Brook Framework with built-in favicon handling => boosted single-threaded's performance
« Last Edit: November 15, 2022, 04:57:51 am by PierceNg »

PascalDragon

  • Hero Member
  • *****
  • Posts: 5678
  • Compiler Developer
Re: Programming language comparison by implementing the same transit data app
« Reply #8 on: November 15, 2022, 07:08:55 am »
Latest code can be found on my fork of the original repo. I don't think I will make a PR yet, until the performance is satisfactory.

Maybe base it on mORMot2 instead?

avk

  • Hero Member
  • *****
  • Posts: 756
Re: Programming language comparison by implementing the same transit data app
« Reply #9 on: November 15, 2022, 08:31:05 am »
<...>
Guess Go doesn't do immediate allocation while FPC does, where it allocates too much.
<...>

It's just that in the go version, stopTimes is not an array, but a slice, the capacity of which can be increased on demand.
BTW, on my machine TCSVDocument parses the file stop_times.txt in about 14 seconds. I suspect that even if you load the file into a TStringList and then parse it line by line using string.Split(), things will go about 6-7 times faster.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8770
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Programming language comparison by implementing the same transit data app
« Reply #10 on: November 15, 2022, 10:01:11 am »
Maybe base it on mORMot2 instead?
I'll check if this example is enough to take as a base.
It's just that in the go version, stopTimes is not an array, but a slice, the capacity of which can be increased on demand.
All of my arrays are dynamic as well, which should be equivalent to Go slice.
BTW, on my machine TCSVDocument parses the file stop_times.txt in about 14 seconds. I suspect that even if you load the file into a TStringList and then parse it line by line using string.Split(), things will go about 6-7 times faster.
Either that or I'll just stream it along the way, no idea which one is faster. But still I think TCSVDocument is just not made with performance in mind (plus that Cells property is kinda arrrgh because it puts column first instead of row first).

Leledumbo

  • Hero Member
  • *****
  • Posts: 8770
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Programming language comparison by implementing the same transit data app
« Reply #11 on: November 15, 2022, 12:18:18 pm »
An improvement:
Code: [Select]
running (0m52.6s), 00/15 VUs, 15 complete and 0 interrupted i | running (0m30.8s), 00/15 VUs, 301 complete and 0 interrupted
default ✓ [======================================] 15 VUs  30   default ✓ [======================================] 15 VUs  30

     data_received..................: 1.5 GB 28 MB/s          |      data_received..................: 30 GB  956 MB/s
     data_sent......................: 277 kB 5.3 kB/s         |      data_sent......................: 2.8 MB 91 kB/s
     http_req_blocked...............: avg=139.3µs  min=66.52µ |      http_req_blocked...............: avg=4.79µs  min=1.06µs
     http_req_connecting............: avg=2.64µs   min=0s     |      http_req_connecting............: avg=725ns   min=0s     
     http_req_duration..............: avg=526.63ms min=2.51ms |      http_req_duration..............: avg=15.24ms min=85.57µs
       { expected_response:true }...: avg=526.63ms min=2.51ms |        { expected_response:true }...: avg=15.24ms min=85.57µs
     http_req_failed................: 0.00%  ✓ 0         ✗ 14 |      http_req_failed................: 0.00%  ✓ 0          ✗ 2
     http_req_receiving.............: avg=1.84ms   min=34.02µ |      http_req_receiving.............: avg=1.41ms  min=12.72µs
     http_req_sending...............: avg=19.56ms  min=19.47µ |      http_req_sending...............: avg=49.25µs min=4.83µs
     http_req_tls_handshaking.......: avg=0s       min=0s     |      http_req_tls_handshaking.......: avg=0s      min=0s     
     http_req_waiting...............: avg=505.22ms min=181.48 |      http_req_waiting...............: avg=13.78ms min=54.7µs
     http_reqs......................: 1485   28.236484/s      |      http_reqs......................: 29799  966.969529/s
     iteration_duration.............: avg=52.14s   min=51.45s |      iteration_duration.............: avg=1.51s   min=898.64m
     iterations.....................: 15     0.285217/s       |      iterations.....................: 301    9.767369/s
     vus............................: 11     min=11      max= |      vus............................: 15     min=15       max
     vus_max........................: 15     min=15      max= |      vus_max........................: 15     min=15       max
  • The latest code now contains my own made TCSVDocument with the same properties and methods of TCSVDocument that I use in this program, requiring 0 changes apart from the uses clause. It boosts reading stop_times.txt by 400% on my machine (28s down to 7s) with -O3. Yet still loses to Go's 2s, seems like TStringList isn't the best choice despite being magnitude faster than whatever TCSVDocument uses under the hood.
  • I also fixed all of the memory leaks which makes the previous test fails due to OOM.
  • Only one last byte: the HTTP server. I just realized that the number of iterations = number of VUs in previous failed test and last successful one, seems like the connection ain't properly closed after sending the response, making each VU stuck and cannot issue another request.

EDIT: forgot to commit csvutils.pas, it's now using low level implementation that pushes reading stop_times.txt to another 33% (7s -> 5s), I have no idea what technique employed by Go to reach that 2s. Impressive stuff.

« Last Edit: November 15, 2022, 01:42:00 pm by Leledumbo »

PascalDragon

  • Hero Member
  • *****
  • Posts: 5678
  • Compiler Developer
Re: Programming language comparison by implementing the same transit data app
« Reply #12 on: November 16, 2022, 07:26:35 am »
  • Only one last byte: the HTTP server. I just realized that the number of iterations = number of VUs in previous failed test and last successful one, seems like the connection ain't properly closed after sending the response, making each VU stuck and cannot issue another request.

You could try with FPC 3.3.1 as there were some fixes regarding the closing of sockets recently (don't know if that is what affects you however).

EDIT: forgot to commit csvutils.pas, it's now using low level implementation that pushes reading stop_times.txt to another 33% (7s -> 5s), I have no idea what technique employed by Go to reach that 2s. Impressive stuff.

What is the time if you simply read the file with TFileStream (discarding the data, only the raw read time)?

avk

  • Hero Member
  • *****
  • Posts: 756
Re: Programming language comparison by implementing the same transit data app
« Reply #13 on: November 16, 2022, 07:56:56 am »
EDIT: forgot to commit csvutils.pas, it's now using low level implementation that pushes reading stop_times.txt to another 33% (7s -> 5s), I have no idea what technique employed by Go to reach that 2s. Impressive stuff.
Curious, which platform did you test on? I just ran your latest version against Go in a Linux x86-64 virtual machine and got a slightly different result:
Code: Text  [Select][+][-]
  1. go:  parsed 1739278 stop times in 2.3342127s
  2. fpc: parsed 1739278 stop times in 2.470 seconds
  3.  
go version:  go1.19.3 linux/amd64
fpc version: 3.3.1-12058-g9b6926c5f5 linux x86_64

Leledumbo

  • Hero Member
  • *****
  • Posts: 8770
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Programming language comparison by implementing the same transit data app
« Reply #14 on: November 16, 2022, 09:00:52 am »
You could try with FPC 3.3.1 as there were some fixes regarding the closing of sockets recently (don't know if that is what affects you however).
I do use 3.3.1, updated like 3 days ago or so. Let me update again.
What is the time if you simply read the file with TFileStream (discarding the data, only the raw read time)?
You mean only the
Code: Pascal  [Select][+][-]
  1. fs.Read(s[1], n);
  2.  
? It's only like 50-70ms.
Curious, which platform did you test on? I just ran your latest version against Go in a Linux x86-64 virtual machine and got a slightly different result:
Code: Text  [Select][+][-]
  1. go:  parsed 1739278 stop times in 2.3342127s
  2. fpc: parsed 1739278 stop times in 2.470 seconds
  3.  
go version:  go1.19.3 linux/amd64
fpc version: 3.3.1-12058-g9b6926c5f5 linux x86_64
Same Linux x86_64, but on a real machine, i7-7700HQ, DDR4-2400 dual channel, Sandisk Extreme Portable SSD V2 500GB over USB 3.0.

I've taken a peek at Go implementation, and boy it's a hell lot more complex than my code, so I'm surprised how good their optimizations work. Or perhaps SetLength() on every iteration is not a good idea, let's see if I can optimize that.

EDIT: Optimized by replacing direct dynamic array with TVector which has quite efficient growth factor, I got additional 250ms.
« Last Edit: November 16, 2022, 09:48:44 am by Leledumbo »

 

TinyPortal © 2005-2018