Recent

Author Topic: would multi threading help this project?  (Read 2496 times)

creaothceann

  • Sr. Member
  • ****
  • Posts: 263
Re: would multi threading help this project?
« Reply #30 on: January 24, 2026, 04:53:27 pm »
AMD 7800X3D, -O3

Code: Text  [Select][+][-]
  1. Allocating 1144 MB RAM...
  2. Running benchmarks...
  3.  
  4. Threads | Time (ms) | Speedup  | Efficiency
  5. -------------------------------------------
  6.       1 |       281 |   1.00x  |   100.0%
  7.       2 |       140 |   2.01x  |   100.4%
  8.       4 |        93 |   3.02x  |    75.5%
  9.       6 |        62 |   4.53x  |    75.5%
  10.       8 |        63 |   4.46x  |    55.8%
  11.      10 |        47 |   5.98x  |    59.8%
  12.      12 |        47 |   5.98x  |    49.8%
  13.      14 |        47 |   5.98x  |    42.7%
  14.      16 |        47 |   5.98x  |    37.4%
  15. -------------------------------------------
  16. Done. Press Enter to exit.

LeP

  • Full Member
  • ***
  • Posts: 135
Re: would multi threading help this project?
« Reply #31 on: January 24, 2026, 05:50:51 pm »
With Delphi ..... near half time respect mine in Lazarus 4.4 FPC 3.2.2 (@LV approach)  :o
Strange, usually FPC is more performant.

Code: Text  [Select][+][-]
  1. Allocating 1144 MB RAM...
  2. Running benchmarks...
  3.  
  4. Threads | Time (ms) | Speedup | Efficiency
  5. -------------------------------------------
  6.       1 |       147 |  1.00x  |   100.0%
  7.       2 |        81 |   1.81x  |    90.7%
  8.       4 |        60 |   2.45x  |    61.3%
  9.       6 |        37 |   3.97x  |    66.2%
  10.       8 |        43 |   3.42x  |    42.7%
  11.      10 |        37 |   3.97x  |    39.7%
  12.      12 |        36 |   4.08x  |    34.0%
  13.      14 |        37 |   3.97x  |    28.4%
  14.      16 |        35 |   4.20x  |    26.3%
  15. -------------------------------------------
  16. Done. Press Enter to exit.

cdbc

  • Hero Member
  • *****
  • Posts: 2611
    • http://www.cdbc.dk
Re: would multi threading help this project?
« Reply #32 on: January 24, 2026, 11:11:09 pm »
Hi
Right, so I had to dig deep into my old "Sandbox / Testbed", to find a project that might be of use to you...
It's an app that resamples images, with the help of BGRABitmap, that's a process that can take a while to complete, so threading is in order...
It creates a threadpool with your choice count and then utilizes the pool in the resampling of more images at once...
To compile it you'll ofc. need BGRABitmap (I fetched the one in OPM tonight).
edit: You don't have to install BGRABitmap, just enter its path in "-Fu"   8) :D (I hate installing comps)
The attached project compiles on FPC 3.2.2 & Laz 4.4
On my old lappy, it takes a while to build, due to the sheer size of BGRABitmap... %)
But I think there should be something in there you can study and rewrite to your own needs...  ;D
Well Have fun & happy coding
Regards Benny
« Last Edit: January 25, 2026, 02:04:07 pm by cdbc »
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE6/QT6 -> FPC Release -> Lazarus Release &  FPC Main -> Lazarus Main

creaothceann

  • Sr. Member
  • ****
  • Posts: 263
Re: would multi threading help this project?
« Reply #33 on: January 25, 2026, 12:24:10 pm »
i7-4790K

cdbc

  • Hero Member
  • *****
  • Posts: 2611
    • http://www.cdbc.dk
Re: would multi threading help this project?
« Reply #34 on: January 25, 2026, 02:05:08 pm »
Hi
See edit in post #32
Regards Benny
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE6/QT6 -> FPC Release -> Lazarus Release &  FPC Main -> Lazarus Main

LV

  • Sr. Member
  • ****
  • Posts: 412
Re: would multi threading help this project?
« Reply #35 on: January 25, 2026, 02:54:18 pm »
It looks like we're encountering the "Memory Wall  %)" effect in this benchmark.

i7-12700H.

Code: Text  [Select][+][-]
  1. Pascal FPC 3.2.2 -O3
  2. -------------------------------------------------------------
  3. Allocating 1144 MB RAM...
  4. -------------------------------------------------------------
  5. Threads | Time (ms) | Speedup | Efficiency | Bandwidth (GB/s)
  6. -------------------------------------------------------------
  7.       1 |       187 |  1.00x   |   100.0%  |           5.98
  8.       2 |       110 |   1.70x  |    85.0%  |          10.16
  9.       4 |        62 |   3.02x  |    75.4%  |          18.03
  10.       6 |        47 |   3.98x  |    66.3%  |          23.78
  11.       8 |        47 |   3.98x  |    49.7%  |          23.78
  12.      10 |        47 |   3.98x  |    39.8%  |          23.78
  13.      12 |        47 |   3.98x  |    33.2%  |          23.78
  14.      14 |        47 |   3.98x  |    28.4%  |          23.78
  15.      16 |        47 |   3.98x  |    24.9%  |          23.78
  16.      18 |        46 |   4.07x  |    22.6%  |          24.30
  17.      20 |        47 |   3.98x  |    19.9%  |          23.78
  18.      22 |        47 |   3.98x  |    18.1%  |          23.78
  19.      24 |        47 |   3.98x  |    16.6%  |          23.78
  20.  
  21. Pascal FPC 3.2.2 -O3
  22. -------------------------------------------------------------
  23. Allocating 4577 MB RAM...
  24. -------------------------------------------------------------
  25. Threads | Time (ms) | Speedup | Efficiency | Bandwidth (GB/s)
  26. -------------------------------------------------------------
  27.       1 |       828 |  1.00x   |   100.0%  |           5.40
  28.       2 |       422 |   1.96x  |    98.1%  |          10.59
  29.       4 |       250 |   3.31x  |    82.8%  |          17.88
  30.       6 |       172 |   4.81x  |    80.2%  |          25.99
  31.       8 |       172 |   4.81x  |    60.2%  |          25.99
  32.      10 |       156 |   5.31x  |    53.1%  |          28.66
  33.      12 |       172 |   4.81x  |    40.1%  |          25.99
  34.      14 |       156 |   5.31x  |    37.9%  |          28.66
  35.      16 |       172 |   4.81x  |    30.1%  |          25.99
  36.      18 |       172 |   4.81x  |    26.7%  |          25.99
  37.      20 |       172 |   4.81x  |    24.1%  |          25.99
  38.      22 |       187 |   4.43x  |    20.1%  |          23.91
  39.      24 |       172 |   4.81x  |    20.1%  |          25.99
  40.  

Code: Text  [Select][+][-]
  1. C++ gcc 14.2.0 -O3
  2. -------------------------------------------------------------
  3. Allocating 1144.41 MB RAM...
  4. Threads | Time (ms) | Speedup | Efficiency | Bandwidth (GB/s)
  5. -------------------------------------------------------------
  6.       1 |       213 |  1.00x   |   100.0%  |           5.25
  7.       2 |        53 |   4.02x  |  200.94%  |          21.09
  8.       4 |        44 |   4.84x  |  121.02%  |          25.40
  9.       6 |        43 |   4.95x  |   82.56%  |          25.99
  10.       8 |        44 |   4.84x  |   60.51%  |          25.40
  11.      10 |        44 |   4.84x  |   48.41%  |          25.40
  12.      12 |        44 |   4.84x  |   40.34%  |          25.40
  13.      14 |        44 |   4.84x  |   34.58%  |          25.40
  14.      16 |        44 |   4.84x  |   30.26%  |          25.40
  15.      18 |        43 |   4.95x  |   27.52%  |          25.99
  16.      20 |        44 |   4.84x  |   24.20%  |          25.40
  17.      22 |        43 |   4.95x  |   22.52%  |          25.99
  18.      24 |        44 |   4.84x  |   20.17%  |          25.40
  19.  
  20. C++ gcc 14.2.0 -O3
  21. -------------------------------------------------------------
  22. Allocating 4577.64 MB RAM...
  23. Threads | Time (ms) | Speedup | Efficiency | Bandwidth (GB/s)
  24. -------------------------------------------------------------
  25.       1 |       894 |  1.00x   |   100.0%  |           5.00
  26.       2 |       212 |   4.22x  |  210.85%  |          21.09
  27.       4 |       167 |   5.35x  |  133.83%  |          26.77
  28.       6 |       167 |   5.35x  |   89.22%  |          26.77
  29.       8 |       172 |   5.20x  |   64.97%  |          25.99
  30.      10 |       170 |   5.26x  |   52.59%  |          26.30
  31.      12 |       166 |   5.39x  |   44.88%  |          26.93
  32.      14 |       166 |   5.39x  |   38.47%  |          26.93
  33.      16 |       165 |   5.42x  |   33.86%  |          27.09
  34.      18 |       165 |   5.42x  |   30.10%  |          27.09
  35.      20 |       168 |   5.32x  |   26.61%  |          26.61
  36.      22 |       165 |   5.42x  |   24.63%  |          27.09
  37.      24 |       166 |   5.39x  |   22.44%  |          26.93
  38.  

LV

  • Sr. Member
  • ****
  • Posts: 412
Re: would multi threading help this project?
« Reply #36 on: January 25, 2026, 11:04:05 pm »
🤔 To demonstrate multithreaded scalability, let's burden the threads with intensive algebraic operations.

i7-12700H.

Code: Text  [Select][+][-]
  1. Pascal FPC 3.2.2 -O3 (Double Precision + Heavy Algebra)
  2. -----------------------------------------------------------------------------
  3. Allocating 2288 MB RAM...
  4. -----------------------------------------------------------------------------
  5. Threads | Time (ms) | Speedup    | Efficiency  | Bandwidth (GB/s) | Validate
  6. -----------------------------------------------------------------------------
  7.       1 |      9109 |   1.00x    |   100.00%   |             0.25 | OK
  8.       2 |      4750 |    1.92x   |    95.88%   |             0.47 | OK
  9.       4 |      2594 |    3.51x   |    87.79%   |             0.86 | OK
  10.       6 |      1797 |    5.07x   |    84.48%   |             1.24 | OK
  11.       8 |      1656 |    5.50x   |    68.76%   |             1.35 | OK
  12.      10 |      1453 |    6.27x   |    62.69%   |             1.54 | OK
  13.      12 |      1344 |    6.78x   |    56.48%   |             1.66 | OK
  14.      14 |      1250 |    7.29x   |    52.05%   |             1.79 | OK
  15.      16 |      1187 |    7.67x   |    47.96%   |             1.88 | OK
  16.      18 |      1172 |    7.77x   |    43.18%   |             1.91 | OK
  17.      20 |      1187 |    7.67x   |    38.37%   |             1.88 | OK
  18.      22 |      1141 |    7.98x   |    36.29%   |             1.96 | OK
  19.      24 |      1125 |    8.10x   |    33.74%   |             1.99 | OK
  20. -----------------------------------------------------------------------------
  21.  

Code: Text  [Select][+][-]
  1. C++ gcc 14.2.0 -O3 (Double Precision + Heavy Algebra)
  2. -----------------------------------------------------------------------------
  3. Allocating 2288 MB RAM...
  4. -----------------------------------------------------------------------------
  5. Threads | Time (ms) | Speedup  | Efficiency  | Bandwidth (GB/s) | Validate
  6. -----------------------------------------------------------------------------
  7.       1 |      9340 |   1.00x  |    100.00%  |             0.24 | OK
  8.       2 |      4571 |   2.04x  |    102.17%  |             0.49 | OK
  9.       4 |      2504 |   3.73x  |     93.25%  |             0.89 | OK
  10.       6 |      1739 |   5.37x  |     89.52%  |             1.29 | OK
  11.       8 |      1535 |   6.08x  |     76.06%  |             1.46 | OK
  12.      10 |      1410 |   6.62x  |     66.24%  |             1.59 | OK
  13.      12 |      1312 |   7.12x  |     59.32%  |             1.70 | OK
  14.      14 |      1244 |   7.51x  |     53.63%  |             1.80 | OK
  15.      16 |      1161 |   8.04x  |     50.28%  |             1.93 | OK
  16.      18 |      1086 |   8.60x  |     47.78%  |             2.06 | OK
  17.      20 |      1174 |   7.96x  |     39.78%  |             1.90 | OK
  18.      22 |      1256 |   7.44x  |     33.80%  |             1.78 | OK
  19.      24 |      1292 |   7.23x  |     30.12%  |             1.73 | OK
  20. -----------------------------------------------------------------------------
  21.  

The P-Core Zone (1–6 Threads)

Perfect Scaling: Efficiency remains above 85-90% up to 6 threads. These are 6 high-performance cores working at full tilt. C++ is slightly ahead here (5.37x Speedup vs. Pascal's 5.07x), likely due to GCC 14’s more aggressive instruction scheduling.

The E-Core Entry (8–14 Threads)

This is where it gets interesting. After the 6th thread, the speedup stops being linear.

Heterogeneity in Action: At 14 threads (6P + 8E), the execution time hits 1244–1250ms. Note that adding the 8 small cores provided a significant boost, but Efficiency naturally dropped to ~50-53%. This is expected: E-cores are slower, which reduces the "average" performance per thread.

Hyper-Threading & Over-Saturation (16–24 Threads)

Peak Performance: C++ peaks at 18 threads (1086ms), while Pascal peaks at 24 threads (1125ms).

C++ Performance Drop: Look at the C++ results after 18 threads—the time begins to increase (1086 -> 1174 -> 1292). This is a classic "pipeline stall" effect. Hyper-Threading on the P-cores starts fighting for the same DIVSD (Division) units that are already 100% saturated.

Pascal’s Stability: Pascal remains more stable at extremely high thread counts (1125ms at 24 threads). This suggests the code generated by FPC might be creating less "noise" in the instruction cache or interacting differently with the Windows thread scheduler.

🚴


 

TinyPortal © 2005-2018