Recent

Author Topic: Parallel Code  (Read 14706 times)

dieselfan

  • New Member
  • *
  • Posts: 16
Parallel Code
« on: February 15, 2018, 11:10:22 am »
I copied some code from Delphi and used the equiv in Laz with the MTProcs unit.

My parallel code is slightly slower than the single threaded code so obviously have a bug somewhere.

I've attached the pas. By comparison on Linux my Delphi code is

SingleThread: 30s
MT: 4s
Laz
ST: 33s
MT: 35s

I'd appreciate some pointers ;)
http://docwiki.embarcadero.com/RADStudio/XE8/en/Using_TParallel.For_from_the_Parallel_Programming_Library
http://wiki.lazarus.freepascal.org/Parallel_procedures
AMD 1800X
Manjaro KDE
Rarely Windows 10 Pro

rvk

  • Hero Member
  • *****
  • Posts: 6163
Re: Parallel Code
« Reply #1 on: February 15, 2018, 11:57:21 am »
I'm not familiar with MTProcs, but shouldn't you do something with ProcThreadPool.CalcBlockSize()?
And only loop to the resulting BlockCount.

And in DoLoop() you should loop from BlockStart to BlockEnd (received from Item.CalcBlock()).

Isn't that explained on the wiki?
http://wiki.lazarus.freepascal.org/Parallel_procedures#DoParallel

As it is now you loop from 1 to cMax in all the 8 threads.
Yeah, that's going to be slower than even singlethread :)

So examine how the example on the wiki splits the work and only does that part of the work for each thread.

bylaardt

  • Sr. Member
  • ****
  • Posts: 309
Re: Parallel Code
« Reply #2 on: February 15, 2018, 03:09:34 pm »
Use this:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   {$IFDEF UNIX}
  7.   cthreads, cmem,
  8.   {$ENDIF}
  9.   MTProcs, Classes, SysUtils, syncobjs;
  10.  
  11. var
  12.   lT : QWord;
  13.   gTot: integer;
  14.  
  15. const cMax = 50000000;
  16.  
  17. function IsPrime(N: Integer): Boolean;
  18. var
  19.   Test, k: Integer;
  20. begin
  21.   if N <= 3 then
  22.     IsPrime := N > 1
  23.   else if ((N mod 2) = 0) or ((N mod 3) = 0) then
  24.     IsPrime := False
  25.   else
  26.   begin
  27.     IsPrime := True;
  28.     k := Trunc(Sqrt(N));
  29.     Test := 5;
  30.     while Test <= k do
  31.     begin
  32.       if ((N mod Test) = 0) or ((N mod (Test + 2)) = 0) then
  33.       begin
  34.         IsPrime := False;
  35.         break; { jump out of the for loop }
  36.       end;
  37.       Test := Test + 6;
  38.     end;
  39.   end;
  40. end;
  41.  
  42. procedure SingleThread;
  43. var
  44.   I: Integer;
  45. begin
  46.   for I := 1 to cMax do
  47.    begin
  48.      if IsPrime(I) then
  49.         Inc(gTot);
  50.    end;
  51. end;
  52.  
  53. procedure DoLoop(Index: PtrInt; Data: Pointer; Item: TMultiThreadProcItem);
  54. begin
  55.   if IsPrime(Index) then
  56.     Inc(gTot);
  57. end;
  58.  
  59. procedure MultiThread;
  60. begin
  61.   ProcThreadPool.DoParallel(@DoLoop, 1, cMax);
  62. end;
  63.  
  64. begin
  65.   try
  66.     Writeln('Lazarus 1.8');
  67.     gTot := 0;
  68.     lT := GetTickCount64;
  69.     SingleThread;
  70.     lT := GetTickCount64 - lT;
  71.     WriteLn(Format('SingleThread took %.3f seconds, highest prime was %d', [lT / 1000, gTot]));
  72.  
  73.     gTot := 0;
  74.     Sleep(100);
  75.     lT := GetTickCount64;
  76.     MultiThread;
  77.     lT := GetTickCount64 - lT;
  78.     WriteLn(Format('MultiThread took %.3f seconds, highest prime was %d', [lT / 1000, gTot]));
  79.     { TfODO -oUser -cConsole Main : Insert code here }
  80.   except
  81.     on E: Exception do
  82.       Writeln(E.ClassName, ': ', E.Message);
  83.   end;
  84. end.

your code has a good example for bad use of TCriticalSection:

"A critical section is a resource which can be owned by only 1 caller: it can be used to make sure that in a multithreaded application only 1 thread enters pieces of code protected by the critical section. "
text above from https://www.freepascal.org/docs-html/fcl/syncobjs/tcriticalsection.html

 
   

rvk

  • Hero Member
  • *****
  • Posts: 6163
Re: Parallel Code
« Reply #3 on: February 15, 2018, 03:45:48 pm »
Use this:
Did you try that code yourself? It doesn't make it faster for Multithread?

It just doesn't follow the guidelines needed to work with the MTProcs code.
And in this case the critical section IS needed because the global variable gTot is updated.
If you do this without critical section, the end result is wrong.
Try it yourself. (below is the code)

This code does take the blocksize and multiple threads into account. It divides the work over the maximum number of thread (like it should). The library does not do this automatically.

Quote
Lazarus 1.8
SingleThread took 13,469 seconds, highest prime was 3001134
50000000 dividing by 8 = 8 blocks of 6250000
Index 0 From 1 to 6250000
Index 1 From 6250001 to 12500000
Index 2 From 12500001 to 18750000
Index 3 From 18750001 to 25000000
Index 4 From 25000001 to 31250000
Index 5 From 31250001 to 37500000
Index 6 From 37500001 to 43750000
Index 7 From 43750001 to 50000000
MultiThread took 3,891 seconds, highest prime was 3001134

Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses {$IFDEF UNIX}
  6.   cthreads,
  7.   cmem, {$ENDIF}
  8.   MTProcs,
  9.   Classes,
  10.   SysUtils,
  11.   syncobjs;
  12.  
  13. var
  14.   lT: QWord;
  15.   gTot: integer;
  16.   TotalLock: TCriticalSection;
  17.  
  18. const
  19.   cMax = 50000000;
  20.  
  21.   function IsPrime(N: integer): boolean;
  22.   var
  23.     Test, k: integer;
  24.   begin
  25.     if N <= 3 then
  26.       IsPrime := N > 1
  27.     else if ((N mod 2) = 0) or ((N mod 3) = 0) then
  28.       IsPrime := False
  29.     else
  30.     begin
  31.       IsPrime := True;
  32.       k := Trunc(Sqrt(N));
  33.       Test := 5;
  34.       while Test <= k do
  35.       begin
  36.         if ((N mod Test) = 0) or ((N mod (Test + 2)) = 0) then
  37.         begin
  38.           IsPrime := False;
  39.           break; { jump out of the for loop }
  40.         end;
  41.         Test := Test + 6;
  42.       end;
  43.     end;
  44.   end;
  45.  
  46.   procedure SingleThread;
  47.   var
  48.     I: integer;
  49.   begin
  50.     for I := 1 to cMax do
  51.     begin
  52.       if IsPrime(I) then
  53.         Inc(gTot);
  54.     end;
  55.   end;
  56.  
  57.   var
  58.     BlockCount, BlockSize: PtrInt;
  59.  
  60.   procedure DoLoop(Index: PtrInt; Data: Pointer; Item: TMultiThreadProcItem);
  61.   var
  62.     i: integer;
  63.     BlockStart, BlockEnd: PtrInt;
  64.   begin
  65.     // compute maximum of block
  66.     Item.CalcBlock(Index, BlockSize, cMax, BlockStart, BlockEnd);
  67.     writeln('Index ', Index, ' From ', BlockStart + 1, ' to ', BlockEnd + 1);
  68.     for i := BlockStart + 1 to BlockEnd + 1 do
  69.     begin
  70.       if IsPrime(I) then
  71.       begin
  72.         TotalLock.Acquire;
  73.         Inc(gTot);
  74.         TotalLock.Release;
  75.       end;
  76.     end;
  77.   end;
  78.  
  79.   procedure MultiThread;
  80.   begin
  81.     // split work into equally sized blocks
  82.     ProcThreadPool.CalcBlockSize(cMax, BlockCount, BlockSize);
  83.     writeln(cMax, ' dividing by ', ProcThreadPool.MaxThreadCount,' = ', BlockCount, ' blocks of ', BlockSize);
  84.     // compute each block
  85.     ProcThreadPool.DoParallel(@DoLoop, 0, BlockCount - 1);
  86.   end;
  87.  
  88. begin
  89.   TotalLock := TCriticalSection.Create;
  90.   try
  91.     Writeln('Lazarus 1.8');
  92.  
  93.     gTot := 0;
  94.     lT := GetTickCount64;
  95.     SingleThread;
  96.     lT := GetTickCount64 - lT;
  97.     WriteLn(Format('SingleThread took %.3f seconds, highest prime was %d', [lT / 1000, gTot]));
  98.  
  99.     gTot := 0;
  100.     Sleep(100);
  101.     lT := GetTickCount64;
  102.     MultiThread;
  103.     lT := GetTickCount64 - lT;
  104.     WriteLn(Format('MultiThread took %.3f seconds, highest prime was %d', [lT / 1000, gTot]));
  105.  
  106.   except
  107.     on E: Exception do
  108.       Writeln(E.ClassName, ': ', E.Message);
  109.   end;
  110.   TotalLock.Free;
  111.  
  112.   readln;
  113.  
  114. end.

(The writelns should probably be also in critical sections but they are there just for show. The "highest prime" should be "total # of primes")
« Last Edit: February 15, 2018, 03:52:30 pm by rvk »

bylaardt

  • Sr. Member
  • ****
  • Posts: 309
Re: Parallel Code
« Reply #4 on: February 15, 2018, 04:30:13 pm »
@rvk
you are right. I really don't payed attention enough.

but the entire code was under criticalsection, what the point about that?
if you want use parallel processing, you must use  un the code in other thread, not the entire code (or most of them) under criticalsection.
 

rvk

  • Hero Member
  • *****
  • Posts: 6163
Re: Parallel Code
« Reply #5 on: February 15, 2018, 04:33:30 pm »
but the entire code was under criticalsection, what the point about that?
Are you sure?

I saw in the original code TotalLock := TCriticalSection.Create;
That doesn't yet create a critical section. Only when calling TotalLock.Acquire; the lock is acquired.
With TotalLock.Release; it is released again.

And .Acquire and .Release were only around the Inc(gTot); line.

dieselfan

  • New Member
  • *
  • Posts: 16
Re: Parallel Code
« Reply #6 on: February 16, 2018, 07:39:00 am »
Thanks for the replies. This is basically my proof of concept of using a "parallel" library to speed up future use cases.

- bylaardt To my knowledge I used critical section correctly. The variable had to be outside of the thread code ie created in main thread. Then I only lock and release in the multithreaded code as I updated a global variable for use in the main thread. How would you advise I update the gTot variable differently?
- rvk with respect to the code you posted, thanks for the block size etc breakdown however what I was attempting was to emulate the equivalent of using TParallel from Delphi. The idea being that I shouldn't have to break it myself into block sizes and it would scale dynamically. I've attached the Delphi equiv below. You can see how much more elegant it is, hence my use of http://wiki.lazarus.freepascal.org/Parallel_procedures
Code: Pascal  [Select][+][-]
  1. program Parallel;
  2.  
  3. {$APPTYPE CONSOLE}
  4.  
  5. {$R *.res}
  6. uses
  7.   Classes,
  8.   SysUtils,
  9.   System.Threading,
  10.   SyncObjs;
  11.  
  12. var lT, gTot: integer;
  13.   s: string;
  14.  
  15. const cMax = 50000000;
  16.  
  17. function IsPrime(N: Integer): Boolean;
  18. var
  19.   Test, k: Integer;
  20. begin
  21.   if N <= 3 then
  22.     IsPrime := N > 1
  23.   else if ((N mod 2) = 0) or ((N mod 3) = 0) then
  24.     IsPrime := False
  25.   else
  26.   begin
  27.     IsPrime := True;
  28.     k := Trunc(Sqrt(N));
  29.     Test := 5;
  30.     while Test <= k do
  31.     begin
  32.       if ((N mod Test) = 0) or ((N mod (Test + 2)) = 0) then
  33.       begin
  34.         IsPrime := False;
  35.         break; { jump out of the for loop }
  36.       end;
  37.       Test := Test + 6;
  38.     end;
  39.   end;
  40. end;
  41.  
  42. procedure SingleThread;
  43. var
  44.   I: Integer;
  45. begin
  46.   for I := 1 to cMax do
  47.    begin
  48.      if IsPrime(I) then
  49.         Inc(gTot);
  50.    end;
  51. end;
  52.  
  53. procedure MultiThread;
  54. begin
  55.   TParallel.&For(1, cMax, procedure(I: integer)
  56.   begin
  57.     if IsPrime(I) then
  58.       TInterLocked.Increment(gTot);
  59.   end);
  60. end;
  61.  
  62. begin
  63.   try
  64.     Writeln('Delphi Tokyo 10.2');
  65.     gTot := 0;
  66.     lT := TThread.GetTickCount;
  67.     SingleThread;
  68.     lT := TThread.GetTickCount - lT;
  69.     WriteLn(Format('SingleThread took %.3f seconds, highest prime was %d', [lT / 1000, gTot]));
  70.  
  71.     gTot := 0;
  72.     lT := TThread.GetTickCount;
  73.     MultiThread;
  74.     lT := TThread.GetTickCount - lT;
  75.     WriteLn(Format('MultiThread took %.3f seconds, highest prime was %d', [lT / 1000, gTot]));
  76.     { TODO -oUser -cConsole Main : Insert code here }
  77.   except
  78.     on E: Exception do
  79.       Writeln(E.ClassName, ': ', E.Message);
  80.   end;
  81. end.
AMD 1800X
Manjaro KDE
Rarely Windows 10 Pro

rvk

  • Hero Member
  • *****
  • Posts: 6163
Re: Parallel Code
« Reply #7 on: February 16, 2018, 09:57:16 am »
- rvk with respect to the code you posted, thanks for the block size etc breakdown however what I was attempting was to emulate the equivalent of using TParallel from Delphi. The idea being that I shouldn't have to break it myself into block sizes and it would scale dynamically.
Well, then MTProc isn't the choice for you, is it. With MTProc you need to break down the tasks yourself in equal parts.

You could put the breakdown in a separate procedure so it looks for your code like it is one call (emulating what Delphi does). But there are other libraries too. Maybe there are some which already have this in them.

I also found another, PasMP, which should be compatible with FPC and Delphi. But I didn't check if it has the break-down in it.

As far as I can see, TParallel.&For() has a separate index-manager which the workerthreads get their number for to execute. Sort of like an array in which the index is flagged as already processed. The upside of this, is that all threads are always busy to the end. With MTProc it could be possible that one thread finishes earlier than the others and sits there doing nothing until the others are done too. I'm not aware of such workerthread-queues in FPC (but I'm not that familiar with threading in FPC).

But emulating the TParallel.&For() should be that difficult to do in a small unit.

Edit:
I've attached a small project. The call from your main program is just
Code: Pascal  [Select][+][-]
  1. MultiThread(1, cMax, @DoLoop);
from parallel_thread.pas.

In parallel_thread.pas there are 8 (default) workerthreads which take an increasing index-number and execute the DoLoop. The only downside is that it's still slow because it acquires a lock for every index it needs to take. This is very inefficient.

Code: Pascal  [Select][+][-]
  1. IndexLock.Acquire;
  2. Inc(CurrentIndex);
  3. DoIndex := CurrentIndex;
  4. IndexLock.Release;
(if you comment out the lock it does seem to work much faster but I don't think it's safe, although it does work for me)

Delphi does this via a "TStrideManager". So, to get more speed, something else must be created other then Acquiring and Releasing the lock 50.000.000 times. But it does proves the concept.

Somebody has an idea how to speed that up?



« Last Edit: February 16, 2018, 10:50:08 am by rvk »

rvk

  • Hero Member
  • *****
  • Posts: 6163
Re: Parallel Code
« Reply #8 on: February 16, 2018, 11:02:55 am »
Haha, (please read the Edit from above post first)

There is function InterlockedIncrement which does a much better job than TCriticalSection.Acquire and .Release :D
(You even used a version, TInterLocked.Increment, in your Delphi version :D)

I changed the code (attached) and now I have the result as below.
Because the InterlockedIncrement can also be used to increase the CurrentIndex you see it's much faster.

I think this is the simplest code you're going to get with parallel threads in FPC. And no need for large library/underlying code like in Delphi.

Quote
Lazarus 1.8
SingleThread took 14,406 seconds, highest prime was 3001134
MultiThread took 3,500 seconds, highest prime was 3001134
press enter key


So, as proof of concept you have this in FPC/Lazarus. (just as short as in Delphi)
Code: Pascal  [Select][+][-]
  1. program parallel_thread_example;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses {$IFDEF UNIX}
  6.   cthreads,
  7.   cmem, {$ENDIF}
  8.   Classes,
  9.   SysUtils,
  10.   parallel_thread;
  11.  
  12. var
  13.   lT: QWord;
  14.   gTot: integer;
  15.  
  16. const
  17.   cMax = 50000000;
  18.  
  19.   function IsPrime(N: integer): boolean;
  20.   var
  21.     Test, k: integer;
  22.   begin
  23.     if N <= 3 then
  24.       IsPrime := N > 1
  25.     else if ((N mod 2) = 0) or ((N mod 3) = 0) then
  26.       IsPrime := False
  27.     else
  28.     begin
  29.       IsPrime := True;
  30.       k := Trunc(Sqrt(N));
  31.       Test := 5;
  32.       while Test <= k do
  33.       begin
  34.         if ((N mod Test) = 0) or ((N mod (Test + 2)) = 0) then
  35.         begin
  36.           IsPrime := False;
  37.           break; { jump out of the for loop }
  38.         end;
  39.         Test := Test + 6;
  40.       end;
  41.     end;
  42.   end;
  43.  
  44.   procedure SingleThread;
  45.   var
  46.     I: integer;
  47.   begin
  48.     for I := 1 to cMax do
  49.     begin
  50.       if IsPrime(I) then
  51.         Inc(gTot);
  52.     end;
  53.   end;
  54.  
  55.   procedure DoLoop(Index: integer);
  56.   begin
  57.     if IsPrime(Index) then
  58.       InterlockedIncrement(gTot);
  59.   end;
  60.  
  61. begin
  62.   try
  63.     Writeln('Lazarus 1.8');
  64.  
  65.     gTot := 0;
  66.     lT := GetTickCount64;
  67.     SingleThread;
  68.     lT := GetTickCount64 - lT;
  69.     WriteLn(Format('SingleThread took %.3f seconds, highest prime was %d', [lT / 1000, gTot]));
  70.  
  71.     gTot := 0;
  72.     Sleep(100);
  73.     lT := GetTickCount64;
  74.     MultiThread(1, cMax, @DoLoop);
  75.     lT := GetTickCount64 - lT;
  76.     WriteLn(Format('MultiThread took %.3f seconds, highest prime was %d', [lT / 1000, gTot]));
  77.  
  78.   except
  79.     on E: Exception do
  80.       Writeln(E.ClassName, ': ', E.Message);
  81.   end;
  82.  
  83.   writeln('press enter key');
  84.   readln;
  85.  
  86. end.

And the accompanied parallel_thread.pas is much smaller than the whole MTProcs etc.
Code: Pascal  [Select][+][-]
  1. unit parallel_thread;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils;
  9.  
  10. type
  11.   DoProcProcedure = procedure(Index: Integer);
  12.  
  13. procedure MultiThread(FromIndex, ToIndex: Integer; Proc: DoProcProcedure; ThreadCount: Integer = 8);
  14.  
  15. implementation
  16.  
  17. var
  18.   CurrentIndex: Integer;
  19.   MaxIndex: Integer;
  20.  
  21. type
  22.   TMyWorkerThread = class(TThread)
  23.   protected
  24.     AProc: DoProcProcedure;
  25.     procedure Execute; override;
  26.   public
  27.     constructor Create(Proc: DoProcProcedure);
  28.   end;
  29.  
  30. constructor TMyWorkerThread.Create(Proc: DoProcProcedure);
  31. begin
  32.   inherited Create(false);
  33.   AProc := Proc;
  34.   FreeOnTerminate := True;
  35. end;
  36.  
  37. procedure TMyWorkerThread.Execute;
  38. var
  39.   DoIndex: integer;
  40. begin
  41.   while not Terminated do
  42.   begin
  43.  
  44.     DoIndex := InterlockedIncrement(CurrentIndex);
  45.  
  46.     if DoIndex > MaxIndex then
  47.       break;
  48.     AProc(DoIndex);
  49.  
  50.   end;
  51. end;
  52.  
  53.  
  54. procedure MultiThread(FromIndex, ToIndex: Integer; Proc: DoProcProcedure; ThreadCount: Integer = 8);
  55. var
  56.   Workers: array of TMyWorkerThread;
  57.   I: integer;
  58. begin
  59.   SetLength(Workers, ThreadCount);
  60.   CurrentIndex := FromIndex;
  61.   MaxIndex := ToIndex;
  62.   for I := 0 to ThreadCount - 1 do
  63.     Workers[I] := TMyWorkerThread.Create(Proc);
  64.  
  65.   while CurrentIndex < MaxIndex do
  66.     sleep(100);
  67.  
  68. end;
  69.  
  70. end.
« Last Edit: February 16, 2018, 11:26:45 am by rvk »

dieselfan

  • New Member
  • *
  • Posts: 16
Re: Parallel Code
« Reply #9 on: February 16, 2018, 01:06:51 pm »
rvk Thanks a mil it's a great POC. I was replying to your earlier post then had to cancel  ::) - this one is way faster and only a couple percent slower than Delphi.

Couple interesting things I noticed though your ST performance is GREAT. What are your specs? I'm running a stock Ryzen 1800X (Manjaro) and the dif with laz on 8/16 thread is negligible when running the latest code you presented.
My times now are
Lazarus 1.8
SingleThread took 31.875 seconds, highest prime was 3001134
MultiThread took 4.205 seconds, highest prime was 3001134

Thanks again!
AMD 1800X
Manjaro KDE
Rarely Windows 10 Pro

rvk

  • Hero Member
  • *****
  • Posts: 6163
Re: Parallel Code
« Reply #10 on: February 16, 2018, 01:31:31 pm »
Couple interesting things I noticed though your ST performance is GREAT. What are your specs? I'm running a stock Ryzen 1800X (Manjaro) and the dif with laz on 8/16 thread is negligible when running the latest code you presented.
I use a Dell XPS8500, Intel Core i7-3770 CPU @ 3.40Ghz. with 16GB internal memory. Nothing really special.

But I actually used an old trunk version of FPC (37381 from oktober). Lazarus doesn't even matter because this is only a console application so it can be compiled directly in FPC.

Quote
C:\dev\fpc\bin\i386-win32>fpc d:parallel_thread_example.pas
Free Pascal Compiler version 3.1.1 [2017/10/02] for i386
Copyright (c) 1993-2017 by Florian Klaempfl and others
Target OS: Win32 for i386
Compiling d:parallel_thread_example.pas
Compiling D:\parallel_thread_example\parallel_thread.pas
Linking d:parallel_thread_example.exe
156 lines compiled, 1.6 sec, 149280 bytes code, 6468 bytes data

C:\dev\fpc\bin\i386-win32>d:parallel_thread_example.exe
FPC 3.1.1 32 bit (trunk 37381, old one from oktober 2017 )
SingleThread took 16,344 seconds, highest prime was 3001134
MultiThread took 3,954 seconds, highest prime was 3001134

press enter key

And WOW, if I compile in Lazarus 1.8 64 bit or even directly in FPC 3.0.4 64 bit I also get a way slower result !!
Quote
C:\laz1.8\fpc\3.0.4\bin\x86_64-win64>fpc d:parallel_thread_example.pas
Free Pascal Compiler version 3.0.4 [2017/12/03] for x86_64
Copyright (c) 1993-2017 by Florian Klaempfl and others
Target OS: Win64 for x64
Compiling d:parallel_thread_example.pas
Compiling D:\parallel_thread_example\parallel_thread.pas
Linking d:parallel_thread_example.exe
156 lines compiled, 0.4 sec, 159584 bytes code, 6580 bytes data

C:\laz1.8\fpc\3.0.4\bin\x86_64-win64>d:parallel_thread_example.exe
FPC 3.0.4 64 bit
SingleThread took 48,156 seconds, highest prime was 3001134
MultiThread took 10,844 seconds, highest prime was 3001134

press enter key

And in Lazarus 1.6.4 with FPC 3.0.2 also 64 bit
Quote
C:\laz1.6.4\fpc\3.0.2\bin\x86_64-win64>fpc d:parallel_thread_example.pas
Free Pascal Compiler version 3.0.2 [2017/02/27] for x86_64
Copyright (c) 1993-2017 by Florian Klaempfl and others
Target OS: Win64 for x64
Compiling d:parallel_thread_example.pas
Compiling D:\parallel_thread_example\parallel_thread.pas
Linking d:parallel_thread_example.exe
156 lines compiled, 0.6 sec, 159376 bytes code, 6580 bytes data

C:\laz1.6.4\fpc\3.0.2\bin\x86_64-win64>d:parallel_thread_example.exe
FPC 3.0.2 64 bit
SingleThread took 48,015 seconds, highest prime was 3001134
MultiThread took 10,781 seconds, highest prime was 3001134

press enter key

I'll try to compile a trunk version for 64 bit and see if that's the problem.

rvk

  • Hero Member
  • *****
  • Posts: 6163
Re: Parallel Code
« Reply #11 on: February 16, 2018, 03:04:46 pm »
I just compiled FPC with new trunk version (rev.38246) in 32 bit and 64 bit.

Results:
Quote
Free Pascal Compiler version 3.1.1 [2018/02/16] for i386
SingleThread took 16,250 seconds, highest prime was 3001134
MultiThread took 3,922 seconds, highest prime was 3001134

Free Pascal Compiler version 3.1.1 [2018/02/16] for x86_64
SingleThread took 49,000 seconds, highest prime was 3001134
MultiThread took 10,890 seconds, highest prime was 3001134

As you can see the 64 bit version is also slower in the latest trunk FPC version.
All 8 logical processors (4 cores) went to 100% during the multi-thread phase.

Maybe you could try a 32 bit version of Lazarus/FPC to see if it's even faster than that 3,5 seconds in multi-thread on your PC.
And maybe somebody else can explain why it's so much slower on 64 bit.

dieselfan

  • New Member
  • *
  • Posts: 16
Re: Parallel Code
« Reply #12 on: February 16, 2018, 04:54:19 pm »

As you can see the 64 bit version is also slower in the latest trunk FPC version.
All 8 logical processors (4 cores) went to 100% during the multi-thread phase.

Maybe you could try a 32 bit version of Lazarus/FPC to see if it's even faster than that 3,5 seconds in multi-thread on your PC.
And maybe somebody else can explain why it's so much slower on 64 bit.
I'll give it a go - that's a huge difference in performance!
AMD 1800X
Manjaro KDE
Rarely Windows 10 Pro

Nitorami

  • Sr. Member
  • ****
  • Posts: 496
Re: Parallel Code
« Reply #13 on: February 16, 2018, 05:09:59 pm »
The difference in speed between 32 and 64 bit is indeed puzzling. In my own average programs, I normally experience differences of +/-20% , but not factor 2 or 3.

Where speed matters, it is a good idea to use native unsigned integers at least in tight loops. The type is defined in FPC as nativeuint, equivalent to qword on 64bit systems, and to dword on 32 bit. Indeed when changing the local variables within the IsPrime function to nativeuint, I get 20% better performance, but still a factor 2 slower than the 32bit version.

rvk

  • Hero Member
  • *****
  • Posts: 6163
Re: Parallel Code
« Reply #14 on: February 16, 2018, 05:30:46 pm »
At least it's not a thread problem.
Because the both SingleThread and Multithread are about 2,5 to 3 times slower.

So the problem is indeed in IsPrime() and the difference in 32 bit and 64 bit compiler.

 

TinyPortal © 2005-2018