Recent

Author Topic: efficiency problem  (Read 49651 times)

Paolo

  • Hero Member
  • *****
  • Posts: 569
Re: efficiency problem
« Reply #90 on: February 17, 2025, 06:27:40 pm »
well done Photor,

but why didn't you use I-K-J or K-I-J looping, they are row oriented, and I see they are much faster. Later I'll post my results.
« Last Edit: February 17, 2025, 07:12:22 pm by Paolo »

LV

  • Full Member
  • ***
  • Posts: 237
Re: efficiency problem
« Reply #91 on: February 17, 2025, 11:58:38 pm »
After so many years, I just realized that one of the main reasons is that matrices of Pascal (like C) are row major instead of column major, so the Multiply function in the code should be modified as
Code: Pascal  [Select][+][-]
  1. Function Multiply(Const mat1,mat2:Matrix):Matrix;
  2. Var
  3.  i,j,k:Word;
  4.  sum:Float;
  5.  v:Vector;
  6. Begin
  7.  For j:=0 to n-1 do
  8.   Begin
  9.    For k:=0 to n-1 do v[k]:=mat2[k,j];
  10.    For i:=0 to n-1 do
  11.    Begin
  12.     sum:=0;
  13.     For k:=0 to n-1 do sum+=mat1[i,k]*v[k];
  14.     Multiply[i,j]:=sum;
  15.    End;
  16.   End;
  17. End;
  18.  
Now running the code takes about 1.5 seconds, instead of 11 seconds.

It seems that all computers have been multi-core for a long time. We should utilize this capability. :-[

Laptop AMD Ryzen 7 4700U:

Single-threaded program

Code: Pascal  [Select][+][-]
  1. program MatrixMultiplication;
  2.  
  3. uses
  4.   SysUtils,
  5.   Math,
  6.   DateUtils;
  7.  
  8. const
  9.   N = 1000;
  10.  
  11. type
  12.   TMatrix = array [0..N - 1, 0..N - 1] of double;
  13.   TVector = array [0..N - 1] of double;
  14.  
  15. var
  16.   A, B, C: TMatrix;
  17.   StartTime, EndTime: TDateTime;
  18.  
  19.   procedure InitializeMatrix(var M: TMatrix);
  20.   var
  21.     i, j: integer;
  22.   begin
  23.     for i := 0 to N - 1 do
  24.       for j := 0 to N - 1 do
  25.         M[i, j] := Random;
  26.   end;
  27.  
  28.   function Multiply(const mat1, mat2: TMatrix): TMatrix;
  29.   var
  30.     i, j, k: word;
  31.     sum: double;
  32.     v: TVector;
  33.     res: TMatrix;
  34.   begin
  35.     FillChar(res, SizeOf(res), 0);
  36.     for j := 0 to N - 1 do
  37.     begin
  38.       for k := 0 to N - 1 do v[k] := mat2[k, j];
  39.       for i := 0 to N - 1 do
  40.       begin
  41.         sum := 0;
  42.         for k := 0 to N - 1 do sum := sum + mat1[i, k] * v[k];
  43.         res[i, j] := sum;
  44.       end;
  45.     end;
  46.     Multiply := res;
  47.   end;
  48.  
  49. begin
  50.   Randomize;
  51.   InitializeMatrix(A);
  52.   InitializeMatrix(B);
  53.  
  54.   StartTime := Now;
  55.   C := Multiply(A, B);
  56.   EndTime := Now;
  57.  
  58.   Writeln('Execution Time: ', MilliSecondsBetween(EndTime, StartTime), ' ms');
  59.   readln;
  60. end.
  61.  

Execution Time: 876 ms  :(

Multi-threaded program

Code: Pascal  [Select][+][-]
  1. program ParMatrixMultiplication;
  2.  
  3. uses
  4.   SysUtils,
  5.   Classes,
  6.   Math,
  7.   DateUtils;
  8.  
  9. const
  10.   N = 1000; // Matrix size
  11.   ThreadCount = 8; // Number of threads
  12.  
  13. type
  14.   TMatrix = array [0..N - 1, 0..N - 1] of double;
  15.   TVector = array [0..N - 1] of double;
  16.  
  17. var
  18.   A, B, C: TMatrix;
  19.   StartTime, EndTime: TDateTime;
  20.  
  21.   // Matrix initialization with random values
  22.   procedure InitializeMatrix(var M: TMatrix);
  23.   var
  24.     i, j: integer;
  25.   begin
  26.     for i := 0 to N - 1 do
  27.       for j := 0 to N - 1 do
  28.         M[i, j] := Random;
  29.   end;
  30.  
  31.   // Thread for matrix multiplication
  32. type
  33.   TMatrixMultiplier = class(TThread)
  34.   private
  35.     FStartRow, FEndRow: integer;
  36.   protected
  37.     procedure Execute; override;
  38.   public
  39.     constructor Create(StartRow, EndRow: integer);
  40.   end;
  41.  
  42.   constructor TMatrixMultiplier.Create(StartRow, EndRow: integer);
  43.   begin
  44.     inherited Create(True);
  45.     FStartRow := StartRow;
  46.     FEndRow := EndRow;
  47.     FreeOnTerminate := False;
  48.   end;
  49.  
  50.   procedure TMatrixMultiplier.Execute;
  51.   var
  52.     i, j, k: integer;
  53.     sum: double;
  54.     v: TVector;
  55.   begin
  56.     for j := 0 to N - 1 do
  57.     begin
  58.       // Optimization: copy column of matrix B to vector
  59.       for k := 0 to N - 1 do
  60.         v[k] := B[k, j];
  61.  
  62.       // Multiplication for the row range
  63.       for i := FStartRow to FEndRow do
  64.       begin
  65.         sum := 0;
  66.         for k := 0 to N - 1 do
  67.           sum := sum + A[i, k] * v[k];
  68.         C[i, j] := sum;
  69.       end;
  70.     end;
  71.   end;
  72.  
  73.   // Main program
  74. var
  75.   Threads: array of TMatrixMultiplier;
  76.   i, Step: integer;
  77. begin
  78.   Randomize;
  79.   InitializeMatrix(A);
  80.   InitializeMatrix(B);
  81.   SetLength(Threads, ThreadCount);
  82.  
  83.   StartTime := Now;
  84.  
  85.   // Create and start threads
  86.   Step := N div ThreadCount;
  87.   for i := 0 to ThreadCount - 1 do
  88.   begin
  89.     if i < ThreadCount - 1 then
  90.       Threads[i] := TMatrixMultiplier.Create(i * Step, (i + 1) * Step - 1)
  91.     else
  92.       Threads[i] := TMatrixMultiplier.Create(i * Step, N - 1);
  93.     Threads[i].Start;
  94.   end;
  95.  
  96.   // Wait for all threads to finish
  97.   for i := 0 to ThreadCount - 1 do
  98.   begin
  99.     Threads[i].WaitFor;
  100.     Threads[i].Free;
  101.   end;
  102.  
  103.   EndTime := Now;
  104.  
  105.   Writeln('Execution Time: ', MilliSecondsBetween(EndTime, StartTime), ' ms');
  106.   Readln;
  107. end.
  108.  

Execution Time: 164 ms  ;)
« Last Edit: February 18, 2025, 12:01:50 am by LV »

photor

  • Jr. Member
  • **
  • Posts: 79
Re: efficiency problem
« Reply #92 on: February 18, 2025, 10:18:54 am »
Execution Time: 164 ms  ;)

Good job, LV! But I intend to fairly compare the single-threaded speed first 8)
« Last Edit: February 23, 2025, 12:01:46 pm by photor »

photor

  • Jr. Member
  • **
  • Posts: 79
Re: efficiency problem
« Reply #93 on: February 18, 2025, 10:26:22 am »
well done Photor,

but why didn't you use I-K-J or K-I-J looping, they are row oriented, and I see they are much faster. Later I'll post my results.

Looking forward to seeing your codes and results  8)
« Last Edit: February 23, 2025, 12:01:23 pm by photor »

ALLIGATOR

  • Full Member
  • ***
  • Posts: 131
Re: efficiency problem
« Reply #94 on: February 18, 2025, 12:01:10 pm »
Code: Pascal  [Select][+][-]
  1. program MatrixMultiplication;
  2. {$mode objfpc}
  3. {$optimization on}
  4.  
  5. uses
  6.   SysUtils,
  7.   DateUtils;
  8.  
  9. const
  10.   N = 1000;
  11.  
  12. type
  13.   TMatrix = array [0..N - 1, 0..N - 1] of double;
  14.   TVector = array [0..N - 1] of double;
  15.  
  16. var
  17.   A, B: TMatrix;
  18.   StartTime: TDateTime;
  19.  
  20.   procedure InitializeMatrix(out M: TMatrix);
  21.   var
  22.     i, j: integer;
  23.   begin
  24.     for i := 0 to N - 1 do
  25.       for j := 0 to N - 1 do
  26.         M[i, j] := Random;
  27.   end;
  28.  
  29.   function Multiply(const mat1, mat2: TMatrix): TMatrix;
  30.   var
  31.     i, j, k: NativeUInt;
  32.     sum: double;
  33.     v: TVector;
  34.   begin
  35.     for j := 0 to N - 1 do
  36.     begin
  37.       for k := 0 to N - 1 do v[k] := mat2[k, j];
  38.       for i := 0 to N - 1 do
  39.       begin
  40.         sum := 0;
  41.         k:=0;
  42.         while k < N div 8 do
  43.         begin
  44.           sum += mat1[i, k+0] * v[k+0]
  45.                + mat1[i, k+1] * v[k+1]
  46.                + mat1[i, k+2] * v[k+2]
  47.                + mat1[i, k+3] * v[k+3]
  48.                + mat1[i, k+4] * v[k+4]
  49.                + mat1[i, k+5] * v[k+5]
  50.                + mat1[i, k+6] * v[k+6]
  51.                + mat1[i, k+7] * v[k+7];
  52.           inc(k, 8);
  53.         end;
  54.         Result[i, j] := sum;
  55.       end;
  56.     end;
  57.   end;
  58.  
  59. begin
  60.   InitializeMatrix(A);
  61.   InitializeMatrix(B);
  62.  
  63.   StartTime := Now;
  64.   Multiply(A, B);
  65.   Writeln('Execution Time: ', MilliSecondsBetween(Now, StartTime), ' ms');
  66.  
  67.   ReadLn;
  68. end.
  69.  

I may have done something wrong, I can't figure it out, my head is busy with other things, but this code speeds up calculations significantly (multiple times). But I repeat - it requires verification.
« Last Edit: February 18, 2025, 12:19:21 pm by ALLIGATOR »

Paolo

  • Hero Member
  • *****
  • Posts: 569
Re: efficiency problem
« Reply #95 on: February 18, 2025, 12:26:51 pm »
You are doing un-rolling, yes it can help.

photor

  • Jr. Member
  • **
  • Posts: 79
Re: efficiency problem
« Reply #96 on: February 18, 2025, 12:32:16 pm »
I may have done something wrong, I can't figure it out, my head is busy with other things, but this code speeds up calculations significantly (multiple times). But I repeat - it requires verification.

What's your command line options for compilation?
« Last Edit: February 23, 2025, 12:01:16 pm by photor »

Thaddy

  • Hero Member
  • *****
  • Posts: 16628
  • Kallstadt seems a good place to evict Trump to.
Re: efficiency problem
« Reply #97 on: February 18, 2025, 12:32:55 pm »
You are doing un-rolling, yes it can help.
The compiler does that for you and often more efficient.
But I am sure they don't want the Trumps back...

photor

  • Jr. Member
  • **
  • Posts: 79
Re: efficiency problem
« Reply #98 on: February 18, 2025, 01:04:29 pm »
Code: Pascal  [Select][+][-]
  1. program MatrixMultiplication;
  2. {$mode objfpc}
  3. {$optimization on}
  4.  
  5. uses
  6.   SysUtils,
  7.   DateUtils;
  8.  
  9. const
  10.   N = 1000;
  11.  
  12. type
  13.   TMatrix = array [0..N - 1, 0..N - 1] of double;
  14.   TVector = array [0..N - 1] of double;
  15.  
  16. var
  17.   A, B: TMatrix;
  18.   StartTime: TDateTime;
  19.  
  20.   procedure InitializeMatrix(out M: TMatrix);
  21.   var
  22.     i, j: integer;
  23.   begin
  24.     for i := 0 to N - 1 do
  25.       for j := 0 to N - 1 do
  26.         M[i, j] := Random;
  27.   end;
  28.  
  29.   function Multiply(const mat1, mat2: TMatrix): TMatrix;
  30.   var
  31.     i, j, k: NativeUInt;
  32.     sum: double;
  33.     v: TVector;
  34.   begin
  35.     for j := 0 to N - 1 do
  36.     begin
  37.       for k := 0 to N - 1 do v[k] := mat2[k, j];
  38.       for i := 0 to N - 1 do
  39.       begin
  40.         sum := 0;
  41.         k:=0;
  42.         while k < N div 8 do
  43.         begin
  44.           sum += mat1[i, k+0] * v[k+0]
  45.                + mat1[i, k+1] * v[k+1]
  46.                + mat1[i, k+2] * v[k+2]
  47.                + mat1[i, k+3] * v[k+3]
  48.                + mat1[i, k+4] * v[k+4]
  49.                + mat1[i, k+5] * v[k+5]
  50.                + mat1[i, k+6] * v[k+6]
  51.                + mat1[i, k+7] * v[k+7];
  52.           inc(k, 8);
  53.         end;
  54.         Result[i, j] := sum;
  55.       end;
  56.     end;
  57.   end;
  58.  
  59. begin
  60.   InitializeMatrix(A);
  61.   InitializeMatrix(B);
  62.  
  63.   StartTime := Now;
  64.   Multiply(A, B);
  65.   Writeln('Execution Time: ', MilliSecondsBetween(Now, StartTime), ' ms');
  66.  
  67.   ReadLn;
  68. end.
  69.  

I may have done something wrong, I can't figure it out, my head is busy with other things, but this code speeds up calculations significantly (multiple times). But I repeat - it requires verification.

"while k < N div 8 do" is wrong.
« Last Edit: February 23, 2025, 12:00:57 pm by photor »

Paolo

  • Hero Member
  • *****
  • Posts: 569
Re: efficiency problem
« Reply #99 on: February 18, 2025, 02:18:36 pm »
Now I am busy, not checked  :o

Code: Pascal  [Select][+][-]
  1.  
  2. program MatrixMultiplication;
  3. {$mode objfpc}
  4. {$optimization on}
  5.  
  6. uses
  7.   SysUtils,
  8.   DateUtils;
  9.  
  10. const
  11.   N = 1000;
  12.  
  13. type
  14.   TMatrix = array [0..N - 1, 0..N - 1] of double;
  15.   TVector = array [0..N - 1] of double;
  16.  
  17. var
  18.   A, B: TMatrix;
  19.   StartTime: TDateTime;
  20.  
  21.   procedure InitializeMatrix(out M: TMatrix);
  22.   var
  23.     i, j: integer;
  24.   begin
  25.     for i := 0 to N - 1 do
  26.       for j := 0 to N - 1 do
  27.         M[i, j] := Random;
  28.   end;
  29.  
  30.   function Multiply(const mat1, mat2: TMatrix): TMatrix;
  31.   var
  32.     i, j, k,s: NativeUInt;
  33.     sum: double;
  34.     V: Tvector;
  35.   Totalblocksize: integer;
  36.   begin
  37.   Totalblocksize:=8*((n)div(8));
  38.     for j := 0 to N - 1 do
  39.     begin
  40.       for k := 0 to N - 1 do v[k] := mat2[k, j];
  41.       for i := 0 to N - 1 do
  42.       begin
  43.         sum := 0;
  44.         k:=0;
  45.         while k < Totalblocksize do
  46.         begin
  47.           sum += mat1[i, k+0] * v[k+0]
  48.                + mat1[i, k+1] * v[k+1]
  49.                + mat1[i, k+2] * v[k+2]
  50.                + mat1[i, k+3] * v[k+3]
  51.                + mat1[i, k+4] * v[k+4]
  52.                + mat1[i, k+5] * v[k+5]
  53.                + mat1[i, k+6] * v[k+6]
  54.                + mat1[i, k+7] * v[k+7];
  55.           inc(k, 8);
  56.         end;
  57.         For s:=k to n-1 do
  58.            Sum:=sum+mat1[i,k]*v[k];
  59.         Result[i, j] := sum;
  60.       end;
  61.     end;
  62.   end;
  63.  

ALLIGATOR

  • Full Member
  • ***
  • Posts: 131
Re: efficiency problem
« Reply #100 on: February 18, 2025, 02:40:08 pm »
"while k < N div 8 do" is wrong.

 :D
Ha ha! ) You're right! A keen eye ) Thanks!!!

The corrected version, on my hardware (i7-8750H, laptop) gives x2

Code: Pascal  [Select][+][-]
  1. program MatrixMultiplication;
  2. {$mode objfpc}
  3. {$optimization on}
  4.  
  5. uses
  6.   SysUtils,
  7.   DateUtils, Math;
  8.  
  9. const
  10.   N = 1000;
  11.  
  12. type
  13.   TMatrix = array [0..N - 1, 0..N - 1] of double;
  14.   TVector = array [0..N - 1] of double;
  15.  
  16. var
  17.   A, B, R1, R2: TMatrix;
  18.   StartTime: TDateTime;
  19.  
  20.   procedure InitializeMatrix(out M: TMatrix);
  21.   var
  22.     i, j: integer;
  23.   begin
  24.     for i := 0 to N - 1 do
  25.       for j := 0 to N - 1 do
  26.         M[i, j] := Random;
  27.   end;
  28.  
  29.   function Multiply1(const mat1, mat2: TMatrix): TMatrix;
  30.    var
  31.      i, j, k: NativeUInt;
  32.      sum: double;
  33.      v: TVector;
  34.    begin
  35.      for j := 0 to N - 1 do
  36.      begin
  37.        for k := 0 to N - 1 do v[k] := mat2[k, j];
  38.        for i := 0 to N - 1 do
  39.        begin
  40.          sum := 0;
  41.          for k := 0 to N - 1 do
  42.          begin
  43.            sum := sum + mat1[i, k] * v[k];
  44.          end;
  45.          // fix
  46.          while k < N do
  47.          begin
  48.            sum += mat1[i, k] * v[k];
  49.            inc(k);
  50.          end;
  51.          // fix
  52.          Result[i, j] := sum;
  53.        end;
  54.      end;
  55.    end;
  56.  
  57.   function Multiply2(const mat1, mat2: TMatrix): TMatrix;
  58.   var
  59.     i, j, k: NativeUInt;
  60.     sum: double;
  61.     v: TVector;
  62.   begin
  63.     for j := 0 to N - 1 do
  64.     begin
  65.       for k := 0 to N - 1 do v[k] := mat2[k, j];
  66.       for i := 0 to N - 1 do
  67.       begin
  68.         sum := 0;
  69.         k:=0;
  70.         while k < N do
  71.         begin
  72.           sum += mat1[i, k+0] * v[k+0]
  73.                + mat1[i, k+1] * v[k+1]
  74.                + mat1[i, k+2] * v[k+2]
  75.                + mat1[i, k+3] * v[k+3];
  76.           inc(k, 4);
  77.         end;
  78.         Result[i, j] := sum;
  79.       end;
  80.     end;
  81.   end;
  82.  
  83. function IsSame(const mat1, mat2: TMatrix): Boolean;
  84. var
  85.   i, k: NativeUInt;
  86. begin
  87.   Result := True;
  88.   for i := 0 to N - 1 do
  89.     for k := 0 to N - 1 do
  90.       if not SameValue(mat1[i, k], mat2[i, k]) then Exit(False);
  91. end;
  92.  
  93. begin
  94.   Randomize;
  95.   InitializeMatrix(A);
  96.   InitializeMatrix(B);
  97.  
  98.   StartTime := Now;
  99.   R1 := Multiply1(A, B);
  100.   Writeln('Execution Time 1: ', MilliSecondsBetween(Now, StartTime), ' ms');
  101.  
  102.   StartTime := Now;
  103.   R2 := Multiply2(A, B);
  104.   Writeln('Execution Time 2: ', MilliSecondsBetween(Now, StartTime), ' ms');
  105.  
  106.   WriteLn('IsSame: ', IsSame(R1, R2));
  107.  
  108.   ReadLn;
  109. end.
  110.  

Code: Pascal  [Select][+][-]
  1. Execution Time 1: 1164 ms
  2. Execution Time 2: 653 ms
  3. IsSame: TRUE
  4.  

« Last Edit: February 18, 2025, 03:20:20 pm by ALLIGATOR »

Paolo

  • Hero Member
  • *****
  • Posts: 569
Re: efficiency problem
« Reply #101 on: February 18, 2025, 03:12:36 pm »
@Alligator,
but it is still wrong. You are assuming "n" be a multiple of 4  but that us not general case.

ALLIGATOR

  • Full Member
  • ***
  • Posts: 131
Re: efficiency problem
« Reply #102 on: February 18, 2025, 03:15:05 pm »
@Alligator,
but it is still wrong. You are assuming "n" be a multiple of 4  but that us not general case.
Of course! It is! I believe this is sufficient to show the acceleration effect itself.

UPD: Corrected the code to be correct for N not divisible by the loop promotion value
seems to be wrong, but it doesn't change the essence - unrolling the loop body - gives acceleration, depending on the hardware, on mine - x2.
« Last Edit: February 18, 2025, 03:54:13 pm by ALLIGATOR »

ALLIGATOR

  • Full Member
  • ***
  • Posts: 131
Re: efficiency problem
« Reply #103 on: February 18, 2025, 05:16:37 pm »
Code: Pascal  [Select][+][-]
  1. Execution Time 1: 1245 ms
  2. Execution Time 2: 543 ms
  3. IsSame: TRUE
  4.  

Code: Pascal  [Select][+][-]
  1. program MatrixMultiplication;
  2. {$mode objfpc}
  3. {$optimization on}
  4.  
  5. uses
  6.   SysUtils,
  7.   DateUtils, Math;
  8.  
  9. const
  10.   N = 1023; // prime
  11.   unroll = 4;
  12.  
  13. type
  14.   TMatrix = array of double;
  15.   TVector = array of double;
  16.  
  17. var
  18.   A, B, R1, R2: TMatrix;
  19.   StartTime: TDateTime;
  20.  
  21.   procedure InitializeMatrix(var M: TMatrix);
  22.   var
  23.     i, j: integer;
  24.   begin
  25.     for i := 0 to N - 1 do
  26.       for j := 0 to N - 1 do
  27.         M[i*N + j] := Random;
  28.   end;
  29.  
  30.   function Multiply1(const mat1, mat2: TMatrix): TMatrix;
  31.    var
  32.      i, j, k: NativeUInt;
  33.      sum: double;
  34.      v: TVector;
  35.    begin
  36.      SetLength(v, N);
  37.      SetLength(Result, N*N);
  38.      for j := 0 to N - 1 do
  39.      begin
  40.        for k := 0 to N - 1 do v[k] := mat2[k*N + j];
  41.        for i := 0 to N - 1 do
  42.        begin
  43.          sum := 0;
  44.          for k := 0 to N - 1 do
  45.          begin
  46.            sum := sum + mat1[i*N + k] * v[k];
  47.          end;
  48.          Result[i*N + j] := sum;
  49.        end;
  50.      end;
  51.    end;
  52.  
  53.   function Multiply2(const mat1, mat2: TMatrix): TMatrix;
  54.   var
  55.     i, j, k: NativeUInt;
  56.     sum: double;
  57.     sum1,sum2,sum3,sum4: double;
  58.     v: TVector;
  59.     pm, pv: PDouble;
  60.   begin
  61.     SetLength(v, N);
  62.     SetLength(Result, N*N);
  63.     pv:=@v[0];
  64.     for j := 0 to N - 1 do
  65.     begin
  66.       for k := 0 to N - 1 do v[k] := mat2[k*N + j];
  67.       for i := 0 to N - 1 do
  68.       begin
  69.         sum := 0;
  70.         sum1:=0; sum2:=0; sum3:=0; sum4:=0;
  71.         k:=0;
  72.         pm:=@mat1[i*N];
  73.         while k < (N-unroll+1) do
  74.         begin
  75.           sum1 += pm[k+0] * pv[k+0];
  76.           sum2 += pm[k+1] * pv[k+1];
  77.           sum3 += pm[k+2] * pv[k+2];
  78.           sum4 += pm[k+3] * pv[k+3];
  79.  
  80.           inc(k, unroll);
  81.         end;
  82.  
  83.         sum:=sum1+sum2+sum3+sum4;
  84.  
  85.         while k < N do
  86.         begin
  87.           sum += mat1[i*N + k] * v[k];
  88.           inc(k);
  89.         end;
  90.         Result[i*N + j] := sum;
  91.       end;
  92.     end;
  93.   end;
  94.  
  95. function IsSame(const mat1, mat2: TMatrix): Boolean;
  96. var
  97.   i, k: NativeUInt;
  98. begin
  99.   Result := True;
  100.   for i := 0 to N - 1 do
  101.     for k := 0 to N - 1 do
  102.       if not SameValue(mat1[i*N + k], mat2[i*N + k]) then Exit(False);
  103. end;
  104.  
  105.  
  106. begin
  107.   SetLength(A, N*N);
  108.   SetLength(B, N*N);
  109.  
  110.   Randomize;
  111.   InitializeMatrix(A);
  112.   InitializeMatrix(B);
  113.  
  114.   StartTime := Now;
  115.   R1 := Multiply1(A, B);
  116.   Writeln('Execution Time 1: ', MilliSecondsBetween(Now, StartTime), ' ms');
  117.  
  118.   StartTime := Now;
  119.   R2 := Multiply2(A, B);
  120.   Writeln('Execution Time 2: ', MilliSecondsBetween(Now, StartTime), ' ms');
  121.  
  122.   WriteLn('IsSame: ', IsSame(R1, R2));
  123.  
  124.   ReadLn;
  125. end.
  126.  

Further, probably only SIMD will help and various techniques of working with cache

photor

  • Jr. Member
  • **
  • Posts: 79
Re: efficiency problem
« Reply #104 on: February 19, 2025, 06:28:46 am »
@Alligator,
but it is still wrong. You are assuming "n" be a multiple of 4  but that us not general case.
Of course! It is! I believe this is sufficient to show the acceleration effect itself.

UPD: Corrected the code to be correct for N not divisible by the loop promotion value
seems to be wrong, but it doesn't change the essence - unrolling the loop body - gives acceleration, depending on the hardware, on mine - x2.

Good job. Does that mean the fp compiler is not smart enough to do unrolling automatically?
« Last Edit: February 23, 2025, 12:00:21 pm by photor »

 

TinyPortal © 2005-2018