Recent

Author Topic: Fast Base64 encoding/decoding  (Read 2361 times)

mikerabat

  • New Member
  • *
  • Posts: 39
Re: Fast Base64 encoding/decoding
« Reply #15 on: June 02, 2023, 02:53:39 pm »
I guess mORMot speed is higher, near 7 GB/s.
You are right.
New test results including mORMot

FastBase64
encoding      2.703 GB/s
decoding      2.915 GB/s

mORMot2 (AVX2 version)
encoding      4.386 GB/s
decoding      5.587 GB/s

mORMot2 (pascal version)
encoding      1.274 GB/s
decoding      1.383 GB/s

FCL base64
encoding      0.123 GB/s
decoding      0.059 GB/s

Those speeds are measured with large ~1GB random data. With small inputs speeds would be different.


Anyway, unrolling 4 times to a 96 byte block saves some redundant loads, so there is room for improvement.

But for who is encoding/decoding mail attachments so critical?
Compared to mORMot2 pascal implementation of base64, we have a lot to improve. Maybe "copy&paste" from mORMot2 would work?
Very simple improvement would be to replace Move(...,3) in FCL base64 encode with single simple assign and that would double the speed.


How did you actually measure the performance?

I tried to integrate the parts necessary from mormot into my project and ended up with the following code snipplet:

Code: Pascal  [Select][+][-]
  1. program FPCBenchmark;
  2.  
  3. uses
  4.   SysUtils,
  5.   idGlobal,
  6.   Classes,
  7.   idCoderMime,
  8.   Windows,
  9.   FastBase64,
  10.   AVXBase64_x64,
  11.   AVXBase64_x86,
  12.   mormot.core.base;
  13.  
  14. {$IFDEF CPUX64}
  15. {$DEFINE x64}
  16. {$ENDIF}
  17. {$IFDEF cpux86_64}
  18. {$DEFINE x64}
  19. {$ENDIF}
  20.  
  21. procedure mBase64( dest : PAnsiChar; buf : PByte; len : integer );
  22. var b64 : PAnsiChar;
  23.     b : PAnsiChar;
  24.     bLen : PtrUint;
  25. begin
  26.      b64 := dest;
  27.      b := PAnsiChar(buf);
  28.      bLen := PtrUint( len );
  29.  
  30.      {$IFDEF x64}
  31.      Base64EncodeAvx2( b, blen, b64 );
  32.      {$ENDIF}
  33. end;
  34.  
  35. procedure LaregeBench;
  36. var startTime, EndTime, freq : Int64;
  37.     buf : TidBytes;
  38.     i: integer;
  39.     j: Integer;
  40.     base64Buf : RawByteString;
  41. const cLargeBuf = 512*1024*1024;
  42.       cNumRounds = 10;
  43. begin
  44.      QueryPerformanceFrequency(freq);
  45.         //RandSeed := 79929;     // just a seed to get the same results for debugging,...
  46.         Randomize;
  47.  
  48.         SetLength(buf, cLargeBuf);
  49.         for j := 0 to Length(buf) - 1 do
  50.             buf[j] := Random(255);
  51.  
  52.         SetLength(base64Buf, ((cLargeBuf + 2) div 3) * 4);
  53.  
  54.         {$IFDEF x64}
  55.         QueryPerformanceCounter(startTime);
  56.         for i := 1 to cNumRounds do
  57.             mBase64(PAnsiChar(base64Buf), @buf[0], Length(buf) );
  58.         QueryPerformanceCounter(endtime);
  59.         Writeln(Format('mormot encoding took: %.3fms', [(endTime - startTime)/freq*1000]));
  60.         Writeln(Format('mormot  MB/s: %.3f', [ (cNumRounds*Length(buf)/(1024*1024) )/((endTime - startTime)/freq)]));
  61.         {$ENDIF}
  62.  
  63.         QueryPerformanceCounter(startTime);
  64.         for i := 1 to cNumRounds do
  65.             Base64Encode(PAnsiChar(base64Buf), @buf[0], Length(buf), True );
  66.         QueryPerformanceCounter(endtime);
  67.         Writeln(Format('FastBase64 encoding took: %.3fms', [(endTime - startTime)/freq*1000]));
  68.         Writeln(Format('FastBase64 MB/s: %.3f', [ (cNumRounds*Int64(cLargeBuf)/(1024*1024) )/((endTime - startTime)/freq)]));
  69. end;
  70.  
  71. begin
  72.   try
  73.     { TODO -oUser -cConsole Main : Code hier einfügen }
  74.      LaregeBench;
  75.     readln;
  76.   except
  77.     on E: Exception do
  78.       Writeln(E.ClassName, ': ', E.Message);
  79.   end;
  80. end.      
  81.  
 

Depending if I run mormot base64 first and then my code I get:

mormot encoding took: 580,650ms
mormot  MB/s: 8817,704
FastBase64 encoding took: 535,960ms
FastBase64 MB/s: 9552,957

and the other way round:
FastBase64 encoding took: 577,464ms
FastBase64 MB/s: 8866,347
mormot encoding took: 537,666ms
mormot  MB/s: 9522,633

so I would say these are even ;)




mika

  • Full Member
  • ***
  • Posts: 100
Re: Fast Base64 encoding/decoding
« Reply #16 on: June 02, 2023, 10:53:10 pm »
How did you actually measure the performance?
You are right that your Fast Base64 is equally good to mORMot implementation.
I wrote this new new test that will show exactly how I got results that I got.

Code: Pascal  [Select][+][-]
  1. program FPCBenchmark;
  2. {$ifdef fpc}{$mode delphi}{$endif}
  3. {$r-}
  4. uses
  5.       SysUtils,
  6.       FastBase64,
  7.       mormot.core.buffers;
  8.  
  9. {$define mormotFirst}    //-- by disable or enable swap test order
  10. {$define measure_2nd_Run} //-- for fast option measure 1st or 2nd run
  11.  
  12. {$define fastFastBase64} //-- make fast FastBase64  1st run 4.5, 2nd run  5.5 GB/s    (disabled 2.7 GB/s)
  13. {$define fastMormot}     //-- make fast Mormot      1st run 4.4, 2nd run  5.5 GB/s    (disabled 2.7 GB/s)
  14.  
  15. function toSeconds (d : double):double;
  16. var a, b : double;
  17.      hh,mm,ss,ms : word;
  18. begin
  19.      DeCodeTime (d,Hh,MM,SS,MS);
  20.      if (hh=0) and (mm=0) and (ss=0) and (ms=0) then hh:=$ffff; //-- avoid division by zero
  21.      toSeconds:=hh*60*60+mm*60+ss+ms/1000;
  22. end;
  23.  
  24.  
  25. procedure activateBuffer(destbuf, buf:pansichar; len : qword);
  26. var k : qword;
  27. begin
  28.      if len>0 then
  29.      for k:=0 to len-1 do
  30.      begin
  31.           //-- do whatever (but do something)
  32.           destbuf[k]:=char(byte(buf[k])+byte(k));
  33.      end;
  34. end;
  35.  
  36. procedure LaregeBench;
  37. var buf : PByte;
  38.     destBuf : PByte;
  39.     i: qword;
  40.     j: qword;
  41.     base64Buf : RawByteString;
  42.     sTim, eTim : double;
  43.  
  44.  
  45. const cLargeBuf = 512*1024*1024;
  46.       cNumRounds = 1; ///---- should be "one"
  47.       cLargeBuf2 = ((cLargeBuf + 2) div 3) * 4;
  48.  
  49. begin
  50.      //RandSeed := 79929;     // just a seed to get the same results for debugging,...
  51.      Randomize;
  52.  
  53.      getmem(buf,cLargeBuf);
  54.      getmem(destbuf,cLargeBuf2);
  55.      writeln('Generating data..');
  56.      for j := 0 to cLargeBuf - 1 do
  57.          buf[j] := Random(255);
  58.  
  59.      SetLength(base64Buf, ((cLargeBuf + 2) div 3) * 4);
  60.  
  61.      writeln('Testing..');
  62.  
  63.      //-- test boosters (or not)
  64.      {$ifdef fastMormot}
  65.      activateBuffer( PAnsiChar(destbuf), @buf[0] ,cLargeBuf);
  66.      {$endif}
  67.      {$ifdef fastFastBase64}
  68.      activateBuffer( PAnsiChar(base64Buf), @buf[0], cLargeBuf);
  69.      {$endif}
  70.  
  71.      //-- booster for 2nd run
  72.      {$ifdef measure_2nd_Run}
  73.      {$ifdef fastMormot}
  74.      mormot.core.buffers.Base64Encode( PAnsiChar(destbuf), @buf[0] ,cLargeBuf);
  75.      {$endif}
  76.      {$ifdef fastFastBase64}
  77.      FastBase64.Base64Encode(PAnsiChar(base64Buf), @buf[0], cLargeBuf);
  78.      {$endif}
  79.      {$endif}
  80.  
  81.      //-- test one
  82.      stim:=now();
  83.      for i := 1 to cNumRounds do
  84.      begin
  85.          {$ifdef mormotFirst}
  86.          mormot.core.buffers.Base64Encode( PAnsiChar(destbuf), @buf[0] ,cLargeBuf);
  87.          {$else}
  88.          FastBase64.Base64Encode(PAnsiChar(base64Buf), @buf[0], cLargeBuf );
  89.          {$endif}
  90.      end;
  91.      etim:=now();
  92.  
  93.  
  94.      {$ifdef mormotFirst}
  95.      Writeln('mormot encoding took: ',toSeconds(etim-stim):12:3,' sec');
  96.      {$ifdef fastMormot}write('Fast ');{$ifdef measure_2nd_Run}write('2nd run ');{$else}write('1st run ');{$endif}{$endif}
  97.      Writeln(Format('mormot  GB/s: %.3f', [  (cLargeBuf*cNumRounds/1024/1024/1024)/toSeconds(etim-stim) ]));
  98.      {$else}
  99.      Writeln('FastBase64 encoding took: ',toSeconds(etim-stim):12:3,' sec');
  100.      {$ifdef fastFastBase64}write('Fast ');{$ifdef measure_2nd_Run}write('2nd run ');{$else}write('1st run ');{$endif}{$endif}
  101.      Writeln(Format('FastBase64 GB/s: %.3f', [  (cLargeBuf*cNumRounds/1024/1024/1024)/ toSeconds(etim-stim)  ]));
  102.      {$endif}
  103.  
  104.      //-- test two
  105.      stim:=now();
  106.      for i := 1 to cNumRounds do
  107.      begin
  108.          {$ifndef mormotFirst}
  109.          mormot.core.buffers.Base64Encode( PAnsiChar(destbuf), @buf[0] ,cLargeBuf);
  110.          {$else}
  111.          FastBase64.Base64Encode(PAnsiChar(base64Buf), @buf[0], cLargeBuf );
  112.          {$endif}
  113.      end;
  114.      etim:=now();
  115.  
  116.      {$ifndef mormotFirst}
  117.      Writeln('mormot encoding took: ',toSeconds(etim-stim):12:3,' sec');
  118.      {$ifdef fastMormot}write('Fast ');{$ifdef measure_2nd_Run}write('2nd run ');{$else}write('1st run ');{$endif}{$endif}
  119.      Writeln(Format('mormot  GB/s: %.3f', [  (cLargeBuf*cNumRounds/1024/1024/1024)/toSeconds(etim-stim) ]));
  120.      {$else}
  121.      Writeln('FastBase64 encoding took: ',toSeconds(etim-stim):12:3,' sec');
  122.      {$ifdef fastFastBase64}write('Fast ');{$ifdef measure_2nd_Run}write('2nd run ');{$else}write('1st run ');{$endif}{$endif}
  123.      Writeln(Format('FastBase64 GB/s: %.3f', [  (cLargeBuf*cNumRounds/1024/1024/1024)/ toSeconds(etim-stim)  ]));
  124.      {$endif}
  125.  
  126.      freemem(buf,cLargeBuf);
  127.      freemem(destbuf,cLargeBuf2);
  128. end;
  129.  
  130. begin
  131.      LaregeBench;
  132.      readln;
  133. end.
  134.  

Run with
{ $define fastFastBase64} //-- make fast FastBase64
{$define fastMormot}     //-- make fast Mormot

FastBase64 encoding took:        0.181 sec
FastBase64 GB/s: 2.762
mormot encoding took:        0.091 sec
mormot  GB/s: 5.495

Run with
{$define fastFastBase64} //-- make fast FastBase64
{ $define fastMormot}     //-- make fast Mormot

FastBase64 encoding took:        0.089 sec
FastBase64 GB/s: 5.618
mormot encoding took:        0.191 sec
mormot  GB/s: 2.618


Pay attention that output buffer is not the same location. If test is run only once with desired length then results are significantly worse.

Edit:
Changing Benchmark example better highlight problem. First time accessing large newly allocated memory block adds additional execution time.
@mikerabat I was using your FPCBenchmark.ppr and changing last test size to 1GB and after FastBase64 test i added mormot test and it "naturally" performed better. You might want to make some changes to avoid future misrepresentation of benchmark results.

2nd Edit:
Changed Benchmark example again.
There are 3 measurements possible
1. output memory have not be accessed before  (2.7 GB/s)
2. output memory accessed before, 1st run (4,5 GB/s)
3. output memory accessed before, 2nd run (5,5 GB/s)
« Last Edit: June 03, 2023, 06:40:59 am by mika »

 

TinyPortal © 2005-2018