Recent

Author Topic: Benchmark test in nanoseconds  (Read 5146 times)

backprop

  • Full Member
  • ***
  • Posts: 216
Benchmark test in nanoseconds
« on: March 02, 2026, 01:39:05 pm »
I need fines duration of executions of part of the process than using gettickcount64 which is limited to 1ms. On the Windows I have used rdtsc instruction on, but I do not use Windows for a 20 years now. I need platform independent solution, if exists, or at least for Linux.

In C/C++, that granulation exists, but is it this implemented with FPC/Lazarus?
« Last Edit: March 02, 2026, 01:41:20 pm by backprop »

LeP

  • Full Member
  • ***
  • Posts: 244
Re: Benchmark test in nanoseconds
« Reply #1 on: March 02, 2026, 02:34:26 pm »
You can use this, that is the TStopWatch implementation and has 100 nanoseconds resolution and it's standard in all platform (I mean Windows, Linux, and Mac if someone improve it).

How to use:

Code: Pascal  [Select][+][-]
  1. uses Diagnostics;
  2. var Tim1: TStopWatch;
  3.  
  4. begin
  5.   Tim1 := TStopWatch.StartNew;
  6.    .........................
  7.   Tim1.Stop;
  8.   writeln(Tim1.ElaspedTicks); //number of step by 100 nanoseconds elapsed.
  9.   Tim1.Reset;
  10.   Tim1.Start;
  11.     ......................
  12.   Tim1.Stop;
  13.     ................
  14. end;
  15.  
Un Sistema per domarli, un IDE per trovarli, un codice per ghermirli e nel framework incatenarli.
An operating system to tame them, an IDE to find them, a code to catch them and in the framework chain them.

creaothceann

  • Sr. Member
  • ****
  • Posts: 335
Re: Benchmark test in nanoseconds
« Reply #2 on: March 02, 2026, 02:52:45 pm »
Or build your own like this:

Code: Pascal  [Select][+][-]
  1. unit U_HRT;
  2.  
  3.  
  4. // Clock based on the system-wide high-resolution timer.
  5.  
  6.  
  7. {$ModeSwitch AdvancedRecords}
  8.  
  9. interface  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  10. uses
  11.         {$ifdef Unix} Linux, UnixType, {$endif}
  12.         SysUtils;
  13.  
  14. type
  15.         Clock = record
  16.                 type
  17.                         Seconds = Double;
  18.                         Time    = Int64;
  19.  
  20.                 class function GetTime         : Time;             static;
  21.                 class function Convert(const t : Time) : Seconds;  static;  inline;
  22.  
  23.                 class procedure Start;            static;  inline;
  24.                 class procedure Stop;             static;  inline;
  25.                 class function  Delta : Seconds;  static;  inline;
  26.  
  27.                 private
  28.  
  29.                 class var
  30.                         _InternalCounter : Time;
  31.                         _TicksPerSecond  : Int64;
  32.  
  33.                 class procedure _Init;  inline;  static;
  34.  
  35.                 public
  36.  
  37.                 class property Resolution : Int64 read _TicksPerSecond;
  38.                 end;
  39.  
  40.  
  41. implementation  ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  42.  
  43.  
  44. {$ifdef Windows}
  45. function QueryPerformanceCounter  (out i : Clock.Time) : LongBool;  external 'kernel32' name 'QueryPerformanceCounter';
  46. function QueryPerformanceFrequency(out i : Clock.Time) : LongBool;  external 'kernel32' name 'QueryPerformanceFrequency';
  47. {$endif}
  48.  
  49.  
  50. {$ifdef Unix}
  51. function QueryPerformanceCounter  (out i : Clock.Time) : LongBool;  inline;  var t : TimeSpec;  begin  Result := (Clock_GetTime(Clock_Monotonic, @t) >= 0);  if Result then i := t.TV_nsec;  end;
  52. function QueryPerformanceFrequency(out i : Clock.Time) : LongBool;  inline;  var t : TimeSpec;  begin  Result := (Clock_GetRes (Clock_Monotonic, @t) >= 0);  if Result then i := t.TV_nsec;  end;
  53. {$endif}
  54.  
  55.  
  56. class procedure Clock._Init;           inline;  begin  if not QueryPerformanceFrequency(_TicksPerSecond) then raise Exception.Create('could not get clock resolution');  end;
  57. class function  Clock.GetTime : Time;           begin  if not QueryPerformanceCounter  (Result         ) then raise Exception.Create('could not get clock tick'      );  end;
  58.  
  59.  
  60. class procedure Clock.Start;                              inline;  begin  _InternalCounter := GetTime;                     end;
  61. class procedure Clock.Stop;                               inline;  begin  _InternalCounter := GetTime - _InternalCounter;  end;
  62. class function  Clock.Convert(const t : Time) : Seconds;  inline;  begin  Result           := t       / _TicksPerSecond;   end;
  63. class function  Clock.Delta                   : Seconds;  inline;  begin  Result           := Convert(_InternalCounter);   end;
  64.  
  65.  
  66. ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  67.  
  68.  
  69. initialization
  70.         Clock._Init;
  71.  
  72.  
  73. end.

Just call Start, Stop and Delta to get the time in (fractions of) a second.
« Last Edit: March 03, 2026, 08:01:19 am by creaothceann »

LeP

  • Full Member
  • ***
  • Posts: 244
Re: Benchmark test in nanoseconds
« Reply #3 on: March 02, 2026, 03:26:20 pm »
@creaothceann,

TStopWatch should be compatible with Delphi, so porting code from and to is not a problem, and mimic a functionality of Microsoft API.
« Last Edit: March 02, 2026, 03:28:37 pm by LeP »
Un Sistema per domarli, un IDE per trovarli, un codice per ghermirli e nel framework incatenarli.
An operating system to tame them, an IDE to find them, a code to catch them and in the framework chain them.

backprop

  • Full Member
  • ***
  • Posts: 216
Re: Benchmark test in nanoseconds
« Reply #4 on: March 02, 2026, 07:23:58 pm »
You can use this, that is the TStopWatch implementation and has 100 nanoseconds resolution and it's standard in all platform (I mean

The problem with this is that is necessary to give high priority to the current process and before measurement I did something as this, some 20 years ago on windows:

Code: Pascal  [Select][+][-]
  1.   // Priority settings
  2.   // Disabling will make instability
  3.  
  4.   PriorityClass := Windows.GetPriorityClass(GetCurrentProcess);
  5.   PriorityThread:= Windows.GetThreadPriority(GetCurrentThread);
  6.  
  7.  
  8.   // Set thread priority to time critical
  9.   Windows.SetThreadPriority(GetCurrentThread, THREAD_PRIORITY_TIME_CRITICAL);
  10.   Windows.SetPriorityClass(GetCurrentProcess, REALTIME_PRIORITY_CLASS);
  11.  
  12.   // A quarter of a second for SpeedStep
  13.   Win32Check(QueryPerformanceFrequency(PerfFreq));
  14.   Win32Check(QueryPerformanceCounter(PerfStart));
  15.   PerfEnd := PerfStart + (PerfFreq div 4);
  16.   repeat
  17.     Win32Check(QueryPerformanceCounter(PerfTemp));
  18.   until PerfTemp >= PerfEnd;
  19.  
  20.  

That way, proper freq. value can be established and the rest is also precise enough. How to do similar on Linux is not known to me. With TStopWatch, error is quite high during not quite accurate established basic freq. and is in range of around 500 microsecond off for each measurement. Thus entire measurement can't be called precise...
« Last Edit: March 02, 2026, 07:27:15 pm by backprop »

LeP

  • Full Member
  • ***
  • Posts: 244
Re: Benchmark test in nanoseconds
« Reply #5 on: March 02, 2026, 07:50:49 pm »
This is not an error connected to the latence of QueryPerformanceCounter (not at all, in some % of value surely yes), but the fact the Windows is not a real time OS, so the timings cannot be and will never be precise.

You can set the thread for a CRIT. priority, but before it there are kernel, drivers, and some thousand of other processes.

So, don't set nothing to "crtical", is not necessary. Use TStopWatch (you can use more instance of record to make more measures) insert it in you normal code.

REPORT the values in memory, NEVER ON CONSOLE. Under Windows, the console consume a lot of time to execute something like "writeln" (and it stop your execution).

I tried rdtsc (and sometimes I still used it) but the difference from TStopWatch are really little. And rdtsc are really some contraindications very heavy.

I normally use like 36 TStopWatch to monitor most of things of mine applications, and with or without them there is no difference of timing, no change of CPU loading, nothing (but I use a Intel CPU class I9 always).

Of course, if you want a mediate precise time measuring you can use rdtsc (take 10 measure and do a media), but this change you process execution and the real timing of your application.
« Last Edit: March 02, 2026, 07:52:54 pm by LeP »
Un Sistema per domarli, un IDE per trovarli, un codice per ghermirli e nel framework incatenarli.
An operating system to tame them, an IDE to find them, a code to catch them and in the framework chain them.

backprop

  • Full Member
  • ***
  • Posts: 216
Re: Benchmark test in nanoseconds
« Reply #6 on: March 02, 2026, 08:12:18 pm »
Yes, I could use RDTSC, knowing all drawbacks. But I'm not certain how to do that with FPC/Lazarus. With Delphi 7 I used this:

Code: Pascal  [Select][+][-]
  1. function RDTSC: Int64;
  2. //Returns 64-bit count of CPU clock cycles.
  3. asm
  4.   // Attention! CPUID in this combination may badly affect of application performance.
  5.   // dw $A20F  // opcode for CPUID
  6.   dw $310F  // opcode for RDTSC
  7. end;
  8.  

440bx

  • Hero Member
  • *****
  • Posts: 6375
Re: Benchmark test in nanoseconds
« Reply #7 on: March 02, 2026, 08:39:46 pm »
using RDTSC should yield something along the lines of (untested):
Code: Pascal  [Select][+][-]
  1. {$asmmode intel}
  2.  
  3. function _rdtsc() : qword;
  4. begin
  5.   asm
  6.     rdtsc      { result in edx:eax or rdx:rax }
  7.  
  8.     {$ifdef WIN64}
  9.       shl rdx, 32
  10.       or  rdx, rax
  11.       mov rax, rdx
  12.     {$endif}
  13.   end;
  14. end;
  15.  
As noted the above code is untested but, in the worst case it should only need minor modifications to work as desired.

Link to RDTSC instruction information: https://www.felixcloutier.com/x86/rdtsc

Note the comments about LFENCE and MFENCE.

As has already been pointed out by others in this thread, in a non-RTOS, the measurement accuracy will always be compromised.

Unfortunately, FPC doesn't support inlining of functions/procedures that contain assembler.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

LeP

  • Full Member
  • ***
  • Posts: 244
Re: Benchmark test in nanoseconds
« Reply #8 on: March 02, 2026, 10:21:03 pm »
@backprop, @440bx

You must calibrate before using rdtsc. This means that you must calculate the value of one "tick" ox rdtsc value.
There are many ways, i.e. from Multimedia API or from QueryPerformanceCounter.

I use normally the value compared with the multimedia API (I work with Windows systems at 99%).
I use this code, the same since lot of years (I changed a little bit, 'cause I use it inside an advance record

Code: Pascal  [Select][+][-]
  1. uses Windows, MMSystem;
  2.  
  3. function Tick: uint64; register;
  4. asm
  5.   rdtsc;    (RDX:RAX in 64 bit) (EDX:EAX in WIN32)
  6.   lfence;
  7.   {$IFDEF WIN64}
  8.      shl RDX, 32
  9.      or RAX, RDX       //RAX maintain integer value of 64 bit (in WIN64)
  10.   {$ENDIF}
  11. end;
  12.  
  13. var
  14.     FOver: uint64;
  15.     FDiv: uint64;
  16.  
  17. procedure Calibrate;
  18. var
  19.   mtmr, tmr: uint64;
  20.   FStart: uint64;
  21.   i: integer;
  22. begin
  23.   FOver := 0;
  24.   mtmr := 0;
  25.   for i := 0 to 19 do
  26.     begin
  27.       FStart := Tick;
  28.       tmr := Tick - FStart - FOver;
  29.       //Difference counter used to compensate
  30.       mtmr := mtmr + tmr;
  31.     end;
  32.   //Medium value
  33.   mtmr := mtmr div 20;
  34.   FOver := mtmr;
  35.   mtmr := 0;
  36.   for i := 0 to 4 do
  37.     begin
  38.       timeBeginPeriod(1);
  39.       sleep(50);
  40.       FStart := Tick;
  41.       Sleep(500);
  42.       tmr := Tick - FStart - FOver;
  43.       mtmr := mtmr + tmr;
  44.       timeEndPeriod(1);
  45.     end;
  46.   // FDiv = tick numbers per microseconds
  47.   FDiv := (mtmr div 500000) div 5;
  48. end;
« Last Edit: March 02, 2026, 10:23:46 pm by LeP »
Un Sistema per domarli, un IDE per trovarli, un codice per ghermirli e nel framework incatenarli.
An operating system to tame them, an IDE to find them, a code to catch them and in the framework chain them.

440bx

  • Hero Member
  • *****
  • Posts: 6375
Re: Benchmark test in nanoseconds
« Reply #9 on: March 02, 2026, 11:12:10 pm »
All you need is to find out the processor speed in clock cycles and I definitely wouldn't do it the way you have it.

In today's processors determining the processor speed is not as simple as meets the eye because the maximum core speed can vary quite a bit (from roughly 1Ghz to 5Ghz+.)  Using the "nominal" speed usually yields a good enough approximation.

I'd rather rely on what the O/S determined the processor speed to be which is kept in the registry.

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

LeP

  • Full Member
  • ***
  • Posts: 244
Re: Benchmark test in nanoseconds
« Reply #10 on: March 03, 2026, 12:05:42 am »
For Intel, TSC is immutable and doesn't depend by freqeuncy of the core.
Every processor (I mean every family and type of processor) has a proper duration of the tick that is uniform and don't depend in any case from the speed or from the core that is taken. This is from Pentium 4 era (2006).

For example my Intel I9 series 14 has a TSC clock of 0,5 ns (near), always ever when it go near 6 GHz clock.

AMD act the same from (2007). Others like ARM I don't know.
.
So, the only mode to measure the time is "calibrate" it, otherwise I don't know what one is reporting (but sure not the time).
Un Sistema per domarli, un IDE per trovarli, un codice per ghermirli e nel framework incatenarli.
An operating system to tame them, an IDE to find them, a code to catch them and in the framework chain them.

440bx

  • Hero Member
  • *****
  • Posts: 6375
Re: Benchmark test in nanoseconds
« Reply #11 on: March 03, 2026, 12:31:06 am »
For Intel, TSC is immutable and doesn't depend by freqeuncy of the core.
That's true in current modern processors but not the case in early processors that implemented rdtsc. I don't know when the TSC became immutable but, it's rather unlikely it happened at the same time in the intel and AMD platforms.

Odds are extremely high that the O/S needs to determine the TSC frequency for its own use and, if so, that value is stored somewhere.  I'd find where and use it.



FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

LeP

  • Full Member
  • ***
  • Posts: 244
Re: Benchmark test in nanoseconds
« Reply #12 on: March 03, 2026, 01:27:28 am »
That's true in current modern processors but not the case in early processors that implemented rdtsc. I don't know when the TSC became immutable but, it's rather unlikely it happened at the same time in the intel and AMD platforms.
If you will read my post better, you will find the date. I wrote those in my sources, and of course my be wrong (but untill now it always worked, almost since 15 years from Delphi XE2).
You can examine the CPUID.80000007H:EDX[8] to know if one processor has TSC Invariant (for Intel and AMD).

Odds are extremely high that the O/S needs to determine the TSC frequency for its own use and, if so, that value is stored somewhere.  I'd find where and use it.
If you find it, it was a pleasure to know since it will have the right "value".
Un Sistema per domarli, un IDE per trovarli, un codice per ghermirli e nel framework incatenarli.
An operating system to tame them, an IDE to find them, a code to catch them and in the framework chain them.

440bx

  • Hero Member
  • *****
  • Posts: 6375
Re: Benchmark test in nanoseconds
« Reply #13 on: March 03, 2026, 01:39:41 am »
You can examine the CPUID.80000007H:EDX[8] to know if one processor has TSC Invariant (for Intel and AMD).
That's correct and, it is one of the reasons (among many) that I doubt it was invariant from the get go.

If you find it, it was a pleasure to know since it will have the right "value".
I routinely read disassembled Windows code, maybe I'll run into it sometime.  Actually, I believe I've already run into it in the past but, since I'm not sure at this time, I am not claiming anything.

ETA:

For those who are interested in using RDSTC to measure execution time, the following paper from intel can be useful:
https://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf

HTH.
« Last Edit: March 03, 2026, 02:18:48 am by 440bx »
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 18975
  • Glad to be alive.
Re: Benchmark test in nanoseconds
« Reply #14 on: March 03, 2026, 09:44:27 am »
Linux, probably unix:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}
  2.  
  3. uses
  4.   baseunix, linux;
  5.  
  6. function NanoTime: qword;
  7. var
  8.   ts: timespec;
  9. begin
  10.  // you can check for availability with an assert
  11.   clock_gettime(CLOCK_MONOTONIC, @ts);
  12.   Result := ts.tv_sec * 1000000000 + ts.tv_nsec;
  13. end;
  14.  
  15. var
  16.   t0, t1: qword;
  17.   i: Integer;
  18. begin
  19.   t0 := NanoTime;
  20.   for i := 1 to 1000000 do { workload };
  21.   t1 := NanoTime;
  22.   WriteLn('Elapsed ns: ', t1 - t0);
  23. end.
« Last Edit: March 03, 2026, 09:46:45 am by Thaddy »
Recovered from removal of tumor in tongue following tongue reconstruction with a part from my leg.

 

TinyPortal © 2005-2018