Recent

Author Topic: To optimize RTL, make function PCharToInt  (Read 37127 times)

BeniBela

  • Hero Member
  • *****
  • Posts: 905
    • homepage
Re: To optimize RTL, make function PCharToInt
« Reply #60 on: July 31, 2021, 01:01:13 pm »

The Discussion goes: How many more BITS beyond bit 54 are needed, before you can decide if 54 is zero or one.


And you need arbitrary many


And how many bits you can put in a string255 with digits?????



Around 700

But that is not enough



engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: To optimize RTL, make function PCharToInt
« Reply #61 on: July 31, 2021, 06:28:48 pm »
For the "young blood":
A  5 1/4 inch  floopy disk had (in the beginning) a capacity of 360 kB and an 8 inch floppy  1.1 MB

Glad that we dont have to fight this battle anymore.

Me too. I still remember how I felt when someone got close to my set of floppy disks with a magnet. My first copy of DOS was in German, don't remember its version.



Carbon dating?
Because we use an ancient "dead" tongue?  :P

Not dead yet, but definitely dying.



I remember the dread when hearing the disk reader head move back to reread a sector. An ominous sign that the disk may be failing.

How delighted I was when inspecting a non-working disk reveals a hair or some fingerprint mark. Gentle cleaning used to bring its data back.

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: To optimize RTL, make function PCharToInt
« Reply #62 on: July 31, 2021, 07:01:30 pm »
My first copy of DOS was in German, don't remember its version.


Time for the very old spanish joke:

DOS dos dos

Winni

jamie

  • Hero Member
  • *****
  • Posts: 6077
Re: To optimize RTL, make function PCharToInt
« Reply #63 on: July 31, 2021, 07:21:40 pm »

No, we don't. This is also about maintainability. And adding functions for everything and the kitchen sink is not maintainable. Also the main types for strings in Object Pascal are the various String types and not arrays of Char.

Thanx to PascalDragon!

And for those who are eager to optimize the whole day and night:

Pascal is not the right choice.
Use Forth.
There you can optimize everything.
On speed. On size. On everything.

Winni

I was part of a project using Forth years ago for a video inspection system to reject bottle items from the packing line if the labels were not on the bottle or misplaced. The two, bottle and label were very close in colors.. Kind of hard to do when all you have is a B&W camera, but a Hi-Res one at that.
 
 On top of that, the CPU box which had special camera interface cards was composed completely of a Z80 cpu set.

 So every CPU cycle counted to obtain the fasted decoded image to be good or bad and then run some IO to operate a machine that would advance the conveyer line and push the defected bottle off the line, too.
The only true wisdom is knowing you know nothing

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: To optimize RTL, make function PCharToInt
« Reply #64 on: July 31, 2021, 07:27:20 pm »
For the "young blood":
A  5 1/4 inch  floopy disk had (in the beginning) a capacity of 360 kB and an 8 inch floppy  1.1 MB
You're too young; I do remember when floppies could be anywhere from 80 to 120 Kb ... both soft- and hard-sectored, 8" and the then "latest" 5,25" ... not to mention the 3" ones. And ultra-expensive hard-disks with up to .... 5 MiB ;D

Ah! Those were the days ... 8)
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: To optimize RTL, make function PCharToInt
« Reply #65 on: July 31, 2021, 08:04:01 pm »
You're too young; I do remember when floppies could be anywhere from 80 to 120 Kb ... both soft- and hard-sectored, 8" and the then "latest" 5,25" ... not to mention the 3" ones. And ultra-expensive hard-disks with up to .... 5 MiB ;D

Ah! Those were the days ... 8)

No - I am not too young: I started with an Apple ][ where everything was different. As always with Apple.
I just came late to the IBM PC. Where the single sided and the double sided floppies worked.

We found out that it was a great cheating with the floppies:
You bought the cheaper single sided and had only to tape the hole opposite to the write protection hole with black tape.
Zack - you had a double sided disk.

My first hard disk was horrible expensive but 49 times faster than the floppy.
Measured again and again.

And in the night  we travelled with an  acoustic coupler  and 300 Baud through the world.
From 3 to 6 in the night. Only in that time german telephone was cheap.

Winni

PascalDragon

  • Hero Member
  • *****
  • Posts: 5444
  • Compiler Developer
Re: To optimize RTL, make function PCharToInt
« Reply #66 on: August 01, 2021, 02:16:05 pm »
Everyone, please, this topic is about RTL optimizations, not computer history!

No. Those would need to be allocated on the heap (especially the return value) and then you can just as well use a dynamic array.

But the point is to avoid heap allocations and return a reference to the already existing array

But you can't. What if the input array is marked as const? What if the function does not take an array as input at all? The caller can't know how much space to reserve and the function's stack will be removed once it returns, so the only place to return an array is the heap. And as I said: then you can just as well use a dynamic array.
 
The Trim function would be the best example for that

It's not, cause I might want to use the input for something else. And the input of Trim isn't marked as const for no reason.

And open arrays of char is how you make thing maintainable.

No, it's not. Also AnsiString involves code page conversions, while ShortString or UnicodeString does not.

For example, there is CompareText

The implementations of CompareText and ShortCompareText could indeed be unified by using stub functions and then calling a common implementation, but can not be generalized for all kinds of string functions, especially not those that modify strings or generate them. (And the Java implementation is kept separate on purpose to avoid even more ifdefs in the core includes).

The Val function is especially horrible. 15 Val functions on 64-bit and a lot more:

The Val intrinsic needs to be able to deal with all these combinations. If it ain't broken, don't fix it.

And what would such a Val do?

It should do what Val always does and convert the string to a number type

That was not clear from your description.

BeniBela

  • Hero Member
  • *****
  • Posts: 905
    • homepage
Re: To optimize RTL, make function PCharToInt
« Reply #67 on: August 12, 2021, 01:57:22 pm »
But you can't. What if the input array is marked as const? What if the function does not take an array as input at all? The caller can't know how much space to reserve and the function's stack will be removed once it returns, so the only place to return an array is the heap. And as I said: then you can just as well use a dynamic array.

But I do not want to create an array

I want a (startPointer, length) pair in registers, which I can use like an array

It's not, cause I might want to use the input for something else. And the input of Trim isn't marked as const for no reason.

That needs some kind of burrow checker, so the input is not used while the trimmed result is used.

No, it's not. Also AnsiString involves code page conversions, while ShortString or UnicodeString does not.

Then you have a fast pchar function that does not do code page conversions, and a slow path function that makes the conversion and then calls the pchar variant


The Val intrinsic needs to be able to deal with all these combinations. If it ain't broken, don't fix it.

The performance is broken

I have now benchmarked it

Compared with the simplest possible pchar:

Code: Pascal  [Select][+][-]
  1. function simplestTryPcharToInt(pstart, pend: pchar; out unsigned: UInt64): boolean;
  2. var
  3.   length: SizeUInt;
  4. begin
  5.   result := false;
  6.   if pend <= pstart then exit;
  7.   while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  8.   length := pend - pstart;
  9.   if length > 20 then exit;
  10.   if (length = 20) and (pstart^ >= '2') then exit;
  11.   unsigned := 0;
  12.   while (pstart < pend) do begin
  13.     case pstart^ of
  14.     '0'..'9': unsigned := unsigned * 10 + (ord(pstart^) - ord('0'));
  15.     else exit;
  16.     end;
  17.     inc(pstart);
  18.   end;
  19.   if (length = 20) and (unsigned < 10000000000000000000) then exit;
  20.   result := true;
  21. end;

on ('123', '456', '9223372036854775807', '9223372036854775808', '18446744073709551615') 10000000 times.


Code: [Select]
val (copy ansistring)     run-time: 6379 ms
val (ansistring)          run-time: 5098 ms
val (shortstring)         run-time: 4981 ms
simplestPchar             run-time: 1327 ms
simplestPchar   (-O4)     run-time:  914 ms
simplestPchar (-O4 -Cort) run-time: 2042 ms

Val is already so bad, the copy hardly matters. Val should be removed


But that was a simple pchar. You can convert an entire block of digits in one go with some bit twiddling:

Code: Pascal  [Select][+][-]
  1. function PcharToInt(pstart, pend: pchar; out unsignedResult: UInt64): boolean;
  2. const
  3.   selectFirstHalfByte8 = UInt64($F0F0F0F0F0F0F0F0);
  4.   decimalZeros8        = UInt64($3030303030303030);
  5.   overflowMaxDigit8    = UInt64($0606060606060606);
  6.   selectFirstHalfByte4 = UInt32($F0F0F0F0);
  7.   decimalZeros4        = UInt32($30303030);
  8.   overflowMaxDigit4    = UInt32($06060606);
  9. var
  10.   length: SizeUInt;
  11.   temp8: UInt64;
  12.   temp4: UInt32;
  13.   bytes: pbyte;
  14.   unsigned: UInt64;
  15. begin
  16.   result := false;
  17.   if pend <= pstart then exit;
  18.   while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  19.   length := pend - pstart;
  20.   if length > 20 then exit;
  21.   if (length = 20) and (pstart^ >= '2') then exit;
  22.   unsigned := 0;
  23.   if PtrUInt(TObject(pstart)) and 7 = 0 then begin
  24.     while pstart + 8 < pend do begin
  25.       temp8 := PUInt64(pstart)^;
  26.       if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;
  27.       temp8 := temp8 - decimalZeros8;
  28.       if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;
  29.       bytes := @temp8;
  30.       unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];
  31.       inc(pstart, 8);
  32.     end;
  33.     while pstart + 4 < pend do begin
  34.       temp4 := PUInt32(pstart)^;
  35.       if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;
  36.       temp4 := temp4 - decimalZeros4;
  37.       if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;
  38.       bytes := @temp4;
  39.       unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);
  40.       inc(pstart, 4);
  41.     end;
  42.   end;
  43.   while (pstart < pend) do begin
  44.     case pstart^ of
  45.     '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));
  46.     else exit;
  47.     end;
  48.     inc(pstart);
  49.   end;
  50.   if (length = 20) and (unsigned < 10000000000000000000) then exit;
  51.   result := true;
  52.   unsignedResult:=unsigned;
  53. end;
  54.  
  55.  

(btw how do you do an unaligned read?)

That gives

Code: [Select]
run-time: 679 ms

That is almost 10 times faster than copying Val!!

MathMan

  • Sr. Member
  • ****
  • Posts: 325
Re: To optimize RTL, make function PCharToInt
« Reply #68 on: August 12, 2021, 03:09:42 pm »
Quote
(btw how do you do an unaligned read?)

Usually via memcpy, as one can not guarantee misaligned fetches are supported by the destination CPU platform.

[Edit]
Thinking a bit more about this. C compilers usually inline the above stated memcpy approach, but I don't know how FPC is handling this. So from a performance perspective it might not be an optimal choice - but it is a general solution.

An alternative would be to do two aligned QWORD reads and then re-combine. But depending on OS, CPU and memory management one might generate a SIGSEGV with one of the two reads.
[/Edit]

BTW - the conversion of 8 digits can be done by 3 muls, 3 adds and some shifts+ands - if you are really speed crazy :-)

Cheers,
MathMan
« Last Edit: August 12, 2021, 04:21:49 pm by MathMan »

BeniBela

  • Hero Member
  • *****
  • Posts: 905
    • homepage
Re: To optimize RTL, make function PCharToInt
« Reply #69 on: August 12, 2021, 06:16:05 pm »
Thinking a bit more about this. C compilers usually inline the above stated memcpy approach, but I don't know how FPC is handling this. So from a performance perspective it might not be an optimal choice - but it is a general solution.

FPC does not inline Move. That is the problem

There should be something like ignore it on x86 and do the unaligned move, and do a bytewise copy on other platforms


BTW - the conversion of 8 digits can be done by 3 muls, 3 adds and some shifts+ands - if you are really speed crazy :-)

How?

Something like you multiply by 10, 100,  and 10000, and rearrange the digits that they participate at the right number of multiplications?

I am too lazy to figure that out right now

MathMan

  • Sr. Member
  • ****
  • Posts: 325
Re: To optimize RTL, make function PCharToInt
« Reply #70 on: August 13, 2021, 11:26:54 am »
BTW - the conversion of 8 digits can be done by 3 muls, 3 adds and some shifts+ands - if you are really speed crazy :-)

How?

Something like you multiply by 10, 100,  and 10000, and rearrange the digits that they participate at the right number of multiplications?

Yes, something like that  :) It should look like the following (untested, direct code)

Code: [Select]
var
  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entry

  Dec( Lower, $3030303030303030 ); // convert to numbers 0-9

  Higher := ( Lower * 10 ) shr 8; // no overflow possible
  Lower := ( Lower and $00ff00ff00ff00ff ) + ( Higher and $00ff00ff00ff00ff );
 
  Higher := ( Lower * 100 ) shr 16;
  Lower := ( Lower and $0000ffff0000ffff ) + ( Higher and $0000ffff0000ffff );
 
  Higher := ( Lower * 10000 ) shr 32;
  Lower := ( Lower and $00000000ffffffff ) + Higher;

Point is - if you read in the ASCII digits from memory, then you have to differentiate between little & big endian CPU in the above!

However - if you really want to dive into implementing a faster 'Val' then maybe you want to take a look at this - https://lemire.me/blog/2021/01/29/number-parsing-at-a-gigabyte-per-second/

Cheers,
MathMan
« Last Edit: August 13, 2021, 01:52:03 pm by MathMan »

BeniBela

  • Hero Member
  • *****
  • Posts: 905
    • homepage
Re: To optimize RTL, make function PCharToInt
« Reply #71 on: August 14, 2021, 01:46:41 pm »

Code: [Select]
var
  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entry

  Dec( Lower, $3030303030303030 ); // convert to numbers 0-9

  Higher := ( Lower * 10 ) shr 8; // no overflow possible
  Lower := ( Lower and $00ff00ff00ff00ff ) + ( Higher and $00ff00ff00ff00ff );
 
  Higher := ( Lower * 100 ) shr 16;
  Lower := ( Lower and $0000ffff0000ffff ) + ( Higher and $0000ffff0000ffff );
 
  Higher := ( Lower * 10000 ) shr 32;
  Lower := ( Lower and $00000000ffffffff ) + Higher;

I  put it in, but it did not help.

Code: [Select]
old run-time: 606 ms
with 3 mul (wrong): run-time: 624 ms
with 3 mul endian corrected: run-time: 667 ms



Code: [Select]
program Project1;

{$mode objfpc}{$H+}

uses
  {$IFDEF UNIX}{$IFDEF UseCThreads}
  cthreads,
  {$ENDIF}{$ENDIF}
  Classes, bbutils, bbutilsbeta,bbdebugtools
  { you can add units after this };

function strTryParseInt(pstart, pend: pchar; out unsigned: UInt64): boolean;
var
  length: SizeUInt;
begin
  result := false;
  if pend <= pstart then exit;
  while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  length := pend - pstart;
  if length > 20 then exit;
  unsigned := 0;
  while (pstart < pend) do begin
    case pstart^ of
    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));
    else exit;
    end;
    inc(pstart);
  end;
  if (length = 20) and (unsigned < 10000000000000000000) then exit;
  result := true;
end;
{$AsmMode intel}
function PcharToInt(pstart, pend: pchar; out unsignedResult: UInt64): boolean;
const
  selectFirstHalfByte8 = UInt64($F0F0F0F0F0F0F0F0);
  decimalZeros8        = UInt64($3030303030303030);
  overflowMaxDigit8    = UInt64($0606060606060606);
  selectFirstHalfByte4 = UInt32($F0F0F0F0);
  decimalZeros4        = UInt32($30303030);
  overflowMaxDigit4    = UInt32($06060606);
var
  length: SizeUInt;
  temp8: UInt64;
  temp4: UInt32;
  bytes: pbyte;
  unsigned: UInt64;
begin
  result := false;
  if pend <= pstart then exit;
  while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  length := pend - pstart;
  if length > 20 then exit;
  if (length = 20) and (pstart^ >= '2') then exit;
  unsigned := 0;
  if PtrUInt(TObject(pstart)) and 7 = 0 then begin
    while pstart + 8 <= pend do begin
      temp8 := PUInt64(pstart)^;
      if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;
      temp8 := temp8 - decimalZeros8;
      if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;
      bytes := @temp8;
      unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];
      inc(pstart, 8);
    end;
    while pstart + 4 <= pend do begin
      temp4 := PUInt32(pstart)^;
      if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;
      temp4 := temp4 - decimalZeros4;
      if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;
      bytes := @temp4;
      unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);
      inc(pstart, 4);
    end;
  end;
  while (pstart < pend) do begin
    case pstart^ of
    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));
    else exit;
    end;
    inc(pstart);
  end;
  if (length = 20) and (unsigned < 10000000000000000000) then exit;
  result := true;
  unsignedResult:=unsigned;
end;



function PcharToIntMM(pstart, pend: pchar; out unsignedResult: UInt64): boolean;
const
  selectFirstHalfByte8 = UInt64($F0F0F0F0F0F0F0F0);
  decimalZeros8        = UInt64($3030303030303030);
  overflowMaxDigit8    = UInt64($0606060606060606);
  selectFirstHalfByte4 = UInt32($F0F0F0F0);
  decimalZeros4        = UInt32($30303030);
  overflowMaxDigit4    = UInt32($06060606);
var
  length: SizeUInt;
  temp8: UInt64;
  temp4: UInt32;
  bytes: pbyte;
  unsigned: UInt64;
var
  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entry
begin
  result := false;
  if pend <= pstart then exit;
  while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  length := pend - pstart;
  if length > 20 then exit;
  if (length = 20) and (pstart^ >= '2') then exit;
  unsigned := 0;
  if PtrUInt(TObject(pstart)) and 7 = 0 then begin
    while pstart + 8 <= pend do begin
      temp8 := PUInt64(pstart)^;
      if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;
      temp8 := temp8 - decimalZeros8;
      if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;
//      bytes := @temp8;
//      unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];
       unsigned := unsigned * 100000000;



      lower := SwapEndian(temp8);

      Higher := ( Lower * 10 ) shr 8; // no overflow possible
      Lower := ( Lower and $00ff00ff00ff00ff ) + ( Higher and $00ff00ff00ff00ff );

      Higher := ( Lower * 100 ) shr 16;
      Lower := ( Lower and $0000ffff0000ffff ) + ( Higher and $0000ffff0000ffff );

      Higher := ( Lower * 10000 ) shr 32;
      unsigned += ( Lower and $00000000ffffffff ) + Higher;
      inc(pstart, 8);
    end;
    while pstart + 4 <= pend do begin
      temp4 := PUInt32(pstart)^;
      if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;
      temp4 := temp4 - decimalZeros4;
      if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;
      bytes := @temp4;
      unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);
      inc(pstart, 4);
    end;
  end;
  while (pstart < pend) do begin
    case pstart^ of
    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));
    else exit;
    end;
    inc(pstart);
  end;
  if (length = 20) and (unsigned < 10000000000000000000) then exit;
  result := true;
  unsignedResult:=unsigned;
end;


function PcharToIntMMWrong(pstart, pend: pchar; out unsignedResult: UInt64): boolean;
const
  selectFirstHalfByte8 = UInt64($F0F0F0F0F0F0F0F0);
  decimalZeros8        = UInt64($3030303030303030);
  overflowMaxDigit8    = UInt64($0606060606060606);
  selectFirstHalfByte4 = UInt32($F0F0F0F0);
  decimalZeros4        = UInt32($30303030);
  overflowMaxDigit4    = UInt32($06060606);
var
  length: SizeUInt;
  temp8: UInt64;
  temp4: UInt32;
  bytes: pbyte;
  unsigned: UInt64;
var
  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entry
begin
  result := false;
  if pend <= pstart then exit;
  while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  length := pend - pstart;
  if length > 20 then exit;
  if (length = 20) and (pstart^ >= '2') then exit;
  unsigned := 0;
  if PtrUInt(TObject(pstart)) and 7 = 0 then begin
    while pstart + 8 <= pend do begin
      temp8 := PUInt64(pstart)^;
      if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;
      temp8 := temp8 - decimalZeros8;
      if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;
//      bytes := @temp8;
//      unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];
       unsigned := unsigned * 100000000;

       lower := temp8;

      Higher := ( Lower * 10 ) shr 8; // no overflow possible
      Lower := ( Lower and $00ff00ff00ff00ff ) + ( Higher and $00ff00ff00ff00ff );

      Higher := ( Lower * 100 ) shr 16;
      Lower := ( Lower and $0000ffff0000ffff ) + ( Higher and $0000ffff0000ffff );

      Higher := ( Lower * 10000 ) shr 32;
      unsigned += ( Lower and $00000000ffffffff ) + Higher;
      inc(pstart, 8);
    end;
    while pstart + 4 <= pend do begin
      temp4 := PUInt32(pstart)^;
      if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;
      temp4 := temp4 - decimalZeros4;
      if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;
      bytes := @temp4;
      unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);
      inc(pstart, 4);
    end;
  end;
  while (pstart < pend) do begin
    case pstart^ of
    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));
    else exit;
    end;
    inc(pstart);
  end;
  if (length = 20) and (unsigned < 10000000000000000000) then exit;
  result := true;
  unsignedResult:=unsigned;
end;

var s: array of string = ('123', '456', '9223372036854775807', '9223372036854775808', '18446744073709551615');//, '18446744073709551616');
  ss: array of shortstring = ('123', '456', '9223372036854775807', '9223372036854775808', '18446744073709551615');

var i, j, code: integer;
    v: uint64;
    t: String;
begin
  i := 2;// high(s);
  t := '12345678';
  writeln(strTryParseInt(pchar(t), pchar(t) + length(t), v));
  writeln(v);
  writeln(PcharToIntMM(pchar(t), pchar(t) + length(t), v));
  writeln(v);
  //exit;
{  startTiming('copy');
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      val(copy(s[i],1,length(s[i])), v, code)
  end;
  stopTiming('copy');
  startTiming('val');
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      val(s[i], v, code)
  end;
  stopTiming('val');
  startTiming('valss');
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      val(ss[i], v, code)
  end;
  stopTiming('valss');   }
  startTiming('new');
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      PcharToInt(pchar(s[i]), pchar(s[i]) + length(s[i]), v)
  end;
  stopTiming('new');
  startTiming('new');
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      PcharToIntMMWrong(pchar(s[i]), pchar(s[i]) + length(s[i]), v)
  end;
  stopTiming('new');
  startTiming();
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      PcharToIntMM(pchar(s[i]), pchar(s[i]) + length(s[i]), v)
  end;
  stopTiming();
end.


Although I had a bug that it only used the 8-digit fast path when there where at least 9 digits. The condition in the loop should have been while pstart + 8 <= pend do begin rather than while pstart + 8 < pend do begin. Changing that made it somewhat faster. Or sometimes slower. It is very fiddly to benchmark. The branch predicator appears to be very random


Point is - if you read in the ASCII digits from memory, then you have to differentiate between little & big endian CPU in the above!

And the code got exactly the wrong endianess. Everything comes out in reverse.

I had to put in a SwapEndian but that call slow  it down too much. FPC not inlining such functions is another big RTL problem.

Inline assembler bswap is even worse because it apparently prevents optimizations.


However - if you really want to dive into implementing a faster 'Val' then maybe you want to take a look at this - https://lemire.me/blog/2021/01/29/number-parsing-at-a-gigabyte-per-second/

That is for float. Integer should always be faster than float.

I have already convinced Bero to implement that.  But I am not sure he got everything right.




A bigger issue is how to name this function. Could be PcharToUInt, PcharToUInt64, TryPcharToUInt, TryPcharToUInt64, or PcharToUIntTry, or PcharToUInt64Try. Try at the beginning is more common, but Try at the end is easier to find in the Lazarus completion list. But there is not other PcharTo* function in the RTL.
Could be TextToUInt, TextToUInt64, TryTextToUInt, TryTextToUInt64, TextToUIntTry, or TextToUInt64Try, because there is sysutils.TextToFloat(). But that is an odd one, a Try function without Try in its name and the only one of its kind.
Could be StrToUInt, StrToUInt64, TryStrToUInt, TryStrToUInt64, or StrToUIntTry, TryStrToUInt64Try, because the overload resolving can find it from its type, so the type does not need to be in the name. Nor the 64.
But the classical one also handles other bases. Perhaps DecimalStrToUInt, DecimalStrToUInt64, TryDecimalStrToUInt, TryDecimalStrToUInt64,  DecimalStrToUIntTry, DecimalStrToUInt64Try.
StrDecimalToUInt, StrDecimalToUInt64, TryStrDecimalToUInt, TryStrDecimalToUInt64, or StrDecimalToUIntTry, StrDecimalToUInt64Try
Could also drop the string type, TryParse, or TryParseDecimal or TryParseDecimalNumber



AlexTP

  • Hero Member
  • *****
  • Posts: 2365
    • UVviewsoft
Re: To optimize RTL, make function PCharToInt
« Reply #72 on: August 14, 2021, 01:53:52 pm »
Quote
>PcharToUInt, PcharToUInt64, TryPcharToUInt, TryPcharToUInt64, or PcharToUIntTry, or PcharToUInt64Try
I think better name them with 'Buffer...' instead of 'PChar' (PChar sounds not nice). For Widestring buffer, name them with 'BufferW...'.

'Decimal' word is nice too.
« Last Edit: August 14, 2021, 01:56:55 pm by Alextp »

MathMan

  • Sr. Member
  • ****
  • Posts: 325
Re: To optimize RTL, make function PCharToInt
« Reply #73 on: August 14, 2021, 02:24:20 pm »

[snip]

Although I had a bug that it only used the 8-digit fast path when there where at least 9 digits. The condition in the loop should have been while pstart + 8 <= pend do begin rather than while pstart + 8 < pend do begin. Changing that made it somewhat faster. Or sometimes slower. It is very fiddly to benchmark. The branch predicator appears to be very random

Point is - if you read in the ASCII digits from memory, then you have to differentiate between little & big endian CPU in the above!

And the code got exactly the wrong endianess. Everything comes out in reverse.

I had to put in a SwapEndian but that call slow  it down too much. FPC not inlining such functions is another big RTL problem.

Inline assembler bswap is even worse because it apparently prevents optimizations.


However - if you really want to dive into implementing a faster 'Val' then maybe you want to take a look at this - https://lemire.me/blog/2021/01/29/number-parsing-at-a-gigabyte-per-second/

That is for float. Integer should always be faster than float.

I have already convinced Bero to implement that.  But I am not sure he got everything right.

Thanks for trying - you gave me some food for thought already. Things I can already state are

* you don't need the SwapEndian - better turn around the muls and shifts
* yes, the paper from Lemire is about float to binary, but if you read careful then Uint to binary is a sub-component of that - iirc
* I also think that the benchmark can be improved - for my taste it has to little variance in sizes of the input strings

Give me a couple of days to play with some ideas.

I'll leave the naming thing unanswered - I am absurdly bad at naming.

Cheers,
MathMan

440bx

  • Hero Member
  • *****
  • Posts: 3921
Re: To optimize RTL, make function PCharToInt
« Reply #74 on: August 14, 2021, 10:19:36 pm »
I am absurdly bad at naming.
In that case, I hope your wife named the kids. ;)
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

 

TinyPortal © 2005-2018