### Bookstore

 Computer Math and Games in Pascal (preview) Lazarus Handbook

### Author Topic: To optimize RTL, make function PCharToInt  (Read 32127 times)

#### ccrause

• Sr. Member
• Posts: 435
##### Re: To optimize RTL, make function PCharToInt
« Reply #60 on: July 31, 2021, 10:45:22 am »
For the "young blood":
A  5 1/4 inch  floopy disk had (in the beginning) a capacity of 360 kB and an 8 inch floppy  1.1 MB

Glad that we dont have to fight this battle anymore.

Winni
I remember the dread when hearing the disk reader head move back to reread a sector. An ominous sign that the disk may be failing.

#### winni

• Hero Member
• Posts: 2671
##### Re: To optimize RTL, make function PCharToInt
« Reply #61 on: July 31, 2021, 10:55:21 am »
[
I remember the dread when hearing the disk reader head move back to reread a sector. An ominous sign that the disk may be failing.
Hi!

Remedy:

Code: Pascal  [Select][+][-]
1. {\$i-}
2. type  Tblock = array[0..511] of byte;
3. f: file of TBlock;
4. block: TBlock;
5. ...
6.
7. repeat
8. seek (f, neededBlock);
10. until ioresult = 0;
11.

Leave the PC and cook coffee ....

Winni

#### BeniBela

• Hero Member
• Posts: 817
##### Re: To optimize RTL, make function PCharToInt
« Reply #62 on: July 31, 2021, 01:01:13 pm »

The Discussion goes: How many more BITS beyond bit 54 are needed, before you can decide if 54 is zero or one.

And you need arbitrary many

And how many bits you can put in a string255 with digits?????

Around 700

But that is not enough

#### engkin

• Hero Member
• Posts: 2934
##### Re: To optimize RTL, make function PCharToInt
« Reply #63 on: July 31, 2021, 06:28:48 pm »
For the "young blood":
A  5 1/4 inch  floopy disk had (in the beginning) a capacity of 360 kB and an 8 inch floppy  1.1 MB

Glad that we dont have to fight this battle anymore.

Me too. I still remember how I felt when someone got close to my set of floppy disks with a magnet. My first copy of DOS was in German, don't remember its version.

Carbon dating?
Because we use an ancient "dead" tongue?

Not dead yet, but definitely dying.

I remember the dread when hearing the disk reader head move back to reread a sector. An ominous sign that the disk may be failing.

How delighted I was when inspecting a non-working disk reveals a hair or some fingerprint mark. Gentle cleaning used to bring its data back.

#### winni

• Hero Member
• Posts: 2671
##### Re: To optimize RTL, make function PCharToInt
« Reply #64 on: July 31, 2021, 07:01:30 pm »
My first copy of DOS was in German, don't remember its version.

Time for the very old spanish joke:

DOS dos dos

Winni

#### jamie

• Hero Member
• Posts: 4849
##### Re: To optimize RTL, make function PCharToInt
« Reply #65 on: July 31, 2021, 07:21:40 pm »

No, we don't. This is also about maintainability. And adding functions for everything and the kitchen sink is not maintainable. Also the main types for strings in Object Pascal are the various String types and not arrays of Char.

Thanx to PascalDragon!

And for those who are eager to optimize the whole day and night:

Pascal is not the right choice.
Use Forth.
There you can optimize everything.
On speed. On size. On everything.

Winni

I was part of a project using Forth years ago for a video inspection system to reject bottle items from the packing line if the labels were not on the bottle or misplaced. The two, bottle and label were very close in colors.. Kind of hard to do when all you have is a B&W camera, but a Hi-Res one at that.

On top of that, the CPU box which had special camera interface cards was composed completely of a Z80 cpu set.

So every CPU cycle counted to obtain the fasted decoded image to be good or bad and then run some IO to operate a machine that would advance the conveyer line and push the defected bottle off the line, too.
The only true wisdom is knowing you know nothing

#### lucamar

• Hero Member
• Posts: 4219
##### Re: To optimize RTL, make function PCharToInt
« Reply #66 on: July 31, 2021, 07:27:20 pm »
For the "young blood":
A  5 1/4 inch  floopy disk had (in the beginning) a capacity of 360 kB and an 8 inch floppy  1.1 MB
You're too young; I do remember when floppies could be anywhere from 80 to 120 Kb ... both soft- and hard-sectored, 8" and the then "latest" 5,25" ... not to mention the 3" ones. And ultra-expensive hard-disks with up to .... 5 MiB

Ah! Those were the days ...
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!)
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

#### winni

• Hero Member
• Posts: 2671
##### Re: To optimize RTL, make function PCharToInt
« Reply #67 on: July 31, 2021, 08:04:01 pm »
You're too young; I do remember when floppies could be anywhere from 80 to 120 Kb ... both soft- and hard-sectored, 8" and the then "latest" 5,25" ... not to mention the 3" ones. And ultra-expensive hard-disks with up to .... 5 MiB

Ah! Those were the days ...

No - I am not too young: I started with an Apple ][ where everything was different. As always with Apple.
I just came late to the IBM PC. Where the single sided and the double sided floppies worked.

We found out that it was a great cheating with the floppies:
You bought the cheaper single sided and had only to tape the hole opposite to the write protection hole with black tape.
Zack - you had a double sided disk.

My first hard disk was horrible expensive but 49 times faster than the floppy.
Measured again and again.

And in the night  we travelled with an  acoustic coupler  and 300 Baud through the world.
From 3 to 6 in the night. Only in that time german telephone was cheap.

Winni

#### PascalDragon

• Hero Member
• Posts: 3339
• Compiler Developer
##### Re: To optimize RTL, make function PCharToInt
« Reply #68 on: August 01, 2021, 02:16:05 pm »

No. Those would need to be allocated on the heap (especially the return value) and then you can just as well use a dynamic array.

But the point is to avoid heap allocations and return a reference to the already existing array

But you can't. What if the input array is marked as const? What if the function does not take an array as input at all? The caller can't know how much space to reserve and the function's stack will be removed once it returns, so the only place to return an array is the heap. And as I said: then you can just as well use a dynamic array.

The Trim function would be the best example for that

It's not, cause I might want to use the input for something else. And the input of Trim isn't marked as const for no reason.

And open arrays of char is how you make thing maintainable.

No, it's not. Also AnsiString involves code page conversions, while ShortString or UnicodeString does not.

For example, there is CompareText

The implementations of CompareText and ShortCompareText could indeed be unified by using stub functions and then calling a common implementation, but can not be generalized for all kinds of string functions, especially not those that modify strings or generate them. (And the Java implementation is kept separate on purpose to avoid even more ifdefs in the core includes).

The Val function is especially horrible. 15 Val functions on 64-bit and a lot more:

The Val intrinsic needs to be able to deal with all these combinations. If it ain't broken, don't fix it.

And what would such a Val do?

It should do what Val always does and convert the string to a number type

That was not clear from your description.

#### BeniBela

• Hero Member
• Posts: 817
##### Re: To optimize RTL, make function PCharToInt
« Reply #69 on: August 12, 2021, 01:57:22 pm »
But you can't. What if the input array is marked as const? What if the function does not take an array as input at all? The caller can't know how much space to reserve and the function's stack will be removed once it returns, so the only place to return an array is the heap. And as I said: then you can just as well use a dynamic array.

But I do not want to create an array

I want a (startPointer, length) pair in registers, which I can use like an array

It's not, cause I might want to use the input for something else. And the input of Trim isn't marked as const for no reason.

That needs some kind of burrow checker, so the input is not used while the trimmed result is used.

No, it's not. Also AnsiString involves code page conversions, while ShortString or UnicodeString does not.

Then you have a fast pchar function that does not do code page conversions, and a slow path function that makes the conversion and then calls the pchar variant

The Val intrinsic needs to be able to deal with all these combinations. If it ain't broken, don't fix it.

The performance is broken

I have now benchmarked it

Compared with the simplest possible pchar:

Code: Pascal  [Select][+][-]
1. function simplestTryPcharToInt(pstart, pend: pchar; out unsigned: UInt64): boolean;
2. var
3.   length: SizeUInt;
4. begin
5.   result := false;
6.   if pend <= pstart then exit;
7.   while (pstart < pend) and (pstart^ = '0') do inc(pstart);
8.   length := pend - pstart;
9.   if length > 20 then exit;
10.   if (length = 20) and (pstart^ >= '2') then exit;
11.   unsigned := 0;
12.   while (pstart < pend) do begin
13.     case pstart^ of
14.     '0'..'9': unsigned := unsigned * 10 + (ord(pstart^) - ord('0'));
15.     else exit;
16.     end;
17.     inc(pstart);
18.   end;
19.   if (length = 20) and (unsigned < 10000000000000000000) then exit;
20.   result := true;
21. end;

on ('123', '456', '9223372036854775807', '9223372036854775808', '18446744073709551615') 10000000 times.

Code: [Select]
`val (copy ansistring)     run-time: 6379 msval (ansistring)          run-time: 5098 msval (shortstring)         run-time: 4981 mssimplestPchar             run-time: 1327 mssimplestPchar   (-O4)     run-time:  914 mssimplestPchar (-O4 -Cort) run-time: 2042 ms`
Val is already so bad, the copy hardly matters. Val should be removed

But that was a simple pchar. You can convert an entire block of digits in one go with some bit twiddling:

Code: Pascal  [Select][+][-]
1. function PcharToInt(pstart, pend: pchar; out unsignedResult: UInt64): boolean;
2. const
3.   selectFirstHalfByte8 = UInt64(\$F0F0F0F0F0F0F0F0);
4.   decimalZeros8        = UInt64(\$3030303030303030);
5.   overflowMaxDigit8    = UInt64(\$0606060606060606);
6.   selectFirstHalfByte4 = UInt32(\$F0F0F0F0);
7.   decimalZeros4        = UInt32(\$30303030);
8.   overflowMaxDigit4    = UInt32(\$06060606);
9. var
10.   length: SizeUInt;
11.   temp8: UInt64;
12.   temp4: UInt32;
13.   bytes: pbyte;
14.   unsigned: UInt64;
15. begin
16.   result := false;
17.   if pend <= pstart then exit;
18.   while (pstart < pend) and (pstart^ = '0') do inc(pstart);
19.   length := pend - pstart;
20.   if length > 20 then exit;
21.   if (length = 20) and (pstart^ >= '2') then exit;
22.   unsigned := 0;
23.   if PtrUInt(TObject(pstart)) and 7 = 0 then begin
24.     while pstart + 8 < pend do begin
25.       temp8 := PUInt64(pstart)^;
26.       if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;
27.       temp8 := temp8 - decimalZeros8;
28.       if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;
29.       bytes := @temp8;
30.       unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];
31.       inc(pstart, 8);
32.     end;
33.     while pstart + 4 < pend do begin
34.       temp4 := PUInt32(pstart)^;
35.       if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;
36.       temp4 := temp4 - decimalZeros4;
37.       if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;
38.       bytes := @temp4;
39.       unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);
40.       inc(pstart, 4);
41.     end;
42.   end;
43.   while (pstart < pend) do begin
44.     case pstart^ of
45.     '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));
46.     else exit;
47.     end;
48.     inc(pstart);
49.   end;
50.   if (length = 20) and (unsigned < 10000000000000000000) then exit;
51.   result := true;
52.   unsignedResult:=unsigned;
53. end;
54.
55.

(btw how do you do an unaligned read?)

That gives

Code: [Select]
`run-time: 679 ms`
That is almost 10 times faster than copying Val!!

#### MathMan

• Full Member
• Posts: 236
##### Re: To optimize RTL, make function PCharToInt
« Reply #70 on: August 12, 2021, 03:09:42 pm »
Quote
(btw how do you do an unaligned read?)

Usually via memcpy, as one can not guarantee misaligned fetches are supported by the destination CPU platform.

Thinking a bit more about this. C compilers usually inline the above stated memcpy approach, but I don't know how FPC is handling this. So from a performance perspective it might not be an optimal choice - but it is a general solution.

An alternative would be to do two aligned QWORD reads and then re-combine. But depending on OS, CPU and memory management one might generate a SIGSEGV with one of the two reads.
[/Edit]

BTW - the conversion of 8 digits can be done by 3 muls, 3 adds and some shifts+ands - if you are really speed crazy :-)

Cheers,
MathMan
« Last Edit: August 12, 2021, 04:21:49 pm by MathMan »

#### BeniBela

• Hero Member
• Posts: 817
##### Re: To optimize RTL, make function PCharToInt
« Reply #71 on: August 12, 2021, 06:16:05 pm »
Thinking a bit more about this. C compilers usually inline the above stated memcpy approach, but I don't know how FPC is handling this. So from a performance perspective it might not be an optimal choice - but it is a general solution.

FPC does not inline Move. That is the problem

There should be something like ignore it on x86 and do the unaligned move, and do a bytewise copy on other platforms

BTW - the conversion of 8 digits can be done by 3 muls, 3 adds and some shifts+ands - if you are really speed crazy :-)

How?

Something like you multiply by 10, 100,  and 10000, and rearrange the digits that they participate at the right number of multiplications?

I am too lazy to figure that out right now

#### MathMan

• Full Member
• Posts: 236
##### Re: To optimize RTL, make function PCharToInt
« Reply #72 on: August 13, 2021, 11:26:54 am »
BTW - the conversion of 8 digits can be done by 3 muls, 3 adds and some shifts+ands - if you are really speed crazy :-)

How?

Something like you multiply by 10, 100,  and 10000, and rearrange the digits that they participate at the right number of multiplications?

Yes, something like that  It should look like the following (untested, direct code)

Code: [Select]
`var  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entry  Dec( Lower, \$3030303030303030 ); // convert to numbers 0-9  Higher := ( Lower * 10 ) shr 8; // no overflow possible  Lower := ( Lower and \$00ff00ff00ff00ff ) + ( Higher and \$00ff00ff00ff00ff );    Higher := ( Lower * 100 ) shr 16;  Lower := ( Lower and \$0000ffff0000ffff ) + ( Higher and \$0000ffff0000ffff );    Higher := ( Lower * 10000 ) shr 32;  Lower := ( Lower and \$00000000ffffffff ) + Higher;`
Point is - if you read in the ASCII digits from memory, then you have to differentiate between little & big endian CPU in the above!

However - if you really want to dive into implementing a faster 'Val' then maybe you want to take a look at this - https://lemire.me/blog/2021/01/29/number-parsing-at-a-gigabyte-per-second/

Cheers,
MathMan
« Last Edit: August 13, 2021, 01:52:03 pm by MathMan »

#### BeniBela

• Hero Member
• Posts: 817
##### Re: To optimize RTL, make function PCharToInt
« Reply #73 on: August 14, 2021, 01:46:41 pm »

Code: [Select]
`var  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entry  Dec( Lower, \$3030303030303030 ); // convert to numbers 0-9  Higher := ( Lower * 10 ) shr 8; // no overflow possible  Lower := ( Lower and \$00ff00ff00ff00ff ) + ( Higher and \$00ff00ff00ff00ff );    Higher := ( Lower * 100 ) shr 16;  Lower := ( Lower and \$0000ffff0000ffff ) + ( Higher and \$0000ffff0000ffff );    Higher := ( Lower * 10000 ) shr 32;  Lower := ( Lower and \$00000000ffffffff ) + Higher;`

I  put it in, but it did not help.

Code: [Select]
`old run-time: 606 mswith 3 mul (wrong): run-time: 624 mswith 3 mul endian corrected: run-time: 667 ms`

Code: [Select]
`program Project1;{\$mode objfpc}{\$H+}uses  {\$IFDEF UNIX}{\$IFDEF UseCThreads}  cthreads,  {\$ENDIF}{\$ENDIF}  Classes, bbutils, bbutilsbeta,bbdebugtools  { you can add units after this };function strTryParseInt(pstart, pend: pchar; out unsigned: UInt64): boolean;var  length: SizeUInt;begin  result := false;  if pend <= pstart then exit;  while (pstart < pend) and (pstart^ = '0') do inc(pstart);  length := pend - pstart;  if length > 20 then exit;  unsigned := 0;  while (pstart < pend) do begin    case pstart^ of    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));    else exit;    end;    inc(pstart);  end;  if (length = 20) and (unsigned < 10000000000000000000) then exit;  result := true;end;{\$AsmMode intel}function PcharToInt(pstart, pend: pchar; out unsignedResult: UInt64): boolean;const  selectFirstHalfByte8 = UInt64(\$F0F0F0F0F0F0F0F0);  decimalZeros8        = UInt64(\$3030303030303030);  overflowMaxDigit8    = UInt64(\$0606060606060606);  selectFirstHalfByte4 = UInt32(\$F0F0F0F0);  decimalZeros4        = UInt32(\$30303030);  overflowMaxDigit4    = UInt32(\$06060606);var  length: SizeUInt;  temp8: UInt64;  temp4: UInt32;  bytes: pbyte;  unsigned: UInt64;begin  result := false;  if pend <= pstart then exit;  while (pstart < pend) and (pstart^ = '0') do inc(pstart);  length := pend - pstart;  if length > 20 then exit;  if (length = 20) and (pstart^ >= '2') then exit;  unsigned := 0;  if PtrUInt(TObject(pstart)) and 7 = 0 then begin    while pstart + 8 <= pend do begin      temp8 := PUInt64(pstart)^;      if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;      temp8 := temp8 - decimalZeros8;      if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;      bytes := @temp8;      unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];      inc(pstart, 8);    end;    while pstart + 4 <= pend do begin      temp4 := PUInt32(pstart)^;      if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;      temp4 := temp4 - decimalZeros4;      if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;      bytes := @temp4;      unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);      inc(pstart, 4);    end;  end;  while (pstart < pend) do begin    case pstart^ of    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));    else exit;    end;    inc(pstart);  end;  if (length = 20) and (unsigned < 10000000000000000000) then exit;  result := true;  unsignedResult:=unsigned;end;function PcharToIntMM(pstart, pend: pchar; out unsignedResult: UInt64): boolean;const  selectFirstHalfByte8 = UInt64(\$F0F0F0F0F0F0F0F0);  decimalZeros8        = UInt64(\$3030303030303030);  overflowMaxDigit8    = UInt64(\$0606060606060606);  selectFirstHalfByte4 = UInt32(\$F0F0F0F0);  decimalZeros4        = UInt32(\$30303030);  overflowMaxDigit4    = UInt32(\$06060606);var  length: SizeUInt;  temp8: UInt64;  temp4: UInt32;  bytes: pbyte;  unsigned: UInt64;var  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entrybegin  result := false;  if pend <= pstart then exit;  while (pstart < pend) and (pstart^ = '0') do inc(pstart);  length := pend - pstart;  if length > 20 then exit;  if (length = 20) and (pstart^ >= '2') then exit;  unsigned := 0;  if PtrUInt(TObject(pstart)) and 7 = 0 then begin    while pstart + 8 <= pend do begin      temp8 := PUInt64(pstart)^;      if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;      temp8 := temp8 - decimalZeros8;      if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;//      bytes := @temp8;//      unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];       unsigned := unsigned * 100000000;      lower := SwapEndian(temp8);      Higher := ( Lower * 10 ) shr 8; // no overflow possible      Lower := ( Lower and \$00ff00ff00ff00ff ) + ( Higher and \$00ff00ff00ff00ff );      Higher := ( Lower * 100 ) shr 16;      Lower := ( Lower and \$0000ffff0000ffff ) + ( Higher and \$0000ffff0000ffff );      Higher := ( Lower * 10000 ) shr 32;      unsigned += ( Lower and \$00000000ffffffff ) + Higher;      inc(pstart, 8);    end;    while pstart + 4 <= pend do begin      temp4 := PUInt32(pstart)^;      if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;      temp4 := temp4 - decimalZeros4;      if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;      bytes := @temp4;      unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);      inc(pstart, 4);    end;  end;  while (pstart < pend) do begin    case pstart^ of    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));    else exit;    end;    inc(pstart);  end;  if (length = 20) and (unsigned < 10000000000000000000) then exit;  result := true;  unsignedResult:=unsigned;end;function PcharToIntMMWrong(pstart, pend: pchar; out unsignedResult: UInt64): boolean;const  selectFirstHalfByte8 = UInt64(\$F0F0F0F0F0F0F0F0);  decimalZeros8        = UInt64(\$3030303030303030);  overflowMaxDigit8    = UInt64(\$0606060606060606);  selectFirstHalfByte4 = UInt32(\$F0F0F0F0);  decimalZeros4        = UInt32(\$30303030);  overflowMaxDigit4    = UInt32(\$06060606);var  length: SizeUInt;  temp8: UInt64;  temp4: UInt32;  bytes: pbyte;  unsigned: UInt64;var  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entrybegin  result := false;  if pend <= pstart then exit;  while (pstart < pend) and (pstart^ = '0') do inc(pstart);  length := pend - pstart;  if length > 20 then exit;  if (length = 20) and (pstart^ >= '2') then exit;  unsigned := 0;  if PtrUInt(TObject(pstart)) and 7 = 0 then begin    while pstart + 8 <= pend do begin      temp8 := PUInt64(pstart)^;      if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;      temp8 := temp8 - decimalZeros8;      if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;//      bytes := @temp8;//      unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];       unsigned := unsigned * 100000000;       lower := temp8;      Higher := ( Lower * 10 ) shr 8; // no overflow possible      Lower := ( Lower and \$00ff00ff00ff00ff ) + ( Higher and \$00ff00ff00ff00ff );      Higher := ( Lower * 100 ) shr 16;      Lower := ( Lower and \$0000ffff0000ffff ) + ( Higher and \$0000ffff0000ffff );      Higher := ( Lower * 10000 ) shr 32;      unsigned += ( Lower and \$00000000ffffffff ) + Higher;      inc(pstart, 8);    end;    while pstart + 4 <= pend do begin      temp4 := PUInt32(pstart)^;      if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;      temp4 := temp4 - decimalZeros4;      if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;      bytes := @temp4;      unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);      inc(pstart, 4);    end;  end;  while (pstart < pend) do begin    case pstart^ of    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));    else exit;    end;    inc(pstart);  end;  if (length = 20) and (unsigned < 10000000000000000000) then exit;  result := true;  unsignedResult:=unsigned;end;var s: array of string = ('123', '456', '9223372036854775807', '9223372036854775808', '18446744073709551615');//, '18446744073709551616');  ss: array of shortstring = ('123', '456', '9223372036854775807', '9223372036854775808', '18446744073709551615');var i, j, code: integer;    v: uint64;    t: String;begin  i := 2;// high(s);  t := '12345678';  writeln(strTryParseInt(pchar(t), pchar(t) + length(t), v));  writeln(v);  writeln(PcharToIntMM(pchar(t), pchar(t) + length(t), v));  writeln(v);  //exit;{  startTiming('copy');  for i := 0 to high(s) do begin    for j := 1 to 10000000 do      val(copy(s[i],1,length(s[i])), v, code)  end;  stopTiming('copy');  startTiming('val');  for i := 0 to high(s) do begin    for j := 1 to 10000000 do      val(s[i], v, code)  end;  stopTiming('val');  startTiming('valss');  for i := 0 to high(s) do begin    for j := 1 to 10000000 do      val(ss[i], v, code)  end;  stopTiming('valss');   }  startTiming('new');  for i := 0 to high(s) do begin    for j := 1 to 10000000 do      PcharToInt(pchar(s[i]), pchar(s[i]) + length(s[i]), v)  end;  stopTiming('new');  startTiming('new');  for i := 0 to high(s) do begin    for j := 1 to 10000000 do      PcharToIntMMWrong(pchar(s[i]), pchar(s[i]) + length(s[i]), v)  end;  stopTiming('new');  startTiming();  for i := 0 to high(s) do begin    for j := 1 to 10000000 do      PcharToIntMM(pchar(s[i]), pchar(s[i]) + length(s[i]), v)  end;  stopTiming();end.`
Although I had a bug that it only used the 8-digit fast path when there where at least 9 digits. The condition in the loop should have been while pstart + 8 <= pend do begin rather than while pstart + 8 < pend do begin. Changing that made it somewhat faster. Or sometimes slower. It is very fiddly to benchmark. The branch predicator appears to be very random

Point is - if you read in the ASCII digits from memory, then you have to differentiate between little & big endian CPU in the above!

And the code got exactly the wrong endianess. Everything comes out in reverse.

I had to put in a SwapEndian but that call slow  it down too much. FPC not inlining such functions is another big RTL problem.

Inline assembler bswap is even worse because it apparently prevents optimizations.

However - if you really want to dive into implementing a faster 'Val' then maybe you want to take a look at this - https://lemire.me/blog/2021/01/29/number-parsing-at-a-gigabyte-per-second/

That is for float. Integer should always be faster than float.

I have already convinced Bero to implement that.  But I am not sure he got everything right.

A bigger issue is how to name this function. Could be PcharToUInt, PcharToUInt64, TryPcharToUInt, TryPcharToUInt64, or PcharToUIntTry, or PcharToUInt64Try. Try at the beginning is more common, but Try at the end is easier to find in the Lazarus completion list. But there is not other PcharTo* function in the RTL.
Could be TextToUInt, TextToUInt64, TryTextToUInt, TryTextToUInt64, TextToUIntTry, or TextToUInt64Try, because there is sysutils.TextToFloat(). But that is an odd one, a Try function without Try in its name and the only one of its kind.
Could be StrToUInt, StrToUInt64, TryStrToUInt, TryStrToUInt64, or StrToUIntTry, TryStrToUInt64Try, because the overload resolving can find it from its type, so the type does not need to be in the name. Nor the 64.
But the classical one also handles other bases. Perhaps DecimalStrToUInt, DecimalStrToUInt64, TryDecimalStrToUInt, TryDecimalStrToUInt64,  DecimalStrToUIntTry, DecimalStrToUInt64Try.
StrDecimalToUInt, StrDecimalToUInt64, TryStrDecimalToUInt, TryStrDecimalToUInt64, or StrDecimalToUIntTry, StrDecimalToUInt64Try
Could also drop the string type, TryParse, or TryParseDecimal or TryParseDecimalNumber

#### Alextp

• Hero Member
• Posts: 1384
##### Re: To optimize RTL, make function PCharToInt
« Reply #74 on: August 14, 2021, 01:53:52 pm »
Quote
>PcharToUInt, PcharToUInt64, TryPcharToUInt, TryPcharToUInt64, or PcharToUIntTry, or PcharToUInt64Try
I think better name them with 'Buffer...' instead of 'PChar' (PChar sounds not nice). For Widestring buffer, name them with 'BufferW...'.

'Decimal' word is nice too.
« Last Edit: August 14, 2021, 01:56:55 pm by Alextp »