Lazarus

Free Pascal => General => Topic started by: AlexTP on July 22, 2021, 09:06:58 am

Title: To optimize RTL, make function PCharToInt
Post by: AlexTP on July 22, 2021, 09:06:58 am
As I see per this patch
https://github.com/graemeg/freepascal/commit/71e8160cb755f38cae6ce40db0bd72f968c5a95c
RTL has places where you do StrToInt(Copy(S, n, m)) and StrToIntDef(Copy(...)) and TryStrToInt(Copy(...)).
This can be optimized with a PCharToInt which won't do string allocs.
Code of PCharToInt is simple (using pointer+length).
Agree?
Title: Re: To optimize RTL, make function PCharToInt
Post by: BeniBela on July 22, 2021, 12:28:20 pm
It is all over the RTL

Not just StrToInt. This goes much much deeper.



Just a quick search:

In inifiles.pp: Trim(Copy(sLine, 1,  j - 1))
In  fphttpclient.pp: LowerCase(Copy(S,1,Length(SetCookie)))=SetCookie
In fphttpclient.pp: ((LowerCase(Copy(HTTPHeaders[Result],1,l)))<>h)
In fpreport.pp: SameText(Copy(BaseName,1,9),'TFPReport')


The proper solution would be to have a new string type SubString or something, which could be used anywhere a string can be used. But it would only refer an existing string, and not require memory allocations.


You could build your own type with a record, but then the problem is that record fields are not kept in registers, so it is slower than a local variable pchar. This needs an improved optimizer
Title: Re: To optimize RTL, make function PCharToInt
Post by: AlexTP on July 22, 2021, 12:45:43 pm
These 3 can be opt'ed already:

In  fphttpclient.pp: LowerCase(Copy(S,1,Length(SetCookie)))=SetCookie
In fphttpclient.pp: ((LowerCase(Copy(HTTPHeaders[Result],1,l)))<>h)
In fpreport.pp: SameText(Copy(BaseName,1,9),'TFPReport')

Using StrLComp() and StrLIComp().
Title: Re: To optimize RTL, make function PCharToInt
Post by: PascalDragon on July 23, 2021, 10:49:31 am
Please stop coming up with these badly thought out ideas.

Please stop making claims without supporting them with evidence.

If a function is not used in your program then it will be smartlinked out, especially if you did indeed enable the smartlink options. If it is not left out then it is indeed used either directly or indirectly.

Take the following program:

Code: Pascal  [Select][+][-]
  1. program tsmartlink;
  2.  
  3. procedure Test;
  4. begin
  5.   Writeln('Test');
  6. end;
  7.  
  8. procedure TestSub;
  9. begin
  10.   Writeln('TestSub');
  11. end;
  12.  
  13. procedure Test2;
  14. begin
  15.   TestSub;
  16. end;
  17.  
  18. procedure TestSub2;
  19. begin
  20.   Writeln('TestSub2');
  21. end;
  22.  
  23. procedure Test3(aArg: Integer); inline;
  24. begin
  25.   if aArg = 2 then
  26.     TestSub2;
  27. end;
  28.  
  29. begin
  30. {$ifdef USE_TEST}
  31.   Test;
  32. {$endif}
  33. {$ifdef USE_TEST2}
  34.   Test2;
  35. {$endif}
  36. {$ifdef USE_TEST3}
  37.   Test3(1);
  38. {$endif}
  39. end.

If you compile without any of the defines set (just call the compiler manually so that none of Lazarus' options intefere) then you won't find any of the three strings inside the resulting binary. If you compile with USE_TEST1 (-dUSE_TEST1) you will find the 'Test' string. If you compile with USE_TEST2 you'll find the 'TestSub' string and if you compile with USE_TEST3 you will find no string, because due to the call of Test3 with 1 instead of 2 the call to TestSub2 will be optimized as well.

This will also work across units and thus also the RTL.
Title: Re: To optimize RTL, make function PCharToInt
Post by: ccrause on July 23, 2021, 03:01:25 pm
Please stop coming up with these badly thought out ideas.

It only aids in adding more useless redundant junk to an already over whelmed RTL..
I think the intent of @Alextp was to highlight the use of copying (sub)strings when it may not be necessary (e.g. when the copied string is not modified), not to suggest yet more functionality.  The following post clearly illustrates that there are existing alternatives that avoids making copies of strings:
These 3 can be opt'ed already:

In  fphttpclient.pp: LowerCase(Copy(S,1,Length(SetCookie)))=SetCookie
In fphttpclient.pp: ((LowerCase(Copy(HTTPHeaders[Result],1,l)))<>h)
In fpreport.pp: SameText(Copy(BaseName,1,9),'TFPReport')

Using StrLComp() and StrLIComp().

To me this suggestion seems sensible, but then I'm no expert on the low level details of string handling.

Perhaps @Alextp can come up with some benchmark tests to highlight potential improvements?
Title: Re: To optimize RTL, make function PCharToInt
Post by: wp on July 23, 2021, 03:39:45 pm
Such micro-optimizations usually are not noticable, but have the risk of breaking something somewhere else. I fully agree with jamie.
Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 23, 2021, 04:28:46 pm
1- I disagree with jamie.
2- "the risk of breaking something somewhere else" is not a good reason,  that same risk is there by simply touching the code anywhere, including fixing a bug.
3- Objecting to every "micro-optimization" will not enhance the speed. It is the collective result of all micro-optimizations that increases the speed.
4- Typically, every function that deals with strings should have a counterpart that takes pchar and length to allow for same usage even when a String type is used.

Title: Re: To optimize RTL, make function PCharToInt
Post by: wp on July 23, 2021, 04:32:06 pm
Typically, every function that deals with strings should have a counterpart that takes pchar and length to allow for same usage even when a String type is used.
The problem will arise when the string has a legal #0 inside. If not done properly the PChar version can stop parsing here and miss the rest of the string.
Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 23, 2021, 04:41:03 pm
I did mention "length" in the section you quoted, for this and to make the move from String type to pchar easy.
Title: Re: To optimize RTL, make function PCharToInt
Post by: lucamar on July 23, 2021, 04:57:29 pm
The problem will arise when the string has a legal #0 inside. If not done properly the PChar version can stop parsing here and miss the rest of the string.

Note that he said "[...] takes pchar and length [...]" so the problem doesn't need to arise. And if passing a "normal" string it's even easier since all one has to do is:
Code: Pascal  [Select][+][-]
  1. Somefunction(PChar(AString), Length(AString));
;)

ETA: As he just said. D***ed cross-posts! :-[
Title: Re: To optimize RTL, make function PCharToInt
Post by: wp on July 23, 2021, 05:05:07 pm
Yes, and if length were always respected we would not have all these buffer underruns in C. As I said: "if done properly"... "When things can go wrong they will go wrong." Or: "Never change a winning team".

I did a lot of these micro-optimizations myself with arguments like "it looks better when I do it this way", "ah, this is a bit faster" etc., and I almost always ended up in introducing a bug somewhere. This is different from fixing a bug. A bug MUST be fixed, micro-optimizations gains nothing.
Title: Re: To optimize RTL, make function PCharToInt
Post by: AlexTP on July 23, 2021, 05:10:53 pm
@wp,
Okay, if you think 'it gives nothing', let the DataTime code be as is.
Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 23, 2021, 05:26:29 pm
This is exactly what we are trying to do, to give you String functions in the RTL that are optimized so you don't need to do more micro-optimizations and introduce bugs into your code.

Forced or not, the reasoning behind rejecting "micro-optimizations" is not correct. Dealing with strings, pchar is the natural type to use, while String type is more of a user-friendly type to hide address, length, codepage and reference count.

Length should always be respected instead of being lazy and use Copy with MaxInt.
Title: Re: To optimize RTL, make function PCharToInt
Post by: howardpc on July 23, 2021, 05:34:04 pm
Length should always be respected instead of being lazy and use Copy with MaxInt.
Note that recent FPCs allow you to use Copy() with only two parameters when you want the copy to be from Index to Length inclusive -- so no Maxint is needed.
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 23, 2021, 05:34:49 pm
Dealing with strings, pchar is the natural type to use, while String type is more of a user-friendly type to hide address, length, codepage and reference count.


Hi!

The natural Pascal type is string, not pchar. Turbo Pascal lived years without pchar. I think pchar was first used in version 7 but only with extended syntax switch on.  pchar is a concession to the Operating systems but not  original Pascal syntax.

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 23, 2021, 05:37:01 pm
Length should always be respected instead of being lazy and use Copy with MaxInt.
Note that recent FPCs allow you to use Copy() with only two parameters when you want the copy to be from Index to Length inclusive -- so no Maxint is needed.

Thank you. This is a step in the correct direction. I mentioned this example as one of my bad habits.
Title: Re: To optimize RTL, make function PCharToInt
Post by: BeniBela on July 23, 2021, 05:38:41 pm
Yes, and if length were always respected we would not have all these buffer underruns in C. As I said: "if done properly"... "When things can go wrong they will go wrong." Or: "Never change a winning team".

I did a lot of these micro-optimizations myself with arguments like "it looks better when I do it this way", "ah, this is a bit faster" etc., and I almost always ended up in introducing a bug somewhere. This is different from fixing a bug. A bug MUST be fixed, micro-optimizations gains nothing.


That is the problem with pchar

It is very unsafe, and easy to accidentally use a wrong length


But with a sub string type, it would store the length together with the pchar, so you cannot use the wrong length. The example

In fpreport.pp: SameText(Copy(BaseName,1,9),'TFPReport')

would be changed

In fpreport.pp: SameText(CopyAsSubStringView(BaseName,1,9),'TFPReport')

and is faster, and still safe.

Or with type helpers, one could do:

In fpreport.pp: BaseName.SubStringView(1,9).SameText('TFPReport')

It is safe, faster and looks better



If a function is not used in your program then it will be smartlinked out, especially if you did indeed enable the smartlink options. If it is not left out then it is indeed used either directly or indirectly.


That did not work at all

Looks like -gl disables smartlinking
Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 23, 2021, 05:42:11 pm
Dealing with strings, pchar is the natural type to use, while String type is more of a user-friendly type to hide address, length, codepage and reference count.


Hi!

The natural Pascal type is string, not pchar. Turbo Pascal lived years without pchar. I think pchar was first used in version 7 but only with extended syntax switch on.  pchar is a concession to the Operating systems but not  original Pascal syntax.

Winni

Hi. Yes, I meant natural in the sense when you need to deal with individual characters and at a closer level to the CPU, pchar is the type to use. Glad you pointed this out, it shows when pchar was perceived as a needed type.
Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 23, 2021, 05:45:14 pm
@BeniBela

+1 for Sub String Type

The header preceding String type is more of a burden. At least a new type that can be separate from the String itself is needed.
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 23, 2021, 06:03:59 pm
Hi!

Depends what you want:

If you want Pascal then use the string.

If you want speed then use pchar which has nothing to do with Pascal but with the OS. And if you want speed then use Assembler but not Pascal.

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 23, 2021, 06:16:24 pm
You make it sound like you choose one and only one, with no other possibility. It sounds more like a gang membership, it is not. You can have Pascal, String, and speed through pchar or sub string.
Title: Re: To optimize RTL, make function PCharToInt
Post by: lucamar on July 23, 2021, 06:29:15 pm
I think pchar was first used in version 7 but only with extended syntax switch on.  pchar is a concession to the Operating systems but not  original Pascal syntax.

[...] Glad you pointed this out, it shows when pchar was perceived as a needed type.

Historical disgression:
It was in fact introduced as a predeclared type in Turbo Pascal for Windows 1.0, which extended TP6.0, though IIRC the (very limited) compatibility rules enabled through Extended Syntax were introduced in TP7. Almost-full compatibility (allowing you to e.g. typecast a String as a PChar) were not added until Delphi 2 or 3 (don'r rememeber which). So it was indeed the drive toward Windows (and Protected-mode) programming which made the addition necessary.
 8-)
Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 23, 2021, 06:42:11 pm
Thank you. I am happy they gave us pchar, and also extending strings beyond 255-char length, not to forget adding code pages, and later breaking out of ANSI codepage that used to limit us to only two languages, or more precisely one additional language beside English.

Should this progress stop?
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 23, 2021, 07:25:33 pm
@lucamar

Delphi1 introduced strpas and strPcopy to convert  string <--> pchar

@engkin

Don't struggle so useless. The introduction of the AnsiString and the UTF8 codepage has nothing to do with the pchar.

The pchar is just an awful bridge to connect to the OS.

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: lucamar on July 23, 2021, 08:14:59 pm
Delphi1 introduced strpas and strPcopy to convert  string <--> pchar

Not quite: those two (among others) were already in TP7, in the Strings unit ;D
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 23, 2021, 08:33:50 pm
Delphi1 introduced strpas and strPcopy to convert  string <--> pchar

Not quite: those two (among others) were already in TP7, in the Strings unit ;D

Thanx, lucamar!

That was the time when I (and a lot other people) used TP 5.5, TP 7.0, TP 1.0/1.5 for Windows (cruel!!) and Delphi1 parallel. So the reminder is not quiet sharp.

That stopped with 32 bit and Delphi 2,3,...

But the simple implicit cast of
Code: Pascal  [Select][+][-]
  1.  
  2.  
  3. String := pchar;

came some versions later.

Winni

Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 23, 2021, 08:55:17 pm
Don't struggle so useless.

How is this supposed to help?

The introduction of the AnsiString and the UTF8 codepage has nothing to do with the pchar.

Ok, let's forget pchar for a second. Can I assume you have no problem with another string type like SubString?
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 23, 2021, 09:08:11 pm


Ok, let's forget pchar for a second. Can I assume you have no problem with another string type like SubString?

What  you call a "SubString" s nothing but an Alias.
A collegue told me that is a great source of confusion in C++

So: When you like to increase the Confusion in Pascal:  Why not?

I go allways the opposite direction:  Keep It Simple, Stupid: KISS

Winni

Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 23, 2021, 09:16:26 pm
This is not helping. The argument now is based on what your collegue is telling you. It does not leave a way for me to accept your argument, nor it leaves a way to convince you of anything your colleague does not like.
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 23, 2021, 09:29:57 pm
It is so obvious:

Whatever I tell you - you try to get out of the trap of logic.

So keep on going, muddle Pascal with pchar, "SubStrings" or whatever you like.
But you can count me out.

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: PascalDragon on July 26, 2021, 01:45:20 pm
Like it or not you seem to keep defending bugs and claimed features that just don't work.

 Also bad code generation with blanket statements in return.

Like it or not, but you either never provide evidence for bugs or you don't believe or ignore us when we say it's not a bug, but instead you doing something that can even fail on Delphi (with proof by us!).
Title: Re: To optimize RTL, make function PCharToInt
Post by: rsz on July 27, 2021, 06:37:02 pm
FPC already seems to have the concept of partial arrays and partial strings, in other languages they're called slices.

Unfortunately I can't seem to find much documentation about partial strings. It is only mentioned briefly in the Free Pascal Reference Guide but it seems like something that would be fitting to use here instead of Copy but unfortunately it seems to do a conversion (and allocation?):

In this code for example:
Code: Pascal  [Select][+][-]
  1. procedure SomeProc;
  2.   procedure Proc1(const XS: array of Integer); begin end;
  3.   procedure Proc2(const Str: String); begin end;
  4. var
  5.   A: array of Integer = (1, 2, 3, 4, 5, 6, 7, 8);
  6.   B: String = 'Hello world!';
  7. begin
  8.   Proc1(A[0..1]); // passes (1, 2) to Proc1
  9.   Proc2(B[1..5]); // passes 'Hello' to Proc2
  10. end;

partial strings seem to do a conversion (fpc_chararray_to_ansistr), unlike partial arrays:
Code: ASM  [Select][+][-]
  1. project1.lpr:52                           Proc1(A[0..1]); // passes (1, 2) to Proc1
  2. 0000000000401334 488b5df8                 mov    -0x8(%rbp),%rbx
  3. 0000000000401338 48c7c600000000           mov    $0x0,%rsi
  4. 000000000040133F 488b7df8                 mov    -0x8(%rbp),%rdi
  5. 0000000000401343 e878010100               callq  0x4114c0 <fpc_dynarray_rangecheck>
  6. 0000000000401348 4889de                   mov    %rbx,%rsi
  7. 000000000040134B 48c7c201000000           mov    $0x1,%rdx
  8. 0000000000401352 4889ef                   mov    %rbp,%rdi
  9. 0000000000401355 e8c6000000               callq  0x401420 <PROC1>
Code: ASM  [Select][+][-]
  1. project1.lpr:53                           Proc2(B[1..5]); // passes 'Hello' to Proc2
  2. 000000000040135A 488b5df0                 mov    -0x10(%rbp),%rbx
  3. 000000000040135E 48c7c601000000           mov    $0x1,%rsi
  4. 0000000000401365 488b7df0                 mov    -0x10(%rbp),%rdi
  5. 0000000000401369 e832b20000               callq  0x40c5a0 <fpc_ansistr_rangecheck>
  6. 000000000040136E 4889de                   mov    %rbx,%rsi
  7. 0000000000401371 41b001                   mov    $0x1,%r8b
  8. 0000000000401374 b900000000               mov    $0x0,%ecx
  9. 0000000000401379 48c7c204000000           mov    $0x4,%rdx
  10. 0000000000401380 488d7d88                 lea    -0x78(%rbp),%rdi
  11. 0000000000401384 e807ad0000               callq  0x40c090 <fpc_chararray_to_ansistr>
  12. 0000000000401389 488b7588                 mov    -0x78(%rbp),%rsi
  13. 000000000040138D 4889ef                   mov    %rbp,%rdi
  14. 0000000000401390 e84b000000               callq  0x4013e0 <PROC2>
  15. 0000000000401395 e8c6340100               callq  0x414860 <fpc_popaddrstack>

fpc_chararray_to_ansistr:
https://github.com/graemeg/freepascal/blob/ef144e079c01ce9c6b54af9d348b70bcd47a1c58/rtl/inc/astrings.inc#L578

I did some rudimentary and quick non-scientific benchmarking and it seems to be unfortunately as slow as a copy on my machine. I expected it (maybe wrongly) to be as fast as passing a pointer with a range.

Edit: They should be const.
Title: Re: To optimize RTL, make function PCharToInt
Post by: marcov on July 27, 2021, 06:55:48 pm
1. A subrange of a shortstring is probably an array of char, so proc2 probably needs to be array of char not string
2. the arguments of proc1 (XS and StR) should probably be const.

Code: [Select]
.Ll7:
movl TC_$P$PROGRAM$_$SOMEPROC_$$_defaultB,%edx
leal -36(%ebp),%eax
call fpc_ansistr_assign
// this optimization is weird, since this snapshot has no avx2 ?
vmovdqu TC_$P$PROGRAM$_$SOMEPROC_$$_defaultA,%ymm0
vmovdqu %ymm0,-32(%ebp)
.Ll8:
leal -32(%ebp),%edx
movl $1,%ecx
call P$PROGRAM$_$SOMEPROC_$$_PROC1$array_of_LONGINT
.Ll9:
movl -36(%ebp),%edx
movl $4,%ecx
call P$PROGRAM$_$SOMEPROC_$$_PROC2$array_of_CHAR
.Ll10:
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 27, 2021, 07:27:30 pm
Hi!

What is often forgetten is the ability of fpc to use only a slice of an array as parameter.

Example:

Code: Pascal  [Select][+][-]
  1. Type arrayOfTpoint = array of TPoint;
  2. var  ar :arrayOfTpoint;
  3. begin
  4.   setLength(ar,123);
  5.   //.....
  6.   Image1.Canvas.polyline(ar[23..42]);
  7.  
  8. end;              

In the example only the points 23..42 are used for drawing.
So you don't need to use copy or slice but you can do the same with the range of the parameter.

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: BeniBela on July 27, 2021, 07:40:12 pm
But it looks like the called function makes a copy with getmem and move:

Code: Pascal  [Select][+][-]
  1.   procedure Proc1(XS: array of Integer);
  2.   begin
  3.     writeln(length(XS));
  4.   end;
  5.  
  6.  
Code: ASM  [Select][+][-]
  1. project1.lpr:3                            begin
  2. 0000000000401090 55                       push   %rbp
  3. 0000000000401091 4889e5                   mov    %rsp,%rbp
  4. 0000000000401094 488da42470ffffff         lea    -0x90(%rsp),%rsp
  5. 000000000040109C 48899d78ffffff           mov    %rbx,-0x88(%rbp)
  6. 00000000004010A3 4c896580                 mov    %r12,-0x80(%rbp)
  7. 00000000004010A7 4c896d88                 mov    %r13,-0x78(%rbp)
  8. 00000000004010AB 4c897590                 mov    %r14,-0x70(%rbp)
  9. 00000000004010AF 48897df8                 mov    %rdi,-0x8(%rbp)
  10. 00000000004010B3 4889f3                   mov    %rsi,%rbx
  11. 00000000004010B6 4c8b65f8                 mov    -0x8(%rbp),%r12
  12. 00000000004010BA 4c8d6b01                 lea    0x1(%rbx),%r13
  13. 00000000004010BE 49c1e502                 shl    $0x2,%r13
  14. 00000000004010C2 4c89ef                   mov    %r13,%rdi
  15. 00000000004010C5 e876760100               callq  0x418740 <fpc_getmem>
  16. 00000000004010CA 4989c6                   mov    %rax,%r14
  17. 00000000004010CD 4c89ea                   mov    %r13,%rdx
  18. 00000000004010D0 4c89f6                   mov    %r14,%rsi
  19. 00000000004010D3 4c89e7                   mov    %r12,%rdi
  20. 00000000004010D6 e835040000               callq  0x401510 <SYSTEM_$$_MOVE$formal$formal$INT64>
  21. 00000000004010DB 4c8975f8                 mov    %r14,-0x8(%rbp)
  22. 00000000004010DF 488d55e0                 lea    -0x20(%rbp),%rdx
  23. 00000000004010E3 488d75a0                 lea    -0x60(%rbp),%rsi
  24. 00000000004010E7 bf01000000               mov    $0x1,%edi
  25. 00000000004010EC e83f2f0100               callq  0x414030 <fpc_pushexceptaddr>
  26. 00000000004010F1 4889c7                   mov    %rax,%rdi
  27. 00000000004010F4 e8670d0000               callq  0x401e60 <fpc_setjmp>
  28. 00000000004010F9 4863d0                   movslq %eax,%rdx
  29. 00000000004010FC 48895598                 mov    %rdx,-0x68(%rbp)
  30. 0000000000401100 85c0                     test   %eax,%eax
  31. 0000000000401102 7528                     jne    0x40112c <PROC1+156>
  32. project1.lpr:4                            writeln(length(XS));
  33. 0000000000401104 e887bc0100               callq  0x41cd90 <fpc_get_output>
  34. 0000000000401109 4989c4                   mov    %rax,%r12
  35. 000000000040110C 488d5301                 lea    0x1(%rbx),%rdx
  36. 0000000000401110 4c89e6                   mov    %r12,%rsi
  37. 0000000000401113 31ff                     xor    %edi,%edi
  38. 0000000000401115 e8f6c80100               callq  0x41da10 <fpc_write_text_sint>
  39. 000000000040111A e871610100               callq  0x417290 <fpc_iocheck>
  40. 000000000040111F 4c89e7                   mov    %r12,%rdi
  41. 0000000000401122 e809bf0100               callq  0x41d030 <fpc_writeln_end>
  42. 0000000000401127 e864610100               callq  0x417290 <fpc_iocheck>
  43. 000000000040112C e82f320100               callq  0x414360 <fpc_popaddrstack>
  44. project1.lpr:5                            end;
  45. 0000000000401131 488b7df8                 mov    -0x8(%rbp),%rdi
  46. 0000000000401135 e826760100               callq  0x418760 <fpc_freemem>
  47. 000000000040113A 488b4598                 mov    -0x68(%rbp),%rax
  48. 000000000040113E 4885c0                   test   %rax,%rax
  49. 0000000000401141 7405                     je     0x401148 <PROC1+184>
  50. 0000000000401143 e8a8330100               callq  0x4144f0 <fpc_reraise>
  51. 0000000000401148 488b9d78ffffff           mov    -0x88(%rbp),%rbx
  52. 000000000040114F 4c8b6580                 mov    -0x80(%rbp),%r12
  53. 0000000000401153 4c8b6d88                 mov    -0x78(%rbp),%r13
  54. 0000000000401157 4c8b7590                 mov    -0x70(%rbp),%r14
  55. 000000000040115B 4889ec                   mov    %rbp,%rsp
  56. 000000000040115E 5d                       pop    %rbp
  57. 000000000040115F c3554889e5488da42470ffff retq  
  58.  
  59.  

Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 27, 2021, 07:55:40 pm
But it looks like the called function makes a copy with getmem and move:

Code: Pascal  [Select][+][-]
  1.   procedure Proc1(XS: array of Integer);
  2.   begin
  3.     writeln(length(XS));
  4.   end;
  5.  
  6.  

That is one of the first lessons in Pascal:

The parameter of a procedure working with a local copy.
If you want to avoid that change it to

 
Code: Pascal  [Select][+][-]
  1.   procedure Proc1( var XS: array of Integer);
  2.   begin
  3.     writeln(length(XS));
  4.   end;
  5.  
  6.  

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: rsz on July 27, 2021, 07:56:19 pm
1. A subrange of a shortstring is probably an array of char, so proc2 probably needs to be array of char not string
2. the arguments of proc1 (XS and StR) should probably be const.
1. It indeed seems to be and in the case of an overloaded function with the option of either string or array of char, the compiler chooses the one with array of char.
2. You're right they should be const and the compiler is able to optimize the string slice to const array of char to pass it as fast as passing a pointer with separate range as parameters. This is very good to know 8-)! Unfortunately the compiler does not seem to be able to optimize a string slice when passed to a const string parameter the same way. If it were able to do this then the copy function in the first post on the first page could be replaced with partial strings (subrange).

What is often forgetten is the ability of fpc to use only a slice of an array as parameter.
I didn't know about it up until yesterday :-[.
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 27, 2021, 08:23:15 pm
@rsz

Hi!

As a string is nothing but an special array of char you can use the shown range for the parameter also with a string:

Code: Pascal  [Select][+][-]
  1. const hello = 'Hello fpc Users!';  
  2. ....
  3. showMessage (hello[7..9]);
  4.  
  5.  

This shows fpc

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: Kays on July 27, 2021, 10:18:42 pm
FPC already seems to have the concept of partial arrays and partial strings, in other languages they're called slices.

Unfortunately I can't seem to find much documentation about partial strings. […]
Those are substrings and they are standardized in ISO 10206 “Extended Pascal”. It’s an FPC extension that this syntax works on other arrays too.
Title: Re: To optimize RTL, make function PCharToInt
Post by: rsz on July 28, 2021, 01:40:03 am
Hi winni!

Thanks. I wasn't aware of it until yesterday but I understand how to use it. I'm trying to figure out how it works internally as it does not seem to be documented in the Free Pascal Reference Guide, see https://forum.lazarus.freepascal.org/index.php/topic,55472.msg413300.html#msg413300 . So far it seems that a slice of a string is handled like an array of char and when passed to a function/procedure which accepts a const string it has to do a conversion.

FPC already seems to have the concept of partial arrays and partial strings, in other languages they're called slices.

Unfortunately I can't seem to find much documentation about partial strings. […]
Those are substrings and they are standardized in ISO 10206 “Extended Pascal”. It’s an FPC extension that this syntax works on other arrays too.
I'll have a read, thank you.

In case anyone is interested, here are my benchmark results. Source code is attached.
The functions are declared as:
Code: Pascal  [Select][+][-]
  1. function FuncAcceptingString(const S: String): Integer;
  2. function FuncAcceptingArray(const S: array of Char): Integer;
  3. function FuncAcceptingPChar(const S: PChar; const FromIndex, ToIndex: Integer): Integer;
And the results are:
Quote
Running benchmarks 1000000000 times per function...

FuncAcceptingString(Copy(InputString, 2, 4))) finished in 18.967 seconds
FuncAcceptingString(InputString[2..5])) finished in 14.641 seconds
FuncAcceptingArray(InputArray[1..4])) finished in 1.776 seconds
FuncAcceptingArray(InputString[2..5])) finished in 1.751 seconds
FuncAcceptingPChar(PChar(InputString), 1, 4)) finished in 1.743 seconds

All accumulators are equal to: 212000000000
That passing a string with copy is the slowest is not really a surprise, but the overhead of a partial string passed to a function with const string compared to an array is rather unexpected when you've never used it before. I have expected it to perform more like passing a pointer.
Title: Re: To optimize RTL, make function PCharToInt
Post by: lucamar on July 28, 2021, 03:48:20 am
[...] but the overhead of a partial string passed to a function with const string compared to an array is rather unexpected when you've never used it before. I have expected it to perform more like passing a pointer.

It's not so surprising; one just has to remember that despite the sintactic similarities the internal structure of strings is different from that of arrays: they always have that more or less complex record before the "char array" itself and the compiler has to build that (so in fact a whole new string) before using it as a parameter, it can't just pass an "array of char" where the function/procedure expects a full blown "String"
Title: Re: To optimize RTL, make function PCharToInt
Post by: PascalDragon on July 28, 2021, 09:11:32 am
FPC already seems to have the concept of partial arrays and partial strings, in other languages they're called slices.

Slices won't help here, because the relevant functions that Alextp mentioned (e.g. SameText or LowerCase) take a String and not an array of Char, thus there will be a copy of the string anyway.

[...] but the overhead of a partial string passed to a function with const string compared to an array is rather unexpected when you've never used it before. I have expected it to perform more like passing a pointer.

It's not so surprising; one just has to remember that despite the sintactic similarities the internal structure of strings is different from that of arrays: they always have that more or less complex record before the "char array" itself and the compiler has to build that (so in fact a whole new string) before using it as a parameter, it can't just pass an "array of char" where the function/procedure expects a full blown "String"

Just to clear up potential confusion:
Title: Re: To optimize RTL, make function PCharToInt
Post by: BeniBela on July 29, 2021, 06:18:45 pm

That is one of the first lessons in Pascal:

The parameter of a procedure working with a local copy.

Right. Unless it is a dynamic array parameter

If you want to avoid that change it to

 
Code: Pascal  [Select][+][-]
  1.   procedure Proc1( var XS: array of Integer);
  2.   begin
  3.     writeln(length(XS));
  4.   end;
  5.  
  6.  



That is much better


Now if local variables or result could be declared to be open arrays

And if there were typehelpers for (open) arrays





Slices won't help here, because the relevant functions that Alextp mentioned (e.g. SameText or LowerCase) take a String and not an array of Char, thus there will be a copy of the string anyway.

Then perhaps we need new overloaded functions

Especially an open array version of Val
Title: Re: To optimize RTL, make function PCharToInt
Post by: PascalDragon on July 30, 2021, 09:13:34 am
Now if local variables or result could be declared to be open arrays

No. Those would need to be allocated on the heap (especially the return value) and then you can just as well use a dynamic array.

Slices won't help here, because the relevant functions that Alextp mentioned (e.g. SameText or LowerCase) take a String and not an array of Char, thus there will be a copy of the string anyway.

Then perhaps we need new overloaded functions

No, we don't. This is also about maintainability. And adding functions for everything and the kitchen sink is not maintainable. Also the main types for strings in Object Pascal are the various String types and not arrays of Char.

Especially an open array version of Val

And what would such a Val do?
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 30, 2021, 10:40:42 am

No, we don't. This is also about maintainability. And adding functions for everything and the kitchen sink is not maintainable. Also the main types for strings in Object Pascal are the various String types and not arrays of Char.

Thanx to PascalDragon!

And for those who are eager to optimize the whole day and night:

Pascal is not the right choice.
Use Forth.
There you can optimize everything.
On speed. On size. On everything.

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: Bart on July 30, 2021, 02:59:37 pm
Use Forth.

Didn't you mean: "use the force"  O:-)

Bart
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 30, 2021, 04:35:13 pm
Hi!

No, have a look at

https://en.wikipedia.org/wiki/Forth_(programming_language) (https://en.wikipedia.org/wiki/Forth_(programming_language))

Forth is its own OS like UCSD Pascal or Oberon.

Forth is used for example in Riad Airport  and the controling of some satellites.
The Bios of the PowerMac is written in Forth.

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: BeniBela on July 30, 2021, 07:18:54 pm
No. Those would need to be allocated on the heap (especially the return value) and then you can just as well use a dynamic array.

But the point is to avoid heap allocations and return a reference to the already existing array
 
The Trim function would be the best example for that

No, we don't. This is also about maintainability. And adding functions for everything and the kitchen sink is not maintainable. Also the main types for strings in Object Pascal are the various String types and not arrays of Char.

And open arrays of char is how you make thing maintainable.

Especially if there were an automatic conversion from string to char array.

For example, there is CompareText

Code: Pascal  [Select][+][-]
  1. function CompareText(const S1, S2: string): Integer; overload;
  2.  
  3. var
  4.   i, count, count1, count2: sizeint;
  5.   Chr1, Chr2: byte;
  6.   P1, P2: PChar;
  7. begin
  8.   Count1 := Length(S1);
  9.   Count2 := Length(S2);
  10.   if (Count1>Count2) then
  11.     Count := Count2
  12.   else
  13.     Count := Count1;
  14.   i := 0;
  15.   if count>0 then
  16.     begin
  17.       P1 := @S1[1];
  18.       P2 := @S2[1];
  19.       while i < Count do
  20.         begin
  21.           Chr1 := byte(p1^);
  22.           Chr2 := byte(p2^);
  23.           if Chr1 <> Chr2 then
  24.             begin
  25.               if Chr1 in [97..122] then
  26.                 dec(Chr1,32);
  27.               if Chr2 in [97..122] then
  28.                 dec(Chr2,32);
  29.               if Chr1 <> Chr2 then
  30.                 Break;
  31.             end;
  32.           Inc(P1); Inc(P2); Inc(I);
  33.         end;
  34.     end;
  35.   if i < Count then
  36.     result := Chr1-Chr2
  37.   else
  38.     // CAPSIZEINT is no-op if Sizeof(Sizeint)<=SizeOF(Integer)
  39.     result:=CAPSIZEINT(Count1-Count2);
  40. end;    
  41.  

ShortCompareText

Code: Pascal  [Select][+][-]
  1. function ShortCompareText(const S1, S2: shortstring): SizeInt;
  2. var
  3.   c1, c2: Byte;
  4.   i: SizeInt;
  5.   L1, L2, Count: SizeInt;
  6.   P1, P2: PChar;
  7. begin
  8.   L1 := Length(S1);
  9.   L2 := Length(S2);
  10.   if L1 > L2 then
  11.     Count := L2
  12.   else
  13.     Count := L1;
  14.   i := 0;
  15.   P1 := @S1[1];
  16.   P2 := @S2[1];
  17.   while i < count do
  18.   begin
  19.     c1 := byte(p1^);
  20.     c2 := byte(p2^);
  21.     if c1 <> c2 then
  22.     begin
  23.       if c1 in [97..122] then
  24.         Dec(c1, 32);
  25.       if c2 in [97..122] then
  26.         Dec(c2, 32);
  27.       if c1 <> c2 then
  28.         Break;
  29.     end;
  30.     Inc(P1); Inc(P2); Inc(I);
  31.   end;
  32.   if i < count then
  33.     ShortCompareText := c1 - c2
  34.   else
  35.     ShortCompareText := L1 - L2;
  36. end;        
  37.  

and ShortCompareText again:

Code: Pascal  [Select][+][-]
  1. {$define FPC_HAS_COMPARETEXT_SHORTSTR}
  2. function ShortCompareText(const S1, S2: shortstring): SizeInt;
  3. var
  4.   c1, c2: Byte;
  5.   i: Integer;
  6.   L1, L2, Count: SizeInt;
  7.   P1, P2: PChar;
  8. begin
  9.   L1 := Length(S1);
  10.   L2 := Length(S2);
  11.   if L1 > L2 then
  12.     Count := L2
  13.   else
  14.     Count := L1;
  15.   i := 0;
  16.   P1 := @ShortstringClass(@S1).fdata[0];
  17.   P2 := @ShortstringClass(@S2).fdata[0];
  18.   c1 := 0;
  19.   c2 := 0;
  20.   while i < count do
  21.   begin
  22.     c1 := byte(p1[i]);
  23.     c2 := byte(p2[i]);
  24.     if c1 <> c2 then
  25.     begin
  26.       if c1 in [97..122] then
  27.         Dec(c1, 32);
  28.       if c2 in [97..122] then
  29.         Dec(c2, 32);
  30.       if c1 <> c2 then
  31.         Break;
  32.     end;
  33.     Inc(I);
  34.   end;
  35.   if i < count then
  36.     ShortCompareText := c1 - c2
  37.   else
  38.     ShortCompareText := L1 - L2;
  39. end;            
  40.  

If you merge these two functions (or three functions if jvm even has open arrays) to a single function with open char arrays, you only have to maintain half (a third) as many functions.
 


The Val function is especially horrible. 15 Val functions on 64-bit and a lot more:

Code: Pascal  [Select][+][-]
  1. {$ifndef FPUNONE}
  2. Function fpc_Val_Real_ShortStr(const s : shortstring; out code : ValSInt): ValReal; compilerproc;
  3. {$endif}
  4. Function fpc_Val_SInt_ShortStr(DestSize: SizeInt; Const S: ShortString; out Code: ValSInt): ValSInt; compilerproc;
  5. Function fpc_Val_UInt_Shortstr(Const S: ShortString; out Code: ValSInt): ValUInt; compilerproc;
  6. function fpc_val_enum_shortstr(str2ordindex:pointer;const s:shortstring;out code:valsint):longint; compilerproc;
  7. Function fpc_Val_Currency_ShortStr(const s : shortstring; out Code : ValSInt): currency; compilerproc;
  8. {$ifdef FPC_HAS_FEATURE_ANSISTRINGS}
  9. {$ifndef FPUNONE}
  10. Function fpc_Val_Real_AnsiStr(Const S : RawByteString; out Code : ValSInt): ValReal; compilerproc;
  11. {$endif}
  12. Function fpc_Val_UInt_AnsiStr (Const S : RawByteString; out Code : ValSInt): ValUInt; compilerproc;
  13. Function fpc_Val_SInt_AnsiStr (DestSize: SizeInt; Const S : RawByteString; out Code : ValSInt): ValSInt; compilerproc;
  14. Function fpc_Val_Currency_AnsiStr(Const S : RawByteString; out Code : ValSInt): Currency; compilerproc;
  15. function fpc_Val_enum_ansistr(str2ordindex:pointer;const s:RawByteString;out code:valsint):longint; compilerproc;
  16. {$endif FPC_HAS_FEATURE_ANSISTRINGS}
  17.  
  18. {$ifdef FPC_HAS_FEATURE_WIDESTRINGS}
  19.   {$ifndef FPC_WIDESTRING_EQUAL_UNICODESTRING}
  20.   {$ifndef FPUNONE}
  21.   Function fpc_Val_Real_WideStr(Const S : WideString; out Code : ValSInt): ValReal; compilerproc;
  22.   {$endif}
  23.   Function fpc_Val_SInt_WideStr (DestSize: SizeInt; Const S : WideString; out Code : ValSInt): ValSInt; compilerproc;
  24.   Function fpc_Val_UInt_WideStr (Const S : WideString; out Code : ValSInt): ValUInt; compilerproc;
  25.   function fpc_val_Enum_WideStr (str2ordindex:pointer;const s:WideString;out code:valsint):longint;compilerproc;
  26.   Function fpc_Val_Currency_WideStr(Const S : WideString; out Code : ValSInt): Currency; compilerproc;
  27.   {$endif ndef FPC_WIDESTRING_EQUAL_UNICODESTRING}
  28.   {$ifndef FPUNONE}
  29.   Function fpc_Val_Real_UnicodeStr(Const S : UnicodeString; out Code : ValSInt): ValReal; compilerproc;
  30.   {$endif}
  31.   Function fpc_Val_SInt_UnicodeStr (DestSize: SizeInt; Const S : UnicodeString; out Code : ValSInt): ValSInt; compilerproc;
  32.   Function fpc_Val_UInt_UnicodeStr (Const S : UnicodeString; out Code : ValSInt): ValUInt; compilerproc;
  33.   function fpc_val_Enum_UnicodeStr(str2ordindex:pointer;const s:UnicodeString;out code:valsint):longint;compilerproc;
  34.   Function fpc_Val_Currency_UnicodeStr(Const S : UnicodeString; out Code : ValSInt): Currency; compilerproc;
  35. {$endif FPC_HAS_FEATURE_WIDESTRINGS}
  36.  
  37. {$ifndef CPU64}
  38. Function fpc_val_int64_shortstr(Const S: ShortString; out Code: ValSInt): Int64; compilerproc;
  39. Function fpc_val_qword_shortstr(Const S: ShortString; out Code: ValSInt): QWord; compilerproc;
  40. {$ifdef FPC_HAS_FEATURE_ANSISTRINGS}
  41. Function fpc_Val_qword_AnsiStr (Const S : RawByteString; out Code : ValSInt): qword;compilerproc;
  42. Function fpc_Val_int64_AnsiStr (Const S : RawByteString; out Code : ValSInt): Int64; compilerproc;
  43. {$endif FPC_HAS_FEATURE_ANSISTRINGS}
  44.  
  45. {$ifdef FPC_HAS_FEATURE_WIDESTRINGS}
  46. {$ifndef FPC_WIDESTRING_EQUAL_UNICODESTRING}
  47. Function fpc_Val_qword_WideStr (Const S : WideString; out Code : ValSInt): qword; compilerproc;
  48. Function fpc_Val_int64_WideStr (Const S : WideString; out Code : ValSInt): Int64; compilerproc;
  49. {$endif ndef FPC_WIDESTRING_EQUAL_UNICODESTRING}
  50. Function fpc_Val_qword_UnicodeStr (Const S : UnicodeString; out Code : ValSInt): qword; compilerproc;
  51. Function fpc_Val_int64_UnicodeStr (Const S : UnicodeString; out Code : ValSInt): Int64; compilerproc;
  52. {$endif FPC_HAS_FEATURE_WIDESTRINGS}
  53.  
  54. {$endif CPU64}
  55.  
  56. {$if defined(CPU16) or defined(CPU8)}
  57. Function fpc_val_longint_shortstr(Const S: ShortString; out Code: ValSInt): LongInt; compilerproc;
  58. Function fpc_val_longword_shortstr(Const S: ShortString; out Code: ValSInt): LongWord; compilerproc;
  59. {$ifdef FPC_HAS_FEATURE_ANSISTRINGS}
  60. Function fpc_Val_longword_AnsiStr (Const S : RawByteString; out Code : ValSInt): LongWord;compilerproc;
  61. Function fpc_Val_longint_AnsiStr (Const S : RawByteString; out Code : ValSInt): LongInt; compilerproc;
  62. {$endif FPC_HAS_FEATURE_ANSISTRINGS}
  63.  
  64. {$ifdef FPC_HAS_FEATURE_WIDESTRINGS}
  65. {$ifndef FPC_WIDESTRING_EQUAL_UNICODESTRING}
  66. Function fpc_Val_longword_WideStr (Const S : WideString; out Code : ValSInt): LongWord; compilerproc;
  67. Function fpc_Val_longint_WideStr (Const S : WideString; out Code : ValSInt): LongInt; compilerproc;
  68. {$endif ndef FPC_WIDESTRING_EQUAL_UNICODESTRING}
  69. Function fpc_Val_longword_UnicodeStr (Const S : UnicodeString; out Code : ValSInt): LongWord; compilerproc;
  70. Function fpc_Val_longint_UnicodeStr (Const S : UnicodeString; out Code : ValSInt): LongInt; compilerproc;
  71. {$endif FPC_HAS_FEATURE_WIDESTRINGS}
  72.  
  73. Function fpc_val_smallint_shortstr(Const S: ShortString; out Code: ValSInt): SmallInt; compilerproc;
  74. Function fpc_val_word_shortstr(Const S: ShortString; out Code: ValSInt): Word; compilerproc;
  75. {$ifdef FPC_HAS_FEATURE_ANSISTRINGS}
  76. Function fpc_Val_word_AnsiStr (Const S : RawByteString; out Code : ValSInt): Word;compilerproc;
  77. Function fpc_Val_smallint_AnsiStr (Const S : RawByteString; out Code : ValSInt): SmallInt; compilerproc;
  78. {$endif FPC_HAS_FEATURE_ANSISTRINGS}
  79.  
  80. {$ifdef FPC_HAS_FEATURE_WIDESTRINGS}
  81. {$ifndef FPC_WIDESTRING_EQUAL_UNICODESTRING}
  82. Function fpc_Val_word_WideStr (Const S : WideString; out Code : ValSInt): Word; compilerproc;
  83. Function fpc_Val_smallint_WideStr (Const S : WideString; out Code : ValSInt): SmallInt; compilerproc;
  84. {$endif ndef FPC_WIDESTRING_EQUAL_UNICODESTRING}
  85. Function fpc_Val_word_UnicodeStr (Const S : UnicodeString; out Code : ValSInt): Word; compilerproc;
  86. Function fpc_Val_smallint_UnicodeStr (Const S : UnicodeString; out Code : ValSInt): SmallInt; compilerproc;
  87. {$endif FPC_HAS_FEATURE_WIDESTRINGS}
  88. {$endif CPU16 or CPU8}
  89.  


That are 10 useless functions which could be completely removed by open array of char and implicit string conversion.

And what would such a Val do?

It should do what Val always does and convert the string to a number type


And be much faster because it would not convert it to a shortstring first


And it should actually work, because there are numbers that are too long to be converted to a shortstring. Right now Val is  broken for too many floating point numbers.


For example 1.000000000000000083440445735494114201498374684297748090773074248463985239407051498604593735845779837892931801205519006540143046983522050187320219603051931177873942083183212403692841435861457337958101109706529749626122129770312276153129073731249624318317970050884468946605920791625976563 E-100.


It is too long to be converted to a shortstring.

If you truncate it after 256 characters, and convert that to double, Val returns 1e-100. Which is just wrong, because the number is not 1e-100. It is closer to double(1.0000000000000001E-100) = 1.000000000000000146888991668385344783348988580061336761363545490991374779558559529591511183078266686203117527600126162281757823427294446352372003164142598747391601025087706864660753929253834010024571626412624920649942809289830052437699004074943449904555592411270481534302234649658203125 E-100





Pascal is not the right choice.
Use Forth.
There you can optimize everything.
On speed. On size. On everything.


I use Pascal, because it used to be especially fast
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 30, 2021, 08:00:55 pm
Hi!

From your university mind games back to reality:

A Double has a 15..16 significant decimal digits precision.

Whatever you put into a string[255]  - a double will not take all the information.

And that is because we all work with some kind of a PC.
And not with some super-duper-number-crunchers

Winni

Title: Re: To optimize RTL, make function PCharToInt
Post by: BeniBela on July 30, 2021, 08:27:20 pm

A Double has a 15..16 significant decimal digits precision.


That is for printing a double

Parsing a string to a double needs to consider all digits: https://www.exploringbinary.com/decimal-to-floating-point-needs-arbitrary-precision/
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 30, 2021, 08:43:47 pm

A Double has a 15..16 significant decimal digits precision.


That is for printing a double

Parsing a string to a double needs to consider all digits: https://www.exploringbinary.com/decimal-to-floating-point-needs-arbitrary-precision/

Please read the links that you advise.

The Discussion goes: How many more BITS beyond bit 54 are needed, before you can decide if 54 is zero or one.

And how many bits you can put in a string255 with digits?????
Think about it.
Or compute. If you can.

Winni

Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 30, 2021, 09:36:44 pm
Hi!

200 digits in, only 15 digits precision out.
As we all knew before and the "wise fpc book" said.

Simple test:
Code: Pascal  [Select][+][-]
  1. const tenDigits = '9876543210';
  2. var s : string;
  3.     d: double;
  4.     i,err: Integer;
  5. begin
  6.    s := '';
  7.    for i := 1 to 20 do s := s +tenDigits;
  8.    val (s,d,err);
  9.    writeln  (d);
  10.   end;  
                   


This results in

Code: Text  [Select][+][-]
  1. 9.87654321098765E199
  2.  

needless to say

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: Bart on July 30, 2021, 10:49:08 pm
No, have a look at

https://en.wikipedia.org/wiki/Forth_(programming_language) (https://en.wikipedia.org/wiki/Forth_(programming_language))

I knew that.
It was an attempt at humor.
Obviously I failed.

Bart
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 30, 2021, 11:21:00 pm


I knew that.
It was an attempt at humor.
Obviously I failed.

Bart

Sorry. So it was me who failed.
Humor is so seldom in this forum (the same with women) so I don't exspect it.
And obvious overlooked it.

But some fun facts concerning Forth:

Forth was not only part of the Space Shuttle but it was the OS and programming language for the pinball machines. Since the time they had a chip and not only relais (click clack).

Winni (aka Pinball Wizard)
Title: Re: To optimize RTL, make function PCharToInt
Post by: 440bx on July 31, 2021, 12:21:57 am
Humor is so seldom in this forum (the same with women) so I don't exspect it.
Maybe it would "optimize" the forums to have a dating section... <chuckle>
Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 31, 2021, 01:18:52 am
Carbon dating?

Talking about Forth and  humor:
Quote
Forth is so-named, because in 1968 "the file holding the interpreter was labeled FOURTH, for 4th (next) generation software, but the IBM 1130 operating system restricted file names to five characters."
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 31, 2021, 10:26:26 am

Talking about Forth and  humor:
Quote
Forth is so-named, because in 1968 "the file holding the interpreter was labeled FOURTH, for 4th (next) generation software, but the IBM 1130 operating system restricted file names to five characters."

Hi!

That were the times when RAM and disk space was rare and expensive.

Another Example of those restrictions:
UCSD Pascal uses only the first 15 chars of a variable name.
If you made a typo beyond string[15] it never occured as error.
Until ....
Some years later you ported your program to Turbo.
There the first 63 chars of a variable were significant.

And another byte fiddeling use:
UCSD Pascal aligned a string to word borders (16 bit times)
So  it was a waste to declare a string[20] because it used 21 bytes and one byte was wasted.
So you always declared an odd string length that nothing was unused!

For the "young blood":
A  5 1/4 inch  floopy disk had (in the beginning) a capacity of 360 kB and an 8 inch floppy  1.1 MB

Glad that we dont have to fight this battle anymore.

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: ccrause on July 31, 2021, 10:39:00 am
Carbon dating?
Because we use an ancient "dead" tongue?  :P
Title: Re: To optimize RTL, make function PCharToInt
Post by: ccrause on July 31, 2021, 10:45:22 am
For the "young blood":
A  5 1/4 inch  floopy disk had (in the beginning) a capacity of 360 kB and an 8 inch floppy  1.1 MB

Glad that we dont have to fight this battle anymore.

Winni
I remember the dread when hearing the disk reader head move back to reread a sector. An ominous sign that the disk may be failing.
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 31, 2021, 10:55:21 am
[
I remember the dread when hearing the disk reader head move back to reread a sector. An ominous sign that the disk may be failing.
Hi!


Remedy:

Code: Pascal  [Select][+][-]
  1. {$i-}
  2. type  Tblock = array[0..511] of byte;
  3. f: file of TBlock;
  4. block: TBlock;
  5. ...
  6.  
  7. repeat
  8. seek (f, neededBlock);
  9. blockread (f, block, 512);
  10. until ioresult = 0;
  11.  

Leave the PC and cook coffee ....

Winni


Title: Re: To optimize RTL, make function PCharToInt
Post by: BeniBela on July 31, 2021, 01:01:13 pm

The Discussion goes: How many more BITS beyond bit 54 are needed, before you can decide if 54 is zero or one.


And you need arbitrary many


And how many bits you can put in a string255 with digits?????



Around 700

But that is not enough


Title: Re: To optimize RTL, make function PCharToInt
Post by: engkin on July 31, 2021, 06:28:48 pm
For the "young blood":
A  5 1/4 inch  floopy disk had (in the beginning) a capacity of 360 kB and an 8 inch floppy  1.1 MB

Glad that we dont have to fight this battle anymore.

Me too. I still remember how I felt when someone got close to my set of floppy disks with a magnet. My first copy of DOS was in German, don't remember its version.



Carbon dating?
Because we use an ancient "dead" tongue?  :P

Not dead yet, but definitely dying.



I remember the dread when hearing the disk reader head move back to reread a sector. An ominous sign that the disk may be failing.

How delighted I was when inspecting a non-working disk reveals a hair or some fingerprint mark. Gentle cleaning used to bring its data back.
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 31, 2021, 07:01:30 pm
My first copy of DOS was in German, don't remember its version.


Time for the very old spanish joke:

DOS dos dos

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: jamie on July 31, 2021, 07:21:40 pm

No, we don't. This is also about maintainability. And adding functions for everything and the kitchen sink is not maintainable. Also the main types for strings in Object Pascal are the various String types and not arrays of Char.

Thanx to PascalDragon!

And for those who are eager to optimize the whole day and night:

Pascal is not the right choice.
Use Forth.
There you can optimize everything.
On speed. On size. On everything.

Winni

I was part of a project using Forth years ago for a video inspection system to reject bottle items from the packing line if the labels were not on the bottle or misplaced. The two, bottle and label were very close in colors.. Kind of hard to do when all you have is a B&W camera, but a Hi-Res one at that.
 
 On top of that, the CPU box which had special camera interface cards was composed completely of a Z80 cpu set.

 So every CPU cycle counted to obtain the fasted decoded image to be good or bad and then run some IO to operate a machine that would advance the conveyer line and push the defected bottle off the line, too.
Title: Re: To optimize RTL, make function PCharToInt
Post by: lucamar on July 31, 2021, 07:27:20 pm
For the "young blood":
A  5 1/4 inch  floopy disk had (in the beginning) a capacity of 360 kB and an 8 inch floppy  1.1 MB
You're too young; I do remember when floppies could be anywhere from 80 to 120 Kb ... both soft- and hard-sectored, 8" and the then "latest" 5,25" ... not to mention the 3" ones. And ultra-expensive hard-disks with up to .... 5 MiB ;D

Ah! Those were the days ... 8)
Title: Re: To optimize RTL, make function PCharToInt
Post by: winni on July 31, 2021, 08:04:01 pm
You're too young; I do remember when floppies could be anywhere from 80 to 120 Kb ... both soft- and hard-sectored, 8" and the then "latest" 5,25" ... not to mention the 3" ones. And ultra-expensive hard-disks with up to .... 5 MiB ;D

Ah! Those were the days ... 8)

No - I am not too young: I started with an Apple ][ where everything was different. As always with Apple.
I just came late to the IBM PC. Where the single sided and the double sided floppies worked.

We found out that it was a great cheating with the floppies:
You bought the cheaper single sided and had only to tape the hole opposite to the write protection hole with black tape.
Zack - you had a double sided disk.

My first hard disk was horrible expensive but 49 times faster than the floppy.
Measured again and again.

And in the night  we travelled with an  acoustic coupler  and 300 Baud through the world.
From 3 to 6 in the night. Only in that time german telephone was cheap.

Winni
Title: Re: To optimize RTL, make function PCharToInt
Post by: PascalDragon on August 01, 2021, 02:16:05 pm
Everyone, please, this topic is about RTL optimizations, not computer history!

No. Those would need to be allocated on the heap (especially the return value) and then you can just as well use a dynamic array.

But the point is to avoid heap allocations and return a reference to the already existing array

But you can't. What if the input array is marked as const? What if the function does not take an array as input at all? The caller can't know how much space to reserve and the function's stack will be removed once it returns, so the only place to return an array is the heap. And as I said: then you can just as well use a dynamic array.
 
The Trim function would be the best example for that

It's not, cause I might want to use the input for something else. And the input of Trim isn't marked as const for no reason.

And open arrays of char is how you make thing maintainable.

No, it's not. Also AnsiString involves code page conversions, while ShortString or UnicodeString does not.

For example, there is CompareText

The implementations of CompareText and ShortCompareText could indeed be unified by using stub functions and then calling a common implementation, but can not be generalized for all kinds of string functions, especially not those that modify strings or generate them. (And the Java implementation is kept separate on purpose to avoid even more ifdefs in the core includes).

The Val function is especially horrible. 15 Val functions on 64-bit and a lot more:

The Val intrinsic needs to be able to deal with all these combinations. If it ain't broken, don't fix it.

And what would such a Val do?

It should do what Val always does and convert the string to a number type

That was not clear from your description.
Title: Re: To optimize RTL, make function PCharToInt
Post by: BeniBela on August 12, 2021, 01:57:22 pm
But you can't. What if the input array is marked as const? What if the function does not take an array as input at all? The caller can't know how much space to reserve and the function's stack will be removed once it returns, so the only place to return an array is the heap. And as I said: then you can just as well use a dynamic array.

But I do not want to create an array

I want a (startPointer, length) pair in registers, which I can use like an array

It's not, cause I might want to use the input for something else. And the input of Trim isn't marked as const for no reason.

That needs some kind of burrow checker, so the input is not used while the trimmed result is used.

No, it's not. Also AnsiString involves code page conversions, while ShortString or UnicodeString does not.

Then you have a fast pchar function that does not do code page conversions, and a slow path function that makes the conversion and then calls the pchar variant


The Val intrinsic needs to be able to deal with all these combinations. If it ain't broken, don't fix it.

The performance is broken

I have now benchmarked it

Compared with the simplest possible pchar:

Code: Pascal  [Select][+][-]
  1. function simplestTryPcharToInt(pstart, pend: pchar; out unsigned: UInt64): boolean;
  2. var
  3.   length: SizeUInt;
  4. begin
  5.   result := false;
  6.   if pend <= pstart then exit;
  7.   while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  8.   length := pend - pstart;
  9.   if length > 20 then exit;
  10.   if (length = 20) and (pstart^ >= '2') then exit;
  11.   unsigned := 0;
  12.   while (pstart < pend) do begin
  13.     case pstart^ of
  14.     '0'..'9': unsigned := unsigned * 10 + (ord(pstart^) - ord('0'));
  15.     else exit;
  16.     end;
  17.     inc(pstart);
  18.   end;
  19.   if (length = 20) and (unsigned < 10000000000000000000) then exit;
  20.   result := true;
  21. end;

on ('123', '456', '9223372036854775807', '9223372036854775808', '18446744073709551615') 10000000 times.


Code: [Select]
val (copy ansistring)     run-time: 6379 ms
val (ansistring)          run-time: 5098 ms
val (shortstring)         run-time: 4981 ms
simplestPchar             run-time: 1327 ms
simplestPchar   (-O4)     run-time:  914 ms
simplestPchar (-O4 -Cort) run-time: 2042 ms

Val is already so bad, the copy hardly matters. Val should be removed


But that was a simple pchar. You can convert an entire block of digits in one go with some bit twiddling:

Code: Pascal  [Select][+][-]
  1. function PcharToInt(pstart, pend: pchar; out unsignedResult: UInt64): boolean;
  2. const
  3.   selectFirstHalfByte8 = UInt64($F0F0F0F0F0F0F0F0);
  4.   decimalZeros8        = UInt64($3030303030303030);
  5.   overflowMaxDigit8    = UInt64($0606060606060606);
  6.   selectFirstHalfByte4 = UInt32($F0F0F0F0);
  7.   decimalZeros4        = UInt32($30303030);
  8.   overflowMaxDigit4    = UInt32($06060606);
  9. var
  10.   length: SizeUInt;
  11.   temp8: UInt64;
  12.   temp4: UInt32;
  13.   bytes: pbyte;
  14.   unsigned: UInt64;
  15. begin
  16.   result := false;
  17.   if pend <= pstart then exit;
  18.   while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  19.   length := pend - pstart;
  20.   if length > 20 then exit;
  21.   if (length = 20) and (pstart^ >= '2') then exit;
  22.   unsigned := 0;
  23.   if PtrUInt(TObject(pstart)) and 7 = 0 then begin
  24.     while pstart + 8 < pend do begin
  25.       temp8 := PUInt64(pstart)^;
  26.       if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;
  27.       temp8 := temp8 - decimalZeros8;
  28.       if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;
  29.       bytes := @temp8;
  30.       unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];
  31.       inc(pstart, 8);
  32.     end;
  33.     while pstart + 4 < pend do begin
  34.       temp4 := PUInt32(pstart)^;
  35.       if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;
  36.       temp4 := temp4 - decimalZeros4;
  37.       if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;
  38.       bytes := @temp4;
  39.       unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);
  40.       inc(pstart, 4);
  41.     end;
  42.   end;
  43.   while (pstart < pend) do begin
  44.     case pstart^ of
  45.     '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));
  46.     else exit;
  47.     end;
  48.     inc(pstart);
  49.   end;
  50.   if (length = 20) and (unsigned < 10000000000000000000) then exit;
  51.   result := true;
  52.   unsignedResult:=unsigned;
  53. end;
  54.  
  55.  

(btw how do you do an unaligned read?)

That gives

Code: [Select]
run-time: 679 ms

That is almost 10 times faster than copying Val!!
Title: Re: To optimize RTL, make function PCharToInt
Post by: MathMan on August 12, 2021, 03:09:42 pm
Quote
(btw how do you do an unaligned read?)

Usually via memcpy, as one can not guarantee misaligned fetches are supported by the destination CPU platform.

[Edit]
Thinking a bit more about this. C compilers usually inline the above stated memcpy approach, but I don't know how FPC is handling this. So from a performance perspective it might not be an optimal choice - but it is a general solution.

An alternative would be to do two aligned QWORD reads and then re-combine. But depending on OS, CPU and memory management one might generate a SIGSEGV with one of the two reads.
[/Edit]

BTW - the conversion of 8 digits can be done by 3 muls, 3 adds and some shifts+ands - if you are really speed crazy :-)

Cheers,
MathMan
Title: Re: To optimize RTL, make function PCharToInt
Post by: BeniBela on August 12, 2021, 06:16:05 pm
Thinking a bit more about this. C compilers usually inline the above stated memcpy approach, but I don't know how FPC is handling this. So from a performance perspective it might not be an optimal choice - but it is a general solution.

FPC does not inline Move. That is the problem

There should be something like ignore it on x86 and do the unaligned move, and do a bytewise copy on other platforms


BTW - the conversion of 8 digits can be done by 3 muls, 3 adds and some shifts+ands - if you are really speed crazy :-)

How?

Something like you multiply by 10, 100,  and 10000, and rearrange the digits that they participate at the right number of multiplications?

I am too lazy to figure that out right now
Title: Re: To optimize RTL, make function PCharToInt
Post by: MathMan on August 13, 2021, 11:26:54 am
BTW - the conversion of 8 digits can be done by 3 muls, 3 adds and some shifts+ands - if you are really speed crazy :-)

How?

Something like you multiply by 10, 100,  and 10000, and rearrange the digits that they participate at the right number of multiplications?

Yes, something like that  :) It should look like the following (untested, direct code)

Code: [Select]
var
  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entry

  Dec( Lower, $3030303030303030 ); // convert to numbers 0-9

  Higher := ( Lower * 10 ) shr 8; // no overflow possible
  Lower := ( Lower and $00ff00ff00ff00ff ) + ( Higher and $00ff00ff00ff00ff );
 
  Higher := ( Lower * 100 ) shr 16;
  Lower := ( Lower and $0000ffff0000ffff ) + ( Higher and $0000ffff0000ffff );
 
  Higher := ( Lower * 10000 ) shr 32;
  Lower := ( Lower and $00000000ffffffff ) + Higher;

Point is - if you read in the ASCII digits from memory, then you have to differentiate between little & big endian CPU in the above!

However - if you really want to dive into implementing a faster 'Val' then maybe you want to take a look at this - https://lemire.me/blog/2021/01/29/number-parsing-at-a-gigabyte-per-second/

Cheers,
MathMan
Title: Re: To optimize RTL, make function PCharToInt
Post by: BeniBela on August 14, 2021, 01:46:41 pm

Code: [Select]
var
  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entry

  Dec( Lower, $3030303030303030 ); // convert to numbers 0-9

  Higher := ( Lower * 10 ) shr 8; // no overflow possible
  Lower := ( Lower and $00ff00ff00ff00ff ) + ( Higher and $00ff00ff00ff00ff );
 
  Higher := ( Lower * 100 ) shr 16;
  Lower := ( Lower and $0000ffff0000ffff ) + ( Higher and $0000ffff0000ffff );
 
  Higher := ( Lower * 10000 ) shr 32;
  Lower := ( Lower and $00000000ffffffff ) + Higher;

I  put it in, but it did not help.

Code: [Select]
old run-time: 606 ms
with 3 mul (wrong): run-time: 624 ms
with 3 mul endian corrected: run-time: 667 ms



Code: [Select]
program Project1;

{$mode objfpc}{$H+}

uses
  {$IFDEF UNIX}{$IFDEF UseCThreads}
  cthreads,
  {$ENDIF}{$ENDIF}
  Classes, bbutils, bbutilsbeta,bbdebugtools
  { you can add units after this };

function strTryParseInt(pstart, pend: pchar; out unsigned: UInt64): boolean;
var
  length: SizeUInt;
begin
  result := false;
  if pend <= pstart then exit;
  while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  length := pend - pstart;
  if length > 20 then exit;
  unsigned := 0;
  while (pstart < pend) do begin
    case pstart^ of
    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));
    else exit;
    end;
    inc(pstart);
  end;
  if (length = 20) and (unsigned < 10000000000000000000) then exit;
  result := true;
end;
{$AsmMode intel}
function PcharToInt(pstart, pend: pchar; out unsignedResult: UInt64): boolean;
const
  selectFirstHalfByte8 = UInt64($F0F0F0F0F0F0F0F0);
  decimalZeros8        = UInt64($3030303030303030);
  overflowMaxDigit8    = UInt64($0606060606060606);
  selectFirstHalfByte4 = UInt32($F0F0F0F0);
  decimalZeros4        = UInt32($30303030);
  overflowMaxDigit4    = UInt32($06060606);
var
  length: SizeUInt;
  temp8: UInt64;
  temp4: UInt32;
  bytes: pbyte;
  unsigned: UInt64;
begin
  result := false;
  if pend <= pstart then exit;
  while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  length := pend - pstart;
  if length > 20 then exit;
  if (length = 20) and (pstart^ >= '2') then exit;
  unsigned := 0;
  if PtrUInt(TObject(pstart)) and 7 = 0 then begin
    while pstart + 8 <= pend do begin
      temp8 := PUInt64(pstart)^;
      if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;
      temp8 := temp8 - decimalZeros8;
      if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;
      bytes := @temp8;
      unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];
      inc(pstart, 8);
    end;
    while pstart + 4 <= pend do begin
      temp4 := PUInt32(pstart)^;
      if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;
      temp4 := temp4 - decimalZeros4;
      if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;
      bytes := @temp4;
      unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);
      inc(pstart, 4);
    end;
  end;
  while (pstart < pend) do begin
    case pstart^ of
    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));
    else exit;
    end;
    inc(pstart);
  end;
  if (length = 20) and (unsigned < 10000000000000000000) then exit;
  result := true;
  unsignedResult:=unsigned;
end;



function PcharToIntMM(pstart, pend: pchar; out unsignedResult: UInt64): boolean;
const
  selectFirstHalfByte8 = UInt64($F0F0F0F0F0F0F0F0);
  decimalZeros8        = UInt64($3030303030303030);
  overflowMaxDigit8    = UInt64($0606060606060606);
  selectFirstHalfByte4 = UInt32($F0F0F0F0);
  decimalZeros4        = UInt32($30303030);
  overflowMaxDigit4    = UInt32($06060606);
var
  length: SizeUInt;
  temp8: UInt64;
  temp4: UInt32;
  bytes: pbyte;
  unsigned: UInt64;
var
  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entry
begin
  result := false;
  if pend <= pstart then exit;
  while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  length := pend - pstart;
  if length > 20 then exit;
  if (length = 20) and (pstart^ >= '2') then exit;
  unsigned := 0;
  if PtrUInt(TObject(pstart)) and 7 = 0 then begin
    while pstart + 8 <= pend do begin
      temp8 := PUInt64(pstart)^;
      if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;
      temp8 := temp8 - decimalZeros8;
      if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;
//      bytes := @temp8;
//      unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];
       unsigned := unsigned * 100000000;



      lower := SwapEndian(temp8);

      Higher := ( Lower * 10 ) shr 8; // no overflow possible
      Lower := ( Lower and $00ff00ff00ff00ff ) + ( Higher and $00ff00ff00ff00ff );

      Higher := ( Lower * 100 ) shr 16;
      Lower := ( Lower and $0000ffff0000ffff ) + ( Higher and $0000ffff0000ffff );

      Higher := ( Lower * 10000 ) shr 32;
      unsigned += ( Lower and $00000000ffffffff ) + Higher;
      inc(pstart, 8);
    end;
    while pstart + 4 <= pend do begin
      temp4 := PUInt32(pstart)^;
      if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;
      temp4 := temp4 - decimalZeros4;
      if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;
      bytes := @temp4;
      unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);
      inc(pstart, 4);
    end;
  end;
  while (pstart < pend) do begin
    case pstart^ of
    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));
    else exit;
    end;
    inc(pstart);
  end;
  if (length = 20) and (unsigned < 10000000000000000000) then exit;
  result := true;
  unsignedResult:=unsigned;
end;


function PcharToIntMMWrong(pstart, pend: pchar; out unsignedResult: UInt64): boolean;
const
  selectFirstHalfByte8 = UInt64($F0F0F0F0F0F0F0F0);
  decimalZeros8        = UInt64($3030303030303030);
  overflowMaxDigit8    = UInt64($0606060606060606);
  selectFirstHalfByte4 = UInt32($F0F0F0F0);
  decimalZeros4        = UInt32($30303030);
  overflowMaxDigit4    = UInt32($06060606);
var
  length: SizeUInt;
  temp8: UInt64;
  temp4: UInt32;
  bytes: pbyte;
  unsigned: UInt64;
var
  Higher, Lower: QWORD; // Lower contains 8 ASCII digits on entry
begin
  result := false;
  if pend <= pstart then exit;
  while (pstart < pend) and (pstart^ = '0') do inc(pstart);
  length := pend - pstart;
  if length > 20 then exit;
  if (length = 20) and (pstart^ >= '2') then exit;
  unsigned := 0;
  if PtrUInt(TObject(pstart)) and 7 = 0 then begin
    while pstart + 8 <= pend do begin
      temp8 := PUInt64(pstart)^;
      if (temp8 and selectFirstHalfByte8) <> decimalZeros8 then exit;
      temp8 := temp8 - decimalZeros8;
      if ((temp8 + overflowMaxDigit8) and selectFirstHalfByte8) <> 0 then exit;
//      bytes := @temp8;
//      unsigned := unsigned * 100000000 + (((((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3])* 10 + bytes[4])* 10 + bytes[5])* 10 + bytes[6])* 10 + bytes[7];
       unsigned := unsigned * 100000000;

       lower := temp8;

      Higher := ( Lower * 10 ) shr 8; // no overflow possible
      Lower := ( Lower and $00ff00ff00ff00ff ) + ( Higher and $00ff00ff00ff00ff );

      Higher := ( Lower * 100 ) shr 16;
      Lower := ( Lower and $0000ffff0000ffff ) + ( Higher and $0000ffff0000ffff );

      Higher := ( Lower * 10000 ) shr 32;
      unsigned += ( Lower and $00000000ffffffff ) + Higher;
      inc(pstart, 8);
    end;
    while pstart + 4 <= pend do begin
      temp4 := PUInt32(pstart)^;
      if (temp4 and selectFirstHalfByte4) <> decimalZeros4 then exit;
      temp4 := temp4 - decimalZeros4;
      if ((temp4 + overflowMaxDigit4) and selectFirstHalfByte4) <> 0 then exit;
      bytes := @temp4;
      unsigned := unsigned * 10000 + ((((bytes[0] * 10) + bytes[1])* 10 + bytes[2])* 10 + bytes[3]);
      inc(pstart, 4);
    end;
  end;
  while (pstart < pend) do begin
    case pstart^ of
    '0'..'9': unsigned := unsigned * 10 + UInt64(ord(pstart^) - ord('0'));
    else exit;
    end;
    inc(pstart);
  end;
  if (length = 20) and (unsigned < 10000000000000000000) then exit;
  result := true;
  unsignedResult:=unsigned;
end;

var s: array of string = ('123', '456', '9223372036854775807', '9223372036854775808', '18446744073709551615');//, '18446744073709551616');
  ss: array of shortstring = ('123', '456', '9223372036854775807', '9223372036854775808', '18446744073709551615');

var i, j, code: integer;
    v: uint64;
    t: String;
begin
  i := 2;// high(s);
  t := '12345678';
  writeln(strTryParseInt(pchar(t), pchar(t) + length(t), v));
  writeln(v);
  writeln(PcharToIntMM(pchar(t), pchar(t) + length(t), v));
  writeln(v);
  //exit;
{  startTiming('copy');
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      val(copy(s[i],1,length(s[i])), v, code)
  end;
  stopTiming('copy');
  startTiming('val');
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      val(s[i], v, code)
  end;
  stopTiming('val');
  startTiming('valss');
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      val(ss[i], v, code)
  end;
  stopTiming('valss');   }
  startTiming('new');
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      PcharToInt(pchar(s[i]), pchar(s[i]) + length(s[i]), v)
  end;
  stopTiming('new');
  startTiming('new');
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      PcharToIntMMWrong(pchar(s[i]), pchar(s[i]) + length(s[i]), v)
  end;
  stopTiming('new');
  startTiming();
  for i := 0 to high(s) do begin
    for j := 1 to 10000000 do
      PcharToIntMM(pchar(s[i]), pchar(s[i]) + length(s[i]), v)
  end;
  stopTiming();
end.


Although I had a bug that it only used the 8-digit fast path when there where at least 9 digits. The condition in the loop should have been while pstart + 8 <= pend do begin rather than while pstart + 8 < pend do begin. Changing that made it somewhat faster. Or sometimes slower. It is very fiddly to benchmark. The branch predicator appears to be very random


Point is - if you read in the ASCII digits from memory, then you have to differentiate between little & big endian CPU in the above!

And the code got exactly the wrong endianess. Everything comes out in reverse.

I had to put in a SwapEndian but that call slow  it down too much. FPC not inlining such functions is another big RTL problem.

Inline assembler bswap is even worse because it apparently prevents optimizations.


However - if you really want to dive into implementing a faster 'Val' then maybe you want to take a look at this - https://lemire.me/blog/2021/01/29/number-parsing-at-a-gigabyte-per-second/

That is for float. Integer should always be faster than float.

I have already convinced Bero to implement that.  But I am not sure he got everything right.




A bigger issue is how to name this function. Could be PcharToUInt, PcharToUInt64, TryPcharToUInt, TryPcharToUInt64, or PcharToUIntTry, or PcharToUInt64Try. Try at the beginning is more common, but Try at the end is easier to find in the Lazarus completion list. But there is not other PcharTo* function in the RTL.
Could be TextToUInt, TextToUInt64, TryTextToUInt, TryTextToUInt64, TextToUIntTry, or TextToUInt64Try, because there is sysutils.TextToFloat(). But that is an odd one, a Try function without Try in its name and the only one of its kind.
Could be StrToUInt, StrToUInt64, TryStrToUInt, TryStrToUInt64, or StrToUIntTry, TryStrToUInt64Try, because the overload resolving can find it from its type, so the type does not need to be in the name. Nor the 64.
But the classical one also handles other bases. Perhaps DecimalStrToUInt, DecimalStrToUInt64, TryDecimalStrToUInt, TryDecimalStrToUInt64,  DecimalStrToUIntTry, DecimalStrToUInt64Try.
StrDecimalToUInt, StrDecimalToUInt64, TryStrDecimalToUInt, TryStrDecimalToUInt64, or StrDecimalToUIntTry, StrDecimalToUInt64Try
Could also drop the string type, TryParse, or TryParseDecimal or TryParseDecimalNumber


Title: Re: To optimize RTL, make function PCharToInt
Post by: AlexTP on August 14, 2021, 01:53:52 pm
Quote
>PcharToUInt, PcharToUInt64, TryPcharToUInt, TryPcharToUInt64, or PcharToUIntTry, or PcharToUInt64Try
I think better name them with 'Buffer...' instead of 'PChar' (PChar sounds not nice). For Widestring buffer, name them with 'BufferW...'.

'Decimal' word is nice too.
Title: Re: To optimize RTL, make function PCharToInt
Post by: MathMan on August 14, 2021, 02:24:20 pm

[snip]

Although I had a bug that it only used the 8-digit fast path when there where at least 9 digits. The condition in the loop should have been while pstart + 8 <= pend do begin rather than while pstart + 8 < pend do begin. Changing that made it somewhat faster. Or sometimes slower. It is very fiddly to benchmark. The branch predicator appears to be very random

Point is - if you read in the ASCII digits from memory, then you have to differentiate between little & big endian CPU in the above!

And the code got exactly the wrong endianess. Everything comes out in reverse.

I had to put in a SwapEndian but that call slow  it down too much. FPC not inlining such functions is another big RTL problem.

Inline assembler bswap is even worse because it apparently prevents optimizations.


However - if you really want to dive into implementing a faster 'Val' then maybe you want to take a look at this - https://lemire.me/blog/2021/01/29/number-parsing-at-a-gigabyte-per-second/

That is for float. Integer should always be faster than float.

I have already convinced Bero to implement that.  But I am not sure he got everything right.

Thanks for trying - you gave me some food for thought already. Things I can already state are

* you don't need the SwapEndian - better turn around the muls and shifts
* yes, the paper from Lemire is about float to binary, but if you read careful then Uint to binary is a sub-component of that - iirc
* I also think that the benchmark can be improved - for my taste it has to little variance in sizes of the input strings

Give me a couple of days to play with some ideas.

I'll leave the naming thing unanswered - I am absurdly bad at naming.

Cheers,
MathMan
Title: Re: To optimize RTL, make function PCharToInt
Post by: 440bx on August 14, 2021, 10:19:36 pm
I am absurdly bad at naming.
In that case, I hope your wife named the kids. ;)
Title: Re: To optimize RTL, make function PCharToInt
Post by: AlexTP on August 14, 2021, 10:27:36 pm
Names:
Now we have

  StrToInt
  StrToInt64
  StrToInt64Def
  StrToIntDef
  TryStrToInt
  TryStrToInt64

I suggest the word "Buffer" for 1-byte strings, and BufferW for widechar strings.

  BufferToInt
  BufferToInt64
  BufferToInt64Def
  BufferToIntDef
  TryBufferToInt
  TryBufferToInt64
Title: Re: To optimize RTL, make function PCharToInt
Post by: MathMan on August 15, 2021, 10:53:12 am
I am absurdly bad at naming.
In that case, I hope your wife named the kids. ;)

She did - though she didn't have a chance to do at the same time  ;)
TinyPortal © 2005-2018