Recent

Author Topic: Alternative set of string-to-int conversion routines  (Read 22480 times)

avk

  • Hero Member
  • *****
  • Posts: 588
    • my self-education project
Alternative set of string-to-int conversion routines
« on: January 19, 2022, 06:45:12 am »
Hi!
For a long time I was going to try to create a string-to-integer conversion function that would be noticeably faster than the built-in Val().
There seems to be something like this at the moment. Anyway, a quick and dirty benchmark against Val() from the current development version of FPC on my machine looks like this:
Code: Text  [Select][+][-]
  1. Int32:
  2. Val(), score:        2853
  3. TryChars2Int, score: 1137
  4. Int64:
  5. Val(), score:        3045
  6. TryChars2Int, score: 1121
  7.  
Parsing is carried out basically according to the same rules as in the built-in Val() (well, as far as I understood them). IMO there is only one significant difference from Val(): since the lone leading zero is the prefix of an octal string, decimal strings cannot start with a zero.
The set also includes functions that accept an array of char and a PChar.
Feedback and comments are highly appreciated, if any.
« Last Edit: January 19, 2022, 10:54:39 am by avk »

Thaddy

  • Hero Member
  • *****
  • Posts: 11547
Re: Alternative set of string-to-int conversion routines
« Reply #1 on: January 19, 2022, 08:06:46 am »
Nice. One remark: there is a lot of duplicate code and that can be avoided.
There are two options I can see:
1. You could write generic functions, implemented and declared in the implementation section. That will (should) not affect speed since generics are templates.
2. Use the technique sometimes used in the rtl, where a header include uses the same implementation include per routine wherein the type is changed in the header, so the implementation include knows the correct type. (predates generics) Also should not affect speed.

But compliments. I find the same speed gain, percent wise.
Путин преступник. Россияне дезинформированы.

zamtmn

  • Sr. Member
  • ****
  • Posts: 481
Re: Alternative set of string-to-int conversion routines
« Reply #2 on: January 19, 2022, 08:15:50 am »
Please add
Code: Pascal  [Select][+][-]
  1. function TryStr2xxx(const s: string; const aIndex, aCount: SizeInt; out aValue: xxx): Boolean; inline;

avk

  • Hero Member
  • *****
  • Posts: 588
    • my self-education project
Re: Alternative set of string-to-int conversion routines
« Reply #3 on: January 19, 2022, 08:35:43 am »
@Thaddy, thank you.
You're right, a lot of code is duplicated and I don't have an option yet to avoid this without sacrificing performance. On the other hand, this set is still a proof of concept.

@zamtmn, doesn't this code
Code: Pascal  [Select][+][-]
  1.   if TryChars2Int(s[5..15], MyInt) then
  2.     ...
  3.  
cover your case?
« Last Edit: January 19, 2022, 08:37:46 am by avk »

zamtmn

  • Sr. Member
  • ****
  • Posts: 481
Re: Alternative set of string-to-int conversion routines
« Reply #4 on: January 19, 2022, 08:50:10 am »
Code: Pascal  [Select][+][-]
  1. TryStrToInt(Copy(s,5,10),MyInt)
also covers my case, but the point is to make it fast, without unnecessary allocations and copies
« Last Edit: January 19, 2022, 08:56:32 am by zamtmn »

avk

  • Hero Member
  • *****
  • Posts: 588
    • my self-education project
Re: Alternative set of string-to-int conversion routines
« Reply #5 on: January 19, 2022, 09:27:07 am »
I just compiled this example:
Code: Pascal  [Select][+][-]
  1. program project1;
  2. {$MODE OBJFPC}{$H+}
  3. uses
  4.   SysUtils, Str2IntAlter;
  5.  
  6. procedure Test(const s: string);
  7. var
  8.   I: Integer;
  9. begin
  10.   if TryChars2Int(s[22..Length(s)], I) then
  11.     WriteLn('I = ', I)
  12.   else
  13.     WriteLn('Ooooppss!');
  14. end;
  15.  
  16. var
  17.   s: string = '';
  18. begin
  19.   Randomize;
  20.   s := 'it is random integer:  ' + Random(MaxInt).ToString;
  21.   Test(s);
  22.   ReadLn;
  23. end.
  24.  

To call TryChars2Int(s[22..Length(s)], I) FPC generated the following code:
32 bits
Code: ASM  [Select][+][-]
  1. # [10] if TryChars2Int(s[22..Length(s)], I) then
  2.         movl    %eax,%edx
  3.         testl   %eax,%eax
  4.         je      .Lj5
  5.         movl    -4(%edx),%edx
  6. .Lj5:
  7.         subl    $22,%edx
  8.         movl    %esp,%ecx
  9.         addl    $21,%eax
  10.         call    STR2INTALTER_$$_TRYCHARS2INT$array_of_CHAR$LONGINT$$BOOLEAN
  11.         testb   %al,%al
  12.         je      .Lj7
  13. # [11] WriteLn('I = ', I)
  14.  
64 bits
Code: ASM  [Select][+][-]
  1. # [10] if TryChars2Int(s[22..Length(s)], I) then
  2.         movq    %rcx,%rdx
  3. # Peephole Optimization: %rdx = %rcx; changed to minimise pipeline stall (MovXXX2MovXXX)
  4.         testq   %rcx,%rcx
  5.         je      .Lj5
  6.         movq    -8(%rdx),%rdx
  7. .Lj5:
  8.         subq    $22,%rdx
  9.         leaq    32(%rsp),%r8
  10.         leaq    21(%rax),%rcx
  11.         call    STR2INTALTER_$$_TRYCHARS2INT$array_of_CHAR$LONGINT$$BOOLEAN
  12.         testb   %al,%al
  13.         je      .Lj7
  14. # [11] WriteLn('I = ', I)
  15.  

There doesn't seem to be any unnecessary copying going on.

zamtmn

  • Sr. Member
  • ****
  • Posts: 481
Re: Alternative set of string-to-int conversion routines
« Reply #6 on: January 19, 2022, 09:54:08 am »
Oh yes, I completely forgot that
Code: Pascal  [Select][+][-]
  1. TryChars2Int(const a: array of char; ...
this is not equivalent to
Code: Pascal  [Select][+][-]
  1. type
  2.   TTest=array of char;
  3. procedure TryChars2Int(const a:TTest; ...
the request is closed :-[

Seenkao

  • Sr. Member
  • ****
  • Posts: 407
    • New ZenGL.
Re: Alternative set of string-to-int conversion routines
« Reply #7 on: January 19, 2022, 09:22:23 pm »
Благодарю за выложенный код!
Я буду знать, что я иду в правильном направлении!  :)

google translate:
Thanks for posting the code!
I will know that I am going in the right direction! :)
Rus: Стремлюсь к созданию минимальных и достаточно быстрых приложений.
Работаю над ZenGL.
Eng: I strive to create applications that are minimal and reasonably fast.
Working on ZenGL. :)

avk

  • Hero Member
  • *****
  • Posts: 588
    • my self-education project
Re: Alternative set of string-to-int conversion routines
« Reply #8 on: January 20, 2022, 02:25:55 pm »
As per Thaddy's remarks, tried to somehow reduce code duplication.
Generic functions need to be declared in the interface part of the unit, and since they are helpers, this is not very good.
Includes are also not suitable, because I want everything to stay in one unit.
As a result, as an experiment, I settled on macros. The code looks unreadable, of course, but it seems that it would not be much better with includes.

zamtmn

  • Sr. Member
  • ****
  • Posts: 481
Re: Alternative set of string-to-int conversion routines
« Reply #9 on: January 20, 2022, 03:09:18 pm »
It became very ugly. In my opinion, the best solution would still be genetics and two modules:
Str2IntAlter - for uses in main unit
Str2IntAlterInternal - for uses in Str2IntAlter, contains generic help functions

avk

  • Hero Member
  • *****
  • Posts: 588
    • my self-education project
Re: Alternative set of string-to-int conversion routines
« Reply #10 on: January 20, 2022, 03:36:03 pm »
Ok, I get it, I need to think about it some more. In any case, there is no problem to roll back to the previous version.
But I really want everything to be in one self-sufficient unit.

Thaddy

  • Hero Member
  • *****
  • Posts: 11547
Re: Alternative set of string-to-int conversion routines
« Reply #11 on: January 20, 2022, 04:38:47 pm »
In principle, using generics (as I suggested too) should not have a speed penalty at all.
Путин преступник. Россияне дезинформированы.

MarkMLl

  • Hero Member
  • *****
  • Posts: 4212
Re: Alternative set of string-to-int conversion routines
« Reply #12 on: January 20, 2022, 06:58:36 pm »
In principle, using generics (as I suggested too) should not have a speed penalty at all.

In principle. In practice I find that debug sessions on my code regularly take me through the generics unit, which suggests that it /does/ interpose code in various places even when not used explicitly.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

avk

  • Hero Member
  • *****
  • Posts: 588
    • my self-education project
Re: Alternative set of string-to-int conversion routines
« Reply #13 on: January 20, 2022, 07:20:06 pm »
Lyrical digression.
As I mentioned above, I had a desire to create a faster function for converting a string to an integer quite a long time ago, or more precisely, at the time of this epic story.
At that time, in order to defeat Windows file mapping in the company with NtDll, it came to some completely crazy decisions.
I hope now it can be done without much straining.

Seenkao

  • Sr. Member
  • ****
  • Posts: 407
    • New ZenGL.
Re: Alternative set of string-to-int conversion routines
« Reply #14 on: January 20, 2022, 10:43:05 pm »
Quote
Int32:
Val(), score:        3073
rejected:            0

TryChars2Int, score: 1486
rejected:            0

sc_StrToLongWord or sc_StrToInt, score: 1104
rejected:            0

Int64:
Val(), score:        3075
rejected:            0

TryChars2Int, score: 1627
rejected:            0

sc_StrToQWord or sc_StrToInt64, score: 1072
rejected:            0

Press any key to exit...
Я проверил свой код в вашей демке. Не воспринимайте это как "вызов". Ваш код людям нравится больше. И они считают, что вашим кодом можно "выдёргивать" цифры из текста. Возможно они правы.
Не всегда "вычурные" решения бывают самыми лучшими. При переводе строк, табличными значениями удобно пользоваться только в шестнадцатеричной системе. Ваш код помог мне решить несколько мелких проблем в моём коде. Хотя идеи далеко не новы... я просто забыл о них.
google translate:
I checked my code in your demo. Don't take it as a "challenge". People like your code more. And they think that your code can "pull out" numbers from the text. Perhaps they are right.
Not always "artsy" solutions are the best. When translating strings, it is convenient to use table values only in the hexadecimal system. Your code helped me solve a few small problems in my code. Although the ideas are far from new... I just forgot about them.
Rus: Стремлюсь к созданию минимальных и достаточно быстрых приложений.
Работаю над ZenGL.
Eng: I strive to create applications that are minimal and reasonably fast.
Working on ZenGL. :)

 

TinyPortal © 2005-2018