Recent

Author Topic: SIMD libraries Fasta v1.0  (Read 1252 times)

LemonParty

  • Sr. Member
  • ****
  • Posts: 393
SIMD libraries Fasta v1.0
« on: June 25, 2025, 03:17:01 pm »
Hello. I am glad to announce a SIMD libraries Fasta. As you can see from name the main purpose of this libraries is to make basic functions to run faster and this is achieved by using hard assembler and SIMD optimizations. The main optimizations are SSE2 based (that’s mean most processors will work with this library). I decided to sell each unit independently (so you don’t have to pay for functionality if you don’t need it).
    Acceleration realized for such architectures:
  • Windows x86
  • Windows x64
  • Linux x86
  • Linux x64
  • AArch64 (partly).
    In places where acceleration is not realized regular (loop unrolled) code used, so it is possible to use this library on various architectures without changing anything.



    The first unit is uFastaStrAnsi.pas. This unit is designed to process ANSI strings. In this unit present such functions:
1.AnyOf (Return amount of sequential chars in a row that belong to aSet);
2.DecLength (How many characters decimal representation will take);
3.DecLengthInt64 (See DecLength);
4.IndexOf (Return index (zero based) of C in set, -1 if not found);
5.IntToStr (Convert integer to string);
6.Lower (Make Count chars of given string lowercase);
7.MaskRange16 (Return mask of characters that are in given range, proceed 16 characters of given string);
8.MaskRange16F (Return mask of characters that are >Low value and <=High value of given range, proceed 16 characters of given string);
9.MaskSet (Return bitmask of chars that are belong to a given set. Count must be in range 1..32);
10.Pos (Various overloads of this function search for one (C), two (CC) or four (CCCC) chars in string);
11.PosAny2 (Search for any character from pair CC in string);
12.PosAny3 (Search for any character from set CCC in string);
13.PosAny4 (Search for any character from set CCCC in string);
14.PosU (Unlimited versions of Pos function. Check no zero chars, no range);
15.PrepareRange (Fill range accordingly to given aLo and aHi);
16.PrepareSet (Take Count chars from string (or a whole string) and fill given set with it);
17.Quantity (Count the number of entries of C in zero terminated string);
18.Size (Return an actual size of given set. Or return an amount of same chars when input parameter is a string);
19.StrToInt (Convert string to integer);
20.StrToInt64 (Convert string to Int64);
21.StrToQWord (Convert string to QWord);
22.Upper (Make Count chars of given string uppercase).
Buy uFastaStrAnsi.pas https://sellcodes.com/KeCSkA6m



    The second unit is uFastaStrUnicode.pas. This unit is designed to process UTF-16 strings. In this unit present such functions:
1. AnyOf (Return amount of sequential chars in a row that belong to aSet);
2. DecLength (Return number of decimal characters given integer will take, include sign);
3. DecLengthInt64 (See DecLength);
4. IndexOf (Return index (zero based) of C in set, -1 if not found);
5. IntToStr (Convert integer to string);
6. Lower (Make Count chars of given string lowercase);
7. MaskRange8 (Return mask of characters that are in given range, 2 bits for each character, proceed 8 characters of given string);
8. MaskRange8F (Return mask of characters that are >Low value and <=High value of given range, 2 bits for each character. Proceed 8 characters of given string);
9. MaskSet (Return bitmask of chars that are belong to a given set. Count must be in range 1..32);
10. Pos (Various overloads of this function search for one (C), two (CC) or four (CCCC) chars in string);
11. PosAny2 (Search for any character from pair CC in string);
12. PosAny3 (Search for any character from set CCC in string);
13. PosU (Unlimited versions of Pos function. Check no zero chars, no range);
14. PrepareRange (Fill range accordingly to given aLo and aHi);
15. PrepareSet (Take Count chars from string (or a whole string) and fill given set with it);
16. Size (Return an actual size of given set. Or return an amount of same chars when input parameter is a string);
17. StrToInt (Convert string to LongInt);
18. StrToInt64 (Convert string to Int64);
19. StrToQWord (Convert string to QWord);
20. Upper (Make Count chars of given string uppercase).
Buy uFastaStrUnicode.pas. https://sellcodes.com/JGAJTTiS

There are more units in this package, they will be released later.
Lazarus v. 4.99. FPC v. 3.3.1. Windows 11

Thaddy

  • Hero Member
  • *****
  • Posts: 18728
  • To Europe: simply sell USA bonds: dollar collapses
Re: SIMD libraries Fasta v1.0
« Reply #1 on: June 25, 2025, 05:10:36 pm »
Is this related to the intrinsics effort that is being developed for trunk?
I will have a look either way.
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 362
  • I use FPC [main] 💪🐯💪
Re: SIMD libraries Fasta v1.0
« Reply #2 on: June 25, 2025, 05:29:49 pm »
Benchmarks and comparisons?!  :D

Nobody wants a pig in a poke  ;D
« Last Edit: June 25, 2025, 05:35:49 pm by ALLIGATOR »
I may seem rude - please don't take it personally

LemonParty

  • Sr. Member
  • ****
  • Posts: 393
Re: SIMD libraries Fasta v1.0
« Reply #3 on: June 26, 2025, 01:56:06 pm »
Benchmarks and comparisons?!  :D

Nobody wants a pig in a poke  ;D
I have made a comparison of my code with this code on large JSON file:
Code: Pascal  [Select][+][-]
  1. function PosSimple(constref aStr: AnsiChar; C: AnsiChar; Count: SizeUInt): SizeUInt;
  2. var
  3.   eStr: array [1..High(Byte)] of AnsiChar absolute aStr;
  4.   Top: SizeUInt;
  5. begin
  6.   Result:= 0;
  7.   for Count:= Low(eStr) to Count do
  8.     if eStr[Count] = C then
  9.       Exit(Count);
  10. end;
Gain is 4x-4.5x.
Lazarus v. 4.99. FPC v. 3.3.1. Windows 11

LemonParty

  • Sr. Member
  • ****
  • Posts: 393
Re: SIMD libraries Fasta v1.0
« Reply #4 on: June 26, 2025, 02:08:51 pm »
Is this related to the intrinsics effort that is being developed for trunk?
I will have a look either way.
Yes, intrinsics and assembler code in my library achieve the same result.
Lazarus v. 4.99. FPC v. 3.3.1. Windows 11

 

TinyPortal © 2005-2018