Recent

Author Topic: strlen() Function - Alternative  (Read 2614 times)

domibay_hugo

  • New Member
  • *
  • Posts: 37
  • Site Reliabilty / DevOps Engineer at Domibay S.L.
    • GitHub Profile
strlen() Function - Alternative
« on: June 05, 2020, 02:00:37 pm »
For a Pascal String in case that the reserved Memory is bigger than the actually written content with the SetLength() Procedure I found that it is useful to use the #0 Character to limit the written Sequence and the strlen() to find the actually usable string.

I found this especially when working with the TStringStream

Now on big strings this can result in some performance loss.

But thinking of this Documentation:
https://www.man7.org/linux/man-pages/man3/memchr.3.html

I was wondering if the IndexByte() Function could be useful to write a faster strlen() Function just like in this Example:
Quote
The following call is a fast means of locating a string's terminating null byte:

           char *p = rawmemchr(s, '\0');

So I developed this little Function:
Code: Pascal  [Select][+][-]
  1. {$IFOPT D+}
  2.   {$NOTE debug mode is active. disabling inline}
  3.   {$DEFINE debug_on}
  4. {$ENDIF}
  5.  
  6. function PCharLength(const ssource: PChar): Cardinal; {$IFNDEF debug_on} inline; {$ENDIF}
  7. var
  8.   pssrc: PChar;
  9.   imtchps: Integer;
  10.   isrhrng: Cardinal;
  11. begin
  12.   Result := 0;
  13.  
  14.   pssrc := ssource;
  15.   isrhrng := 1024;
  16.  
  17.   repeat  //until imtchps <> -1;
  18.     imtchps := IndexByte(pssrc^, isrhrng, 0);
  19.  
  20.     if imtchps = -1 then
  21.     begin
  22.       inc(Result, isrhrng);
  23.       inc(pssrc, isrhrng);
  24.  
  25.       isrhrng := isrhrng * 2;
  26.     end;  //if imtchps = -1 then
  27.   until imtchps <> -1;
  28.  
  29.   inc(Result, imtchps);
  30. end;
  31.  

I made a little benchmark to see if there is any measurable improvement:

-------
Big Source 1 (length: '33044') - strlen (count: '1000') - Start - Now in millisecs since midnight : 44566167
Big Source 1 - strlen - End - Now in millisecs since midnight : 44566191
Big Source 1: strlen completed in '23.9996938034892' ms.
Big Source 1 (length: '33044') - PCharLength (count: '1000') - Start - Now in millisecs since midnight : 44566191
Big Source 1 - PCharLength - End - Now in millisecs since midnight : 44566212
Big Source 1: PCharLength completed in '21.0004393011332' ms.
-------
Big Source 1 (length: '66088') - strlen (count: '1000') - Start - Now in millisecs since midnight : 44566212
Big Source 1 - strlen - End - Now in millisecs since midnight : 44566250
Big Source 1: strlen completed in '37.9995675757527' ms.
Big Source 1 (length: '66088') - PCharLength (count: '1000') - Start - Now in millisecs since midnight : 44566250
Big Source 1 - PCharLength - End - Now in millisecs since midnight : 44566283
Big Source 1: PCharLength completed in '32.9999718815088' ms.


In strings > 32KB you can notice an advantage of 2 - 3 ms on 1000 Operations
and on strings > 64 KB you can observe an advantage of 3 - 5 ms on 1000 Operations

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12645
  • FPC developer.
Re: strlen() Function - Alternative
« Reply #1 on: June 05, 2020, 02:12:05 pm »
Yes, but of course the smart thing is to record the number of bytes written, so that you don't have to do this.

That's 0ms / 1000.

That larger ones are slower might simply be a cache effect.

domibay_hugo

  • New Member
  • *
  • Posts: 37
  • Site Reliabilty / DevOps Engineer at Domibay S.L.
    • GitHub Profile
Re: strlen() Function - Alternative
« Reply #2 on: June 19, 2020, 12:56:12 pm »
Yes, but of course the smart thing is to record the number of bytes written, so that you don't have to do this.

Yes, when using the TStringStream you have actually the TStringStream.Position Property that internally is implemented as a Byte Counter on each TStringStream.Write()  Method Call.

So you can build a PChar as a Pointer to it's Data like this:
Code: Pascal  [Select][+][-]
  1. stream := TStringStream.Create('');
  2.  
  3. stream.WriteString('My unchangeable Source String. Is it really unchangeable?');
  4.  
  5. //Terminate the String for PChar
  6. stream.WriteByte(0);
  7.  
  8. psdata := @stream.DataString[1];
  9. //psdata := PChar(stream.DataString);    produces the same effect
  10. idatalength := stream.Position - 1;
  11.  
  12. stream.Free;
  13. stream := Nil;
  14.  

Even more FreePascal supports Pointer Aritmethics as shown in the Wike at:
https://wiki.freepascal.org/UTF8_strings_and_characters#Iterating_over_string_analysing_individual_codepoints
So given a PChar EndP and StartP you can calculate the Length of the PChar String like this:
Code: Pascal  [Select][+][-]
  1. idatalength := EndP - StartP;
  2.  

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: strlen() Function - Alternative
« Reply #3 on: June 19, 2020, 02:39:25 pm »
What Marco means is that in the case of Pascal string types, the length is always stored and known. No need for the cumbersome strlen which is for C like PChars.
Length() should be used on Pascal strings: it simply reads the length from a negative offset.
It also can't overflow (in Pascal strings #0 as content is allowed in any place, so also in the middle, which makes strlen useless)
Note you refer explicitly to Pascal string types. Which are always O(1) for reading length...as opposed to C which is O(n)....
« Last Edit: June 19, 2020, 07:29:48 pm by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

 

TinyPortal © 2005-2018