bench-marking methodology:
- I used 5 files with size of about 30MB, each one but only type of character (1,2,3,4) unit code character expect the fifth which has mixed characters.
- using engkin HardwareTicks function to get the ticks; ( thank you engkin)
- First, I load the file with TMemoryStream then copy it to TStringStream.
- Looping 7 times (I ignore the first two) and calling each UTF8Len** passing (TStringStream.DataString) after that calculate the mean of each function ticks.
sorry for the blabla

...... maybe the code will provide clearer description.
ms := TMemoryStream.Create;
try
ms.LoadFromFile(p5);
//ss := TStringStream.Create('$¢€𐍈سé');
ss := TStringStream.Create('');
try
ss.CopyFrom(ms,0);
hL0 := 0; hL1 := 0; hL2 := 0; hL3 := 0; hL4 := 0; hL5 := 0; hL6 := 0;
for i:= 0 to 6 do
begin
ht0 := HardwareTicks; L0 := UTF8LengthFast_Deamon(ss.DataString);
ht1 := HardwareTicks; L1 := UTF8LengthFast(ss.DataString);
ht2 := HardwareTicks; L2 := UTF8LengthN(ss.DataString);
ht3 := HardwareTicks; L3 := mebUTF8Length(ss.DataString);
ht4 := HardwareTicks; L4 := mUTF8Length(ss.DataString);
ht5 := HardwareTicks; L5 := myUTF8Len1(ss.DataString);
ht6 := HardwareTicks; L6 := myUTF8Len2(ss.DataString);
ht7 := HardwareTicks;
if i > 1 then
begin
hL0 := ht1 - ht0;
hL1 := ht2 - ht1;
hL2 := ht3 - ht2;
hL3 := ht4 - ht3;
hL4 := ht5 - ht4;
hL5 := ht6 - ht5;
hL6 := ht7 - ht6;
end;
end;
hL0 := hL0 div 5;
hL1 := hL1 div 5;
hL2 := hL2 div 5;
hL3 := hL3 div 5;
hL4 := hL4 div 5;
hL5 := hL5 div 5;
hL6 := hL6 div 5;
Memo1.Lines.Add('UTF8LengthFast_Deamon: ' + IntToStr(L0) + ' : ' + IntToStr(hL0) + ' ticks' );
Memo1.Lines.Add('UTF8LengthFast : ' + IntToStr(L1) + ' : ' + IntToStr(hL1) + ' ticks' );
Memo1.Lines.Add('UTF8LengthN : ' + IntToStr(L2) + ' : ' + IntToStr(hL2) + ' ticks' );
Memo1.Lines.Add('mebUTF8Length(inline): ' + IntToStr(L3) + ' : ' + IntToStr(hL3) + ' ticks' );
Memo1.Lines.Add('mUTF8Length : ' + IntToStr(L4) + ' : ' + IntToStr(hL4) + ' ticks' );
Memo1.Lines.Add('myUTF8Len1 : ' + IntToStr(L5) + ' : ' + IntToStr(hL5) + ' ticks' );
Memo1.Lines.Add('myUTF8Len2 : ' + IntToStr(L6) + ' : ' + IntToStr(hL6) + ' ticks' );
finally
ss.Free;
end;
finally
ms.Free;
end;
myUTF8Length uses a very simple way to return char length (is this correct for valid utf-8 char)
function myCharLen(p: PChar): integer;
begin
Result := 1;
case Ord(p^) and %11110000 of
%11110000 : Result := 4;
%11100000 : Result := 3;
%11010000 : Result := 2;
%11000000 : Result := 2;
end;
end;
As you can see UTF8LengthFast(the ported one) is doing very well for 1uc and 2uc and almost the same as my function for 3uc and loses for 4uc (given the clearity of my function I'd say it is doing well).

The UTF8CharacterLengthFast is not that fast and loses to the old function UTF8CharacterLength (inlined) for 1uc and mixed chars;

The new UTF8Length is worst that the old one with UTF8CharacterLength (inlined); I think reverting to old code is better here;
