I did not touch the old UTF8Length. Mattias did in r52857.
sorry for the mix up
.
in r52857.
we are talking about the same code
Now it is optimized and inlined for the most common 1-byte case but is slightly slower for others.
No the old one still perform better when inlined. (see benchmark)
In practice the net result will be positive because online data always contains ASCII meta-data and markup tags even if text is non-European origin.
That's why I made the utf8 mixed file : 3/8 uc1 (ascii), 2/8 uc2, 2/8 uc3, 1/8 uc4
The whole function is way too big for inlining.
when I say inlining, I don't really mean the UTF8Length itself (doesn't make sense for me) but the inlining of UTF8CharacterLength.
In the same r52857 Mattias improved also the inlined UTF8CharacterLengthFast. Please take a look.
I'll already benchmark it under the name UTF8LengthFast(alternative)
Did you test with a 64-bit compiler?
No, I never intended to load 64bit of data even in 64bit environment, so I guess PtrUInt should be changed to integer and PPtrUInt to PInteger.
This makes no sense:
case b of
0..127 : inc(Result);
192..223 : inc(Result);
224..239 : inc(Result);
240..247 : inc(Result);
end;
Why not just:
if b in [0..127, 192..247] then inc(Result);
Well this is hilarious, I paused at this block many times and have feeling that something wrong but never gave it deeper though since wouldn't hurt the performance.