Recent

Author Topic: "UTF8CodepointSizeFast" — question about existing optimization  (Read 1639 times)

flowCRANE

  • Hero Member
  • *****
  • Posts: 937
"UTF8CodepointSizeFast" — question about existing optimization
« on: February 27, 2024, 05:57:10 pm »
The UTF8CodepointSizeFast is implemented in this way (comments are original):

Code: Pascal  [Select][+][-]
  1. function UTF8CodepointSizeFast(p: PChar): integer;
  2. begin
  3.   case p^ of
  4.     #0..#191   : Result := 1;
  5.     #192..#223 : Result := 2;
  6.     #224..#239 : Result := 3;
  7.     #240..#247 : Result := 4;
  8.     //#248..#255 : Result := 1;
  9.     // Theoretically UTF-8 supports length 1-7, but since 2003, RFC 3629 limits
  10.     // it to 1-4 bytes.
  11.     // This is an inline function, so keep the function short.
  12.     //#248..#251   : Result := 5;
  13.     //#252, #253   : Result := 6;
  14.     //#254         : Result := 7;
  15.  
  16.     else Result := 1; // An optimization + prevents compiler warning about uninitialized Result.
  17.   end;
  18. end;

Can somebody explain me please why the else statement is an optimization, as the comment says?
Lazarus 4.2 with FPC 3.2.2, Windows 11 — all 64-bit

Working solo on a top-down retro-style action/adventure game (pixel art), programming the engine from scratch, using Free Pascal and SDL3.

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: "UTF8CodepointSizeFast" — question about existing optimization
« Reply #1 on: February 27, 2024, 06:06:26 pm »
1. makes the case stay in order for most of the part. In order case generates efficient code.
2. unexpected result are treated as resolving to case 1, which means result is always valid
3. Prevents the Jonas' case without else warning in trunk (6060)
« Last Edit: February 27, 2024, 06:08:19 pm by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

domasz

  • Hero Member
  • *****
  • Posts: 616
Re: "UTF8CodepointSizeFast" — question about existing optimization
« Reply #2 on: February 27, 2024, 06:50:04 pm »
Is the above any better than the below?

Code: Pascal  [Select][+][-]
  1.     function UTF8CodepointSizeFast(p: PChar): integer;
  2.     begin
  3.   Result := 1;
  4.       case p^ of
  5.         #0..#191   : Result := 1;
  6.         #192..#223 : Result := 2;
  7.         #224..#239 : Result := 3;
  8.         #240..#247 : Result := 4;
  9.         //#248..#255 : Result := 1;
  10.         // Theoretically UTF-8 supports length 1-7, but since 2003, RFC 3629 limits
  11.         // it to 1-4 bytes.
  12.         // This is an inline function, so keep the function short.
  13.         //#248..#251   : Result := 5;
  14.         //#252, #253   : Result := 6;
  15.         //#254         : Result := 7;
  16.       end;
  17.     end;

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: "UTF8CodepointSizeFast" — question about existing optimization
« Reply #3 on: February 27, 2024, 06:59:58 pm »
No it fails my points 2 and 3. At least 2 needs to be taken into account. Since it has a less likelihood, solving it in the else will result in faster code in the common case. Unless you want exception frames and these are dead slow.
« Last Edit: February 27, 2024, 07:09:04 pm by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

domasz

  • Hero Member
  • *****
  • Posts: 616
Re: "UTF8CodepointSizeFast" — question about existing optimization
« Reply #4 on: February 27, 2024, 07:08:59 pm »
Thank you!

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: "UTF8CodepointSizeFast" — question about existing optimization
« Reply #5 on: February 27, 2024, 07:10:42 pm »
Note I believe the compiler can already optimize case loops in higher optimization settings: it is an "easy" one.
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

RayoGlauco

  • Full Member
  • ***
  • Posts: 223
  • Beers: 1567
Re: "UTF8CodepointSizeFast" — question about existing optimization
« Reply #6 on: February 27, 2024, 07:22:15 pm »
I think originally the code was:

Code: Pascal  [Select][+][-]
  1.     function UTF8CodepointSizeFast(p: PChar): integer;
  2.     begin
  3.       case p^ of
  4.         #0..#191   : Result := 1;
  5.         #192..#223 : Result := 2;
  6.         #224..#239 : Result := 3;
  7.         #240..#247 : Result := 4;
  8.         #248..#255 : Result := 1;
  9.       end;
  10.     end;
  11.  

Then, someone commented this
Code: Pascal  [Select][+][-]
  1. #248..#255 : Result := 1;
and added this
Code: Pascal  [Select][+][-]
  1. else Result := 1;
To err is human, but to really mess things up, you need a computer.

RayoGlauco

  • Full Member
  • ***
  • Posts: 223
  • Beers: 1567
Re: "UTF8CodepointSizeFast" — question about existing optimization
« Reply #7 on: February 27, 2024, 07:31:40 pm »
The final version seems faster, and avoids the compiler warning
To err is human, but to really mess things up, you need a computer.

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: "UTF8CodepointSizeFast" — question about existing optimization
« Reply #8 on: February 27, 2024, 08:23:09 pm »
Less comparisons at the cost of one jump.
Usually this is such a case that compilers can optimize better than humans, well humans who do not take into account extra complexity. It is a kind of pre-AI type AI, well it is AI: making compilers smarter than people. :o
« Last Edit: February 27, 2024, 08:27:05 pm by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

flowCRANE

  • Hero Member
  • *****
  • Posts: 937
Re: "UTF8CodepointSizeFast" — question about existing optimization
« Reply #9 on: February 28, 2024, 02:36:01 am »
I still don't know what type of optimization would be used. This code might as well look like this:

Code: Pascal  [Select][+][-]
  1. function UTF8CodepointSizeFast(p: PChar): integer;
  2. begin
  3.   case p^ of
  4.     #0   .. #191: Result := 1;
  5.     #192 .. #223: Result := 2;
  6.     #224 .. #239: Result := 3;
  7.     #240 .. #247: Result := 4;
  8.     #248 .. #255: Result := 1;
  9.   end;
  10. end;

The order is correct, the full range of byte values has been used. A redundant compiler message about a possible missing result does not bother (although such a case does not exist for single-byte Char). So what advantage does the else version give in practice (at the asm level and various optimizations)?

Code: Pascal  [Select][+][-]
  1. function UTF8CodepointSizeFast(p: PChar): integer;
  2. begin
  3.   case p^ of
  4.     #0   .. #191: Result := 1;
  5.     #192 .. #223: Result := 2;
  6.     #224 .. #239: Result := 3;
  7.     #240 .. #247: Result := 4;
  8.   else
  9.     Result := 1;
  10.   end;
  11. end;

Does it even matter?
Lazarus 4.2 with FPC 3.2.2, Windows 11 — all 64-bit

Working solo on a top-down retro-style action/adventure game (pixel art), programming the engine from scratch, using Free Pascal and SDL3.

TRon

  • Hero Member
  • *****
  • Posts: 4377
Re: "UTF8CodepointSizeFast" — question about existing optimization
« Reply #10 on: February 28, 2024, 04:03:52 am »
Does it even matter?
Not really as the true winner is an 256 byte array  :P
Today is tomorrow's yesterday.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4673
  • I like bugs.
Re: "UTF8CodepointSizeFast" — question about existing optimization
« Reply #11 on: February 28, 2024, 09:39:02 am »
I removed the inaccurate part of the comment about optimization in fee8dbf80e.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

 

TinyPortal © 2005-2018