Recent

Author Topic: Computed goto  (Read 27822 times)

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12202
  • FPC developer.
Re: Computed goto
« Reply #15 on: November 01, 2016, 09:33:44 pm »
An interpreter is where this matters, when every instruction moves through that case statement, even if it is only becomes 1% faster.

If you wanted it fast, you'd make a compiler. :) Such an interpreter would not be portable with the asm anyway.

BeniBela

  • Hero Member
  • *****
  • Posts: 927
    • homepage
Re: Computed goto
« Reply #16 on: November 02, 2016, 01:06:01 am »

If you wanted it fast, you'd make a compiler. :)

It does not have to be that fast. It is not supposed to be important. Too fast here, and a bottleneck will appear somewhere else

Such an interpreter would not be portable with the asm anyway.

That is why I was hoping for a computed goto without asm

Thaddy

  • Hero Member
  • *****
  • Posts: 16945
  • Ceterum censeo Trump esse delendam
Re: Computed goto
« Reply #17 on: November 02, 2016, 10:09:46 am »
Type helper for pointer? ;) Or a set of pointer operators ;)
Basically you can design a computed goto with a limited stack based design. As you would with an interpreter.
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

derek.john.evans

  • Guest
Re: Computed goto
« Reply #18 on: November 02, 2016, 12:28:22 pm »
This compiles (Win32 Lazarus 1.6). Unsure if it actually works correctly.

EDIT: left out a dispatch..

Code: Pascal  [Select][+][-]
  1. {$define DISPATCH := p:= dispatch_table[code[pc]]; Inc(pc); asm JMP p;end;}
  2.  
  3. function interp_cgoto(code: PByte; initval: integer): integer;
  4. label
  5.   do_halt, do_inc, do_dec, do_mul2, do_div2, do_add7, do_neg;
  6. var
  7.   pc, val: integer;
  8.   dispatch_table: array[0..6] of
  9.   pointer = (@do_halt, @do_inc, @do_dec, @do_mul2, @do_div2, @do_add7, @do_neg);
  10.   p: pointer;
  11. begin
  12.   pc := 0;
  13.   val := initval;
  14.   DISPATCH;  
  15.   while True do begin
  16.     do_halt: Exit(val);
  17.     do_inc: Inc(val);
  18.     DISPATCH;
  19.     do_dec: Dec(val);
  20.     DISPATCH;
  21.     do_mul2: val := val * 2;
  22.     DISPATCH;
  23.     do_div2: val := val div 2;
  24.     DISPATCH;
  25.     do_add7: Inc(val, 7);
  26.     DISPATCH;
  27.     do_neg: val := -val;
  28.     DISPATCH;
  29.   end;
  30. end;  
  31.  

Requires these directives:
Code: Pascal  [Select][+][-]
  1. {$GOTO ON}
  2. {$MACRO ON}
  3. {$ASMMODE intel}  
  4.  
« Last Edit: November 02, 2016, 12:36:00 pm by Geepster »

derek.john.evans

  • Guest
Re: Computed goto
« Reply #19 on: November 02, 2016, 12:38:34 pm »
The following code calculates: (((x + 1) * 2) + 1 + 1) / 2
Code: Pascal  [Select][+][-]
  1. code: array[0..5] of byte = (1, 3, 1, 1, 4, 0);  
  2.  

If x = 10, I get 12
Code: Pascal  [Select][+][-]
  1. ShowMessage(IntToStr(interp_cgoto(code, 10)));  
  2.  

I guess the question is, what is the speed advantage of this vs a case statement?

derek.john.evans

  • Guest
Re: Computed goto
« Reply #20 on: November 02, 2016, 01:33:05 pm »
I ran some quick tests: 1=Computed gotos / 2=case statement

-O1 1=4313 2=5281
-O2 1=4282 2=3513
-O3 1=4328 2=3516

To me, it looks like there is no point in using computed gotos with Pascal.

Code: Pascal  [Select][+][-]
  1. program SpeedTest;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. {$GOTO ON}
  6. {$MACRO ON}
  7. {$ASMMODE intel}
  8.  
  9. uses
  10.   LclIntf;
  11.  
  12. {$R *.res}
  13.  
  14. {$define DISPATCH := p:= dispatch_table[code^]; Inc(code); asm JMP p;end;}
  15.  
  16.   function interp_cgoto1(code: PByte; initval: integer): integer;
  17.   label
  18.     do_halt, do_inc, do_dec, do_mul2, do_div2, do_add7, do_neg;
  19.   var
  20.     dispatch_table: array[0..6] of
  21.     pointer = (@do_halt, @do_inc, @do_dec, @do_mul2, @do_div2, @do_add7, @do_neg);
  22.     p: pointer;
  23.   begin
  24.     Result := initval;
  25.     DISPATCH;
  26.     while True do begin
  27.       do_halt: Exit;
  28.       do_inc: Inc(Result);
  29.       DISPATCH;
  30.       do_dec: Dec(Result);
  31.       DISPATCH;
  32.       do_mul2: Result := Result * 2;
  33.       DISPATCH;
  34.       do_div2: Result := Result div 2;
  35.       DISPATCH;
  36.       do_add7: Inc(Result, 7);
  37.       DISPATCH;
  38.       do_neg: Result := -Result;
  39.       DISPATCH;
  40.     end;
  41.   end;
  42.  
  43.   function interp_cgoto2(code: PByte; initval: integer): integer;
  44.   begin
  45.     Result := initval;
  46.     while True do begin
  47.       case code^ of
  48.         0: begin
  49.           Exit;
  50.         end;
  51.         1: begin
  52.           Inc(Result);
  53.         end;
  54.         2: begin
  55.           Dec(Result);
  56.         end;
  57.         3: begin
  58.           Result := Result * 2;
  59.         end;
  60.         4: begin
  61.           Result := Result div 2;
  62.         end;
  63.         5: begin
  64.           Inc(Result, 7);
  65.         end;
  66.         6: begin
  67.           Result := -Result;
  68.         end;
  69.       end;
  70.       Inc(code);
  71.     end;
  72.   end;
  73.  
  74. const
  75.   CodeSize = 100000;
  76.   LoopCount = 10000;
  77.   InitVal = 0;
  78.  
  79. var
  80.   I, Sum: integer;
  81.   Tick: QWord;
  82.   Code: array of byte;
  83. begin
  84.   SetLength(Code, CodeSize);
  85.   for I := 0 to High(Code) do begin
  86.     Code[I] := 6 - (I mod 6);
  87.   end;
  88.   Code[High(Code)] := 0;
  89.   Tick := GetTickCount64;
  90.   Sum := 0;
  91.   for I := 1 to LoopCount do begin
  92.     Inc(Sum, interp_cgoto1(@Code[0], InitVal));
  93.   end;
  94.   WriteLn('Time: ', GetTickCount64 - Tick, ' Sum: ', Sum);
  95.   Tick := GetTickCount64;
  96.   Sum := 0;
  97.   for I := 1 to LoopCount do begin
  98.     Inc(Sum, interp_cgoto2(@Code[0], InitVal));
  99.   end;
  100.   WriteLn('Time: ', GetTickCount64 - Tick, ' Sum: ', Sum);
  101.   ReadLn;
  102. end.
  103.  

mischi

  • Full Member
  • ***
  • Posts: 178
Re: Computed goto
« Reply #21 on: November 02, 2016, 01:39:50 pm »
I guess the question is, what is the speed advantage of this vs a case statement?
No speed advantage. On the contrary. I need about 10 times longer to understand the code ;-)

Thaddy

  • Hero Member
  • *****
  • Posts: 16945
  • Ceterum censeo Trump esse delendam
Re: Computed goto
« Reply #22 on: November 02, 2016, 01:51:08 pm »
Basically what he did is the limited stack algorithm I suggested. That's valid. There are other options.
Since you will always introduce another level of indirection, there's always a speed penalty.

OTOH: look at the compiler code for case with strings ;)

This is a solvable issue,provided a case loop is optimized in sorted order and continue is allowed,.
That's nothing new in compiler construction.

But if then else comes very very close to that too. So it is notational candy anyway. And candy is bad for your teeth.
« Last Edit: November 02, 2016, 01:53:32 pm by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12202
  • FPC developer.
Re: Computed goto
« Reply #23 on: November 02, 2016, 02:06:33 pm »
Free Pascal has some jump table support, contrary to (afaik) Delphi.

According to the docs jumptables should kick in starting with -O1, but from the benchmarks I'd say -O2 :-)

It seems however that in this (Geepster's code) case the asm prohibits keeping values in registers, blowing up the code (-O4)

Code: Pascal  [Select][+][-]
  1. # [18] do_inc: Inc(val);
  2.         addw    $1,-20(%ebp)
  3.         movl    -4(%ebp),%edx
  4.         movswl  -16(%ebp),%eax
  5.         movzbl  (%edx,%eax,1),%eax
  6.         movl    -48(%ebp,%eax,4),%eax
  7.         movl    %eax,-52(%ebp)
  8.         addw    $1,-16(%ebp)
  9.         jmp     *-52(%ebp)
  10. .Lj7:
  11. # [20] do_dec: Dec(val);
  12.         subw    $1,-20(%ebp)
  13.         movl    -4(%ebp),%edx
  14.         movswl  -16(%ebp),%eax
  15.         movzbl  (%edx,%eax,1),%eax
  16.         movl    -48(%ebp,%eax,4),%eax
  17.         movl    %eax,-52(%ebp)
  18.         addw    $1,-16(%ebp)
  19.         jmp     *-52(%ebp)
  20. .Lj8:
  21. # [22] do_mul2: val := val * 2;
  22.         shlw    $1,-20(%ebp)
  23.         movl    -4(%ebp),%edx
  24.         movswl  -16(%ebp),%eax
  25.         movzbl  (%edx,%eax,1),%eax
  26.         movl    -48(%ebp,%eax,4),%eax
  27.         movl    %eax,-52(%ebp)
  28.         addw    $1,-16(%ebp)
  29.         jmp     *-52(%ebp)
« Last Edit: November 02, 2016, 02:11:29 pm by marcov »

BeniBela

  • Hero Member
  • *****
  • Posts: 927
    • homepage
Re: Computed goto
« Reply #24 on: November 02, 2016, 02:19:58 pm »

Code: Pascal  [Select][+][-]
  1. {$define DISPATCH := p:= dispatch_table[code[pc]]; Inc(pc); asm JMP p;end;}
  2.  
  3. function interp_cgoto(code: PByte; initval: integer): integer;
  4. label
  5.   do_halt, do_inc, do_dec, do_mul2, do_div2, do_add7, do_neg;
  6. var
  7.   pc, val: integer;
  8.   dispatch_table: array[0..6] of
  9.   pointer = (@do_halt, @do_inc, @do_dec, @do_mul2, @do_div2, @do_add7, @do_neg);
  10.   p: pointer;
  11. begin
  12.   pc := 0;
  13.   val := initval;
  14.   DISPATCH;  
  15.   while True do begin
  16.     do_halt: Exit(val);
  17.     do_inc: Inc(val);
  18.     DISPATCH;
  19.     do_dec: Dec(val);
  20.     DISPATCH;
  21.     do_mul2: val := val * 2;
  22.     DISPATCH;
  23.     do_div2: val := val div 2;
  24.     DISPATCH;
  25.     do_add7: Inc(val, 7);
  26.     DISPATCH;
  27.     do_neg: val := -val;
  28.     DISPATCH;
  29.   end;
  30. end;  
  31.  


That is awesome


-O1 1=4313 2=5281
-O2 1=4282 2=3513
-O3 1=4328 2=3516


It depends on the interpreted program.

Try Code [ I ] := 6 - (i mod 2) ;  then the case is slower, because it is turned into a list of if statements and it has to check all conditions before finding the right one



OTOH: look at the compiler code for case with strings ;)
 

That is horrible.

Slow enough to show up in my benchmarks, even if it is only used during the parsing phase

Free Pascal has some jump table support, contrary to (afaik) Delphi.

According to the docs jumptables should kick in starting with -O1, but from the benchmarks I'd say -O2 :-)

Oh, it does?

I thought it does not have them, because it does not create jumptables in this function.

Actually, it seems that is only created after there are 12 cases.



Thaddy

  • Hero Member
  • *****
  • Posts: 16945
  • Ceterum censeo Trump esse delendam
Re: Computed goto
« Reply #25 on: November 02, 2016, 02:20:29 pm »
Free Pascal has some jump table support, contrary to (afaik) Delphi.
Delphi has jump table optimization support in the compiler. Always had (at least since D2).
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12202
  • FPC developer.
Re: Computed goto
« Reply #26 on: November 02, 2016, 02:22:48 pm »
In my case the first still is faster (-O4)

1= 1782  2=2250   (FPC 3.0.0/win32 on a i7-3770)

with trunk however the difference is much less:

1= 1785  2=2000   (FPC 3.1.1/win32 on a i7-3770)

I looked at the assembler, no jumptables were used.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12202
  • FPC developer.
Re: Computed goto
« Reply #27 on: November 02, 2016, 02:28:20 pm »
Actually, it seems that is only created after there are 12 cases.

I added a few dummy cases and indeed it does. Gets slightly faster too:

1=1766  2=1906

The remaining difference is probably the jump to the end of the case and then back to the beginning. Working on an optimization that detects the construct (while true do begin..end) and optimizing it would do that without all kinds of ugly shenanigans.
« Last Edit: November 02, 2016, 03:35:13 pm by marcov »

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: Computed goto
« Reply #28 on: November 02, 2016, 02:45:39 pm »
Perhaps the best solution is   a macro:

Code: [Select]
{$define dispatch:=p := dispatch_table[byte(code[p])]; inc(pc); asm jmp p; end }
how come? Is it be because a macro is used in C example?

Well, a nested inlined procedure will do the same trick, but it will be more pascal way.
Just replace GotoAddr call with asm
« Last Edit: November 02, 2016, 04:58:45 pm by skalogryz »

BeniBela

  • Hero Member
  • *****
  • Posts: 927
    • homepage
Re: Computed goto
« Reply #29 on: November 02, 2016, 08:07:15 pm »


how come? Is it be because a macro is used in C example?

Well, a nested inlined procedure will do the same trick, but it will be more pascal way.
Just replace GotoAddr call with asm

Because with the nested procedure it gives

Code: [Select]
Hint: "assembler" not yet supported inside inline procedure/function
Hint: Inlining disabled

Although when it confuses the registers, it is worse than the procedure

Code: [Select]
inline macro asm: Time: 3266 Sum: 30000
inline GotoAddr call: Time: 2639 Sum: 30000
case: Time: 2829 Sum: 30000
case with table: Time: 3363 Sum: 30000

Actually the interpreter is too simple. An actual interpreter cannot keep all data in an register anyways. And the program is too simple, too. The taken case repeats every 6 iterations, the branch predictor can learn that.

Try Code[ I ] := Random(6)+1;     

Code: [Select]
inline macro asm: Time: 10334 Sum: 46130000
inline GotoAddr call: Time: 8901 Sum: 46130000
case: Time: 11211 Sum: 46130000
case with table: Time: 11287 Sum: 46130000




 

TinyPortal © 2005-2018