Recent

Author Topic: Compiler case optimization vs if then else  (Read 5405 times)

damieiro

  • Full Member
  • ***
  • Posts: 200
Compiler case optimization vs if then else
« on: December 09, 2019, 07:48:17 pm »
Hello!

Some (perhaps beginner) question.

How does compiler optimization perform for case vs if then else?

Code: Pascal  [Select][+][-]
  1.  case MyVar of
  2.    1:    write ('one');
  3.  else
  4.    write ('Not one');
  5.  end;

Generates exactly the same machine code like it would generate this? (rephrase: generate the same intermediate code?)
Code: Pascal  [Select][+][-]
  1. if MyVar=1
  2.   then write ('one')
  3.   else write ('Not one');

I think it should.

Same for simplest form?

Code: Pascal  [Select][+][-]
  1.  case MyVar of
  2.    1:    write ('one');
  3.  end;
Code: Pascal  [Select][+][-]
  1. if MyVar = 1 then write ('one');

but... what else if nested conditions? does compiler optimice better that "at if-then-else at hand"?

Code: Pascal  [Select][+][-]
  1.  case MyVar of
  2.    1:  write ('one');
  3.    2:  write ('two');
  4.  else
  5.    write ('Not one neither two');
  6.  end;

What should be their "if then else tree"?
Code: Pascal  [Select][+][-]
  1. if MyVar=1
  2.   then
  3.     write ('one')
  4.   else
  5.     begin    
  6.       if My Var=2
  7.         then write ('two')
  8.         else  write ('Not one neither two');
  9.     end;

and for N conditions? does some kind of binary tree the case statement in compiler optimization? does the compiler rearrange if-then-else? do the same intermediate code?

I see this on docs : https://wiki.freepascal.org/Case_Compiler_Optimization, but it seems an Stub
« Last Edit: December 09, 2019, 08:34:09 pm by damieiro »

stocki

  • Full Member
  • ***
  • Posts: 144
Re: Compiler case optimization vs if then else
« Reply #1 on: December 09, 2019, 08:06:32 pm »
You can see that in the asm code.

damieiro

  • Full Member
  • ***
  • Posts: 200
Re: Compiler case optimization vs if then else
« Reply #2 on: December 09, 2019, 08:22:54 pm »
Quote
You can see that in the asm code

Well, my fault. I rephrase the question: do the compiler the same intermediate code for the case and the if then else? Do some optimizations? Does the compiler detect better optimization with the case statement?.
(not asking even for other optimizations depending of data type, for example if "my var" were an string)

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Compiler case optimization vs if then else
« Reply #3 on: December 09, 2019, 08:33:57 pm »
Quote
You can see that in the asm code

Well, my fault. I rephrase the question: do the compiler the same intermediate code for the case and the if then else? Do some optimizations? Does the compiler detect better optimization with the case statement?.
(not asking even for other optimizations depending of data type, for example if "my var" were an string)
Since the values compared against in a normal Pascal case statement are constants, that fact can be used to create a jump table. Whether or not the compiler sees fit to create a jump table may depend on compiler settings such as optimizing for speed or size or a balance between the two.

That optimization cannot be done when the values are strings.

Nested "if" statements are rarely subject to general optimizations.

If you want to know how the compiler optimizes something, as others above already pointed out, you look at the code the compiler generated. 


(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

damieiro

  • Full Member
  • ***
  • Posts: 200
Re: Compiler case optimization vs if then else
« Reply #4 on: December 09, 2019, 08:44:45 pm »
Quote
Since the values compared against in a normal Pascal case statement are constants, that fact can be used to create a jump table. Whether or not the compiler sees fit to create a jump table may depend on compiler settings such as optimizing for speed or size or a balance between the two.

That optimization cannot be done when the values are strings.

Nested "if" statements are rarely subject to general optimizations.

If you want to know how the compiler optimizes something, as others above already pointed out, you look at the code the compiler generated.


Well, thanks, but i think the answer i need is not to read assembly code generated which can have different aproachs strongly dependant of architecture (for example jump prediction).

The optimization i was asking is at higher level. Before the translation to assembly.

But yeah, you give me a clue. Case statement seems to give to compiler better information to optimize than "at hand" optimized if-then-else which is the real question i need to be anwered. I only find this in docs: https://wiki.freepascal.org/Case_Compiler_Optimization

I even think if for a "optimiced way" the "if-then-else" could be always replaced by case statements without performance penalty and even a boost in performance.

ASerge

  • Hero Member
  • *****
  • Posts: 2222
Re: Compiler case optimization vs if then else
« Reply #5 on: December 09, 2019, 09:40:08 pm »
For a single value, the if then else expression looks more natural, and as I see assembler code, the compiler generates a more optimal code for Intel x64 in this case.

damieiro

  • Full Member
  • ***
  • Posts: 200
Re: Compiler case optimization vs if then else
« Reply #6 on: December 09, 2019, 10:37:01 pm »
Quote
For a single value, the if then else expression looks more natural, and as I see assembler code, the compiler generates a more optimal code for Intel x64 in this case.

Thanks.
I agree that if then else for single value looks more natural an readable.
However, i do not understand why compiler cannot generate same code for a single value if..then and case. These should be translated the same


440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Compiler case optimization vs if then else
« Reply #7 on: December 09, 2019, 11:32:52 pm »
However, i do not understand why compiler cannot generate same code for a single value if..then and case. These should be translated the same
A compiler translates the high level instructions into machine instructions.  In the great majority of cases, it doesn't analyze the code logic and create simpler logic that accomplishes the same thing.

That means, for an "if/then/else" construct, the compiler has a general "formula" (to put it that way) that works for all "if/then/else" constructs, the same for the case construct and, the two formulas are very different. 

IOW, the compiler doesn't try to figure out if some if/then/else construction can be expressed as a case statement.

That said, some compilers can do logic optimizations of that kind but, the cases they can handle are usually quite limited. 

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Compiler case optimization vs if then else
« Reply #8 on: December 10, 2019, 09:29:26 am »
The optimization i was asking is at higher level. Before the translation to assembly.
There is no general optimization for these two constructs at the higher level. A case is lowered by the architecture backend to a jump table or something similar and an if-statement is lowered to jumps.

The only optimization occurs if the value that is checked against is constant (e.g. might be the case thanks to inlining) then the unused branches will be discarded before the backend lowers the code.

damieiro

  • Full Member
  • ***
  • Posts: 200
Re: Compiler case optimization vs if then else
« Reply #9 on: December 10, 2019, 12:49:21 pm »
Quote
A compiler translates the high level instructions into machine instructions.  In the great majority of cases, it doesn't analyze the code logic and create simpler logic that accomplishes the same thing.

That means, for an "if/then/else" construct, the compiler has a general "formula" (to put it that way) that works for all "if/then/else" constructs, the same for the case construct and, the two formulas are very different.

IOW, the compiler doesn't try to figure out if some if/then/else construction can be expressed as a case statement.
That said, some compilers can do logic optimizations of that kind but, the cases they can handle are usually quite limited.

Well, taking your words, what i expected (my knowledge in compilers isn't too high) is that compiler will do some translation like this

1.- If ... then  -> APLY formula 1: (some conditional jump  in assembly and if it could be possible - not in this example- with prediction -> would be great to inform compiler the probability ->  i think a while loop will be used probability)
2.- If.. then ... else -> APLY formula 2.

3 Case X Of One Case end; APLY formula 1 (if.. then)
  Case X of One Case else end; APLY formula 2 (if..then..else)

   Case X of ManyCases : Aply formula 3.. (case of more items)

This seems intuitive for me, but from a compiler engineering perhaps is not aplicable.

damieiro

  • Full Member
  • ***
  • Posts: 200
Re: Compiler case optimization vs if then else
« Reply #10 on: December 10, 2019, 12:52:40 pm »
Quote
There is no general optimization for these two constructs at the higher level. A case is lowered by the architecture backend to a jump table or something similar and an if-statement is lowered to jumps.

The only optimization occurs if the value that is checked against is constant (e.g. might be the case thanks to inlining) then the unused branches will be discarded before the backend lowers the code.

Thanks, this is the answer i needed.
These constructs use different formulas, and it seems that simplest "cases" do not reuse if-them-else formulae.

Are the case using other optimization besides jumptables (some trees, or things like that)...

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Compiler case optimization vs if then else
« Reply #11 on: December 10, 2019, 05:31:19 pm »
This seems intuitive for me, but from a compiler engineering perhaps is not aplicable.
It is applicable but, doing that requires that the compiler do logical optimizations, that is, optimizations before generating code.  Those kinds of optimizations are often difficult to perform, though in the specific case you mentioned it wouldn't be very hard.

These constructs use different formulas, and it seems that simplest "cases" do not reuse if-them-else formulae.
That is correct.  The reason is, the typical "case" statement (match against constants) can often be implemented with a jump table whereas the typical (nested) "if" statements cannot.

Are the case using other optimization besides jumptables (some trees, or things like that)...
The nature of the typical case statement limits its possible optimization to the use of a jump table (speed-wise).  If a programmer is implementing a tree structure using one or more case statements, it's likely that is not the cleanest and easiest to understand implementation of the algorithm.  Compilers do not and, should not, concern themselves with that kind of stuff.

I'll give you an analogy.  A cross-shaped screw, can be screwed using a flat screwdriver ("if" statement) instead of a Phillips screwdriver ("case" statement)  but, reusing a flat screwdriver isn't the ideal solution in that case.  That said, in the specific case of testing a single condition with a "case" statement, it would be better if the compiler degenerated the "case" into an "if" but, that's the kind of thing the programmer is supposed to do.  IOW, it's the programmer who didn't pick the right "tool" and it isn't the compiler's job to override the tool the programmer chose.

consider this code snippet
Code: Pascal  [Select][+][-]
  1. procedure Code;
  2. var
  3.   a, b : integer;
  4.  
  5. begin
  6.   if a = 3 then writeln('3');
  7.  
  8.   case a of
  9.     3 : writeln('3');
  10.   end;
  11.  
  12.   if a = 3 then writeln('3') else writeln('not 3');
  13.  
  14.   case a of
  15.     3 : writeln('3');
  16.   else
  17.     writeln('not 3');
  18.   end;
  19. end;
  20.  
That code generates this
Code: ASM  [Select][+][-]
  1. Generics.lpr:33                   if a = 3 then writeln('3');
  2.  
  3. ; the following two (2) statements are the "formula" for 1- condition if statement
  4.  
  5. 00401422 83fb03                   cmp    $0x3,%ebx
  6. 00401425 7526                     jne    0x40144d <CODE+45>
  7.  
  8. ; a is 3
  9.  
  10. 00401427 e8b4540000               call   0x4068e0 <fpc_get_output>
  11. 0040142C 89c6                     mov    %eax,%esi
  12. 0040142E 89f2                     mov    %esi,%edx
  13. 00401430 b133                     mov    $0x33,%cl
  14. 00401432 b800000000               mov    $0x0,%eax
  15. 00401437 e884570000               call   0x406bc0 <fpc_write_text_char>
  16. 0040143C e84f290000               call   0x403d90 <fpc_iocheck>
  17. 00401441 89f0                     mov    %esi,%eax
  18. 00401443 e8c8550000               call   0x406a10 <fpc_writeln_end>
  19. 00401448 e843290000               call   0x403d90 <fpc_iocheck>
  20.  
  21.  
  22. Generics.lpr:35                   case a of
  23.  
  24. ; the following five (5) statements are the "formula" for a general multi branch statement which
  25. ; does not use a jump table
  26.  
  27. 0040144D 89d8                     mov    %ebx,%eax
  28. 0040144F 83f803                   cmp    $0x3,%eax
  29. 00401452 7c2b                     jl     0x40147f <CODE+95>
  30. 00401454 83e803                   sub    $0x3,%eax
  31. 00401457 7526                     jne    0x40147f <CODE+95>
  32.  
  33. ; a is 3
  34.  
  35. Generics.lpr:36                   3 : writeln('3');
  36. 00401459 e882540000               call   0x4068e0 <fpc_get_output>
  37. 0040145E 89c6                     mov    %eax,%esi
  38. 00401460 89f2                     mov    %esi,%edx
  39. 00401462 b133                     mov    $0x33,%cl
  40. 00401464 b800000000               mov    $0x0,%eax
  41. 00401469 e852570000               call   0x406bc0 <fpc_write_text_char>
  42. 0040146E e81d290000               call   0x403d90 <fpc_iocheck>
  43. 00401473 89f0                     mov    %esi,%eax
  44. 00401475 e896550000               call   0x406a10 <fpc_writeln_end>
  45. 0040147A e811290000               call   0x403d90 <fpc_iocheck>
  46.  
  47.  
  48. ; -------------------------  binary choice using "if" and using "case"
  49.  
  50. Generics.lpr:39                   if a = 3 then writeln('3') else writeln('not 3');
  51.  
  52. ; the following two (2) statements are the "formula" for the  if/else statement
  53.  
  54. 0040147F 83fb03                   cmp    $0x3,%ebx
  55. 00401482 7528                     jne    0x4014ac <CODE+140>
  56.  
  57. ; a is 3
  58.  
  59. 00401484 e857540000               call   0x4068e0 <fpc_get_output>
  60. 00401489 89c6                     mov    %eax,%esi
  61. 0040148B 89f2                     mov    %esi,%edx
  62. 0040148D b133                     mov    $0x33,%cl
  63. 0040148F b800000000               mov    $0x0,%eax
  64. 00401494 e827570000               call   0x406bc0 <fpc_write_text_char>
  65. 00401499 e8f2280000               call   0x403d90 <fpc_iocheck>
  66. 0040149E 89f0                     mov    %esi,%eax
  67. 004014A0 e86b550000               call   0x406a10 <fpc_writeln_end>
  68. 004014A5 e8e6280000               call   0x403d90 <fpc_iocheck>
  69. 004014AA eb29                     jmp    0x4014d5 <CODE+181>
  70.  
  71. ; a is _not_ 3
  72.  
  73. 004014AC e82f540000               call   0x4068e0 <fpc_get_output>
  74. 004014B1 89c6                     mov    %eax,%esi
  75. 004014B3 b900b04000               mov    $0x40b000,%ecx
  76. 004014B8 89f2                     mov    %esi,%edx
  77. 004014BA b800000000               mov    $0x0,%eax
  78. 004014BF e8ec550000               call   0x406ab0 <fpc_write_text_shortstr>
  79. 004014C4 e8c7280000               call   0x403d90 <fpc_iocheck>
  80. 004014C9 89f0                     mov    %esi,%eax
  81. 004014CB e840550000               call   0x406a10 <fpc_writeln_end>
  82. 004014D0 e8bb280000               call   0x403d90 <fpc_iocheck>
  83.  
  84.  
  85. ; binary choice using "case" statement
  86.  
  87. Generics.lpr:41                   case a of
  88.  
  89. ; the following five (5) statements are the "formula" for a general multi branch statement which
  90. ; does not use a jump table
  91.  
  92. ; note the similarity (basically identical) with the single condition case
  93.  
  94. 004014D5 89d8                     mov    %ebx,%eax
  95. 004014D7 83f803                   cmp    $0x3,%eax
  96. 004014DA 7c2d                     jl     0x401509 <CODE+233>
  97. 004014DC 83e803                   sub    $0x3,%eax
  98. 004014DF 7528                     jne    0x401509 <CODE+233>
  99.  
  100. ; a is 3
  101.  
  102. Generics.lpr:42                   3 : writeln('3');
  103. 004014E1 e8fa530000               call   0x4068e0 <fpc_get_output>
  104. 004014E6 89c3                     mov    %eax,%ebx
  105. 004014E8 89da                     mov    %ebx,%edx
  106. 004014EA b133                     mov    $0x33,%cl
  107. 004014EC b800000000               mov    $0x0,%eax
  108. 004014F1 e8ca560000               call   0x406bc0 <fpc_write_text_char>
  109. 004014F6 e895280000               call   0x403d90 <fpc_iocheck>
  110. 004014FB 89d8                     mov    %ebx,%eax
  111. 004014FD e80e550000               call   0x406a10 <fpc_writeln_end>
  112. 00401502 e889280000               call   0x403d90 <fpc_iocheck>
  113. 00401507 eb29                     jmp    0x401532 <CODE+274>
  114.  
  115. ; a is not 3
  116.  
  117. Generics.lpr:44                   writeln('not 3');
  118. 00401509 e8d2530000               call   0x4068e0 <fpc_get_output>
  119. 0040150E 89c3                     mov    %eax,%ebx
  120. 00401510 b900b04000               mov    $0x40b000,%ecx
  121. 00401515 89da                     mov    %ebx,%edx
  122. 00401517 b800000000               mov    $0x0,%eax
  123. 0040151C e88f550000               call   0x406ab0 <fpc_write_text_shortstr>
  124. 00401521 e86a280000               call   0x403d90 <fpc_iocheck>
  125. 00401526 89d8                     mov    %ebx,%eax
  126. 00401528 e8e3540000               call   0x406a10 <fpc_writeln_end>
  127. 0040152D e85e280000               call   0x403d90 <fpc_iocheck>
  128.  

If the compiler did the kinds of optimizations you are thinking about (essentially, optimizing by altering the logic), it could implement the "if/then/else" (note the "else" part) and equivalent "case" statement using _half_ the code (no faster and no slower) but, choosing the right logic is the programmer's responsibility not the compiler's.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

damieiro

  • Full Member
  • ***
  • Posts: 200
Re: Compiler case optimization vs if then else
« Reply #12 on: December 10, 2019, 10:29:20 pm »
@440bx: Many thanks for your exhaustive explanation!!!  :) :). I have learnt a lot reading you.

With your response i know how actually fpc does with these statements.

Only for the pleasure of debate,

Quote
I'll give you an analogy.  A cross-shaped screw, can be screwed using a flat screwdriver ("if" statement) instead of a Phillips screwdriver ("case" statement)  but, reusing a flat screwdriver isn't the ideal solution in that case.  That said, in the specific case of testing a single condition with a "case" statement, it would be better if the compiler degenerated the "case" into an "if" but, that's the kind of thing the programmer is supposed to do.  IOW, it's the programmer who didn't pick the right "tool" and it isn't the compiler's job to override the tool the programmer chose.

I  agree with the analogy and i agree that compiler cannot do all the work, but:
-  i think that programmer should be well informed: There should be in the wiki that a case for 1 or 2 elements hit in performance penalty. For strings, actual docs indicates that case is similar as nested if then elses with no optimization, for example. With actual doc, programmer can think it has two flat screwdrivers with different colours, not a different ones.
- i think compiler should go for best code: If we know that a single case statement can be optimized allways, i think it should be done and i think it's not changing the logic of the program. Even a warning explaining that a single case hits in penalty would help and would Teach. It can be viewed as logical optimization (changing single switch for if) or viewed as formula optimization (seeing the case and choosing the formulae to apply -> i will go for this)

Quote
Quote
    Are the case using other optimization besides jumptables (some trees, or things like that)...
The nature of the typical case statement limits its possible optimization to the use of a jump table (speed-wise).  If a programmer is implementing a tree structure using one or more case statements, it's likely that is not the cleanest and easiest to understand implementation of the algorithm.  Compilers do not and, should not, concern themselves with that kind of stuff.
Well, i didn't explain me well. I was talking about this entry on the pascal wiki: https://wiki.freepascal.org/Case_Compiler_Optimization that talks about possible enhancements of the case "formula"

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Compiler case optimization vs if then else
« Reply #13 on: December 11, 2019, 12:26:04 am »
@440bx: Many thanks for your exhaustive explanation!!!  :) :). I have learnt a lot reading you.
You're welcome.  I'm pleased you found it helpful.

-  i think that programmer should be well informed: There should be in the wiki that a case for 1 or 2 elements hit in performance penalty.
That's quite reasonable but, there is an inherent problem, which is, the way code is generated for a particular statement can change from one version of the compiler to the next.  When the documentation goes into details as to how code is generated for a particular construct, then keeping the documentation accurate and up to date becomes a real headache.

For strings, actual docs indicates that case is similar as nested if then elses with no optimization, for example.
Yes, for strings, a case statement is simply a syntactically cleaner construction than a bunch of nested if/then/else(s) and, the generated code should be the same.

With actual doc, programmer can think it has two flat screwdrivers with different colours, not a different ones.
The documentation should only go "so far" in explaining how a construct is implemented. The reason for that is, as mentioned previously, if it's too detailed, then the overhead of keeping the documentation up to date can become rather significant.  The documentation should focus on what a construct does and be very limited as to its implementation.

When a programmer wants to know every detail of how a construct is implemented, he/she should look at the generated assembly code.  If the code can be improved, he/she can point out how and make a suggestion.

- i think compiler should go for best code: If we know that a single case statement can be optimized allways, i think it should be done and i think it's not changing the logic of the program.
Expecting the compiler to go for best code is reasonable.  However, it is also reasonable to expect the programmer to pick the right constructs to express logic flow.  The if/then/else construct was designed for a single conditional test, case was designed for multi branch tests.  If the programmer isn't using the right construct, the compiler shouldn't be expected to morph one into the other, even if it can.

One crucial thing to keep in mind is: how much information does the compiler have to gather to accomplish a particular optimization.  The more information the compiler needs, the more complex the code generation algorithm gets and, complexity often results in a performance penalty too (not to mention subtle bugs.)

For the compiler to figure out that a "case" statement is equivalent to a single test "if" statement, it has to parse the entire case statement, once that is done, it cannot simply redirect execution to the code for an "if" statement because the code for the "if" statement expects to parse an "if" not a "case".   That means there has to be code to account for the single condition case statement, which is applicable only when the programmer makes a "less than optimal" choice to express the logic flow.  Optimization does _not_ include compensating for a programmer making poor choices to express logic.

There is one book I always recommend to someone who is interested in how a compiler works, it is: Brinch Hansen on Pascal Compilers.  Unfortunately, it is out of print and, the price people are asking for it is outrageous but, I haven't seen a better introductory book on compilers.   If you can get a copy at a reasonable price, I recommend it very highly.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Compiler case optimization vs if then else
« Reply #14 on: December 11, 2019, 01:28:23 am »
   If you can get a copy at a reasonable price, I recommend it very highly.

Incredible : used > 170.- $,  new  > 800.- $

Or some PDF-bandits: give us your credit card number ...

Winni

 

TinyPortal © 2005-2018