Recent

Author Topic: FPC 3.1.1. vs 2.6.4  (Read 26290 times)

Nitorami

  • Sr. Member
  • ****
  • Posts: 491
FPC 3.1.1. vs 2.6.4
« on: March 04, 2015, 09:52:00 pm »
Excuse me if this is a naive question - clueless as I am, I should possibly not try to build a compiler from the source, but, I was curious, and downloaded the fpc 3.1.1 source trunk a while ago (29.1.), after having read about new features and optimisations. I not familiar with makefiles etc. and consequently it took me a while to sucessfully build fpc including the IDE (for win32). But, I got it working after all.

However I found that the code generated by 3.1.1 appears to be 10%...20% slower than that generated by 2.6.4. , and this is more or less independant of the programs tried, except for massive floating point operations where the performance appears to be just the same. I tried all combinations of optimisation switches including the new ones but it just does not matter - the code is slower altogether, and none of the new optimisations makes a noticeable difference.  Are there specific switches for the build process I should have activated ?

airpas

  • Full Member
  • ***
  • Posts: 179
Re: FPC 3.1.1. vs 2.6.4
« Reply #1 on: March 04, 2015, 10:01:00 pm »
did you try -O4 switch ?

Nitorami

  • Sr. Member
  • ****
  • Posts: 491
Re: FPC 3.1.1. vs 2.6.4
« Reply #2 on: March 04, 2015, 10:07:35 pm »
Not for building the compiler, anyway.

For compiling my programs neither because I would have expected the code runs at the same speed when specifying the same "2.6.4" switches. But it does not...

airpas

  • Full Member
  • ***
  • Posts: 179
Re: FPC 3.1.1. vs 2.6.4
« Reply #3 on: March 04, 2015, 10:19:35 pm »
you can always pass -al to the compiler for generating assembler , so you can see what's going on .

Nitorami

  • Sr. Member
  • ****
  • Posts: 491
Re: FPC 3.1.1. vs 2.6.4
« Reply #4 on: March 04, 2015, 10:30:42 pm »
Yes, maybe, if you can read assembler - I don't.

But that slightly misses the point - I am not talking about specific code - my impression is that 3.1.1 produces slower code regardless of the program I compile, and my suspicion is just that I should have built the compiler differently.

airpas

  • Full Member
  • ***
  • Posts: 179
Re: FPC 3.1.1. vs 2.6.4
« Reply #5 on: March 04, 2015, 10:50:21 pm »
there is no guarantee that every new release of the compiler will generate faster code in 100% of cases
i did some benchmarks , as result  60% of cases fpc 3.1.1 is ~20% faster than 2.6  , where 40% is equal or less . this is normal , even with c++ , i tryed with gcc 4.7 and 4.9 and the results was varient .

zamtmn

  • Hero Member
  • *****
  • Posts: 594
Re: FPC 3.1.1. vs 2.6.4
« Reply #6 on: March 04, 2015, 10:53:12 pm »
I also noticed the older versions generate faster code.
For example in my case fpc-2.7.1-20130828 faster that trunk version ~15%

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 1058
Re: FPC 3.1.1. vs 2.6.4
« Reply #7 on: March 04, 2015, 11:03:12 pm »
But that slightly misses the point - I am not talking about specific code - my impression is that 3.1.1 produces slower code regardless of the program I compile, and my suspicion is just that I should have built the compiler differently.
There is no way to build the compiler itself so that it generates either faster or slower code.

Nitorami

  • Sr. Member
  • ****
  • Posts: 491
Re: FPC 3.1.1. vs 2.6.4
« Reply #8 on: March 04, 2015, 11:09:47 pm »
@airpas: Alright, I can accept that.

I have learned (from Jonas) that the speed of of superscalar CPU architectures is not predictable anyway. The times are gone when you could simply add up clock cycles to determine execution speed. The numerious attempts to optimise the code by the CPU itself - cache lookahead, uop cache, branch prediction, instruction fetch, retirement, memory cache, memory prefetch - whatever - make it virtually impossible to generate optimal code other than by trial and error, and then it will be CPU specific.
This can be quite irritating, I remember a small mandelbrot benchmark program which I tried to optimise over and over and just hit a brick wall - then accidentally I declared an unused shortstring in the program body and the speed went up factor 2.

Anyway, the results on a specific CPU probably don't say a lot about the performance of the compiler, but I'll try to install 3.1.1. to a different machine to see whether it makes a difference. I agree that the speed of the generated code may vary with compiler versions on a specific machine, but this should equal out over different CPUs. If a compiler version produces slower code on several CPUs or platform, I would think it has a bug.

@Jonas - thanks. Thought I had missed something.

trayres

  • Jr. Member
  • **
  • Posts: 92
Re: FPC 3.1.1. vs 2.6.4
« Reply #9 on: March 05, 2015, 06:18:41 am »
I believe there's talk of a LLVM backend for FPC - if this is true, we should be able to get the same performance/optimizations that are available to an industry standard C++ implementation.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8757
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: FPC 3.1.1. vs 2.6.4
« Reply #10 on: March 05, 2015, 08:14:04 am »
From my own benchmark, 3.1.1 produces faster code in most cases. Well, logically, if there are more (generic) optimizations, how can it slow down existing code? Could you put some examples that's slower when compiled by 3.1.1?
I believe there's talk of a LLVM backend for FPC - if this is true, we should be able to get the same performance/optimizations that are available to an industry standard C++ implementation.
The backend is far from complete or even usable to my knowledge. I believe we could get the same level of performance without it, we need their knowledgeable and skilled developers, not their code ;)

FPK

  • Moderator
  • Full Member
  • *****
  • Posts: 118
Re: FPC 3.1.1. vs 2.6.4
« Reply #11 on: March 05, 2015, 08:27:02 am »
From my own benchmark, 3.1.1 produces faster code in most cases. Well, logically, if there are more (generic) optimizations, how can it slow down existing code? Could you put some examples that's slower when compiled by 3.1.1?

Indeed, 3.x should generate better code than previous versions. But
- 3.x has new features which might slow down programs: most notably the new string mechanisms and the windows exception handling or generating PIC by default
- there might be indeed bugs which slow down certain example

So please submit examples which clearly show what kind of code is slower (and on which platform).

airpas

  • Full Member
  • ***
  • Posts: 179
Re: FPC 3.1.1. vs 2.6.4
« Reply #12 on: March 05, 2015, 11:45:49 am »
in this example fpc 2.6.4 is faster than 3.1.1 (fpc 2.6.4 show 1482ms , fpc 3.1.1 show 1684ms)
both example compiled with -O3 , in win7 32bits.
Code: [Select]
program mandelbrot;
 uses windows;
const
   ixmax = 2500;
   iymax = 2000;
   cxmin = -2.5;
   cxmax =  1.5;
   cymin = -2.0;
   cymax =  2.0;
   maxcolorcomponentvalue = 255;
   maxiteration = 200;
   escaperadius = 2;
 
type
   colortype = record
      red   : byte;
      green : byte;
      blue  : byte;
   end;
 
var
   ix, iy      : integer;
   cx, cy      : real;
   pixelwidth  : real = (cxmax - cxmin) / ixmax;
   pixelheight : real = (cymax - cymin) / iymax;
   filename    : string = 'new1.ppm';
   comment     : string = '# ';
   outfile     : textfile;
   color       : colortype;
   zx, zy      : real;
   zx2, zy2    : real;
   iteration   : integer;
   er2         : real = (escaperadius * escaperadius);
   tm : longword;
begin
   tm := GetTickCount();
   {$I-}
   assign(outfile, filename);
   rewrite(outfile);
   if ioresult <> 0 then
   begin
      writeln(stderr, 'unable to open output file: ', filename);
      exit;
   end;
 
   writeln(outfile, 'P6');
   writeln(outfile, ' ', comment);
   writeln(outfile, ' ', ixmax);
   writeln(outfile, ' ', iymax);
   writeln(outfile, ' ', maxcolorcomponentvalue);
 
   for iy := 1 to iymax do
   begin
      cy := cymin + (iy - 1)*pixelheight;
      if abs(cy) < pixelheight / 2 then cy := 0.0;
      for ix := 1 to ixmax do
      begin
         cx := cxmin + (ix - 1)*pixelwidth;
         zx := 0.0;
         zy := 0.0;
         zx2 := zx*zx;
         zy2 := zy*zy;
         iteration := 0;
         while (iteration < maxiteration) and (zx2 + zy2 < er2) do
         begin
            zy := 2*zx*zy + cy;
            zx := zx2 - zy2 + cx;
            zx2 := zx*zx;
            zy2 := zy*zy;
            iteration := iteration + 1;
         end;
         if iteration = maxiteration then
         begin
            color.red   := 0;
            color.green := 0;
            color.blue  := 0;
         end
         else
         begin
            color.red   := 255;
            color.green := 255;
            color.blue  := 255;
         end;
         write(outfile, chr(color.red), chr(color.green), chr(color.blue));
      end;
   end;
 
   close(outfile);
   writeln(GetTickCount() - tm,'ms');
   readln;
end.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8757
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: FPC 3.1.1. vs 2.6.4
« Reply #13 on: March 05, 2015, 01:08:38 pm »
in this example fpc 2.6.4 is faster than 3.1.1 (fpc 2.6.4 show 1482ms , fpc 3.1.1 show 1684ms)
Unless you're running a single tasking OS, milliseconds differences or even several seconds are negligible in a multitasking environment. Below is the result from my machine, executed 10 times in a row using bash for loop:
Code: [Select]
2.6.4 - 3.1.1
1760.00038628 ms - 1493.99964139 ms
2843.00032072 ms - 2430.00027258 ms
2800.99944212 ms - 3722.00002894 ms
3038.00026886 ms - 2577.99974643 ms
2724.99967832 ms - 2460.99990327 ms
3045.00020575 ms - 2432.99952708 ms
2713.00014574 ms - 2427.00038943 ms
2739.99972269 ms - 2781.00043070 ms
2783.00014324 ms - 2688.00028134 ms
2759.99999139 ms - 2790.99962208 ms
As you can see, even the execution time can differ up to 2+ seconds between the same binary. Between the different binary, it's similar. If you can show consistent (i.e. executed at least 10 times in a row) 5 seconds difference, that's what you can call as slower.

airpas

  • Full Member
  • ***
  • Posts: 179
Re: FPC 3.1.1. vs 2.6.4
« Reply #14 on: March 05, 2015, 01:38:08 pm »
in the example the difference is integer , how you get this in single? , you're in linux maybe
i retest > 10 times , always 2.6.4 <= 1450ms  / 3.1.1 >= ~1600ms
try to increase ixmax and iymax

 

TinyPortal © 2005-2018