Recent

Author Topic: fpc lines of code  (Read 14865 times)

Nitorami

  • Sr. Member
  • ****
  • Posts: 481
Re: fpc lines of code
« Reply #15 on: March 07, 2015, 12:49:46 pm »
Ah, alright. Now that's something I guess. Been stupid, associated "hardware" with some sort of FPGA or similar... ::)

The attached is the best I can achieve on my machine. Got another 150ms or so down by using fast buffered file write (using the oldstyle Turbopascal blockwrite routine) instead of writing three bytes at a time, which causes a lot of overhead. It is now 1280ms here with FPC 3.1.1.

Code: [Select]
{$mode objfpc}
program mandelbrot;
uses sysutils;

{$FPUTYPE SSE3}

const
   ixmax = 2500;
   iymax = 2000;
   cxmin = -2.5;
   cxmax =  1.5;
   cymin = -2.0;
   cymax =  2.0;
   pixelwidth  = (cxmax - cxmin) / ixmax;
   pixelheight = (cymax - cymin) / iymax;
   maxcolorcomponentvalue = 255;
   maxiteration = 200;
   escaperadius = 2.0;
   er2 = escaperadius * escaperadius;
   ret = #13#10;

type
   colortype = object
      red   : byte;
      green : byte;
      blue  : byte;
      procedure SetColor (r,g,b: byte);
   end;

   procedure colortype.SetColor (r,g,b: byte);
   begin
     red := r; green := g; blue := b;
   end;

var
   n, ix, iy   : dword;
   cx, cy      : real;
   filename    : string = 'new1.ppm';
   outfile     : file;
   color       : colortype;
   tm          : TDateTime;
   LineStr     : ansistring;

function Iterate (const cx,cy,er2: real): dword; inline;
var i: dword;
    zx,zy,zx2,zy2: real;
begin
  i := 0; zx := 0; zy := 0; zx2 := 0; zy2 := 0;
  while  (zx2 + zy2 < er2) and (i < maxiteration) do
  begin
     zy := cy + zx*zy*2;
     zx := cx + zx2 - zy2;
     zx2 := zx*zx;
     zy2 := zy*zy;
     inc (i);
  end;
  result := i;
end;


begin
   tm := Now;
   {$I-}
   assign  (outfile, filename);
   rewrite (outfile, 1);
   if ioresult <> 0 then
   begin
      writeln(stderr, 'unable to open output file: ', filename);
      exit;
   end;
   {$I+}
   LineStr := 'P6'+ ret
              +' # '+ret
              +' '+IntToStr (ixmax)+ret
              +' '+IntToStr (iymax)+ret
              +' '+IntToStr (maxcolorcomponentvalue)+ret;

   blockwrite (outfile,LineStr[1], length (LineStr));

   SetLength (LineStr, sizeof (colortype) * ixmax);
   for iy := 0 to iymax-1 do
   begin
      cy := cymin + iy*pixelheight;
      if abs(cy) < pixelheight / 2 then cy := 0.0;
      for ix := 0 to ixmax-1 do
      begin
         cx := cxmin + ix*pixelwidth;
         if iterate (cx,cy,er2) = maxiteration
         then Color.SetColor (0,0,0)
         else Color.SetColor (255,255,255);
         n := ix*3;
         LineStr[n+1] := chr (color.red);
         LineStr[n+2] := chr (color.green);
         LineStr[n+3] := chr (color.blue);
      end;
      blockwrite(outfile, LineStr[1], length (LineStr));
   end;

   close(outfile);
   writeln((Now-tm)*3600000*24:0:0,'ms');
   readln;
end.


Cyrax

  • Hero Member
  • *****
  • Posts: 836
Re: fpc lines of code
« Reply #16 on: March 07, 2015, 01:43:18 pm »
Some suggestions for your new code:

Change
Code: [Select]
SetLength (LineStr, sizeof (colortype) * ixmax); to 
Code: [Select]
SetLength (LineStr, sizeof (colortype) * ixmax * iymax);
Move
Code: [Select]
blockwrite(outfile, LineStr[1], length (LineStr)); out of for loop and to before
Code: [Select]
close(outfile);

Cyrax

  • Hero Member
  • *****
  • Posts: 836
Re: fpc lines of code
« Reply #17 on: March 07, 2015, 04:07:06 pm »
Code: [Select]
project1.lpr:92                   LineStr[n+1] := chr (color.red);
00401953 b8c0c24100               mov    $0x41c2c0,%eax
00401958 e8a3100000               call   0x402a00 <fpc_ansistr_unique>
0040195D 8b1500c04100             mov    0x41c000,%edx
00401963 8d5201                   lea    0x1(%edx),%edx
00401966 8a0da0c24100             mov    0x41c2a0,%cl
0040196C 884c10ff                 mov    %cl,-0x1(%eax,%edx,1)
project1.lpr:93                   LineStr[n+2] := chr (color.green);
00401970 b8c0c24100               mov    $0x41c2c0,%eax
00401975 e886100000               call   0x402a00 <fpc_ansistr_unique>
0040197A 8b1500c04100             mov    0x41c000,%edx
00401980 8d5202                   lea    0x2(%edx),%edx
00401983 8a0da1c24100             mov    0x41c2a1,%cl
00401989 884c10ff                 mov    %cl,-0x1(%eax,%edx,1)
project1.lpr:94                   LineStr[n+3] := chr (color.blue);
0040198D b8c0c24100               mov    $0x41c2c0,%eax
00401992 e869100000               call   0x402a00 <fpc_ansistr_unique>
00401997 8b1500c04100             mov    0x41c000,%edx
0040199D 8d5203                   lea    0x3(%edx),%edx
004019A0 8a0da2c24100             mov    0x41c2a2,%cl
004019A6 884c10ff                 mov    %cl,-0x1(%eax,%edx,1)

It seems to wise to drop using string variable as place to save results.

airpas

  • Full Member
  • ***
  • Posts: 179
Re: fpc lines of code
« Reply #18 on: March 07, 2015, 04:49:37 pm »
visual test , based on Nitorami optimization , i got 577ms , with -O4 -XX -WG -XS -Cfsse2
Code: [Select]
program mandelbrot;
{$ifdef FPC}{$mode objfpc}{$h+}{$endif}
uses windows;

const
   w = 800;
   h = 600;
   ixmax = 2000;
   iymax = 2000;
   cxmin = -2.5;
   cxmax =  1.5;
   cymin = -2.0;
   cymax =  2.0;
   pixelwidth  = (cxmax - cxmin) / ixmax;
   pixelheight = (cymax - cymin) / iymax;
   maxcolorcomponentvalue = 255;
   maxiteration = 200;
   escaperadius = 2.0;
   er2 = escaperadius * escaperadius;

var
  wc : twndclassex;
    wnd: hwnd;
    msg: tmsg;
    bmp: HBITMAP;
    fnt : HFONT;
    dc,back_dc : longword;
    data : plongword;
    ix, iy   : longword;
    cx, cy      : real;
    tm          : longword;


function inttostr(i : integer):ansistring;
begin
  str(i,inttostr);
end;

procedure setup(win : longword);
var
  binfo : BITMAPINFO;
begin
 
  dc := GetDC(win);
  back_dc := CreateCompatibleDC(dc);

  fillchar(binfo,sizeof(BITMAPINFO),0);
  binfo.bmiHeader.biSize   := sizeof(BITMAPINFOHEADER);
  binfo.bmiHeader.biWidth  := ixmax;
  binfo.bmiHeader.biHeight := -iymax;
  binfo.bmiHeader.biPlanes      :=1;
  binfo.bmiHeader.biBitCount    :=32;
  binfo.bmiHeader.biCompression :=BI_RGB;
  binfo.bmiHeader.biSizeImage   :=0;
  binfo.bmiHeader.biClrUsed     :=256;
  binfo.bmiHeader.biClrImportant:=256;

  bmp := CreateDIBSection(back_dc,binfo,DIB_RGB_COLORS,data,0,0);
  selectObject(back_dc,bmp);
  fnt := CreateFont(14, 0, 0, 0, FW_LIGHT, 0, 0, 0,
                    ANSI_CHARSET, OUT_TT_PRECIS, CLIP_DEFAULT_PRECIS, 0,
    FF_DONTCARE or DEFAULT_PITCH, 'tahoma');
  SelectObject(dc,fnt);
 
end;

procedure clean();
begin
  deleteDC(back_dc);
  deleteDC(dc);
  deleteObject(bmp);
  deleteObject(fnt);
end;

function Iterate (const cx,cy,er2: real): dword; inline;
var i: dword;
    zx,zy,zx2,zy2: real;
begin
  i := 0; zx := 0; zy := 0; zx2 := 0; zy2 := 0;
  while  (zx2 + zy2 < er2) and (i < maxiteration) do
  begin
     zy := cy + zx*zy*2;
     zx := cx + zx2 - zy2;
     zx2 := zx*zx;
     zy2 := zy*zy;
     inc (i);
  end;
  result := i;
end;

procedure draw();
var
s : ansistring;
c : longword;
begin
   tm := GetTickCount;
 
   for iy := 0 to iymax-1 do
   begin
      cy := cymin + iy*pixelheight;
      if abs(cy) < pixelheight / 2 then cy := 0.0;
      for ix := 0 to ixmax-1 do
      begin
         cx := cxmin + ix*pixelwidth;
         if iterate (cx,cy,er2) = maxiteration
         then c := 0
         else c := $ffffff;
         data[ (iymax * iy ) +  ix]  := c;
      end;
   
   end;
   SetStretchBltMode(dc,HALFTONE);
   StretchBlt(dc,0,0,w,h,back_dc,0,0,ixmax,iymax,SRCCOPY);
   
   s := inttostr(GetTickCount()-tm) + 'ms';
   SetBKMode(dc,TRANSPARENT);
   TextoutA(dc,0,0,pchar(s),length(s));
   
end;

function wndproc(wnd, message:longword;wParam,lParam: longint):longint; stdcall;
begin
    case (message) of
     WM_CLOSE:
           ExitProcess(0);
    end;       
    wndproc := DefWindowProc (wnd, message, wParam, lParam);
end;

begin
    FillChar(wc,sizeof(wc),0);

wc.cbSize        := sizeof(twndclassex);
    wc.style         := CS_HREDRAW or CS_VREDRAW;
    wc.lpfnWndProc   := @wndproc;
    wc.hbrBackground := 1;
    wc.lpszClassName := 'wclass';

    registerclassex (@wc);

wnd := CreateWindowEx(WS_EX_TOOLWINDOW or WS_EX_APPWINDOW ,
                  'wclass',
                  'mandelbrot',
                   WS_OVERLAPPEDWINDOW or WS_VISIBLE,
                   GetSystemMetrics( SM_CXSCREEN ) div 2 -  w  div 2,
                   GetSystemMetrics( SM_CYSCREEN ) div 2 -  h  div 2,
                   w,
                   h,
                   0,
                   0,
                   0,
                   nil);

  setup(wnd);
  draw();
while ( msg.message <> WM_QUIT ) do
begin
if( PeekMessage( @msg, 0, 0, 0, PM_REMOVE ) ) then
begin
TranslateMessage( @msg );
DispatchMessage( @msg );
end else
begin
         
        end;
end;
clean();
    unregisterclass(@wc,hinstance);
    ExitProcess(wnd);
end.

Nitorami

  • Sr. Member
  • ****
  • Posts: 481
Re: fpc lines of code
« Reply #19 on: March 07, 2015, 05:37:38 pm »
@Cyrax: You mean buffer the whole image in memory instead of a scanline only. The drawback is, that the image has a size of 15MB, and the compiler needs to allocate memory and clear it by SetLength (). That adds so much overhead that in the end the overall execution time gets higher.

As to the string variable, I only used it because we need to write the file header first ("P6" etc), and it was convenient to use a string to spare the hassle with type conversion. Otherwise, I would have used a dynamic array. The trick is that the access to string or array is fast, while a call to writeln () takes a while. Therefore better buffer a decent amount of bytes (more than a few kB is normally not necessary and has not further benefit), then call writeln and flush the whole string or array in one go.

@airpas: Hey, you are cheating  :-[ Your original code had ixmax=2500, now it is 2000...

Cyrax

  • Hero Member
  • *****
  • Posts: 836
Re: fpc lines of code
« Reply #20 on: March 07, 2015, 06:52:51 pm »
You only need to allocate needed memory once during execution of the program and in this situation memory requirement wont change, either. So there is no overhead at all. Also you don't need time memory allocation or other unneeded things, only what matters is  performance of the algorithm.

Nitorami

  • Sr. Member
  • ****
  • Posts: 481
Re: fpc lines of code
« Reply #21 on: March 07, 2015, 07:34:26 pm »
Hm, I understand that SetLength (AnsiString) does allocate memory, although this is different from a dynamic array, where SetLength allocates memory and clears it, which does take time (although not a lot for 15MB).

Still, the whole takes distinctly longer when using a whole image buffer, maybe due to cache issues again (not sure whether the cache is used at all in such cases, but 15MB is a lot larger than my 2nd level cache on my CPU).

The reason for using SetLength is that when adding characters to an Ansistring (or dynamic array) one by one e.g. by LineStr := LineStr + 'P', this may be getting really slow because new memory needs to be allocated on every call. When setting the required length at the start, this is not necessary.

lagprogramming

  • Sr. Member
  • ****
  • Posts: 405
Re: fpc lines of code
« Reply #22 on: March 07, 2015, 08:35:40 pm »
Out of curiosity.

Within "procedure draw();"
Remove variable "c : longword;" and remove a branch by changing
Code: [Select]
      for ix := 0 to ixmax-1 do
      begin
         cx := cxmin + ix*pixelwidth;
         if iterate (cx,cy,er2) = maxiteration
         then c := 0
         else c := $ffffff;
         data[ (iymax * iy ) +  ix]  := c;
      end;

with

Code: [Select]
      for ix := 0 to ixmax-1 do
      begin
         cx := cxmin + ix*pixelwidth;
         data[ (iymax * iy ) +  ix]  := longword(longbool(iterate (cx,cy,er2) <> maxiteration));
      end;

   How will it change the code execution speed on your CPU?

BeniBela

  • Hero Member
  • *****
  • Posts: 905
    • homepage
Re: fpc lines of code
« Reply #23 on: March 07, 2015, 11:25:13 pm »
   The trick is that the access to string or array is fast, while a call to writeln () takes a while. Therefore better buffer a decent amount of bytes (more than a few kB is normally not necessary and has not further benefit), then call writeln and flush the whole string or array in one go.
 

Isn't there a write buffer whose size you can increase to make it fast?

Nitorami

  • Sr. Member
  • ****
  • Posts: 481
Re: fpc lines of code
« Reply #24 on: March 08, 2015, 01:36:18 am »
@lagprogramming: On my machine not at all. This is as expected, because the time is spent in interate which does a lot of floating point operations, and the time for the  branching does not matter at all in comparison. This is different than for tiny loops such as the example in your bug report (for which I can confirm the same behaviour on my machine)

@BeniBela: write () has certainly its own buffer and it may be possible to change its size, but the issue is that to call write () character by character does entail some overhead, regardless of buffer size.
Nowadays most would use memory/filestreams for that anyway, but I have no experience how they work.
 

lagprogramming

  • Sr. Member
  • ****
  • Posts: 405
Re: fpc lines of code
« Reply #25 on: March 08, 2015, 12:30:30 pm »
I might be wrong here but changing

Code: [Select]
function Iterate (const cx,cy,er2: real): dword; inline;
var i: dword;
    zx,zy,zx2,zy2: real;
begin
  i := 0; zx := 0; zy := 0; zx2 := 0; zy2 := 0;
  while  (zx2 + zy2 < er2) and (i < maxiteration) do
  begin
     zy := cy + zx*zy*2;
     zx := cx + zx2 - zy2;
     zx2 := zx*zx;
     zy2 := zy*zy;
     inc (i);
  end;
  result := i;
end;

with

Code: [Select]
function Iterate (const cx,cy,er2: real): dword; inline;
var zx,zy,zx2,zy2: real;
begin
  Result := maxiteration; zy := 0; zx2 := 0; zy2 := 0;
  while  (zx2 + zy2 < er2) and (Result <> 0) do
  begin
     zx := cx + zx2 - zy2;
     zy := cy + zx*zy*2;
     zx2 := zx*zx;
     zy2 := zy*zy;
     dec(Result);
  end;
end;

and further modify

Code: [Select]
      for ix := 0 to ixmax-1 do
      begin
         cx := cxmin + ix*pixelwidth;
         data[ (iymax * iy ) +  ix]  := longword(longbool(iterate (cx,cy,er2) <> maxiteration));
      end;

with

Code: [Select]
      for ix := 0 to ixmax-1 do
      begin
         cx := cxmin + ix*pixelwidth;
         data[ (iymax * iy ) +  ix]  := longword(longbool(iterate (cx,cy,er2) <> 0));
      end;

might increase a bit the speed.

I've removed local variable "i: dword;" by replacing it with function's result and replaced "maxiteration" comparisons with zero comparisons.
The idea was that if a value that lies in CPU register has to be compared with zero, then instead of "cmp" the compiler generates a "test" instruction in the generated assembler code, which on many CPUs improves the speed a bit.
I've also switched the order of zx,zy,zx2,zy2 computation because some CPUs might stall a little bit with the original code order.
I've also removed "zx := 0;" because it seemed useless to me.
I don't know what's the effect of the changes, there might not be a single one :).

airpas

  • Full Member
  • ***
  • Posts: 179
Re: fpc lines of code
« Reply #26 on: March 08, 2015, 12:56:17 pm »
your modification slow down the rendering x2 , and gave wrong shape .
i think the code as is , is optimized anyways , in my machine gives me 577ms , rebuild it with gcc 4.9.2 with a lots of speed switches + see2 gives 490ms  , so fpc already done a great job

Nitorami

  • Sr. Member
  • ****
  • Posts: 481
Re: fpc lines of code
« Reply #27 on: March 08, 2015, 06:28:57 pm »
If you want REAL speed for fractals, you should use the GPU, it is incredibly powerful. I learned this when playing with the free winamp media player which comes with a visualisation plugin called "milkdrop2". Milkdrop allows simple and wysiwig access to the GPU shaders via HLSL (high level shader language, sort of C-like but optimised for speed and massive parallel processing.)

I was fascinated what can be done with shaders. See this link to a movie of a code I made, it is a mandelbox rendering in realtime (the Mandelbox is a 3D version of the Mandelbrot, and the code allows you to fly through the structure at fullscreen resolution and in realtime. This is not pre-rendered or something)

https://www.youtube.com/watch?feature=player_embedded&v=HhPuI4QHR4s
 

 

TinyPortal © 2005-2018