Recent

Author Topic: Numerical Speed test Delphi 7 versus Lazarus programs  (Read 18833 times)

crorden

  • New Member
  • *
  • Posts: 36
Numerical Speed test Delphi 7 versus Lazarus programs
« Reply #15 on: November 16, 2007, 10:30:08 pm »
I have documented this with a examples of the results and code for the FPC bug tracking:

http://www.freepascal.org/mantis/view.php?id=10201

crorden

  • New Member
  • *
  • Posts: 36
Numerical Speed test Delphi 7 versus Lazarus programs
« Reply #16 on: November 17, 2007, 07:19:45 pm »
Ruma-
I have added new findings to the bug-tracking detaiils:
http://www.freepascal.org/mantis/view.php?id=10201

In short, it appears that shortints are not optimized to the same extent as integers. You can get similar performance with shortints as with integers by using
 inc(s,2)
instead of
 s := s + 2;
and by using
 dec(s,2)
instead of
 s := s -2

The new notes also suggest that the poor performance of doubles relative to single precision floating points may be due to poor byte alignment.

-c

ruma1974

  • New member
  • *
  • Posts: 7
Numerical Speed test Delphi 7 versus Lazarus programs
« Reply #17 on: November 19, 2007, 09:33:22 pm »
Thanks for submitting it to the FPC bug tracking. I am affraid that the inc(s,3) and dec(s,2) does not improve the performance for me. However, I still have to study your info in more detail.

Rune

crorden

  • New Member
  • *
  • Posts: 36
bounties added
« Reply #18 on: November 24, 2007, 09:56:00 pm »
I have posted two bounties for anyone who can help improve the speed of FPC/Lazarus for a specific numerical problem. I hope the solution could help the speed of numerical processing in general...

http://www.sph.sc.edu/comd/rorden/mricron/bounty/

http://wiki.lazarus.freepascal.org/Bounties#Multi-platform_bounties

Marius

  • New Member
  • *
  • Posts: 31
RE: bounties added
« Reply #19 on: November 25, 2007, 04:38:49 am »
Unfortunately i cannot improve the internal cpu routines.

You're smooth procedure is doing a lot of multiplications ~350kk for this test, so i thought it was better to avoid the multiplications and do some precalculations. Here's my efforts on a (slow) 2400mhz P4 ;-)

Delphi7 newsmooth/oldaligned=4125/4391 ~6% speed increase)
Lazarus newsmooth/oldaligned=4094/4703 ~13% speed increase)

Have fun,
Marius

procedure SmoothInput(lFWHM: integer);
{$ALIGN 8}
var i, lcutoffvoxx, lY, lX, lMin, lMax, lPos, lYPos: integer;
  lDataBuffer: array of array of single;
  lsigma, lexpd, lcumgauss: single;
  lTempBuff: array of byte;
  lxra: array of single;
  pbyte1: pbyte;
begin
  //Calculate static data
  lsigma :=(lFWHM) / sqrt(8 * ln(2));
  lcutoffvoxx := round(6 * lsigma);
  lexpd := 2 * lsigma * lsigma;
  Setlength(lTempBuff, gSrcWid * gSrcHt);

  //Calculate lxra tables
  SetLength(lxra, lcutoffvoxx + 1);
  lCumGauss := 0;
  for i := 0 to lcutoffvoxx do begin
    lxra := exp( - 1 *(i * i) / lexpd);
    lCumGauss := lCumGauss + lxra;
  end;
  lCumGauss := 2 * lCumGauss - lxra[0];
  if lCumGauss <> 0 then begin
    for i := 0 to lcutoffvoxx do begin
      lxra := lxra / lCumGauss;
    end;
  end;

  //Precalculate to avoid multiplications in inner loop (reduce it to a sum)
  //Dynamic array's are really suprisingly efficient (and clearer to read :P)
  SetLength(lDataBuffer, 256, lcutoffvoxx + 1);
  for lx := 0 to 255 do begin
    for ly := 0 to lcutoffvoxx do begin
      lDataBuffer[lx, ly] := lx * lxra[ly];
    end;
  end;

  //Smooth horizontally
  lyPos := 0;
  for lY := 0 to gSrcHt - 1 do begin
    for lX := 0 to gSrcWid - 1 do begin
      lMin := lX - lCutoffVoxX;
      if lMin < 0
      then lMin := 0;
      lMax := lX + lCutoffVoxX;
      if lMax >= gSrcWid
      then lMax := gSrcWid - 1;                

      lCumGauss := 0;
      pbyte1 := @gBuff^[lYPos + lMin];
      for lPos := lMin to lMax do begin
        lCumGauss := lCumGauss + lDataBuffer[pbyte1^, abs(lX - lPos)];
        inc(pByte1);
      end;
      lTempBuff[lX + lYPos] := round(lCumGauss);
    end;
    inc(lyPos, gSrcWid);
  end;

  //Smooth vertically
  for lX := 0 to gSrcWid - 1 do begin
    lyPos := 0;
    for lY := 0 to gSrcHt - 1 do begin
      lMin := lY - lCutoffVoxX;
      if lMin < 0
      then lMin := 0;
      lMax := lY + lCutoffVoxX;
      if lMax >= gSrcHt
      then lMax := gSrcHt - 1;

      lCumGauss := 0;
      pbyte1 := @lTempBuff[(lMin * gSrcWid) + lX];
      for lPos := lMin to lMax do begin
        lCumGauss := lCumGauss + lDataBuffer[pbyte1^, abs(lY - lPos)];
        inc(pbyte1, gSrcWid);
      end;
      gSmoothBuff^[lYPos + lX] := round(lCumGauss);
      inc(lyPos, gSrcWid);
    end;
  end;
end;

Marc

  • Administrator
  • Hero Member
  • *
  • Posts: 2512
Numerical Speed test Delphi 7 versus Lazarus programs
« Reply #20 on: November 26, 2007, 11:16:06 am »
in normal cases, variables are always aligned to their size.

Also the cited article measures a bit apples and oranges and completely misses the clue what the $A directive does.
It compares 1-byte alignment versus 16-byte alignment, while in delphi and fpc doubles are 8 byte aligned (unless you are using them in a packed array or have tweaked with the $A directive), so the given numbers don't tell a thing about normal double operation.
//--
{$I stdsig.inc}
//-I still can't read someones mind
//-Bugs reported here will be forgotten. Use the bug tracker

Marius

  • New Member
  • *
  • Posts: 31
Numerical Speed test Delphi 7 versus Lazarus programs
« Reply #21 on: November 26, 2007, 02:08:27 pm »
Yes, i was already uncertain about including the align since its more a compiler thing. I later realized it wasn't doing much good, and it had also very little effect on the stack (obviously ;-))

 

TinyPortal © 2005-2018