Recent

Author Topic: issues with fpc numlib  (Read 10018 times)

wp

  • Hero Member
  • *****
  • Posts: 11916
issues with fpc numlib
« on: October 02, 2011, 06:40:26 pm »
When trying to write a curve fitting series for TAChart using the polynomial fitting routine in fpc's numlib (unit ipf, procedure ipfpol) I detected several errors:

- There is a typo in the documenting comment above the ipfpol procedure saying that "Calculate n-degree polynomial b for dataset (x,y) with n elements using the least squares method." - this should read "... with m elements..."

- There seems to be a memory allocation issue somewhere within ipfpol. When fitting an n-degree polynomial it should be sufficient to allocate n+1 elements for the parameter array, since there are n+1 fitting parameters. However, the program crashes in this situation. When allocating n+2 elements, however, the program runs well.

- The fit results are not correct when there is quite a large difference between the input data and the fitted data. This can be verified in comparison e.g. with gnuplot.

The attachment contains a short project demonstrating the last issue. The project creates three test data sets and tries to fit a straight line (y = a + bx) to the data. Only the straight line input data set is fitted well, the two others (parabola and exponential) are fitted with an incorrect axis intersection (in comparison with gnuplot).

Please note that the button "Call gnuplot" works only if gnuplot is installed; I am on Windows 64-bit and have gnuplot in "C:\Program Files (x86)\gnuplot\binary", in other cases modify the gnuplot path in the procedure "ExecFit" accordingly.

Is there anybody out there who knows more about the internals of numlib than myself and who could fix these issues? I will be posting this report also in the Lazarus Bug Tracker.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11451
  • FPC developer.
Re: issues with fpc numlib
« Reply #1 on: October 02, 2011, 09:21:23 pm »
I think it is not correct to compare something as numlibs ipfpol to gnuplot.

Programs like gnuplot usually take more time to characterize and scan the input, and select the algorithm appropiately.

"ipfpol" is just one of those algorithms.

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: issues with fpc numlib
« Reply #2 on: October 02, 2011, 09:30:49 pm »
Sure, ipfpol is very limited compared to gnuplot. But the demo requires just a simple linear fit. If that would not work ipfpol would be useless.
« Last Edit: October 02, 2011, 09:35:32 pm by wp »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11451
  • FPC developer.
Re: issues with fpc numlib
« Reply #3 on: October 02, 2011, 10:10:06 pm »
Sure, ipfpol is very limited compared to gnuplot. But the demo requires just a simple linear fit. If that would not work ipfpol would be useless.

Least squares is typically quite sensitive. It is usually used as a primitive to build more complex stuff on.

E.g. in delphi code (using tpmath) I iterate several times, removing outliers with extreme relative residuals in each iteration.

I meanwhile checked the out of bounds array access. I think I know where it happens, but it is not easily correctable without reverse engineering the code.

If you can read Dutch, here are some unprocessed scans of the docs:

http://www.stack.nl/~marcov/numlib/

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: issues with fpc numlib
« Reply #4 on: October 02, 2011, 10:35:05 pm »
Thank you for the docs. I can't read dutch, I am German and, at least, can get an impression of what's going on.

Quote
removing outliers with extreme relative residuals in each iteration
I didn't do that, and I think gnuplot didn't either. So I am still not convinced why two programs get different results. Both are just trying to minimize the sum of squared residuals. But just by looking at the result of the ipfpol fit one can see that this is not the minimum of fit residuals (see "ipfpol-result.png") -- the fitted line is almost always on one side of the input data, in contrast to the gnuplot fit ("gnuplot-results.png") where the straight line goes right through the data.

Quote
I meanwhile checked the out of bounds array access. I think I know where it happens
Give me a hint, I'll try to debug it myself.


« Last Edit: October 03, 2011, 01:18:14 am by wp »

Ask

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 687
Re: issues with fpc numlib
« Reply #5 on: October 03, 2011, 05:24:54 am »
I agree that numlib has many problems and is in need of serious overhaul.
I have decided to use it in TAChart since it is already bundled with FPC,
so I thought that, even if imperfect, it is still better than importing yet another library.

I'd like to see numlib improved, but have no time to do that now.
As a rough plan, first I'd recommend to get rid of archaic manual memory allocation,
and use dynamic arrays everywhere. This will allow you to use range checks.
Second, I would work on the naming. I am not sure how much code is using numlib now, but I guess very little. So I'd recommend mass-renaming of units and procedures to a get away from 1960-era FORTRAN standard.
Third, the code should be brought to a consistent styling.
After that, the real work may begin on improving and fixing the code.

If you modify numlib, note that TAChart includes a modified copy to cater for
Lazarus/FPC version mismatch. I plan to synchronize copies after each FPC release.

So you'll have to send your changes both to FPC devels and me, sorry for the inconvenience.

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: issues with fpc numlib
« Reply #6 on: October 03, 2011, 12:17:47 pm »
This looks like re-inventing the wheel. How about adding another numerical library to fpc, such as tpmath? This package (under LGPL) is very complete, much superior to numlib regarding curve fitting, and still actively developed.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11451
  • FPC developer.
Re: issues with fpc numlib
« Reply #7 on: October 03, 2011, 12:52:16 pm »
This looks like re-inventing the wheel. How about adding another numerical library to fpc, such as tpmath? This package (under LGPL) is very complete, much superior to numlib regarding curve fitting, and still actively developed.

numlib is a large body of routines, not just curvefitting. Btw, the memory allocation is not even manual enough for me. 

Ask

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 687
Re: issues with fpc numlib
« Reply #8 on: October 03, 2011, 01:13:16 pm »
This looks like re-inventing the wheel. How about adding another numerical library to fpc, such as tpmath? This package (under LGPL) is very complete, much superior to numlib regarding curve fitting, and still actively developed.

I would certainly support getting a better math library,
and TPMath (actually, I think DMath variant is more fitting) looks like a decent candidate.
However, there are some hurdles:
1) DMath should be a strict superset of numlib. For example, from the quick look, I do not see any spline-related functions in DMath -- maybe I just missed them?
2) DMath contains a lot of auxiliary code, such as string and BGI graphics routines, which IMO should not be added to FPC.
3) Finally, you have to convince FPC developers, not me, which is very hard.

So until the above happens, we are stuck with numlib.

wp

  • Hero Member
  • *****
  • Posts: 11916
[Solved] issues with fpc numlib
« Reply #9 on: October 03, 2011, 11:27:37 pm »
I found the problem: no problem! I had called the ipfpol procedure with the understanding that the parameter n would be the number of fitting parameters, but as written in the unit, it is the degree of the polynomial.

After changing that, the arrow overrun error is gone, and the fit results are correct.
« Last Edit: October 03, 2011, 11:33:12 pm by wp »

 

TinyPortal © 2005-2018