Recent

Author Topic: Measuring distances between data points  (Read 15583 times)

wp

  • Hero Member
  • *****
  • Posts: 7534
Re: Measuring distances between data points
« Reply #15 on: September 01, 2012, 06:03:23 pm »
Although it is hard to imagine that anybody would be interested in the distance of a data point to another one on an inivisible series, I could imagine that there might be some tweaky application in which limitations could only be overcome in this strange way. Therefore, I agree that my proposal might be too much of a limitation, particularly because a workaround with AffectedSeries is available for my special case. After adopting my main project accordingly, speed is not an issue any more.

Quote
bounds calculation also requires a linear scan
But not whenever the mouse moves. And as far as I understand, the bounds are updated automatically whenever new data are added - so no "scanning" involved.

Quote
To help general case
The general case of XY distance measurement is very exotic because 99% of all charts plot different quantities on x and y axes -- the geometrical distance between the points (2010, 1Million $) and (2011, 2 Million$) is meaningless. Therefore I would not consider it seriously. BTW, as proposed some time ago, the default value of MeasureMode should not be cdmXY for the same reason.

But I do agree that indexing may not be worth the effort.

Let me go one step back. The performance issues in distance measurement are important only when DatapointMode is not dpmFree. One reason why I introduced the locking mechanism was because the distance tool does not know which axes it should use to calculate axis units. I am thinking about a way to provide this information in a different way, like properties AxisIndexX, AxisIndexY for the distance tool. But what if we have several axes in the same direction but with different transformations?

« Last Edit: September 01, 2012, 06:57:07 pm by wp »
Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

Ask

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 687
Re: Measuring distances between data points
« Reply #16 on: September 02, 2012, 11:20:12 am »
Quote
bounds calculation also requires a linear scan
But not whenever the mouse moves. And as far as I understand, the bounds are updated automatically whenever new data are added - so no "scanning" involved.
What I wanted to say is that this caching may theoretically be omitted by some exotic source. For example, currently the caching is not efficient for multi-valued sources.
Anyway, implemented boundary check in r38478.

Quote from: wp
The general case of XY distance measurement is very exotic because 99% of all charts plot different quantities on x and y axes -- the geometrical distance between the points (2010, 1Million $) and (2011, 2 Million$) is meaningless. Therefore I would not consider it seriously. BTW, as proposed some time ago, the default value of MeasureMode should not be cdmXY for the same reason.
I disagree. For example, my experience of using similar tools, limited as it is,
tells that measurement is mostly useful in scatter, GIS-oriented and other scientific plots with meaningful 2d distances, since measuring a single-axis distance is easily performed "mentally", and so does not need a specialized tool.
I am NOT saying that I am right "statistically", but only that I find your anecdotal evidence not convincing enough. We need at least a third person :)

After some experiments, I have implemented yet another variant of optimization, which works for all distance modes is and asymptotically logarithmic, but is based on a heuristic (i.e. may return wrong results). Basically it supposes that the data is "smooth enough" so that hierarchical gradient search will find the global minimum.
I am not 100% sure if it is useful, so attach it as a patch instead of committing.
After applying the patch you can control this algorithm by
changing constant in the "p.FApproxLevel := 10" line of TDataPointTool.FindNearestPoint.
Also, you can insert "exit(true)" as a first line of InBoundaryBox function
to disable the previous optimization so as to not obscure the effect of this one.

Questions are:
1) Should this optimization be committed?
2) If yes, should the interface be published?
3) If yes, what should that interface be?

Quote
One reason why I introduced the locking mechanism was because the distance tool does not know which axes it should use to calculate axis units.
Was that "the only one reason" or "one of (several) reasons"?

Quote
I am thinking about a way to provide this information in a different way, like properties AxisIndexX, AxisIndexY for the distance tool. But what if we have several axes in the same direction but with different transformations?
Consider a chart with several overlapping series assigned to different axises.
If user wants to measure distance in dpmFree mode, but always on one specific axis,
he currently can not do that.
AxisIndex properties might help this case, but, as you noted, will result in trouble
for the case of multiple non-overlapping axises.
One compromise may be to implement a new mode, which dos not snap the tool to the nearest point, but still uses FindNearest (possible with increased GrabRadius) to find a series for the purpose of axis transformation.

wp

  • Hero Member
  • *****
  • Posts: 7534
Re: Measuring distances between data points
« Reply #17 on: September 03, 2012, 12:01:59 am »
Quote
I find your anecdotal evidence not convincing enough
In engineering and physics, almost all plots have different quantities on x and y axis, but probably I am underestimating the importance of length-length plots since my field of work is not GIS.

Quote
implemented boundary check
Wow, a dramatic increase in speed. Both the modified distance demo as well as my main project gain substantially. Thank you.

Quote
implemented yet another variant of optimization which works for all distance modes is and asymptotically logarithmic, but is based on a heuristic (i.e. may return wrong results).
Not much faster, if at all, than the version after boundary check. But the snapping effect is less efficient, it is easy to miss the starting point, and then the proper axis transformation is not detected resulting in wrong distance values. I played with several ApproxLevels: the larger values, around 10, show the behavior described, and the lower values, 0 or 1, seem like having no advantage over the boundary check alone. I tend to drop this optimization.
I have to admit that there is a good chance that my "real life" noisy data may not be appropriate, but I get the same behaviour when I replace the RandomChartSource in the distance demo by the internal ListSource that I populate with values of the sine function.

Here is another idea for an optimization -- which is not an optimization at all, but adds an event "OnGetNearestPoint" to the DataPointTool and shifts the burden of finding the optimum algorithm to the programmer by using his own procedure.

Quote
Was that "the only one reason" or "one of (several) reasons"?
My primary concern was the issue with the tranformations, but the possibility to measure "exact" distances by locking to the data points was a motivation as well.

Quote
implement a new mode, which dos not snap the tool to the nearest point, but still uses FindNearest (possible with increased GrabRadius) to find a series for the purpose of axis transformation.
No opinion yet.

A remark on the snapping mode: I realized during testing today that the snapping distance line can start in "nowhere land", i.e. off of any series, and this means that the transformations may not be detected correctly. That's not what I meant. In snapping mode, the first point should always be on a data point, but the second point can be anywhere. This is in contrast to dpmLock where both points are data points, and to dpmFree where both points can be anywhere.
Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

Ask

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 687
Re: Measuring distances between data points
« Reply #18 on: September 03, 2012, 04:16:16 am »
Quote
Not much faster, if at all, than the version after boundary check.
This is why I recommended to disable the boundary check when testing it
(or to increase the number of points by another order of magnitude).

Quote
I played with several ApproxLevels: the larger values, around 10, show the behavior described, and the lower values, 0 or 1, seem like having no advantage over the boundary check alone. I tend to drop this optimization.
Yes, ApproxLevel lower then 4 is just ignored. To get "better, but slower" approximation the ApproxLevel should be *increased* to e.g. 20.
Anyway, I find it doubtful too, so agree to drop it.

Quote
In snapping mode, the first point should always be on a data point, but the second point can be anywhere.
I did notice that behavior while testing your code, but was fully convinced that it is a bug -- I did not even mention it in a list of changes.
At least, it seems non-intuitive to me.
How about separate "DataPointModeStart" and "DataPointModeEnd" properties?
Or "dpmLockStartSnapSecond" mode -- although the latter seems very awkward.

wp

  • Hero Member
  • *****
  • Posts: 7534
Re: Measuring distances between data points
« Reply #19 on: September 03, 2012, 12:20:10 pm »
Quote
I recommended to disable the boundary check
...
To get "better, but slower" approximation the ApproxLevel should be *increased* to e.g. 20
I tested these cases. Maybe my summary war not detailed enough.

Quote
to increase the number of points by another order of magnitude
In the modified distance demo (with sine instead of random data) I went up to 250000 points per series and at first did not see any difference with and without your modifications any more. This was due to the redrawing of the entire chart in "normal" drawing mode. After switching to XOR mode, the differences were back again. The heuristic optimization (without boundary check) appeared to be a bit smoother than the boundary check even at an ApproxLevel of 30 which I needed to find the starting point of snap mode.
So, I still tend to drop the heuristic mode, but since the work is done and there may be cases when it has clearer advantages than in my tests there would be no risk in keeping it.

Quote
How about separate "DataPointModeStart" and "DataPointModeEnd" properties?
Fine, but then we can skip either dpmLock or dpmSnap.

When working with the modified distance demo I had the idea that it would be nice to zoom in while dragging the end point. Or to seek the start point at high zoom level, zoom out, drag to the appoximate position of the end point, and to zoom in again to find the exact end position. Is is possible to combine tools? (Please don't implement it if it is too difficult).
Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

Ask

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 687
Re: Measuring distances between data points
« Reply #20 on: September 03, 2012, 04:02:42 pm »
Quote
separate "DataPointModeStart" and "DataPointModeEnd" properties?
Implemented in r38498. Now you can achieve what you wanted
by setting DataPointModeStart=dpmLock, DataPointModeEnd=dpmSnap

Quote
I had the idea that it would be nice to zoom in while dragging the end point.
...
Is it possible to combine tools?

It is indeed hard to combine tools -- not so much in implementation,
but in design -- too many weird combinations and corner cases.
However, you can simply perform zooming in event handlers:

Code: [Select]
procedure TForm1.ctDistance1BeforeKeyDown(ATool: TChartTool; APoint: TPoint);
var
  ext: TDoubleRect;
  x: Double;
begin
  if ssShift in GetKeyShiftState then begin
    ext := Chart1.LogicalExtent;
    if ext.b.x - ext.a.x >= 10 then begin
      x := Chart1.XImageToGraph(APoint.X);
      ext.a.x := x - 4;
      ext.b.x := x + 4;
      Chart1.LogicalExtent := ext;
    end;
    ATool.Handled;
  end;
end;

procedure TForm1.ctDistance1BeforeKeyUp(ATool: TChartTool; APoint: TPoint);
begin
  Chart1.ZoomFull;
  ATool.Handled;
end;

Some notes:
1) A usual, this will work only if the chart has focus, so you need to write Chart1SetFocus everywhere. You can probably use
"if ctDistance1.IsActive" from the form's OnKeyDown handler, but I did not test that.
2) The zooming code in the example above is very crude. Check zoom tools for the correct fixed-point zooming implementation. Zooming code can probably be refactored out from the tool... but not right now :)
3) GetKeyShiftState is declared in LCLIntf unit. Since r38500 I have added
Toolset.DispatchedShiftState property, so you can check that instead.
It should also let you capture mouse clicks, although I did not test that either.
4) Calls of "Handled" and using Before- rather then After- event are essential to this solution.
5) Another solution would be to utilize timer, reset the count on every mouse move, and zoom in when after a certain period of hovering over a single point (or a small area). It could be implemented similarly, using events.

wp

  • Hero Member
  • *****
  • Posts: 7534
Re: Measuring distances between data points
« Reply #21 on: September 03, 2012, 06:08:27 pm »
Quote
Now you can achieve what you wanted
by setting DataPointModeStart=dpmLock, DataPointModeEnd=dpmSnap
At first I thought - what's that, aren't dpmLock and dpmSnap the same? No, they are not: dpmSnap shows the distance line after hitting a data point and dragging it into nowhere land, dpmLock does not.

Quote
you can simply perform zooming in event handlers
Exellent solution. I added a MouseWheelZoomTool and borrowed its zooming code for the OnMouseWheel handlers of the distance tool. Now my application can zoom before, after and while measuring distances using the mouse wheel. There's a danger of getting twisted fingers, though...

Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

 

TinyPortal © 2005-2018