Recent

Author Topic: [TLineSeries] Slow PrepareGraphPoints on very large data sets  (Read 2331 times)

Martok

  • New Member
  • *
  • Posts: 20
Hi y'all,

the TAChart documentation Wiki page mentions that TLineSeries can "quickly draw line series from extremely large datasets (10000+ points)". Even the "line" demo seems to think that "large" is a bit more than that and starts at 600k items. I've recently come across actual applications with moderately large measurement data sets in that order of magnitude, hence the issue.

The "line" demo als lets you measure the redraw time, and it turns out that the *drawing* of those 600k points is in fact very optimized. What is not very fast is the stuff before that, most importantly TBasicPointSeries.PrepareGraphPoints (easily noticeable just by stepping through TLineSeries.Draw in the debugger)

So, I did some experiments to see where the time goes and made some modifications in TACustomSeries.pas:1966, starting from the first version as it is normally.
Code: Text  [Select][+][-]
  1. Timed refresh (s)     Test Case
  2. 600kPts    1.1MPts
  3. 0.450      0.869      FGraphPoints[i - FLoBound] := GetGraphPoint(i);
  4. 0.295      0.540      FGraphPoints[i - FLoBound] := DoublePoint(i, i mod 3);
  5. 0.145      0.271      FGraphPoints[i - FLoBound] := DoublePoint(i, 1.0);
  6. 0.081      0.145      FGraphPoints[i - FLoBound] := ZeroDoublePoint;
  7. 0.071      0.124      FillByte(FGraphPoints[0], Sizeof(TDoublePoint) * (FUpBound-FLoBound+1), 0);
  8.  

As you can see, about 83% of the time is spent just copying the mapped values to the internal buffer, drawing itself is basically free. What makes this an actual problem is that this is done for all visible values on each draw call, not just when any of the settings or data that define these values changed. This means that when a TAChart shows a running measurement progress, it starts out fast and becomes progressively slower as more data is accumulated. Linear complexity, but with a pretty high constant factor... As a bonus, that factor multiplies by 4-5 when running in a heaptrc binary since there are a lot of memory allocations downstream of that point.

I think I've figured out the architecture enough to understand why this is clearly the easiest way to do this, but it seems very inefficient.

Do you think something could be done about this? Just unrolling GetGraphPoint saves about 10%, which is nice but still nowhere near what could clearly be achieved by a smarter update strategy.


Regards,
Martok

wp

  • Hero Member
  • *****
  • Posts: 11830
Re: [TLineSeries] Slow PrepareGraphPoints on very large data sets
« Reply #1 on: May 21, 2023, 12:06:42 pm »
What is not very fast is the stuff before that, most importantly TBasicPointSeries.PrepareGraphPoints (easily noticeable just by stepping through TLineSeries.Draw in the debugger)
I cannot confirm this. In the line demo the debugger steps over the PrepareGraphPoints command immediately. The execution time cannot be extended by adding more data because the linedemo is setup that a click on the "Add" button adds more series, but every series has the same number of points (50,000).

To see thereal effect of PrepareGraphPoints I am adding a dedicated demo which adds 500,000 data to the same single series with each button click. I clicked it 4 times to have 2 millions values in the series and set a breakpoint on the "PrepareGraphPoints" in TLineSeries.Draw. Stepping over the breakpoint in the debugger occurs instantly in the debugger, but in the line "DrawSingleLineInStack" there is a noticeable delay - this is clear, because drawing requires much more operations than just copying data into a buffer.

Besides putting the data in a single series there is one other major difference to the linedemo. The series in the linedemo are created with AxisIndexX <> -1 and AxisIndexY <> -1 while in the new demo they are equal to -1 which is the normal setting of most charting applications. In the linedemo, this has the consequence that the GetGraphPoint instruction is execute at all (see the code in PrepareGraphPoints). This procedure takes care of axis transformations (logarithmic axes, paned charts, ...). In the linedemo the axis transformation is an identity transformation, quite trivial, but of course requires to enter and exit some subroutines along the way. Modify the linedemo and comment out the lines in btnAddSeriesClick which set the AxisIndexX and AxisIndexY of the new series s, and you will get faster execution because now GetGraphPoint is not called any more.

Of course, the greatest impact on drawing speed is the fact that you want to draw hundreds of thousands of data points in the same chart. In time-critical applications, it is more appropriate to draw only the most recent values and let the others scroll out of the viewport. TAChart has the component TChartLiveView for exactly this purpose (https://wiki.lazarus.freepascal.org/TAChart_documentation#Live_View). Search the forum for details, it was created by user request.

When data values arrive from an external source at a high rate you should not redraw after receiving each single value, but buffer them and redraw only larger chunks. Make sure to enclose the AddXY part of your code into Series.BeginUpdate/EndUpdate and/or Chart.DisableRedrawing/EndableRedrawing pairs to optimize extent calculation and redrawing.

Other advice for high-speed line series: Do not show data point symbols, use the default pen (pen styles other than psSolid, widths other than 1, Cosmetic other than true are guarantees for slow drawing in particular on Windows; only changing the pen color is safe).

Martok

  • New Member
  • *
  • Posts: 20
Re: [TLineSeries] Slow PrepareGraphPoints on very large data sets
« Reply #2 on: May 21, 2023, 06:16:11 pm »
In the line demo the debugger steps over the PrepareGraphPoints command immediately. The execution time cannot be extended by adding more data because the linedemo is setup that a click on the "Add" button adds more series, but every series has the same number of points (50,000).
Well, yeah, the total time measured is not "1x 600kpts" but "12x 50kpts", but as everything is nicely linear that doesn't matter too much.

To see thereal effect of PrepareGraphPoints I am adding a dedicated demo which adds 500,000 data to the same single series with each button click. I clicked it 4 times to have 2 millions values in the series and set a breakpoint on the "PrepareGraphPoints" in TLineSeries.Draw. Stepping over the breakpoint in the debugger occurs instantly in the debugger, but in the line "DrawSingleLineInStack" there is a noticeable delay - this is clear, because drawing requires much more operations than just copying data into a buffer.
Wait, am I getting this right. Are you saying it does not behave like this for you? Youtube Screengrab

If I completely comment out the DrawSingleLineInStack call, processing 2Mpts still takes 980ms, compared to 1990ms with actual pixel ops (or 6.2s with heaptrc).

Modify the linedemo and comment out the lines in btnAddSeriesClick which set the AxisIndexX and AxisIndexY of the new series s, and you will get faster execution because now GetGraphPoint is not called any more.
Yep, that gets the time split down from 80:20 in the first post to roughly 50:50 like here. But since I need multiple vertical axes with different (but aligned) transforms, that doesn't really help either.


Of course, the greatest impact on drawing speed is the fact that you want to draw hundreds of thousands of data points in the same chart. In time-critical applications, it is more appropriate to draw only the most recent values and let the others scroll out of the viewport.
Hm, switching out the component at runtime is not really an option, but I could/will have to scroll the extent manually during acquisition.

The thing is, I'm not getting all that many values. 1kS/s for about 10 minutes (so that's where the 600kpts come from) which already come in blocks of about 500 from the hardware. But then adding them takes almost no time, processing 500k points only to add 500 new ones does. So while I technically have around 500ms for a refresh, most of the time is spent just copying the same values over and over and over, until it gets to the point where it can't keep up.

Also I'm already wrapping all points adding DisableRedrawing/BeginUpdate, even before I realized how bad the performance gets for longer runs, just because that's the pattern for lists that update something.

Other advice for high-speed line series: Do not show data point symbols, use the default pen (pen styles other than psSolid, widths other than 1, Cosmetic other than true are guarantees for slow drawing in particular on Windows; only changing the pen color is safe).
Thanks for pointing that out, I already found that on the "Fast Line Series" section in the Wiki. Doesn't do much, as drawing isn't really the issue.

Interesting point about Windows at least: GDI actually optimizes pixel fill pretty well if line segments end up not moving the pointer at all (ie. neighboring points map to the same pixel), this can be tested with your demo here by replacing the random points with a constant value. 2Mpts then take 1.4s instead of 1.9s for the normal "every point matters" case. So for data that comes in with sufficiently little noise, naively emitting draw calls without any downsampling is not all that problematic.



And now I've had this post open so long that I tried something. What if PrepareGraphPoints only updates points it knows it needs to update (nb: I haven't implemented invalidation at all, and I'm not sure how to get notifications for a single changed point from the source). Not perfect and probably buggy as hell, but it does give a 3x speedup of the expensive part.

Graph is from the test project modified to add 50k points per step, fired from a timer. t_paint is first paint, t_repaint is second, t_overhead is the difference, aka how much is spent on preparing the new points.

Do you think that sort of idea is worth following?

Code: Pascal  [Select][+][-]
  1. procedure TBasicPointSeries.PrepareGraphPoints(
  2.   const AExtent: TDoubleRect; AFilterByExtent: Boolean);
  3.  
  4.   procedure UpdateRange(ALo, AUp: integer);
  5.   var
  6.     i: Integer;
  7.   begin
  8.     if (AxisIndexX < 0) and (AxisIndexY < 0) then begin
  9.       // Optimization: bypass transformations in the default case.
  10.       if Source.XCount > 0 then
  11.         for i := ALo to AUp do
  12.           with Source[i]^ do
  13.             FGraphPoints[i - FLoBound] := DoublePoint(X, Y)
  14.       else
  15.         for i := ALo to AUp do
  16.           with Source[i]^ do
  17.             FGraphPoints[i - FLoBound] := DoublePoint(i, Y);
  18.     end else
  19.       for i := ALo to AUp do
  20.         FGraphPoints[i - FLoBound] := GetGraphPoint(i);
  21.   end;
  22.  
  23. var
  24.   newCount, oldCount, shift: Integer;
  25. begin
  26.   FindExtentInterval(AExtent, AFilterByExtent);
  27.  
  28.   newCount:= Max(FUpBound - FLoBound + 1, 0);
  29.   SetLength(FGraphPoints, newCount);
  30.  
  31.   // Salvage as many points as possible
  32.   if (newCount>0) and (FGPLoBound>=0) and (FGPUpBound>=FGPLoBound) then begin
  33.     if (FLoBound>FGPUpBound) or (FUpBound<FGPLoBound) then
  34.       // no overlap at all
  35.       UpdateRange(FLoBound, FUpBound)
  36.     else begin
  37.       oldCount:= FGPUpBound - FGPLoBound + 1;
  38.       if FLoBound > FGPLoBound then begin
  39.         // old: [0 1 2 3 4 5 6 7]8 9
  40.         // new:  0 1[2 3 4 5 6 7 8]9
  41.         shift:= FLoBound - FGPLoBound;
  42.         Move(FGraphPoints[shift], FGraphPoints[0],
  43.              sizeof(TDoublePoint) * Min(oldCount - shift, newCount));
  44.         UpdateRange(FGPUpBound + 1, FUpBound);
  45.       end else
  46.       if FLoBound < FGPLoBound then begin
  47.         // old:  0 1[2 3 4 5 6 7]8 9
  48.         // new: [0 1 2 3 4 5 6 7]8 9
  49.         shift:= FGPLoBound - FLoBound;
  50.         Move(FGraphPoints[0], FGraphPoints[shift],
  51.              sizeof(TDoublePoint) * Min(oldCount, newCount));
  52.         UpdateRange(FLoBound, FGPLoBound - 1);
  53.         UpdateRange(FGPUpBound + 1, FUpBound);
  54.       end else begin
  55.         // old:  0[1 2 3 4 5 6]7 8 9
  56.         // new:  0[1 2 3 4 5 6 7 8]9
  57.         UpdateRange(FGPUpBound + 1, FUpBound);
  58.       end;
  59.     end;
  60.   end else
  61.     UpdateRange(FLoBound, FUpBound);
  62.  
  63.   FGPLoBound:= FLoBound;
  64.   FGPUpBound:= FUpBound;
  65. end;

wp

  • Hero Member
  • *****
  • Posts: 11830
Re: [TLineSeries] Slow PrepareGraphPoints on very large data sets
« Reply #3 on: May 21, 2023, 08:20:33 pm »
To see thereal effect of PrepareGraphPoints I am adding a dedicated demo which adds 500,000 data to the same single series with each button click. I clicked it 4 times to have 2 millions values in the series and set a breakpoint on the "PrepareGraphPoints" in TLineSeries.Draw. Stepping over the breakpoint in the debugger occurs instantly in the debugger, but in the line "DrawSingleLineInStack" there is a noticeable delay - this is clear, because drawing requires much more operations than just copying data into a buffer.
Wait, am I getting this right. Are you saying it does not behave like this for you? Youtube Screengrab
Well, in the linedemo I do not see any delay at these two lines, both are executed instantly (I wonder: is my computer really that much faster than yours?) I do see a difference in the demo that I added to the previous post with 2 millions of data points. And here the line PrepareGraphPoints is executed immediately, and there is some (small) delay on "DrawSinglelineInStack"

If I completely comment out the DrawSingleLineInStack call, processing 2Mpts still takes 980ms, compared to 1990ms with actual pixel ops (or 6.2s with heaptrc).
That's interesting: I am seeing a reduction by about 50%, too, when "DrawSingleLineInStack" is commented out, I would have expected a greater effect.

What if PrepareGraphPoints only updates points it knows it needs to update (nb: I haven't implemented invalidation at all, and I'm not sure how to get notifications for a single changed point from the source). Not perfect and probably buggy as hell, but it does give a 3x speedup of the expensive part.
Good.

But please understand the following arguments why I will not add your code to TAChart: This is because PrepareGraphPoints is called by all descendants of TBasicPointSeries under many conditions: unsorted chart source, changing of the transformations, deleting of points, and many more which do no come to my mind now. Your code, on the other hand, is an optimization for one specific condition, but I am rather sure that it will not work in any case and have negative side-effects. But I moved the PrepareGraphPoints method into the protected section of TBasicPointSeries and made it "virtual". Therefore, now you can write a descendant of TLineSeries in which you replace the inherited PrepareGraphPoints by your optimization.

Martok

  • New Member
  • *
  • Posts: 20
Re: [TLineSeries] Slow PrepareGraphPoints on very large data sets
« Reply #4 on: May 25, 2023, 11:16:03 pm »
Well, in the linedemo I do not see any delay at these two lines, both are executed instantly (I wonder: is my computer really that much faster than yours?)
Maybe, but I don't think that changes the relative order of things.

I've changed the program to do the test in a reproducible fashion and also output timing from within TBasicPointSeries and ran it on a few different PCs. Full results attached, here's the 600k (my real use case) and 3M (for reference) points lines of each:

Code: [Select]
Total Size        : 3000000
Block Size        : 50000
Waves / Block     : 3
Series.AxisIndexX : 1
Series.AxisIndexY : -1
Chart.Width       : 1280
Chart.Height      : 720

=== Core i7-6700HQ "Fast Prepare Patch" @ 1.66GHz ===
600000 / 3000000 add=0.016 prepare=0.032 drawSLIS=0.156 draw_total=0.242
2950000 / 3000000 add=0.016 prepare=0.056 drawSLIS=0.715 draw_total=0.938

=== Core i7-6700HQ @ 1.66GHz ===
600000 / 3000000 add=0.016 prepare=0.318 drawSLIS=0.158 draw_total=0.530
2950000 / 3000000 add=0.017 prepare=1.469 drawSLIS=0.722 draw_total=2.359

=== Core i7-6700HQ @ 3.1GHz ===
600000 / 3000000 add=0.009 prepare=0.170 drawSLIS=0.084 draw_total=0.285
2950000 / 3000000 add=0.009 prepare=0.804 drawSLIS=0.390 draw_total=1.288

=== i5-4600 @ 3.2GHz ===
600000 / 3000000 add=0.008 prepare=0.159 drawSLIS=0.084 draw_total=0.275
2950000 / 3000000 add=0.008 prepare=0.729 drawSLIS=0.376 draw_total=1.196

=== Ryzen 9 3900X ===
600000 / 3000000 add=0.005 prepare=0.128 drawSLIS=0.067 draw_total=0.216
2950000 / 3000000 add=0.006 prepare=0.589 drawSLIS=0.304 draw_total=0.958

So I do have some slow systems, but PrepareGraphPoints consistently takes about 2x longer than actual drawing, even for low point counts (yay linear runtime).

Does it do the same for you? In addition to the code in the attached zip, I've modified TLineSeries.Draw as follows:
Code: Pascal  [Select][+][-]
  1. // ...
  2.   t0:= Now;
  3.   PrepareGraphPoints(ext, LineType <> ltFromOrigin);
  4.   t1:= Now;
  5.   Write(' prepare=', FormatDateTime('s.zzz', t1-t0));
  6.   RemoveStackedNaN;
  7.   t0:= Now;
  8.   DrawSingleLineInStack(ADrawer, 0);
  9.   t1:= Now;
  10.   Write(' drawSLIS=', FormatDateTime('s.zzz', t1-t0));        
  11. // ...

I would have liked to also include my executable in case of compiler weirdness, but the attachment size limit doesn't let me.

That's interesting: I am seeing a reduction by about 50%, too, when "DrawSingleLineInStack" is commented out, I would have expected a greater effect.
Unless the testcase turns into a GDI pixel fill rate benchmark (which I'd say is a degenerate case for charting), the cost of PrepareGraphPoints is just so much larger than drawing. Also I just saw that the "same pixel" optimization is actually in TLineSeries.DrawSingleLineInStack/CacheLine, so it was actually you instead of MS who was clever here :D

But I moved the PrepareGraphPoints method into the protected section of TBasicPointSeries and made it "virtual". Therefore, now you can write a descendant of TLineSeries in which you replace the inherited PrepareGraphPoints by your optimization.
That seems like a very good idea, a TFastLineSeries that also ensures "fast" settings both on Pen etc. and Source side seems like the way to go here. I'm going to give that a go.

Edit: yeah, that is the way to go. About a 3-4x speedup from the modified PrepareGraphPoints and a stripped-down DrawSingleLine. Interestingly, downsampling to drawing resolution doesn't actually do much, most of the remaining time is iteration and mapping alone. You'd think there'd be a difference between drawing 243394 or 1212 points, but... barely 100ms.
Code: [Select]
=== Core i7-6700HQ "Fast Line Series Patch" @ 1.66GHz ===
600000 / 3000000 add=0.017 prepare=0.027 points=53598 breaks=3 drawSLIS=0.102 draw_total=0.151
2950000 / 3000000 add=0.017 prepare=0.051 points=243394 breaks=6 drawSLIS=0.462 draw_total=0.592

The comment above TManhattanSeries says it's "Optimized to work efficiently for millions of points.", and doesn't precompute the points at all. Weird question, but how sure are you that this actually saves time in the general case?
« Last Edit: May 26, 2023, 01:30:23 am by Martok »

wp

  • Hero Member
  • *****
  • Posts: 11830
Re: [TLineSeries] Slow PrepareGraphPoints on very large data sets
« Reply #5 on: May 26, 2023, 08:14:52 pm »
Nice test. These are my results:
Code: [Select]
Intel i7-10700 CPU @ 2.90; Windows 11

32 bit, -O1: 2950000 / 3000000 add=0.000 prepareGP=0.543 drawSLIS=0.331 draw_total=0.950
32 bit, -O1, no debugger: 2950000 / 3000000 add=0.000 prepareGP=0.551 drawSLIS=0.323 draw_total=0.953
32 bit, -O2: 2950000 / 3000000 add=0.000 prepareGP=0.490 drawSLIS=0.319 draw_total=0.876
32 bit, -O4: 2950000 / 3000000 add=0.000 prepareGP=0.473 drawSLIS=0.327 draw_total=0.879

64 bit, -O1: 2950000 / 3000000 add=0.011 prepareGP=0.507 drawSLIS=0.122 draw_total=0.691
64 bit, -O2: 2950000 / 3000000 add=0.016 prepareGP=0.444 drawSLIS=0.127 draw_total=0.634
64 bit, -O4: 2950000 / 3000000 add=0.000 prepareGP=0.448 drawSLIS=0.108 draw_total=0.603

So, essentially the same as you.

I tried to write a "TFastLineSeries" which does not execute PrepareGraphPoints at all, but accesses the datavalues in the chart source directly. Moreover, it simplifies the painting a lot. But it is difficult to become significantly faster than the standard TLineSeries. Thinking about it, it is clear because the conversion to graph coordinates always has to be performed, only at a different place in the code now.

The only way to speed it up is to do what you are proposing in your patch a few posts higher: to cache previously calculated data points and to calculate only those which are added.

It is a quite clever procedure, it even seems to work correctly when values are deleted rather than added. However, it has the pitfall that I suspected: When the user edits a data value somewhere in the "salvaged" part of the data range the conversion to graph coordinates is not updated, and the chart plots the old data - see attached demo (left chart: a plain-old TLineSeries, right chart: "TMyLineSeries" which inherits from TLineSeries but replaces the PrepareGraphPoints method by your optimized one; the button click changes the value at x = 5, but only the non-optimized series is updated.)
« Last Edit: May 26, 2023, 08:16:57 pm by wp »

Martok

  • New Member
  • *
  • Posts: 20
Re: [TLineSeries] Slow PrepareGraphPoints on very large data sets
« Reply #6 on: May 29, 2023, 01:11:28 am »
Thanks for the extensive tests. Should have expected it (lots ouf Doubles), but x64 being that much faster still surprises me.


So, turns out the attempt in #2 is completly buggy and also way more complex than it needed to be :-[

Better code attached, this time as an actual class and not just a weird patch.

Code: Text  [Select][+][-]
  1. === Core i7-6700HQ @ 3.1GHz ===
  2. 600000 / 3000000 add=0.009 prepare=0.170 drawSLIS=0.084 draw_total=0.285
  3. 2950000 / 3000000 add=0.009 prepare=0.804 drawSLIS=0.390 draw_total=1.288
  4.  
  5. === Core i7-6700HQ "TFastLineSeries" @ 3.1GHz ===
  6. 600000 / 3000000 add=0.010 prepare=0.013 points=53598 drawSLIS=0.057 draw_total=0.083
  7. 2950000 / 3000000 add=0.010 prepare=0.033 points=243394 drawSLIS=0.258
  8.  
  9. Full invalidate 3M pts:
  10.  prepare=0.797 points=235319 breaks=6 drawSLIS=0.254
  11.  
  12. Selective invalidate, 100kpts:
  13.  prepare=0.047 points=235319 breaks=6 drawSLIS=0.266
  14.  

Like yours, my heavily reduced Draw is only about a third faster than the general one in TLineSeries. The main advantage is in caching the points.

However, it has the pitfall that I suspected: When the user edits a data value somewhere in the "salvaged" part of the data range the conversion to graph coordinates is not updated, and the chart plots the old data
Yup, I was aware ;-)

Notifications from the Source don't carry any information what was changed, so I don't think there is a good way to invalidate that automatically. I think the most sensible way to handle invalidation is to explicitly have a user of the specialized series trigger it if the data is changed. For example in my application I only ever need full invalidation after offset correction, nothing else changes the series' data.

The results above contain two invalidations after altering the values from 50k to 150k to zero. A naive full invalidation is exactly as expensive as it used to be, a "smart" selective invalidation that keeps the largest contiguous range alive only has to rebuild the first 150k points, so it's still pretty fast.

Let's see if I can get some real tests in this week.

 

TinyPortal © 2005-2018