Recent

Author Topic: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow  (Read 10130 times)

AlexTP

  • Hero Member
  • *****
  • Posts: 1776
    • UVviewsoft
Demo is here
https://github.com/Alexey-T/FreePascal-tests/tree/master/Cocoa%20rendering%20speed%20test

GUI has 4 checkboxes:
- use TextOut with transparent BG
- use TextOut with colored font+BG
- use Rectangle
- use FillRect

all is fast on win64/gtk2, and all is slow on Cocoa.
Code: Pascal  [Select][+][-]
  1.            pc: linux gtk2 | pc: win64 | pc: osx10.8 | macbook osx10.11 (another CPU)
  2. 1st textout       20           0-15          60           85        
  3. 2nd textout       30           0-15         250           320      
  4. both textouts     105          15-45        500           580      
  5.                                                                       ..
  6. 1st rect          15           15           20            55        
  7. 2nd rect          10           15           10            30        
  8.  

Sorry that "macbook" test was on different MacBook, but it has ~~similar CPU speed as main PC. maybe it's 1.5 slower than PC.
Why Cocoa "2nd TextOut test" is SO SLOW?
« Last Edit: February 16, 2022, 07:32:49 am by Alextp »

AlexTP

  • Hero Member
  • *****
  • Posts: 1776
    • UVviewsoft
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #1 on: July 19, 2021, 05:11:49 pm »
Now benchmark of TCocoaContext.TextOut func in LCL.
It has 3 parts:
- context switching (at begin+at end of func)
- textout
- rectangle (block below "if Assigned(Rect) then")

Modified LCL
Code: Pascal  [Select][+][-]
  1. //at start if cocoaGdiObjects.pas
  2.  
  3. var
  4.   _cocoa_ctx: qword;
  5.   _cocoa_text: qword;
  6.   _cocoa_rect: qword;
  7.  
  8. //later
  9. procedure TCocoaContext.TextOut(X, Y: Integer; Options: Longint; Rect: PRect; UTF8Chars: PChar; Count: Integer; CharsDelta: PInteger);
  10. var
  11.   BrushSolid, FillBg: Boolean;
  12.   q0, q1: qword;
  13. begin
  14.   q0:= gettickcount64;
  15.   CGContextSaveGState(CGContext());
  16.   q1:= gettickcount64;
  17.   inc(_cocoa_ctx, q1-q0);
  18.   q0:= q1;
  19.  
  20.   if Assigned(Rect) then
  21.   begin
  22.     // fill background
  23.     //debugln(['TCocoaContext.TextOut ',UTF8Chars,' ',dbgs(Rect^)]);
  24.     if (Options and ETO_OPAQUE) <> 0 then
  25.     begin
  26.       BrushSolid := BkBrush.Solid;
  27.       BkBrush.Solid := True;
  28.       with Rect^ do
  29.         Rectangle(Left, Top, Right, Bottom, True, BkBrush);
  30.       BkBrush.Solid := BrushSolid;
  31.     end;
  32.  
  33.     if ((Options and ETO_CLIPPED) <> 0) and (Count > 0) then
  34.     begin
  35.       CGContextBeginPath(CGContext);
  36.       CGContextAddRect(CGContext, RectToCGrect(Rect^));
  37.       CGContextClip(CGContext);
  38.     end;
  39.   end;
  40.  
  41.   q1:= gettickcount64;
  42.   inc(_cocoa_rect, q1-q0);
  43.   q0:= q1;
  44.  
  45.   if (Count > 0) then
  46.   begin
  47.     FillBg := BkMode = OPAQUE;
  48.     if FillBg then
  49.       FText.BackgroundColor := BkBrush.ColorRef;
  50.     FText.SetText(UTF8Chars, Count);
  51.     FText.Draw(ctx, X, Y, FillBg, CharsDelta);
  52.   end;
  53.  
  54.   q1:= gettickcount64;
  55.   inc(_cocoa_text, q1-q0);
  56.   q0:= q1;
  57.  
  58.   CGContextRestoreGState(CGContext());
  59.  
  60.   q1:= gettickcount64;
  61.   inc(_cocoa_ctx, q1-q0);
  62.   q0:= q1;
  63.  
  64.   AttachedBitmap_SetModified();
  65. end;
  66.  
  67.  

Results for (context)+(textout)+(rect), in msec.
On the same PC on osx10.8.

- 1st TextOut: 1+36+0 ..... 3+53+0
- 2nd TextOut: 1+180+0 .... 3+240+0


six1

  • Full Member
  • ***
  • Posts: 110
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #2 on: July 19, 2021, 05:29:44 pm »
it is 65 on MacBook Air BigSur 11.4

Zoë

  • New Member
  • *
  • Posts: 23
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #3 on: September 16, 2021, 09:45:39 pm »
I did some playing around with raw Cocoa APIs and got some interesting and unexpected results.  The attached zip is my modifications, with a couple of different versions implemented.  Only one is enabled at a time; you'll need to adjust what's commented out to see the variations.

Timings on my system (MacBook 2012, macOS 10.13.8):

TCanvas.TextOut:
* Black fg, bsClear bg: 90ms
* Random fg, bsClear bg: 216ms
* Random fg/bg color: 300ms

NSString.drawAtPoint_withAttributes:
* No fg/bg color: 50ms
* Random fg color: 63ms
* Random fg/bg color: 260ms

NSAttributedString.drawAtPoint:
* No fg/bg color: 52ms
* Random fg color: 68ms
* Random fg/bg color: 264ms
* Random fg, NSRectFill for bg: 100ms
* Single random fg/bg color, created at loop start: 240ms
* Single random fg created at loop start, NSRectFill for bg: 87ms

NSTextStorage/NSLayoutManager (created at top of loop and cached):
* No fg/bg color: 43ms
* Random fg color: 170ms
* Random fg/bg color: 230ms
* Random fg/bg set once outside loop: 61ms
* Update with new NSAttributedString, random fg/bg: 235ms

Windows 10 TCanvas.TextOut:
* Black fg, bsClear bg: 40-60ms
* Random fg/bg: 31-47ms

The NSTextStorage/NSLayoutManager "Random fg/bg color" case (230ms) is essentially what's implemented by LCLCocoa's TCanvas.TextOut.  The idea is that the string is set/decoded into font glyphs once then cached, and then setting the foreground/background color just updates the drawing.  Based on my understanding of the various APIs, I would have expected that to be the fastest.

Relying on the text apis to do the background fill is extremely slow, and again, I'm not sure why.  Replacing the background with a simple NSRectFill dramatically speeds things up.  The one exception is when using NSTextStorage with a single cached foreground/background, which is fast.  Aside from that case, the fastest looks to be NSAttributedString with a separate NSRectFill call.

Moving the for loop to only enclose NS(Attribute)String creation/painting, so the context saving/restoring is only done once improves things by 5-10ms, but that's probably not possible in the general case.

It might be possible to get faster speed for the random foreground color using CoreText, but I haven't tested that, and you'd need to dig down a few layers for it.  CTLine are created from an NSAttributedString and CTLineDraw uses the foreground color it has, so using that won't gain anything.  Using CTRunDraw or whatever the direct glyph drawing function is might allow more control.

There's a commented out block in CreateWnd that might be worth testing in your more complicated apps.  It tells the system that the app doesn't use HDR color spaces, and was suggested by one of our customers based on a similar patch in LibreOffice.  In the sample app it doesn't change the speed, but in Beyond Compare it noticably improved resize speed on 4K+ monitors.  We're still evaluating it, so I can't promote it as a patch just yet.

The attached patch is almost certainly wrong and definitely needs cleanup, but adds an optimized NSAttributedString + NSRectFill path to TCocoaContext.TextOut which improves performance of the benchmark by about 50%.  I have not tested it outside the benchmark.
« Last Edit: September 17, 2021, 12:46:46 am by Zoë »

AlexTP

  • Hero Member
  • *****
  • Posts: 1776
    • UVviewsoft
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #4 on: September 21, 2021, 07:25:19 pm »
Trying Zoe's patch in real app CudaText.
Yet I don't see regressions.
Rendering of bold/italic is ok, Dx offsets are still supported it seems.

Speed: CudaText frame render time is now ~2x faster (when Dx is off, CudaText has option for it).
« Last Edit: September 21, 2021, 08:04:39 pm by Alextp »

AlexTP

  • Hero Member
  • *****
  • Posts: 1776
    • UVviewsoft
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #5 on: September 21, 2021, 08:04:14 pm »
Ops, corrected my test, I compared with too old app version.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 7892
  • Debugger - SynEdit - and more
    • wiki
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #6 on: September 29, 2021, 09:54:15 pm »
Having noted this here due to the mail list.... Now I am not the expert on Cocoa, I can't give any specifics to it.
So this post of mine may end up a one-off contribution to this thread.

What I do know is that indeed "CharsDelta" is a feature from Windows.

It is mainly used to force chars into a monospaced grid. Though in theory every cell (for each char) can have it's very own width.
It basically forces each char to be adapted to a given width.

And it appears that is a Windows only thing. And other WS have various "workaround implementations". Usually those are slow.
And may have other Side effects, such as script fonts (Arabic) not looking correct at all, because the text may be drawn one char at a time....

==> Therefore it is probably correct to split textout and other routines into with/without CharsDelta.


As to how this is best achieved on any WS, well I do not have that answer.

One avenue possible worth exploring is, if this was implemented in the original Cocoa or later added.
Afaik in gtk2 it was latter added. But cocoa is a younger WS, and it may have been done from the start.

Anyway, one might get lucky exploring this code with "git blame". Maybe there is an older and faster version.

Zoë

  • New Member
  • *
  • Posts: 23
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #7 on: September 29, 2021, 11:30:07 pm »
What I do know is that indeed "CharsDelta" is a feature from Windows.
...
Therefore it is probably correct to split textout and other routines into with/without CharsDelta.

I agree that the simple case should be an optimized path if it isn't already, but also, the CharsDelta support is not the only overhead, and may not even be the most significant factor.

Aside from the TCanvas.TextOut calls, none of the timings I gave use the LCL GDI routines for drawing at all, and don't attempt to emulate CharsDelta, but still have unexpected slowdowns.  Using simple, raw Cocoa calls to just draw strings with a particular font/foreground/background color shows that having that routine fill the background color is significantly slower than measuring the string, manually filling the background rect, then drawing the foreground text on top.  That doesn't make any sense.  Changing the foreground color of the existing string object similarly takes longer than it should.  Based on my understanding of how the macOS text rendering systems work, that shouldn't be the case

AlexTP

  • Hero Member
  • *****
  • Posts: 1776
    • UVviewsoft
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #8 on: January 14, 2022, 10:31:06 am »
Quote
I agree that the simple case should be an optimized path if it isn't already, but also, the CharsDelta support is not the only overhead, and may not even be the most significant factor.

Even if CharsDelta is not the most significant factor, let's apply the current Zoe's patch? It speeds up things, and I will be 'safe' if I forgot to apply the patch (using Lazarus latest).
Dmitry, if you agree, pls apply in your fork?
Later people will improve things.

Zoë

  • New Member
  • *
  • Posts: 23
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #9 on: January 14, 2022, 10:04:42 pm »
David is working on a proper fix for this issue.  We've had some issues crop up with macOS 12 that are taking priority right now, but we should have a full patch relatively soon.  My patch breaks support for underline and strikethrough styles and I still do not support applying it as-is.

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2732
    • havefunsoft.com
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #10 on: January 14, 2022, 10:10:11 pm »
We've had some issues crop up with macOS 12
anything for LCL to be concerned about?

Zoë

  • New Member
  • *
  • Posts: 23
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #11 on: January 14, 2022, 11:41:00 pm »
anything for LCL to be concerned about?

No. 

We use the ScrollWindowEx API pretty extensively for our internal editor/grid/treeview controls.  LCLCOCOA's implementation just invalidates the entire control rather than using a bitblit like Win32/Qt do, so it's a lot slower.  We had an alternate implementation that used NSView.scrollRect_by instead, but Apple changed that in 10.14 to do a full invalidate too, so it's no longer any better.  We worked around that by having a new TCustomControl subclass that uses double buffering and System.Move to scroll the contents.  Unfortunately, to avoid conversion overhead, the backing bitmap and context have to exactly match what macOS is using for the view's drawing surface, and that's been very fragile.  Long term we're now planning on making our custom controls use NSScrollView natively instead to avoid fighting Apple, but in the meantime we've had to work around breaks/crashes whenever macOS or Xcode get updated.


Zoë

  • New Member
  • *
  • Posts: 23
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #12 on: January 15, 2022, 12:06:12 am »
anything for LCL to be concerned about?

Oh, there was one thing:  We needed to make some changes to NSTableView related to the new "style" property they introduced, which added a bunch of padding to listboxes.  The one I know of is calling NSTableView.setStyle(NSTableViewStyleFullWidth) in TCocoaTableListView.initWithFrame.  I think that was Xcode and/or MACOSX_DEPLOYMENT_TARGET dependent. 

It looks like we have quite a few changes to  CocoaTables.pas though, and I think we use it in a non-standard configuration (I see DYNAMIC_NSTABLEVIEW_BASE defined), so I don't know how much affects stock LCL.  I'll have David look into contributing the relevant changes upstream after he gets the TextOut patch finalized.

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2732
    • havefunsoft.com
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #13 on: January 15, 2022, 04:35:37 am »
It looks like we have quite a few changes to  CocoaTables.pas though, and I think we use it in a non-standard configuration (I see DYNAMIC_NSTABLEVIEW_BASE defined), so I don't know how much affects stock LCL.  I'll have David look into contributing the relevant changes upstream after he gets the TextOut patch finalized.
Apple has not remove NSCell APIs yet (and not much deprecation is happening there).
I think for now NSCell has more features implemented, but I'm all for removal NSCell based implementation at some point.

We needed to make some changes to NSTableView related to the new "style" property they introduced
ah yes. Now I recall that TListView looks a bit weird on macOS 12.
thanks for the highlight!

Zoë

  • New Member
  • *
  • Posts: 23
Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
« Reply #14 on: February 10, 2022, 12:58:08 am »
https://gitlab.com/freepascal.org/lazarus/lazarus/-/issues/39641

Polished patch finally submitted.  :D  David doesn't monitor the forums, so any feedback related to it should happen in the bug tracker.

One thing to note: TCanvas.TextOut calls both ExtUTF8Out and TextWidth, which adds about 30% overhead relative to calling just ExtUTF8Out on its own.  There doesn't appear to be a way to eliminate that in a way that would satisfy the existing design constraints without optimizing for this specific edge case.  It's already significantly faster, but if you need a bit of extra speed just use ExtUTF8Out directly instead.
« Last Edit: February 10, 2022, 01:02:18 am by Zoë »

 

TinyPortal © 2005-2018