Lazarus

Programming => Widgetset => Cocoa => Topic started by: AlexTP on July 18, 2021, 11:14:11 pm

Title: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: AlexTP on July 18, 2021, 11:14:11 pm
Demo is here
https://github.com/Alexey-T/FreePascal-tests/tree/master/Cocoa%20rendering%20speed%20test

GUI has 4 checkboxes:
- use TextOut with transparent BG
- use TextOut with colored font+BG
- use Rectangle
- use FillRect

all is fast on win64/gtk2, and all is slow on Cocoa.
Code: Pascal  [Select][+][-]
  1.            pc: linux gtk2 | pc: win64 | pc: osx10.8 | macbook osx10.11 (another CPU)
  2. 1st textout       20           0-15          60           85        
  3. 2nd textout       30           0-15         250           320      
  4. both textouts     105          15-45        500           580      
  5.                                                                       ..
  6. 1st rect          15           15           20            55        
  7. 2nd rect          10           15           10            30        
  8.  

Sorry that "macbook" test was on different MacBook, but it has ~~similar CPU speed as main PC. maybe it's 1.5 slower than PC.
Why Cocoa "2nd TextOut test" is SO SLOW?
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: AlexTP on July 19, 2021, 05:11:49 pm
Now benchmark of TCocoaContext.TextOut func in LCL.
It has 3 parts:
- context switching (at begin+at end of func)
- textout
- rectangle (block below "if Assigned(Rect) then")

Modified LCL
Code: Pascal  [Select][+][-]
  1. //at start of cocoaGdiObjects.pas
  2.  
  3. var
  4.   _cocoa_ctx: qword;
  5.   _cocoa_text: qword;
  6.   _cocoa_rect: qword;
  7.  
  8. //later
  9. procedure TCocoaContext.TextOut(X, Y: Integer; Options: Longint; Rect: PRect; UTF8Chars: PChar; Count: Integer; CharsDelta: PInteger);
  10. var
  11.   BrushSolid, FillBg: Boolean;
  12.   q0, q1: qword;
  13. begin
  14.   q0:= gettickcount64;
  15.   CGContextSaveGState(CGContext());
  16.   q1:= gettickcount64;
  17.   inc(_cocoa_ctx, q1-q0);
  18.   q0:= q1;
  19.  
  20.   if Assigned(Rect) then
  21.   begin
  22.     // fill background
  23.     //debugln(['TCocoaContext.TextOut ',UTF8Chars,' ',dbgs(Rect^)]);
  24.     if (Options and ETO_OPAQUE) <> 0 then
  25.     begin
  26.       BrushSolid := BkBrush.Solid;
  27.       BkBrush.Solid := True;
  28.       with Rect^ do
  29.         Rectangle(Left, Top, Right, Bottom, True, BkBrush);
  30.       BkBrush.Solid := BrushSolid;
  31.     end;
  32.  
  33.     if ((Options and ETO_CLIPPED) <> 0) and (Count > 0) then
  34.     begin
  35.       CGContextBeginPath(CGContext);
  36.       CGContextAddRect(CGContext, RectToCGrect(Rect^));
  37.       CGContextClip(CGContext);
  38.     end;
  39.   end;
  40.  
  41.   q1:= gettickcount64;
  42.   inc(_cocoa_rect, q1-q0);
  43.   q0:= q1;
  44.  
  45.   if (Count > 0) then
  46.   begin
  47.     FillBg := BkMode = OPAQUE;
  48.     if FillBg then
  49.       FText.BackgroundColor := BkBrush.ColorRef;
  50.     FText.SetText(UTF8Chars, Count);
  51.     FText.Draw(ctx, X, Y, FillBg, CharsDelta);
  52.   end;
  53.  
  54.   q1:= gettickcount64;
  55.   inc(_cocoa_text, q1-q0);
  56.   q0:= q1;
  57.  
  58.   CGContextRestoreGState(CGContext());
  59.  
  60.   q1:= gettickcount64;
  61.   inc(_cocoa_ctx, q1-q0);
  62.   q0:= q1;
  63.  
  64.   AttachedBitmap_SetModified();
  65. end;
  66.  
  67.  

Results for (context)+(textout)+(rect), in msec.
On the same PC on osx10.8.

- 1st TextOut: 1+36+0 ..... 3+53+0
- 2nd TextOut: 1+180+0 .... 3+240+0

Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: six1 on July 19, 2021, 05:29:44 pm
it is 65 on MacBook Air BigSur 11.4
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: Zoë on September 16, 2021, 09:45:39 pm
I did some playing around with raw Cocoa APIs and got some interesting and unexpected results.  The attached zip is my modifications, with a couple of different versions implemented.  Only one is enabled at a time; you'll need to adjust what's commented out to see the variations.

Timings on my system (MacBook 2012, macOS 10.13.8):

TCanvas.TextOut:
* Black fg, bsClear bg: 90ms
* Random fg, bsClear bg: 216ms
* Random fg/bg color: 300ms

NSString.drawAtPoint_withAttributes:
* No fg/bg color: 50ms
* Random fg color: 63ms
* Random fg/bg color: 260ms

NSAttributedString.drawAtPoint:
* No fg/bg color: 52ms
* Random fg color: 68ms
* Random fg/bg color: 264ms
* Random fg, NSRectFill for bg: 100ms
* Single random fg/bg color, created at loop start: 240ms
* Single random fg created at loop start, NSRectFill for bg: 87ms

NSTextStorage/NSLayoutManager (created at top of loop and cached):
* No fg/bg color: 43ms
* Random fg color: 170ms
* Random fg/bg color: 230ms
* Random fg/bg set once outside loop: 61ms
* Update with new NSAttributedString, random fg/bg: 235ms

Windows 10 TCanvas.TextOut:
* Black fg, bsClear bg: 40-60ms
* Random fg/bg: 31-47ms

The NSTextStorage/NSLayoutManager "Random fg/bg color" case (230ms) is essentially what's implemented by LCLCocoa's TCanvas.TextOut.  The idea is that the string is set/decoded into font glyphs once then cached, and then setting the foreground/background color just updates the drawing.  Based on my understanding of the various APIs, I would have expected that to be the fastest.

Relying on the text apis to do the background fill is extremely slow, and again, I'm not sure why.  Replacing the background with a simple NSRectFill dramatically speeds things up.  The one exception is when using NSTextStorage with a single cached foreground/background, which is fast.  Aside from that case, the fastest looks to be NSAttributedString with a separate NSRectFill call.

Moving the for loop to only enclose NS(Attribute)String creation/painting, so the context saving/restoring is only done once improves things by 5-10ms, but that's probably not possible in the general case.

It might be possible to get faster speed for the random foreground color using CoreText, but I haven't tested that, and you'd need to dig down a few layers for it.  CTLine are created from an NSAttributedString and CTLineDraw uses the foreground color it has, so using that won't gain anything.  Using CTRunDraw or whatever the direct glyph drawing function is might allow more control.

There's a commented out block in CreateWnd that might be worth testing in your more complicated apps.  It tells the system that the app doesn't use HDR color spaces, and was suggested by one of our customers based on a similar patch in LibreOffice.  In the sample app it doesn't change the speed, but in Beyond Compare it noticably improved resize speed on 4K+ monitors.  We're still evaluating it, so I can't promote it as a patch just yet.

The attached patch is almost certainly wrong and definitely needs cleanup, but adds an optimized NSAttributedString + NSRectFill path to TCocoaContext.TextOut which improves performance of the benchmark by about 50%.  I have not tested it outside the benchmark.
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: AlexTP on September 21, 2021, 07:25:19 pm
Trying Zoe's patch in real app CudaText.
Yet I don't see regressions.
Rendering of bold/italic is ok, Dx offsets are still supported it seems.

Speed: CudaText frame render time is now ~2x faster (when Dx is off, CudaText has option for it).
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: AlexTP on September 21, 2021, 08:04:14 pm
Ops, corrected my test, I compared with too old app version.
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: Martin_fr on September 29, 2021, 09:54:15 pm
Having noted this here due to the mail list.... Now I am not the expert on Cocoa, I can't give any specifics to it.
So this post of mine may end up a one-off contribution to this thread.

What I do know is that indeed "CharsDelta" is a feature from Windows.

It is mainly used to force chars into a monospaced grid. Though in theory every cell (for each char) can have it's very own width.
It basically forces each char to be adapted to a given width.

And it appears that is a Windows only thing. And other WS have various "workaround implementations". Usually those are slow.
And may have other Side effects, such as script fonts (Arabic) not looking correct at all, because the text may be drawn one char at a time....

==> Therefore it is probably correct to split textout and other routines into with/without CharsDelta.


As to how this is best achieved on any WS, well I do not have that answer.

One avenue possible worth exploring is, if this was implemented in the original Cocoa or later added.
Afaik in gtk2 it was latter added. But cocoa is a younger WS, and it may have been done from the start.

Anyway, one might get lucky exploring this code with "git blame". Maybe there is an older and faster version.
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: Zoë on September 29, 2021, 11:30:07 pm
What I do know is that indeed "CharsDelta" is a feature from Windows.
...
Therefore it is probably correct to split textout and other routines into with/without CharsDelta.

I agree that the simple case should be an optimized path if it isn't already, but also, the CharsDelta support is not the only overhead, and may not even be the most significant factor.

Aside from the TCanvas.TextOut calls, none of the timings I gave use the LCL GDI routines for drawing at all, and don't attempt to emulate CharsDelta, but still have unexpected slowdowns.  Using simple, raw Cocoa calls to just draw strings with a particular font/foreground/background color shows that having that routine fill the background color is significantly slower than measuring the string, manually filling the background rect, then drawing the foreground text on top.  That doesn't make any sense.  Changing the foreground color of the existing string object similarly takes longer than it should.  Based on my understanding of how the macOS text rendering systems work, that shouldn't be the case
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: AlexTP on January 14, 2022, 10:31:06 am
Quote
I agree that the simple case should be an optimized path if it isn't already, but also, the CharsDelta support is not the only overhead, and may not even be the most significant factor.

Even if CharsDelta is not the most significant factor, let's apply the current Zoe's patch? It speeds up things, and I will be 'safe' if I forgot to apply the patch (using Lazarus latest).
Dmitry, if you agree, pls apply in your fork?
Later people will improve things.
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: Zoë on January 14, 2022, 10:04:42 pm
David is working on a proper fix for this issue.  We've had some issues crop up with macOS 12 that are taking priority right now, but we should have a full patch relatively soon.  My patch breaks support for underline and strikethrough styles and I still do not support applying it as-is.
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: skalogryz on January 14, 2022, 10:10:11 pm
We've had some issues crop up with macOS 12
anything for LCL to be concerned about?
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: Zoë on January 14, 2022, 11:41:00 pm
anything for LCL to be concerned about?

No. 

We use the ScrollWindowEx API pretty extensively for our internal editor/grid/treeview controls.  LCLCOCOA's implementation just invalidates the entire control rather than using a bitblit like Win32/Qt do, so it's a lot slower.  We had an alternate implementation that used NSView.scrollRect_by instead, but Apple changed that in 10.14 to do a full invalidate too, so it's no longer any better.  We worked around that by having a new TCustomControl subclass that uses double buffering and System.Move to scroll the contents.  Unfortunately, to avoid conversion overhead, the backing bitmap and context have to exactly match what macOS is using for the view's drawing surface, and that's been very fragile.  Long term we're now planning on making our custom controls use NSScrollView natively instead to avoid fighting Apple, but in the meantime we've had to work around breaks/crashes whenever macOS or Xcode get updated.

Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: Zoë on January 15, 2022, 12:06:12 am
anything for LCL to be concerned about?

Oh, there was one thing:  We needed to make some changes to NSTableView related to the new "style (https://developer.apple.com/documentation/appkit/nstableview/3622475-style?language=objc)" property they introduced, which added a bunch of padding to listboxes.  The one I know of is calling NSTableView.setStyle(NSTableViewStyleFullWidth) in TCocoaTableListView.initWithFrame.  I think that was Xcode and/or MACOSX_DEPLOYMENT_TARGET dependent. 

It looks like we have quite a few changes to  CocoaTables.pas though, and I think we use it in a non-standard configuration (I see DYNAMIC_NSTABLEVIEW_BASE defined), so I don't know how much affects stock LCL.  I'll have David look into contributing the relevant changes upstream after he gets the TextOut patch finalized.
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: skalogryz on January 15, 2022, 04:35:37 am
It looks like we have quite a few changes to  CocoaTables.pas though, and I think we use it in a non-standard configuration (I see DYNAMIC_NSTABLEVIEW_BASE defined), so I don't know how much affects stock LCL.  I'll have David look into contributing the relevant changes upstream after he gets the TextOut patch finalized.
Apple has not remove NSCell APIs yet (and not much deprecation is happening there).
I think for now NSCell has more features implemented, but I'm all for removal NSCell based implementation at some point.

We needed to make some changes to NSTableView related to the new "style" property they introduced
ah yes. Now I recall that TListView looks a bit weird on macOS 12.
thanks for the highlight!
Title: Re: Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: Zoë on February 10, 2022, 12:58:08 am
https://gitlab.com/freepascal.org/lazarus/lazarus/-/issues/39641

Polished patch finally submitted.  :D  David doesn't monitor the forums, so any feedback related to it should happen in the bug tracker.

One thing to note: TCanvas.TextOut calls both ExtUTF8Out and TextWidth, which adds about 30% overhead relative to calling just ExtUTF8Out on its own.  There doesn't appear to be a way to eliminate that in a way that would satisfy the existing design constraints without optimizing for this specific edge case.  It's already significantly faster, but if you need a bit of extra speed just use ExtUTF8Out directly instead.
Title: Re: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: skalogryz on February 18, 2022, 02:48:40 am
I'm observing some differences in text drawing before and after the patch.

It does appear that the text is drawn a pixel lower than before and the bottom part is cutoff.

This might be due to a differences in the text measurement as the screen shot is made based on SynEdit component.
update
that seems to do the trick... but it doesn't feel to be the proper fix.
Code: Diff  [Select][+][-]
  1. Index: lcl/interfaces/cocoa/cocoagdiobjects.pas
  2. ===================================================================
  3. --- lcl/interfaces/cocoa/cocoagdiobjects.pas    (revision 67313)
  4. +++ lcl/interfaces/cocoa/cocoagdiobjects.pas    (working copy)
  5. @@ -1934,7 +1934,7 @@
  6.  
  7.            CGContextTranslateCTM(cg, 0, FSize.Height);
  8.            CGContextScaleCTM(cg, 1, -1);
  9. -          YPrime := FSize.Height - y - Font.Font.ascender;
  10. +          YPrime := FSize.Height - y - Font.Font.ascender+1;
  11.            CGContextSetTextPosition(cg, x, YPrime);
  12.  
  13.            if CharsDelta = nil then
Title: Re: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: Zoë on February 18, 2022, 03:06:16 am
What font and size are you using?

There is some difference between CoreText and the higher level APIs, which is discussed here: https://stackoverflow.com/questions/5511830/how-does-line-spacing-work-in-core-text-and-why-is-it-different-from-nslayoutm

David’s second patch solved the issue of clipped text for us, and most fonts look correct, but we did see a few that were drawing at the top of the rect and we couldn’t figure out a fix. I haven’t seen any that were too low though. It seems to be an issue with specific font constructions (e.g., no external leading).
Title: Re: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: skalogryz on February 18, 2022, 03:11:34 am
What font and size are you using?
the default preselected on TSynEdit.

on Monterey it's reported as "Andale Mono", Height 10

It seems to be an issue with specific font constructions (e.g., no external leading).
I'd think that the vertical placement is off. (as shown in the patch above).
(on the screenshot, the post-patched lines are drawn one pixel lower, than before the patch)

As always - Cocoa necessity to flip Y coordinates back and worth doesn't help.

There is some difference between CoreText and the higher level APIs, which is discussed here: https://stackoverflow.com/questions/5511830/how-does-line-spacing-work-in-core-text-and-why-is-it-different-from-nslayoutm
I don't believe the line spacing is involved in here.
SynEdit draws the text per line and does it's own line spacing (not related to the font line spacing at all)
Title: Re: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: davidfjenkins on February 18, 2022, 10:08:03 pm
Dmitry,

What are you using to generate the screen shot with the lower part of the y cut off?  I know it is synedit related but in what format?

I've tried building trunk laz-ide and I am not seeing the descent values of text getting cut off like in your screen shot.

I would like to test out some thoughts about rounding error but need to be able to repeat the problem

Thanks
David
Title: Re: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: davidfjenkins on February 19, 2022, 12:48:19 am
I think the change that is actually making the difference might be the changes to TCocoaContext.GetTextExtentPoint().

The original code, before any of our cocoagdiobjects.pas patches were applied created an NSMutableAttributedString, set the current font and then used:

r :=M.boundingRectWithSize_options(NSMakeSize(Maxint, Maxint), 0);
height := ROund(r.size.height).

New code uses CTLineGetTypographicBounds() and then

height := Round(ascent + abs(descent) + lead);

I put the boundingRectwithsize() code back in and checked the two different results before they were rounded. 

For one font I got
original code: height = 17.0 , new code: height = 15.31

for another font:
original code: height = 16.0 , new code: height = 14.13

and
original code: height = 14.0 , new code: height = 12.96.

This is important, I believe, because SynEdit spaces its text lines by
FTextHeight := CharHeight + FCurrentExtraLineSpace.  Where CharHeight initial calculation is based on calling GetTextExtentPoint.

I could not repeat the clipping you were seeing when I compiled Laz IDE at first.  But if I went into preferences and changed Current Extra Line Space and made it smaller then I did get clipping.  You can probably make your clipping go away by increasing Current Extra Line Space.

You could also test this by removing the +1 in .TextOUt and instead doing a +1 on the height returned in GetTextExtentPoint().

That's my theory.  This issue is that I spent particular time trying to get GetTextExtentPoint() correct.  This is the second patch I sent in.  The initial patch had replaced the NSMutableAttributedString call and was using CTLineGetBoundswithOptions() instead.  I looked the height value returned for CTLineGetBOundsWithOptions() and noticed that it was larger than ascent+descent+lead.  I haven't checked this but is probably pretty close to what the NSMutableAttributedString.boundRectWithSize.

I wasn't sure which of the values was correct.  The one that matched ascent+descent+lead (which is what GetTextMetrics returns) or the larger values from the two functions that returned bounds. 

So I went and looked at what windows GetTextExtentPoint returns.  It appears to me that GetTextExtentPoint() on windows returns ascent+descent+lead.  And the GetTextMetrics on windows also returns that same value for Tm.height.

So my last patch made GetTextExtentPoint() return a height that matches what I saw Windows doing.  That value IS smaller than previous code and could well be what is causing the difference you are seeing.

If ascent+descent+lead is not the correct value for GetTextExtentPoint() to return for height and if that is what is causing the SynEdit clipping you are seeing then I suggest reverting my last patch on cocoagdiobjects.pas and seeing if that fixes the problem.




Title: Re: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: skalogryz on February 19, 2022, 03:51:24 am
What are you using to generate the screen shot with the lower part of the y cut off?  I know it is synedit related but in what format?
a plain project with TSynEdit on it and a bunch of lines "SynEdit" set.
nothing special really.
The TSynEdits font is not modified.
Title: Re: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: davidfjenkins on March 02, 2022, 07:17:10 pm
I was able to repeat the clipping by setting up the simple SynEdit app.  At first it only repeated if I compiled on macos 10.14 and not my 11.6 machine.  But then we noticed that the default font size for TSynEdit was different between the two

10.14 Font.Height = $0000000A
11.6 Font.Height = -13

When I change to font.height = 10, I was able to repeat the clipping.

I have submitted a new patch in bug report 39660.  The specific issue the patch fixes that caused the clipping is described in the report.  Summary is that I needed to make sure that whereever ascent and/or descent values are used, they are converted from float->integer in the same fashion.

I looked at GetTextMetrics on Windows to see how the conversion ws done there.  It is always Round(ascent) and Round(descent).  cocoagdiobjects.pas now emulates that behavior and is consistent across the three functions GetTextMetrics, GetTextExtentPoint, and TextOut
Title: Re: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: Zoë on March 02, 2022, 07:28:44 pm
Some additional info related to line layout that we ran into that David's patch doesn't touch on:

There are fonts installed on macOS by default that draw flush to the top of the rect with excessive empty space at the bottom (e.g., DIN Condensed).  Using those same fonts on Windows draws them in the middle of the rect with space both above and below the text like you'd expect.  This is not a bug in LCL.  It turns out the TTF font files actually have two different headers that store the ascent/descent/leading values, and they aren't guaranteed to be in sync.  Windows reads one and macOS uses the other.  Using TextEdit with those same fonts shows the same behavior.
Title: Re: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: davidfjenkins on March 03, 2022, 04:44:21 pm
The two headers are HHEA - Horizontal Table Header

  https://docs.microsoft.com/en-us/typography/opentype/spec/hhea

and OS/2 Windows Metrics Table. 

  https://docs.microsoft.com/en-us/typography/opentype/spec/os2

The HHEA has one set of ascent/descent/lead values. that are described as typographic values and are Apple specific (see the msdn documentation above).

ascender
descender
LineGap

The OS/2 windows tables has two sets of values:

sTypoAscender
sTypoDescender
sTypoLineGap

usWinAscent
usWinDescent

I have looked at ttf files that have different values for all three sets though the hhea values and teh sTypo values can be the same.  There is a flag USE_TYPO_METRICS that indicates to use sTypo values instead of usWin values. 

But as Zoe points out, the important point is that the same font file can easily result in different vertical placement of text between Windows and Cocoa.
Title: Re: [Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
Post by: trev on March 04, 2022, 01:57:53 am
Quote
Dependencies

A number of fields in the 'OS/2' table replicate data found elsewhere in the font; most notably, the various ascent and descent fields mirror some of the contents of the horizontal header table. The macOS derives ascent and descent information from the latter; Windows from the former.

Fonts intended to be used on both platforms must have consistent values for the font's ascent and descent in these two tables.

Source: https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6OS2.html

So perhaps some fonts are not intended to be used on macOS but only Windows etc.
TinyPortal © 2005-2018