Forum > Cocoa
[Solved] Demo to benchmark TextOut on win64/gtk2/Cocoa - Cocoa is very slow
AlexTP:
Demo is here
https://github.com/Alexey-T/FreePascal-tests/tree/master/Cocoa%20rendering%20speed%20test
GUI has 4 checkboxes:
- use TextOut with transparent BG
- use TextOut with colored font+BG
- use Rectangle
- use FillRect
all is fast on win64/gtk2, and all is slow on Cocoa.
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} --- pc: linux gtk2 | pc: win64 | pc: osx10.8 | macbook osx10.11 (another CPU) 1st textout 20 0-15 60 85 2nd textout 30 0-15 250 320 both textouts 105 15-45 500 580 ..1st rect 15 15 20 55 2nd rect 10 15 10 30
Sorry that "macbook" test was on different MacBook, but it has ~~similar CPU speed as main PC. maybe it's 1.5 slower than PC.
Why Cocoa "2nd TextOut test" is SO SLOW?
AlexTP:
Now benchmark of TCocoaContext.TextOut func in LCL.
It has 3 parts:
- context switching (at begin+at end of func)
- textout
- rectangle (block below "if Assigned(Rect) then")
Modified LCL
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---//at start if cocoaGdiObjects.pas var _cocoa_ctx: qword; _cocoa_text: qword; _cocoa_rect: qword; //laterprocedure TCocoaContext.TextOut(X, Y: Integer; Options: Longint; Rect: PRect; UTF8Chars: PChar; Count: Integer; CharsDelta: PInteger);var BrushSolid, FillBg: Boolean; q0, q1: qword;begin q0:= gettickcount64; CGContextSaveGState(CGContext()); q1:= gettickcount64; inc(_cocoa_ctx, q1-q0); q0:= q1; if Assigned(Rect) then begin // fill background //debugln(['TCocoaContext.TextOut ',UTF8Chars,' ',dbgs(Rect^)]); if (Options and ETO_OPAQUE) <> 0 then begin BrushSolid := BkBrush.Solid; BkBrush.Solid := True; with Rect^ do Rectangle(Left, Top, Right, Bottom, True, BkBrush); BkBrush.Solid := BrushSolid; end; if ((Options and ETO_CLIPPED) <> 0) and (Count > 0) then begin CGContextBeginPath(CGContext); CGContextAddRect(CGContext, RectToCGrect(Rect^)); CGContextClip(CGContext); end; end; q1:= gettickcount64; inc(_cocoa_rect, q1-q0); q0:= q1; if (Count > 0) then begin FillBg := BkMode = OPAQUE; if FillBg then FText.BackgroundColor := BkBrush.ColorRef; FText.SetText(UTF8Chars, Count); FText.Draw(ctx, X, Y, FillBg, CharsDelta); end; q1:= gettickcount64; inc(_cocoa_text, q1-q0); q0:= q1; CGContextRestoreGState(CGContext()); q1:= gettickcount64; inc(_cocoa_ctx, q1-q0); q0:= q1; AttachedBitmap_SetModified();end;
Results for (context)+(textout)+(rect), in msec.
On the same PC on osx10.8.
- 1st TextOut: 1+36+0 ..... 3+53+0
- 2nd TextOut: 1+180+0 .... 3+240+0
six1:
it is 65 on MacBook Air BigSur 11.4
Zoƫ:
I did some playing around with raw Cocoa APIs and got some interesting and unexpected results. The attached zip is my modifications, with a couple of different versions implemented. Only one is enabled at a time; you'll need to adjust what's commented out to see the variations.
Timings on my system (MacBook 2012, macOS 10.13.8):
TCanvas.TextOut:
* Black fg, bsClear bg: 90ms
* Random fg, bsClear bg: 216ms
* Random fg/bg color: 300ms
NSString.drawAtPoint_withAttributes:
* No fg/bg color: 50ms
* Random fg color: 63ms
* Random fg/bg color: 260ms
NSAttributedString.drawAtPoint:
* No fg/bg color: 52ms
* Random fg color: 68ms
* Random fg/bg color: 264ms
* Random fg, NSRectFill for bg: 100ms
* Single random fg/bg color, created at loop start: 240ms
* Single random fg created at loop start, NSRectFill for bg: 87ms
NSTextStorage/NSLayoutManager (created at top of loop and cached):
* No fg/bg color: 43ms
* Random fg color: 170ms
* Random fg/bg color: 230ms
* Random fg/bg set once outside loop: 61ms
* Update with new NSAttributedString, random fg/bg: 235ms
Windows 10 TCanvas.TextOut:
* Black fg, bsClear bg: 40-60ms
* Random fg/bg: 31-47ms
The NSTextStorage/NSLayoutManager "Random fg/bg color" case (230ms) is essentially what's implemented by LCLCocoa's TCanvas.TextOut. The idea is that the string is set/decoded into font glyphs once then cached, and then setting the foreground/background color just updates the drawing. Based on my understanding of the various APIs, I would have expected that to be the fastest.
Relying on the text apis to do the background fill is extremely slow, and again, I'm not sure why. Replacing the background with a simple NSRectFill dramatically speeds things up. The one exception is when using NSTextStorage with a single cached foreground/background, which is fast. Aside from that case, the fastest looks to be NSAttributedString with a separate NSRectFill call.
Moving the for loop to only enclose NS(Attribute)String creation/painting, so the context saving/restoring is only done once improves things by 5-10ms, but that's probably not possible in the general case.
It might be possible to get faster speed for the random foreground color using CoreText, but I haven't tested that, and you'd need to dig down a few layers for it. CTLine are created from an NSAttributedString and CTLineDraw uses the foreground color it has, so using that won't gain anything. Using CTRunDraw or whatever the direct glyph drawing function is might allow more control.
There's a commented out block in CreateWnd that might be worth testing in your more complicated apps. It tells the system that the app doesn't use HDR color spaces, and was suggested by one of our customers based on a similar patch in LibreOffice. In the sample app it doesn't change the speed, but in Beyond Compare it noticably improved resize speed on 4K+ monitors. We're still evaluating it, so I can't promote it as a patch just yet.
The attached patch is almost certainly wrong and definitely needs cleanup, but adds an optimized NSAttributedString + NSRectFill path to TCocoaContext.TextOut which improves performance of the benchmark by about 50%. I have not tested it outside the benchmark.
AlexTP:
Trying Zoe's patch in real app CudaText.
Yet I don't see regressions.
Rendering of bold/italic is ok, Dx offsets are still supported it seems.
Speed: CudaText frame render time is now ~2x faster (when Dx is off, CudaText has option for it).
Navigation
[0] Message Index
[#] Next page