Recent

Author Topic: [Solved] can't explain, not the Who, but console/terminal performance  (Read 1418 times)

Thaddy

  • Hero Member
  • *****
  • Posts: 16201
  • Censorship about opinions does not belong here.
I had to go back to a simple threading example I gave a couple of weeks ago, and while testing some code on an arguably much slower processor I got results that are inexplainable to me: a windows console running the app is thousands, times slower than running the same code on Linux running under wsl2 (hyperv'd) on the same machine (or for that matter a raspberry pi 3/4/5/).
Anyone of you know how to explain the Windows console is so much slower? (file IO is OK)
Code: Pascal  [Select][+][-]
  1. program simplethread2;
  2. {$mode objfpc}{$I-}
  3. {$modeswitch anonymousfunctions}
  4. uses {$ifdef unix}cthreads,{$endif}sysutils,classes;
  5. var
  6.   Sync:IReadWriteSync;
  7.   WorkerThread1,
  8.   WorkerThread2,
  9.   ControllerThread:TThread;
  10.   start:int64;
  11. begin
  12.   Sync:=TMultiReadExclusiveWriteSynchronizer.create;
  13.   Sync.BeginWrite;
  14.   start := gettickcount64;
  15.   Sync.EndWrite;
  16.   writeln('Program start time saved');
  17.   WorkerThread1:=TThread.executeinthread(procedure
  18.                              var
  19.                                s:string = 'Hey Main, I am still busy....';
  20.                                i:integer;
  21.                              begin
  22.                                for i := 0 to 100000 do if i mod 100 = 0 then
  23.                                begin
  24.                                  Sync.BeginWrite;
  25.                                  writeln(s);
  26.                                  Sync.EndWrite;
  27.                                end;                              
  28.                                Sync.BeginRead;
  29.                                writeln('WorkerThread1 done',  ' took:', gettickcount64 - start);
  30.                                Sync.EndRead;
  31.                              end);
  32.   WorkerThread2:=TThread.executeinthread(procedure
  33.                              var
  34.                                s:string = 'Hey, One,Still chatting?....';
  35.                                i:integer;
  36.                              begin
  37.                                for i := 0 to 100000 do if i mod 100 = 0 then
  38.                                begin
  39.                                  Sync.BeginWrite;
  40.                                  writeln(s);
  41.                                  Sync.EndWrite;
  42.                                end;                              
  43.                                Sync.BeginRead;
  44.                                writeln('WorkerThread2 done',  ' took:', gettickcount64 - start);
  45.                                Sync.EndRead;
  46.                              end);
  47.   ControllerThread:=TThread.executeinthread(procedure
  48.                              begin
  49.                                writeln('Test the ControllerThread');
  50.                                                            WorkerThread1.waitfor;
  51.                                WorkerThread2.waitfor;                          
  52.                                                            Sync.BeginRead;
  53.                                writeln('ControllerThread done, handing over to main', ' took:', gettickcount64 - start);
  54.                                Sync.EndRead;
  55.                              end);
  56.   writeln('perform main workload');                          
  57.   //sleep(10000);
  58.   writeln('Main workload finished', ' took:', gettickcount64 - start);
  59.   if not ControllerThread.finished then ControllerThread.waitfor;
  60.   writeln('Main thread finished in ', gettickcount64 - start);    
  61. end.
The sync code is superfluous, you can ignore that. That is rediculous.
Without the console IO the code runs at about the same speed. (and yes, I already applied the {$I-} state.)
Ignore the outputs, that is as expected. In your own tests you can see that the threading is working, just scroll up.
« Last Edit: November 06, 2024, 01:05:41 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

Thaddy

  • Hero Member
  • *****
  • Posts: 16201
  • Censorship about opinions does not belong here.
Re: can't explain, not the Who, but console/terminal performance
« Reply #1 on: November 06, 2024, 10:34:08 am »
I also rewrote - well similar -  it in C, just to be sure it isn't fpc, same effect... Why is the Windows console so slow?
If I smell bad code it usually is bad code and that includes my own code.

440bx

  • Hero Member
  • *****
  • Posts: 4760
Re: can't explain, not the Who, but console/terminal performance
« Reply #2 on: November 06, 2024, 11:24:21 am »
Why is the Windows console so slow?
Part of the reason is that the console is a separate process, therefore every write to the console involves a process switch.  In turn a process switch involves a ring transition which, since there is a process switch, involves the scheduler too.  That's a lot of overhead to write something out to the console.

There might be other reasons but, the process switch alone, required for every write will have a very noticeable performance penalty.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 16201
  • Censorship about opinions does not belong here.
Re: can't explain, not the Who, but console/terminal performance
« Reply #3 on: November 06, 2024, 11:42:07 am »
As I wrote, you can remove all the sync, same behavior. there are no context switches, but one: when the controller finishes.
On both platforms IO is deemed threadsafe.
Same behavior in cross-platform C (gnu) It is awful, and I can't explain it.
If I smell bad code it usually is bad code and that includes my own code.

440bx

  • Hero Member
  • *****
  • Posts: 4760
Re: can't explain, not the Who, but console/terminal performance
« Reply #4 on: November 06, 2024, 11:55:09 am »
there are no context switches, but one: when the controller finishes.
No.  In Windows, there is a context switch for every write.  They are not visible because they happen behind the scenes but, the console is another process, therefore outputting to the console is a forced process switch, that also requires copying the data that is being written to the console to memory that belongs to the console, failure to do that would create the risk of the console process trashing the caller's stack which is not acceptable.

There is a _lot_ of overhead to output to a Windows console.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

rvk

  • Hero Member
  • *****
  • Posts: 6594
Re: can't explain, not the Who, but console/terminal performance
« Reply #5 on: November 06, 2024, 12:38:38 pm »
a windows console running the app is thousands, times slower than running the same code on Linux running under wsl2 (hyperv'd) on the same machine (or for that matter a raspberry pi 3/4/5/).
On a terminal with ssh from Windows doing this on a rpi is fast (I expected that).
But running it in a terminal window on a desktop environment on the rpi I actually expected it to be slow (as on Windows).
But even then it is fast on rpi.

It's not a factor 1000 for me but still a lot.

On Windows itself (in console, cmd.exe)
Quote
WorkerThread2 done took:1062
ControllerThread done, handing over to main took:1062
Main thread finished in 1062

SSH from Windows to rpi:
Quote
WorkerThread2 done took:39
ControllerThread done, handing over to main took:40
Main thread finished in 100

On rpi in the GUI (via RDP) in terminal:
Quote
WorkerThread2 done took:42
WorkerThread1 done took:42
ControllerThread done, handing over to main took:43
Main thread finished in 101

BTW. Running project1.exe multiple times in a direct powershell console it gets faster and faster  :D :D

Third time running it, it gave me:
Quote
Hey, One,Still chatting?....
WorkerThread2 done took:94
ControllerThread done, handing over to main took:94
Main thread finished in 94

Thaddy

  • Hero Member
  • *****
  • Posts: 16201
  • Censorship about opinions does not belong here.
[SOLVED]Re: can't explain, not the Who, but console/terminal performance
« Reply #6 on: November 06, 2024, 01:02:18 pm »
The output is irrelevant, - the order - Rik, but the speed difference amazed me. Is the Windows console really so slow?
Hence I wrote me a C program and that confirmed it: Windows console is really slow.
It has nothing to do with Freepascal. That is a good thing to know.
« Last Edit: November 06, 2024, 01:06:39 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

440bx

  • Hero Member
  • *****
  • Posts: 4760
Re: [SOLVED]Re: can't explain, not the Who, but console/terminal performance
« Reply #7 on: November 06, 2024, 01:09:00 pm »
Windows console is really slow.
Yes, it really is and, in addition to that, in some cases, performance-wise, it exhibits really strange (and perplexing) behavior. 
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

rvk

  • Hero Member
  • *****
  • Posts: 6594
Re: [SOLVED]Re: can't explain, not the Who, but console/terminal performance
« Reply #8 on: November 06, 2024, 01:09:14 pm »
Hence I wrote me a C program and that confirmed it: Windows console is really slow.
The CMD.exe console yes. The powershell console is much faster.

Microsoft wants us to use the new powershell anyway.
Isn't that the new 'terminal' in Windows 11  ;)

https://devblogs.microsoft.com/commandline/windows-terminal-is-now-the-default-in-windows-11/

But yes... good to know the performance of the cmd.exe console is lousy.

440bx

  • Hero Member
  • *****
  • Posts: 4760
Re: [SOLVED]Re: can't explain, not the Who, but console/terminal performance
« Reply #9 on: November 06, 2024, 01:25:58 pm »
But yes... good to know the performance of the cmd.exe console is lousy.
it's not really cmd.exe that is slow, it's conhost.exe that is the performance culprit combined with having two processes involved in the output. 

To see that cmd.exe has little to no effect on the performance, output is just as slow when using TCCLE (which uses conhost.exe) and the folks at JPSoft are really good at making fast (and good) software.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

rvk

  • Hero Member
  • *****
  • Posts: 6594
Re: [SOLVED]Re: can't explain, not the Who, but console/terminal performance
« Reply #10 on: November 06, 2024, 01:41:03 pm »
But yes... good to know the performance of the cmd.exe console is lousy.
it's not really cmd.exe that is slow, it's conhost.exe that is the performance culprit combined with having two processes involved in the output. 

To see that cmd.exe has little to no effect on the performance, output is just as slow when using TCCLE (which uses conhost.exe) and the folks at JPSoft are really good at making fast (and good) software.
powershell.exe also runs under conhost.exe, doesn't it? (at least for me it still does)
And there it seems to be fast.
But yes, it can be the interaction between cmd.exe and conhost.exe which is the culprit.

There is also the new Windows terminal which opens a powershell under openconsole.exe instead of conhost.exe.

440bx

  • Hero Member
  • *****
  • Posts: 4760
Re: [SOLVED]Re: can't explain, not the Who, but console/terminal performance
« Reply #11 on: November 06, 2024, 01:48:45 pm »
powershell.exe also runs under conhost.exe, doesn't it? (at least for me it still does)
And there it seems to be fast.
But yes, it can be the interaction between cmd.exe and conhost.exe which is the culprit.

There is also the new Windows terminal which opens a powershell under openconsole.exe instead of conhost.exe.
I am running Windows 7 which has version 1.0 of Powershell and that version uses conhost.exe and it is just as slow as cmd.exe and/or TCCLE.  I tried it as a result of your comments.

There is also a graphical version but, it also uses conhost.exe and that one, seems to be even a bit slower than the one that outputs directly to the text console.

I have Windows 10 VM but, it is not up at this time and I avoid using it.  I'll check it out sometime.  I'll try the "legacy" terminal mode (uses conhost or its Win10 equivalent) and the new terminal it offers (which I haven't looked at how it's implemented.)
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 16201
  • Censorship about opinions does not belong here.
Re: [Solved] can't explain, not the Who, but console/terminal performance
« Reply #12 on: November 06, 2024, 02:00:13 pm »
The comforting thing to know is that FPC is not the culprit.
Thank you both for confirming it.
If I smell bad code it usually is bad code and that includes my own code.

rvk

  • Hero Member
  • *****
  • Posts: 6594
Re: [SOLVED]Re: can't explain, not the Who, but console/terminal performance
« Reply #13 on: November 06, 2024, 02:03:30 pm »
I am running Windows 7 which has version 1.0 of Powershell and that version uses conhost.exe and it is just as slow as cmd.exe and/or TCCLE.  I tried it as a result of your comments.
Have you tried starting powershell.exe directly from the start menu (via search).

If you start a powershell.exe under cmd.exe it gets executed inside cmd.exe and thus conhost.exe (recognizable via the black screen).
If you start powershell.exe directly via start menu, you get a blue background screen.
That one also runs under conhost.exe but is much much faster.

(At least, that's how it is under Windows 10)


440bx

  • Hero Member
  • *****
  • Posts: 4760
Re: [Solved] can't explain, not the Who, but console/terminal performance
« Reply #14 on: November 06, 2024, 02:12:56 pm »
So far, I've only tested in Win 7.  In Win 7, I started powershell directly from Explorer and it used conhost.exe and it was just as slow as cmd.exe.  Also, started from Explorer, I started the graphical version available in Win 7 (which also uses conhost.exe) and it was slow too (even felt a tad slower.)

I'm starting the Win10 VM to try it.  I'll report my findings using Win10 once I have them.


(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

 

TinyPortal © 2005-2018