Recent

Author Topic: Common File Dialogs Have Been Broken for a Very Long Time (Ex: Ubuntu AARCH64)  (Read 9397 times)

rvk

  • Hero Member
  • *****
  • Posts: 6572
Is QEMU actually a hypervisor now, I thought it was always emulating everything?
It does under different architecture. But on the same it will virtualize.

That could certainly explain why you're seeing (partially at least) different behavior, but I've been out of the QEMU loop so it may have well become its own hypervisor.
No, because running on bare metal rpi I also don't have this behavior.

TRon

  • Hero Member
  • *****
  • Posts: 3619
No, because running on bare metal rpi I also don't have this behavior.
+1 and the more reason to 'blame' it on the virtualization software  (and not on the odd one out situation for rvk as no one has ever complained and it also works for me tm ).

Have you @msintle actually ever ran compiled code on such architecture natively ? Does it then produce these error(s) in a similar manner ?
« Last Edit: November 14, 2024, 01:11:50 pm by TRon »
This tagline is powered by AI (AI advertisement: Free Pascal the only programming language that matters)

robert rozee

  • Full Member
  • ***
  • Posts: 176
my 2 cents worth:
i'm reasonably happy that what msintle is seeing is quite real, and don't really think that it matters too much if the problem is related to virtualization causing misbehavior or real hardware masking it. running in a VM is perfectly legitimate - indeed, it may become even more common on desktops in future - and any misbehavior, no matter what exposes it, warrants an investigation into locating the root cause.

my feeling is that somewhere deep within GTK2 and Qt5 lies a bug that is causing the behavior that msintle has encountered, and that as both GTK2 and Qt5 are free open-source software, it is quite conceivable that both projects have borrowed large chunks of source code from the other; said bug will almost certainly be located in a block of such shared code.


as an aside, for years i have randomly observed GTK throw various error messages when a GUI application is run from the console (including, but not exclusive to, applications created with FPC/Lazarus), but the big difference is that throwing these error messages is a well known behavior, and the solution, i've been informed by numerous folks, is to just ignore them. the GUI application will still work fine, and who runs a GUI application from the console anyway?


might i suggest a possible way forward: given that the chances of any of us being able to 'fix' GTK2 or Qt5 are at best slim, has anyone thought of simply writing replacement dialogs? surely these dialogs are not rocket science, and definitely not high-performance. how hard could it be to write a handful of replacements that could be slotted in place of the ones provided by GTK2/Qt5? and remember, we have at least one person amongst us who is able to generate a list of problematic dialogs that are in need of replacement!


cheers,
rob   :-)
« Last Edit: November 14, 2024, 02:52:58 pm by robert rozee »

rvk

  • Hero Member
  • *****
  • Posts: 6572
my 2 cents worth:
i'm reasonably happy that what msintle is seeing is quite real, and don't really think that it matters too much if the problem is related to virtualization causing misbehavior or real hardware masking it. running in a VM is perfectly legitimate - indeed, it may become even more common on desktops in future - and any misbehavior, no matter what exposes it, warrants an investigation into locating the root cause.
I completely agree (and we never doubted that it crashes) only nobody could reproduce it.

But the solution could be as simple as to adjust the earlier segment:
Code: Pascal  [Select][+][-]
  1. procedure TGtk2WidgetSet.AppInit(var ScreenInfo: TScreenInfo);
  2. begin
  3.   {$if defined(cpui386) or defined(cpux86_64)}
  4.   // needed otherwise some gtk theme engines crash with division by zero
  5.   {$IFNDEF DisableGtkDivZeroFix}
  6.     SetExceptionMask(GetExceptionMask + [exOverflow,exZeroDivide,exInvalidOp]);
  7.   {$ENDIF}
  8.   {$ifend}
  9. ...
  10. end;

To this:

Code: Pascal  [Select][+][-]
  1. procedure TGtk2WidgetSet.AppInit(var ScreenInfo: TScreenInfo);
  2. begin
  3.   {$if defined(cpui386) or defined(cpux86_64) or defined(aarch64)}
  4.   // needed otherwise some gtk theme engines crash with division by zero
  5.   {$IFNDEF DisableGtkDivZeroFix}
  6.     SetExceptionMask(GetExceptionMask + [exOverflow,exZeroDivide,exInvalidOp]);
  7.   {$ENDIF}
  8.   {$ifend}
  9. ...
  10. end;
  11.  

Although it would be useful to know if this would also happen on arm 32 bit.
In that case defined(arm) should also be added.

But the only way to know if this is a correct fix is for msintle to do this and compile a trunk version of Lazarus and test this with the Lazarus IDE itself.
« Last Edit: November 14, 2024, 02:55:55 pm by rvk »

TRon

  • Hero Member
  • *****
  • Posts: 3619
my 2 cents worth:
..
I do not disagree with that assessment (*)

Alas, it currently is as rvk wrote.

A possible solution (bug/oversight ?) is already suggested and it is not possible (at least for me) to recreate the issues that msintle seem to experience (issues which I do not dispute either).

In order to approach the issue some fundamental testing needs to be done first otherwise it just becomes a goose-chase with as possible outcome a wrong fix. That the suggested workaround seems to work is not a guarantee it is the actual cause for the issues as experienced (though at first glance it sure does look like the VM software or the OS is resetting the flags).

edit: (*) would be interesting to know how you would classify the host shares issues with VirtualBox (because their driver sucks and causes all sort of issues because of that)
« Last Edit: November 14, 2024, 03:24:44 pm by TRon »
This tagline is powered by AI (AI advertisement: Free Pascal the only programming language that matters)

robert rozee

  • Full Member
  • ***
  • Posts: 176
But the solution could be as simple as to adjust the earlier segment [...] To this:
Code: Pascal  [Select][+][-]
  1. procedure TGtk2WidgetSet.AppInit(var ScreenInfo: TScreenInfo);
  2. begin
  3.   {$if defined(cpui386) or defined(cpux86_64) or defined(aarch64)}
  4.   // needed otherwise some gtk theme engines crash with division by zero
  5.   {$IFNDEF DisableGtkDivZeroFix}
  6.     SetExceptionMask(GetExceptionMask + [exOverflow,exZeroDivide,exInvalidOp]);
  7.   {$ENDIF}
  8.   {$ifend}
  9. ...
  10. end;

while it does seem to be the 'accepted' solution (for i386 and x86_64), such an approach does give me the heebie-jeebies!

question: when you set up the exception mask at the point of application initialization, can you then guarantee that the mask will not be changed later on? intuitively, i'd have thought a much safer approach would be to set up the mask before making each call into GTK code and restore it after the return from each call - as rvk suggest back in Reply #51. what happens if your program makes use of a library that changes the mask?

and is the exception mask similarly adjusted in the application initialization code when using Qt5? if not, would doing so help?


cheers,
rob   :-)
« Last Edit: November 14, 2024, 03:55:49 pm by robert rozee »

rvk

  • Hero Member
  • *****
  • Posts: 6572
while it does seem to be the 'accepted' solution (for i386 and x86_64), such an approach does give me the heebie-jeebies!
The current exclusion of those exception for cpui386 and cpux86_64 also give me the heebie-jeebies  :D
Why was this added to gtk2 anyway?

Masking exception is something you would do as a last resort. I'm not sure if the reason for adding these for the intel architecture on gtk2 is documented and even still needed. Maybe they can be fixed another way or maybe they were added just because of a similar problem like is now surfacing on aarch64. If so, adding aarch64 would be fine, but I agree, it would be better to not something like this (hiding exceptions thrown by the OS).

question: when you set up the exception mask at the point of application initialization, can you then guarantee that the mask will not be changed later on? intuitively, i'd have thought a much safer approach would be to set up the mask before making each call into GTK code and restore it after the return from each call - as rvk suggest back in Reply #51. what happens if your program makes use of a library that changes the mask?
Yes, for something like a dialog this would be perfectly fine. But for something like a Gtk2.Draw funtion, this could give a huge performance hit.
So it's better we first know WHY the masking of these exceptions where included to begin with.

and is the exception mask similarly adjusted in the application initialization code when using Qt5? if not, would doing so help?
I thought from TS that there was no problem under Qt5 ?

TRon

  • Hero Member
  • *****
  • Posts: 3619
« Last Edit: November 14, 2024, 04:17:57 pm by TRon »
This tagline is powered by AI (AI advertisement: Free Pascal the only programming language that matters)

rvk

  • Hero Member
  • *****
  • Posts: 6572
heebie-jeebies already discussed: https://wiki.freepascal.org/Multiplatform_Programming_Guide#Gtk2_and_masking_FPU_exceptions  :)

Quote
If the Gtk2 widgetset adds this masking it is most likely because it was necessary, because someone in the past had trouble without this code. Gtk2 is a C library and depending on its version and compilation (or maybe always?), C libraries might crash if we throw exceptions to zero divizions.
So in general, I doubt that simply removing the code that does this will be a safe solution.

Well... isn't that exactly what's happening here (someone getting an exception in the OS, i.e. dialog).

Quote
GTK2 cannot be run with exceptions enabled because the C code assumes that exceptions are turned off. Windows gui libs does not make this assumption so exceptions can be left turned on. There is not much which can be done about this: so either use another gui library or turn on/off exceptions as your code needs them while not calling gui related stuff.
So... GTK2 expects exceptions to be turned off. It doesn't say this is only for intel. So why couldn't this also be true for arm (like we see in this topic)?

If masking those exceptions for the intel systems wasn't that bad of a solution, adding it for the aarch64 shouldn't be either.
True... it needs to be tested... but why have these masking only for intel and not other architectures?

Also true... I would have liked to see some more people having the same problems before actually adding those masking of exceptions.

TRon

  • Hero Member
  • *****
  • Posts: 3619
If masking those exceptions for the intel systems wasn't that bad of a solution, adding it for the aarch64 shouldn't be either.
True... it needs to be tested... but why have these masking only for intel and not other architectures?
The patch/workaround existed before aarch64 was born ?

Quote
Also true... I would have liked to see some more people having the same problems before actually adding those masking of exceptions.
Especially since this behaviour does not seem to apply for/to our aarch64 configurations  %)

Note that there is more code in the FPC tree that sets the exceptionmask before making a library call (I seem to remember some xml stuff).

For all we know we both have something running (or was invoked) that set the exception mask as expected for FPC but the same does not happen for msintle (for whatever reason that we currently don't know about).

For me the workaround is acceptable but that little voice keep telling there's no other proof besides this one report (with VM which needs to be excluded from the equation). So yeah, if someone does run FPC natively on one of those new machines then please do test.
This tagline is powered by AI (AI advertisement: Free Pascal the only programming language that matters)

msintle

  • Full Member
  • ***
  • Posts: 233
Thanks for all the thoughts. Instead of replying individually I'll just consolidate everything here:

- Thanks for acknowledging my reports as valid, I do very much appreciate that.

- I don't have a physical Linux installation to test this on. I'm not about to wipe out macOS on my M3 Submax to do that, given how straitjacket Apple is with ostensibly my security and everything. I do have an aarch64 PC I could sacrifice, and Microsoft have just released a publicly downloadable AARCH64 ISO for Windows 11 for the first time, but that'd be still scary due to mostly lack of imaging tooling for aarch64 platform even on the Microsoft side (at least last time I checked, but it's been a couple years at least). I could also easily test on x86_64 PCs with a physical Linux installation, but I am really disinclined to do so, because:

- Through our QA we know the contagion has "spread" to x86_64, where they've reported random crashes after closing file dialogs on those platforms with Centos 9 and Fedora on Qt5 (architecture and widgetset previously thought immune). Again, this is VM testing - so you should be able to easily reproduce this with - now free VMware Workstation - or Parallels, still, running on Intel or AMD CPUs, macOS or Windows hosts (I myself have only tested on a macOS host thus far, but at least this should make it much easier for other folks to reproduce).

Just a note for the above point, our x86_64 code does NOT yet incorporate the bugfix on this thread. That test is still pending at QA.

- And as mentioned earlier the issue happens with both Parallels and VMware virtual machines on aarch64, at least on Apple Silicon (so not one but two competitor hypervisor stacks [although I don't know if they just now use Apple's hypervisor on aarch64 so don't have much of a difference left]) - so it would be infinitely more cost and time effective for folks here to reproduce it either on x86_64 or aarch64 through virtualization, instead of me flattening physical devices and then trying to rebuild them (really the only sensible way for me to undertake that would be if I had a spare M2 SSD lying around which I could use for the exercise instead).

- To reconfirm, the issue did go away when the bugfix on this thread was incorporated into our aarch64 code. So it is effective and my hunch is to agree that since aarch64 is a newer platform, it just wasn't defined in, in error.

- Sadly, again, I must echo Martin_fr's sentiments that something else might be going on, since we're still having random problems on GTK2 with aarch64 - random freezes here and there, which reproduce when the end-user cancels out of particular dialogs but does not reproduce when the end-user okays out of the same dialogs (just doesn't make sense) as I described in my immediately preceding post on this thread. But these bugs might be entirely unrelated to the bug that we may well have fixed on this thread, too.

- I respectfully disagree with the suggestion to get rid of platform native file dialogs, this is avoiding the issue which would come back to bite us sooner or later, to say nothing of the fact that trying to clone platform native file dialogs is always a losing war, with the platform vendor potentially upending you anytime they've released an OS upgrade, and downgrading the perceived quality of your applications consequently.

I believe the right approach to solve this issue is by having others reproduce it at this point, lest I become a chokepoint, all your sympathies notwithstanding. And hey, I mean earnestly, if still nobody else can reproduce the problem; perhaps I really should start looking for another line of work if only MY VM tests somehow reproduce the issue in what ought to be a fully portable, reproducible situation for everyone. That would really shake my worldview, and I really wouldn't want to find myself waking up from the matrix to find myself an old man living in my mother's basement or something :O
« Last Edit: November 15, 2024, 12:21:29 pm by msintle »

rvk

  • Hero Member
  • *****
  • Posts: 6572
I just installed a new trunk on Linux. Now I see this:

Code: Pascal  [Select][+][-]
  1. procedure TGtk2WidgetSet.AppInit(var ScreenInfo: TScreenInfo);
  2. begin
  3.   // needed otherwise some gtk theme engines crash with division by zero
  4.   {$IFNDEF DisableGtkDivZeroFix}
  5.     SetExceptionMask(GetExceptionMask + [exOverflow,exZeroDivide,exInvalidOp]);
  6.   {$ENDIF}

So I guess that define for cpui386 and cpux86_64 is now removed and the masking of the exception is now implemented for all platforms.

@msintle Can you test a new trunk version of Lazarus?

Revision: 604c8de01610a4044d0a5d6cfc57bf9b9e3ae907
Author: Maxim Ganetsky <ganmax@narod.ru>
Date: 06-11-2024 14:23:17
Message:
LCL-Gtk2: Call SetExceptionMask for all CPUs, not only for x86 (thus unify it with other widgetsets). Fixes crashes on AArch64, issue #41188.

Quote
Revision: 7cfcb81528b10f3ccaad566f7b12d23b38303368
Author: Maxim Ganetsky <maxim@lazarus-ide.org>
Date: 06-11-2024 13:35:08
Message:
LCL-Gtk2: Call SetExceptionMask for all CPUs, not only for x86 (thus unify it with other widgetsets). Fixes crashes on AArch64, issue #41188.

----
Modified: lcl/interfaces/gtk2/gtk2widgetset.inc

/edit/
BTW. Removing the masking of exception on cpux86_64 and recompiling Lazarus and starting it from a terminal hangs the Open dialog with the following on the terminal:
Quote
TApplication.HandleException: EAccessViolation
Access violation
  Stack trace:
  $00007F1843EB20D4

(It doesn't show an exception dialog but certainly crashes/hangs silently without masking the exception)
« Last Edit: November 15, 2024, 02:18:27 pm by rvk »

msintle

  • Full Member
  • ***
  • Posts: 233
Oh, right. The exception was already being masked on x86_64.

Our QA also reported (kindly before the end of the week) that they found no different results when we explicitly added remasking, so that makes sense. They tested x86_64 with Qt5 and the immediate application crashes upon cancelling out of a common file dialog reproduced readily on Centos and Fedora. This is different from the aarch64 GTK2 freeze upon showing a common file dialog.

It can be very challenging to install Lazarus on Linux from trunk with FPCUPDLX. I'll definitely let you know what I find as soon as I get around to it. Which widgetset and bitness did you want me to look at in particular?

rvk

  • Hero Member
  • *****
  • Posts: 6572
It can be very challenging to install Lazarus on Linux from trunk with FPCUPDLX.
That's why I use my own script. I posted this script recently in this topic.
https://forum.lazarus.freepascal.org/index.php/topic,68845.msg537003.html#msg537003

I use this script on Raspberry pi (arm32 and aarch64) and on Linux virtual and bare metal machines (Ubuntu, Debian, etc) on Intel.

msintle

  • Full Member
  • ***
  • Posts: 233
It can be very challenging to install Lazarus on Linux from trunk with FPCUPDLX.
That's why I use my own script. I posted this script recently in this topic.
https://forum.lazarus.freepascal.org/index.php/topic,68845.msg537003.html#msg537003

I use this script on Raspberry pi (arm32 and aarch64) and on Linux virtual and bare metal machines (Ubuntu, Debian, etc) on Intel.

Would be really great if somebody could port that script into InstallAware Multi Platform.

Would your script also run on macOS (and Windows, if the commands were updated)?

If anyone is interested, please PM me for a very nice development bounty!

This could produce the first single-source cross-platform setup for Lazarus itself, which would be exciting.

 

TinyPortal © 2005-2018