More strangely is that for testing I compiled the IDE in 'Optimized IDE' and now the bug fully goes away in Windows 32 and Linux 32 bit.
I didn't mention, but at least the bug affection fpDebug too, is a 64bit issue (afaik / at least it depends on cpu registers, and they differ by bitness).
In any case, optimizer bugs are
notoriously elusive....
One tiny change: a line of code, just re-ordering, or a compiler setting..., and they can come and go.... They also could be present, but depend on program flow, i.e. only if certain other code was/is run before/after the bug, then they will cause an error, and otherwise not.
When I encountered the issue the first time, it took me (combined / I did have to have several goes) 4 or 5 weeks to trace it. And "trace it" does not mean to find the actual wrong code in the compiler, it meant to find the code in my app, that was incorrectly compiled, and to be able to see what had gone wrong in the generated asm. => From that info a member of the FPC team was then able to find the bug in fpc....
Maybe we need to upgrade to master in order to get the bug fixed independently of the optimization level?
FPC 3.2.3 (fixes branch) should be fine.
Fpc 3.2.2 with -O1 also should be (at least from all that I have seen so far)
The bug isn't in our own code, but in the combination of LCL or other package and kbmMemTable, the same code with bufdataset was working fine, except that slow as hell.
Probably... But never say never.
While we know there are issues in FPC, we don't know if it is them that cause this particular issue.
It is very possible, and has happen more than often enough that a bug in the "user code" (including LCL) did only led to errors when compiled with optimization. Things like missing initializations, incorrect nil checks, incorrect bool/math logic in expressions and comparisons, .... all of those may sometimes still yield correct results, unless some optimization comes in too.
And, even if the feature that breaks is in the LCL, it can still be user code. A dangling pointer write access can destroy any data of the app, any data completely unrelated to the buggy code. So in such case, the LCL could easily crash in reaction to having its data trashed.
More so, going back to the possible optimizer bug: It is possible that the optimizer incorrectly translates some of your valid code, and by that introduce a dangling pointer (or similar) issue, which in turn manifests in the LCL.
So, O3 may work... well at least today. It may fail, if you add or remove a line of code somewhere.
If indeed it is an optimizer bug, then you have 2 options:
- use -O1 (for your code, and all packages including LCL...)
- upgrade fpc to 3.2.3
If you do either of the above, and the issue goes away....
Well you still want to make sure, that you actually fixed the right thing. Because as I said "
notoriously elusive....
One tiny change" => There still is the possibility that there is a bug in your code affecting the LCL (or maybe it is in the LCL....), and that changing the compiler version was just
One tiny change. And the next tiny change brings it back. Or it is even still there, but just needs other input, other order of buttons pressed by user.... And so it may work for you, but the same compiled exe working for today's use-case will fail next week.
So throw in all the checks the compiler has (in different combinations).
-CRriot
-gh
-gt (with 1,2,3 or 4 "t", so you need 4 different test builds, because -gt -gt is equal to -gtt -- each compiled exe can only be tested with one of the 4 options),
-gh (and environment HEAPTRC="keepreleased",
add assertions, wherever you can (and run with and without them: -Sa on or off)
And last not least, if you are on linux: use -gv and test with
valgrind --tool=memcheck
And if you are using threads, valgrind has some thread checker too (actually several) / google that.