Here it is nice to see that switching Buttons is cool until you put an Image on the form. Then, switching components becomes unusable.
Video: https://www.youtube.com/watch?v=rUItOqTz96U&feature=youtu.be
The bad news is I can reproduce that now. The good news is that there is work being done to the performance of the OpenGL back-end for the videocore V1.
It has not yet reached the same level of optimizations of the videocore 1V but that is a matter of time.
If you are adventurous, you can follow the development and try the trunk version of the OpenGL back-end.
It is the overlapping images that cause the problem, if they are nicely separated there is no issue.
Eventually the videocore V1 back-end should be *much* faster based on specs.
Note (to be sure) you do not need to use opengl or opengles yourself: this is abstracted away in the X renderer.
One note that worries me slightly: you write you are sharing code between 3 and 4? I would not do that. Compile for 3 and 4 separately.
The 3 does not use MESA drivers, but EGL on top of broadcom blobs for the X renderer, the 4 has proper MESA drivers, but are not fully optimized into the X renderer yet.
One way to test it is to compile the Opengles and OpenGL examples that come with FPC. (skip the glx examples: these do not work fat enough yet)
You will note that you get twice the framerate compared to RPi3, so it is the x-renderer that does not yet use all available speed.