I can't understand what you said about blur "that does the blur (you may need to derive the class to add "variables")".
Oh ok. I suggest to look at the example I posted, and look at the GLSL files. There is the code for the shader. And that's where you would need to put some other code.
The GLSL code is related to a TBGLShader3D class. In the example it is defined in umyshader.pas. The uniform variables are like parameters that are transmitted to the GLSL code. They are defined both in the class and in the GLSL (in mix.fragment.glsl).
For example the blur radius would be a uniform variable, like the fade_factor in the example.
I just tested and it takes 62ms for r=9 and 90 for r=11. it seems not not to me it has a much difference and I expect faster result.
Surprising. Well the fast blur without box blur is already optimized so the difference is not that big with a small radius. The difference is significant with a radius like 80 for example. To see the difference, you would need to comment out the "if" block in the fast blur where it calls the box blur.
After that, you can try to call box blur directly. Maybe the box blur could be optimized?