21.183 seconds -- Above system
15.723 seconds -- Intel(R) Xeon(R) CPU E31235 at 3.20GHz
18.048 seconds -- Intel(R) Core(TM) i5-7300HQ CPU at 2.50GHz
14.207 seconds -- AMD Ryzen 7 2700X Eight-Core Processor [4Ghz]
I need to test this on some more processors and see. I don't have any newer ones to try it on as yet.
The results you get are along the lines of what I expect.
I can't put my finger on it but, executing this code sequence
# [67] e1^^^^^ := 1;
movq (%rbx),%rax
movq (%rax),%rax
movq (%rax),%rax
movq (%rax),%rax
movq $1,(%rax)
.Ll20:
3 times (for e1, e2 and e3) in under 3 seconds doesn't add up. Each of those instructions take on the order of 3 clock cycles, times 5, that's 15, times 3 (e1, e2, e3), that's 45, times a billion, that's 45 billlon. IOW, a 3.7Ghz processor is magically executing somewhere in the ballpark of 20 billion instructions per second. It doesn't add up.
You are a rare lover of complications.
I don't think of organization as creating complication, on the contrary.
But I don't recommend it to others.
I don't recommend any particular level depth. What I "recommend" is for the code structure to be parallel to the problem structure. The number of levels is irrelevant, the structure of the problem determines the structure of the solution.
In all those years of Pascal: I never saw a fourth level.
There are plenty of things that should be commonly seen that are either, never seen or, extremely rarely seen. Their absence is no indication they are undesirable or their presence an indication of desirability. I'm sure you've seen a few goto(s) but, I doubt you'd use that as a basis to recommend their use.