Forum > General

Where's the optimization? ... (ARM)

(1/1)

AlanTheBeast:
In 32 bit ARM.


--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---    UBXMSEi := 7;          UBXCSEi := 7;          CFLA := 0;    CFLB := 0; 
-O2 on or not.
yields:


--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---# [489] UBXMSEi := 7;        mov     r1,#7        ldr     r0,.Lj475        strh    r1,[r0]# [490] UBXCSEi := 7;        mov     r0,#7        ldr     r1,.Lj473        strh    r0,[r1]# [491] CFLA := 0;        mov     r0,#0        ldr     r1,.Lj486        strb    r0,[r1]# [492] CFLB := 0;        mov     r1,#0        ldr     r0,.Lj488        strb    r1,[r0] 
Would have thought 1 load of 7 (or 0 the CFLA/B) and then stores to ea. location ...

And perhaps register adds to point to the next location rather than loading 2nd pointer for ea. store.  (Yes, ops in the order of the declaration - even threw in an {$ALIGN 2} there....)

Or does more optimization take place in the assembler pas?

I suppose I could put them in a record with a case overlay (effectively absolute) to improve that, but expected it out of the compiler.

NOTE: not critical here, running at about 25 - 30 Hz normally - but really expected better.

Laksen:
It's a known optimization possibility that the compiler isn't smart enough to do currently.
On paper it's simple enough, but once you are in the node tree it can get tricky to implement :)

AlanTheBeast:

--- Quote from: Laksen on September 26, 2022, 09:21:41 pm ---It's a known optimization possibility that the compiler isn't smart enough to do currently.
On paper it's simple enough, but once you are in the node tree it can get tricky to implement :)

--- End quote ---

Actually why I'm curious if there are assembler level optimizations that are invisible to us up here in so-called high level land.

For me, the above would be a no-brainer - but I used to write x86 and other less RISC oriented assembler ...

There is another trick, of course, declare a 64 bit word a(absolute) t the location where those 4 vars are located and clobber them with 1 write... may go there yet.

And then of course some time soon, this project will go 64bit and I'll be obsessing over other little bits...

(To be clear, the question is for a pretty key bit of code where receiving data is concerned, it's not that frequent but I need to get the function done in r/t close to when measurements were made...)

PascalDragon:

--- Quote from: AlanTheBeast on September 26, 2022, 09:49:55 pm ---
--- Quote from: Laksen on September 26, 2022, 09:21:41 pm ---It's a known optimization possibility that the compiler isn't smart enough to do currently.
On paper it's simple enough, but once you are in the node tree it can get tricky to implement :)

--- End quote ---

Actually why I'm curious if there are assembler level optimizations that are invisible to us up here in so-called high level land.
--- End quote ---

Well, essentially anything located in the assembler optimizer (in this case either the Aarch64 specific one or the general ARM one). In both cases the results will be visible in the assembly output however.

You can always open a feature request for that optimization, I'm rather sure someone like FPK or Gareth would like to play with that...

Seenkao:

--- Quote from: PascalDragon on September 27, 2022, 09:46:58 am ---I'm rather sure someone like FPK or Gareth would like to play with that...

--- End quote ---
https://gitlab.com/freepascal.org/fpc/source/-/issues/39781

Проблема оптимизации для всех архитектур одинакова. Не собирается её ни кто решать в ближайшее время. Банальные вещи, компилятор не хочет замечать. А для ARM-архитектуры, там намного хуже оптимизация, чем для x86.

google translate:
The optimization problem is the same for all architectures. No one is going to solve it in the near future. Banal things, the compiler does not want to notice. And for the ARM architecture, there is much worse optimization than for x86.

Navigation

[0] Message Index

Go to full version