### Bookstore

 Computer Math and Games in Pascal (preview) Lazarus Handbook

### Author Topic: Where's the optimization? ... (ARM)  (Read 515 times)

#### AlanTheBeast

• Sr. Member
• Posts: 306
• My software never cras....
##### Where's the optimization? ... (ARM)
« on: September 26, 2022, 07:52:30 pm »
In 32 bit ARM.

Code: Pascal  [Select][+][-]
1.     UBXMSEi := 7;
2.     UBXCSEi := 7;
3.     CFLA := 0;
4.     CFLB := 0;
5.

-O2 on or not.
yields:

Code: Pascal  [Select][+][-]
1. # [489] UBXMSEi := 7;
2.         mov     r1,#7
3.         ldr     r0,.Lj475
4.         strh    r1,[r0]
5. # [490] UBXCSEi := 7;
6.         mov     r0,#7
7.         ldr     r1,.Lj473
8.         strh    r0,[r1]
9. # [491] CFLA := 0;
10.         mov     r0,#0
11.         ldr     r1,.Lj486
12.         strb    r0,[r1]
13. # [492] CFLB := 0;
14.         mov     r1,#0
15.         ldr     r0,.Lj488
16.         strb    r1,[r0]
17.

Would have thought 1 load of 7 (or 0 the CFLA/B) and then stores to ea. location ...

And perhaps register adds to point to the next location rather than loading 2nd pointer for ea. store.  (Yes, ops in the order of the declaration - even threw in an {\$ALIGN 2} there....)

Or does more optimization take place in the assembler pas?

I suppose I could put them in a record with a case overlay (effectively absolute) to improve that, but expected it out of the compiler.

NOTE: not critical here, running at about 25 - 30 Hz normally - but really expected better.
..Samuel Clemens.

#### Laksen

• Hero Member
• Posts: 702
##### Re: Where's the optimization? ... (ARM)
« Reply #1 on: September 26, 2022, 09:21:41 pm »
It's a known optimization possibility that the compiler isn't smart enough to do currently.
On paper it's simple enough, but once you are in the node tree it can get tricky to implement

#### AlanTheBeast

• Sr. Member
• Posts: 306
• My software never cras....
##### Re: Where's the optimization? ... (ARM)
« Reply #2 on: September 26, 2022, 09:49:55 pm »
It's a known optimization possibility that the compiler isn't smart enough to do currently.
On paper it's simple enough, but once you are in the node tree it can get tricky to implement

Actually why I'm curious if there are assembler level optimizations that are invisible to us up here in so-called high level land.

For me, the above would be a no-brainer - but I used to write x86 and other less RISC oriented assembler ...

There is another trick, of course, declare a 64 bit word a(absolute) t the location where those 4 vars are located and clobber them with 1 write... may go there yet.

And then of course some time soon, this project will go 64bit and I'll be obsessing over other little bits...

(To be clear, the question is for a pretty key bit of code where receiving data is concerned, it's not that frequent but I need to get the function done in r/t close to when measurements were made...)
..Samuel Clemens.

#### PascalDragon

• Hero Member
• Posts: 4759
• Compiler Developer
##### Re: Where's the optimization? ... (ARM)
« Reply #3 on: September 27, 2022, 09:46:58 am »
It's a known optimization possibility that the compiler isn't smart enough to do currently.
On paper it's simple enough, but once you are in the node tree it can get tricky to implement

Actually why I'm curious if there are assembler level optimizations that are invisible to us up here in so-called high level land.

Well, essentially anything located in the assembler optimizer (in this case either the Aarch64 specific one or the general ARM one). In both cases the results will be visible in the assembly output however.

You can always open a feature request for that optimization, I'm rather sure someone like FPK or Gareth would like to play with that...

#### Seenkao

• Sr. Member
• Posts: 417
##### Re: Where's the optimization? ... (ARM)
« Reply #4 on: September 28, 2022, 10:00:54 am »
I'm rather sure someone like FPK or Gareth would like to play with that...
https://gitlab.com/freepascal.org/fpc/source/-/issues/39781

Проблема оптимизации для всех архитектур одинакова. Не собирается её ни кто решать в ближайшее время. Банальные вещи, компилятор не хочет замечать. А для ARM-архитектуры, там намного хуже оптимизация, чем для x86.