Recent

Author Topic: can the compiler be told to align character arrays on a byte boundary ?  (Read 10201 times)

440bx

  • Hero Member
  • *****
  • Posts: 1086
Hello,

I've been trying unsuccessfully to have FPC align character arrays on a byte boundary. 

The question is: Is there a way to tell FPC to align character arrays on a byte boundary ?

Thank you.

PS: I've tried many ways (some creative) which did not work.  The following sample program shows a few of them.

Code: Pascal  [Select]
  1. {$MODE OBJFPC                      }
  2.  
  3. {$MODESWITCH     ADVANCEDRECORDS   }
  4. {$MODESWITCH     TYPEHELPERS       }
  5. {$MODESWITCH     ALLOWINLINE       }
  6. {$MODESWITCH     RESULT            }
  7. {$MODESWITCH     PROPERTIES        }
  8.  
  9. {$MODESWITCH     ANSISTRINGS-      }
  10. {$MODESWITCH     AUTODEREF-        }
  11. {$MODESWITCH     UNICODESTRINGS-   }
  12. {$MODESWITCH     POINTERTOPROCVAR- }
  13.  
  14. {$LONGSTRINGS    OFF               }
  15. {$WRITEABLECONST ON                }
  16. {$TYPEDADDRESS   ON                }
  17. {$ALIGN          ON                }
  18.  
  19.  
  20. (*
  21. 0000 afb0 46 50 43 20 33 2e 30 2e  34 20 5b 32 30 31 37 2f  FPC 3.0.4 [2017/
  22. 0000 afc0 31 30 2f 30 36 5d 20 66  6f 72 20 78 38 36 5f 36  10/06] for x86_6
  23. 0000 afd0 34 20 2d 20 57 69 6e 36  34 00 00 00 00 00 00 00  4 - Win64.......
  24. 0000 afe0 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
  25. 0000 aff0 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
  26.  
  27. // seems to always align on a 16 byte boundary no matter the setting of
  28. // align or packed record
  29.  
  30. 0000 b000 66 69 72 73 74 20 20 63  68 61 72 61 63 74 65 72  first  character
  31. 0000 b010 20 61 72 72 61 79 00 00  00 00 00 00 00 00 00 00   array..........
  32.  
  33. 0000 b020 73 65 63 6f 6e 64 20 63  68 61 72 61 63 74 65 72  second character
  34. 0000 b030 20 61 72 72 61 79 00 00  00 00 00 00 00 00 00 00   array..........
  35.  
  36. 0000 b040 74 68 69 72 64 20 20 63  68 61 72 61 63 74 65 72  third  character
  37. 0000 b050 20 61 72 72 61 79 00 00  00 00 00 00 00 00 00 00   array..........
  38.  
  39. 0000 b060 66 6f 75 72 74 68 20 63  68 61 72 61 63 74 65 72  fourth character
  40. 0000 b070 20 61 72 72 61 79 00 00  00 00 00 00 00 00 00 00   array..........
  41.  
  42. 0000 b080 61 6e 64 20 73 6f 20 6f  6e 00 00 00 00 00 00 00  and so on.......
  43.  
  44. 0000 b090 74 65 78 74 20 69 6e 20  61 20 72 65 63 6f 72 64  text in a record
  45. 0000 b0a0 20 31 00 00 00 00 00 00  00 00 00 00 00 00 00 00   1..............
  46.  
  47. 0000 b0b0 74 65 78 74 20 69 6e 20  61 20 72 65 63 6f 72 64  text in a record
  48. 0000 b0c0 20 32 00 00 00 00 00 00  00 00 00 00 00 00 00 00   2..............
  49.  
  50. 0000 b0d0 74 65 78 74 20 69 6e 20  61 20 72 65 63 6f 72 64  text in a record
  51. 0000 b0e0 20 33 00 00 00 00 00 00  00 00 00 00 00 00 00 00   3..............
  52.  
  53. 0000 b0f0 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
  54. 0000 b100 39 54 68 69 73 20 62 69  6e 61 72 79 20 68 61 73  9This binary has
  55. 0000 b110 20 6e 6f 20 73 74 72 69  6e 67 20 63 6f 6e 76 65   no string conve
  56. 0000 b120 72 73 69 6f 6e 20 73 75  70 70 6f 72 74 20 63 6f  rsion support co
  57. *)
  58.  
  59.  
  60.  
  61. program CharacterArraysAlignment;
  62.  
  63.  
  64. const
  65.   {$align 1}
  66.   s1             = 'first  character array';
  67.   s2             = 'second character array';
  68.   s3             = 'third  character array';
  69.   s4             = 'fourth character array';
  70.   s5             = 'and so on';
  71.  
  72.   p1             : pchar = s1;
  73.   p2             : pchar = s2;
  74.   p3             : pchar = s3;
  75.   p4             : pchar = s4;
  76.   p5             : pchar = s5;
  77.  
  78.   l1             = length(s1);
  79.   l2             = length(s2);
  80.   l3             = length(s3);
  81.   l4             = length(s4);
  82.   l5             = length(s5);
  83.  
  84.  
  85. type
  86.   {$align 1}
  87.   {$packrecords 1}
  88.   tmessages = bitpacked record
  89.     const m1          = 'text in a record 1';
  90.     const m2          = 'text in a record 2';
  91.     const m3          = 'text in a record 3';
  92.   end;
  93.  
  94. var
  95.   messages      : tmessages;
  96.  
  97. const
  98.   mp1           : pchar = messages.m1;
  99.   mp2           : pchar = messages.m2;
  100.   mp3           : pchar = messages.m3;
  101.  
  102.  
  103.  
  104. begin
  105.   writeln(p1, l1);
  106.   writeln(p2, l2);
  107.   writeln(p3, l3);
  108.   writeln(p4, l4);
  109.   writeln(p5, l5);
  110.  
  111.   writeln(mp1);
  112.   writeln(mp2);
  113.   writeln(mp3);
  114.  
  115. end.
  116.  
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 8679
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #1 on: August 17, 2018, 09:04:49 pm »
https://www.freepascal.org/docs-html/current/prog/progsu9.html#x16-150001.2.9

{$CODEALIGN VARMAX=1}{$CODEALIGN RECORDMAX=1}{$CODEALIGN LOCALMAX=1}
Usually the compiler will pick natural alignment for records and its members. (AND you specified {$align on}, which forces that...) The above should prevent that, but in my experience not on all platforms: some CPU's rely on naturally aligned data.
If this works for you, note that you just slowed down your code *a lot*!. (With the exception of 8 bit processors)

Also note that half of the switches here are duplicates and align on can even clash with other alignment settings.
Code: Pascal  [Select]
  1. {$MODE OBJFPC                      }
  2.  
  3. {$MODESWITCH     ADVANCEDRECORDS   }
  4. {$MODESWITCH     TYPEHELPERS       }
  5. {$MODESWITCH     ALLOWINLINE       }
  6. {$MODESWITCH     RESULT            }  // Is already on in objfpc mode
  7. {$MODESWITCH     PROPERTIES        }  // is already on in objfpc mode
  8.  
  9. {$MODESWITCH     ANSISTRINGS-      }  // default is already H- in objfpc mode: the standard type is shortstring
  10. {$MODESWITCH     AUTODEREF-        }  // is already off in objfpc mode
  11. {$MODESWITCH     UNICODESTRINGS-   }  // is already off in objfpc mode: the standard is shortstring
  12. {$MODESWITCH     POINTERTOPROCVAR- }  // is already off in objfpc mode
  13.  
  14. {$LONGSTRINGS    OFF               }  // is already off in objfpc mode: standard is shortstring
  15. {$WRITEABLECONST ON                }
  16. {$TYPEDADDRESS   ON                }
  17. {$ALIGN          ON                }  // forces natural alignment...
Better clean up a bit....
« Last Edit: August 17, 2018, 09:33:57 pm by Thaddy »
Most people that want to use threading should learn to patch their jeans first: use a needle.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7355
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #2 on: August 17, 2018, 09:26:59 pm »
Add

Code: Pascal  [Select]
  1. writeln(sizeof(messages));

Those literals are only part of the namespace of the record, they are not members. Moreover afaik literals are assignment compatible to ansistrings so have a record at negative offset, which contains non-byte entities.

Thaddy

  • Hero Member
  • *****
  • Posts: 8679
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #3 on: August 17, 2018, 09:40:13 pm »
I overlooked that. Indeed. They are literals, not members.
Remove the consts in the record and my above advice applies. Although note that shortstrings also have a (bytesized) negative offset. the length byte at index 0.
For it to work properly you probably need to use fixed length arrays of AnsiChar as record members.
And in the case of shortstring, also give them the proper length: shortstring[22]; etc. if they are record members.Otherwise the compiler will reserve 256 bytes for every shortstring.
« Last Edit: August 17, 2018, 09:45:30 pm by Thaddy »
Most people that want to use threading should learn to patch their jeans first: use a needle.

440bx

  • Hero Member
  • *****
  • Posts: 1086
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #4 on: August 17, 2018, 10:11:47 pm »
https://www.freepascal.org/docs-html/current/prog/progsu9.html#x16-150001.2.9

{$CODEALIGN VARMAX=1}{$CODEALIGN RECORDMAX=1}{$CODEALIGN LOCALMAX=1}
Usually the compiler will pick natural alignment for records and its members. (AND you specified {$align on}, which forces that...) The above should prevent that, but in my experience not on all platforms: some CPU's rely on naturally aligned data.
If this works for you, note that you just slowed down your code *a lot*!. (With the exception of 8 bit processors)

Also note that half of the switches here are duplicates and align on can even clash with other alignment settings.

Thank you Thaddy, that was useful.  No matter the settings, it still insists on 8 byte alignment.  That's an improvement.  There doesn't seem to be a way to convince the compiler to align those arrays on a byte or word boundary, which is unfortunate.
There shouldn't be any slow down because the natural alignment of a (single byte) character array is byte alignment (they are accessed byte at a time, as long as it is that way, it doesn't make any difference.)  I'm not changing the alignment of any ordinal types (those would definitely be affected) or code for that matter.

I'm not worried about telling the compiler multiple times what I want.  I want to be certain that nothing resembling managed types gets in the code.

Add

Code: Pascal  [Select]
  1. writeln(sizeof(messages));

Those literals are only part of the namespace of the record, they are not members. Moreover afaik literals are assignment compatible to ansistrings so have a record at negative offset, which contains non-byte entities.
Yes, I was aware of that but, I was throwing stuff at the compiler to see what, if anything, made a difference .  I do appreciate your pointing it out, though. Thank you.  Just FYI, the literals are still kept as plain literals, no size is present in the executable file (at least not in Windows.)  I guess the compatibility is done at runtime through library code. 

8 byte alignment is an improvement over 16 byte alignment.  That's good enough for most cases.
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #5 on: August 18, 2018, 05:10:12 am »
These are constants, so I assume {$CodeAlign CONSTMAX=1} should do it.

440bx

  • Hero Member
  • *****
  • Posts: 1086
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #6 on: August 18, 2018, 05:42:14 am »
These are constants, so I assume {$CodeAlign CONSTMAX=1} should do it.
Yes, I tried that.  In spite of that, it still aligns to 8 bytes.   I couldn't find a way to make it align to 1, 2 or 4 bytes.

ETA
I looked at the executable file in hex at every attempt.  It was always aligned on an 8 byte boundary, no matter what.   With no directive, it aligns to a 16 byte boundary.  Those seem to be the only two possible results.
« Last Edit: August 18, 2018, 05:45:21 am by 440bx »
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #7 on: August 18, 2018, 05:48:52 am »
These are constants, so I assume {$CodeAlign CONSTMAX=1} should do it.
Yes, I tried that.  In spite of that, it still aligns to 8 bytes.   I couldn't find a way to make it align to 1, 2 or 4 bytes.
I just tried it on Win32.

Without CONSTMAX=1:
Code: ASM  [Select]
  1. .section .data.n__$CHARACTERARRAYSALIGNMENT$_Ld1,"d"
  2.         .balign 16
  3. .globl  _$CHARACTERARRAYSALIGNMENT$_Ld1
  4. _$CHARACTERARRAYSALIGNMENT$_Ld1:
  5. # [73] p1             : pchar = s1;
  6.         .ascii  "first  character array\000"

With CONSTMAX=1:
Code: ASM  [Select]
  1. .section .data.n__$CHARACTERARRAYSALIGNMENT$_Ld1,"d"
  2. .globl  _$CHARACTERARRAYSALIGNMENT$_Ld1
  3. _$CHARACTERARRAYSALIGNMENT$_Ld1:
  4. # [73] p1             : pchar = s1;
  5.         .ascii  "first  character array\000"

So possibly 32-bit vs. 64-bit difference.

Edit:
I think you are right, but on 32-bit version it is 4 instead of 8.
« Last Edit: August 18, 2018, 05:58:40 am by engkin »

440bx

  • Hero Member
  • *****
  • Posts: 1086
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #8 on: August 18, 2018, 06:04:28 am »
Edit:
I think you are right, but on 32-bit version it is 4 instead of 8.
I didn't try it on 32 bit.  I focused on 64 bit because it's what I'm using.  At least it can be made to align on an 8 byte boundary, it will have to do.

Thank you for checking it out.
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 8679
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #9 on: August 18, 2018, 09:04:30 am »
We overlooked that the memory manager also comes into play: I suspect the memory manager aligns all allocations to at least natural alignment.
That would mean what we observe: the start of the record is at a minimum naturally aligned (2/4/8. for 16/32/64 bit CPU's)  Its content can be byte aligned, though.
So in theory you could write a byte aligning memory manager  8-). I can confirm that fixed array of char record members have no overhead and *are* byte aligned.
I would have a look what the cmem C memory manager does.
Most people that want to use threading should learn to patch their jeans first: use a needle.

LemonParty

  • New Member
  • *
  • Posts: 28
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #10 on: August 18, 2018, 05:25:26 pm »
Put your data into a packed record:
Code: Pascal  [Select]
  1. type
  2.  TRec = packed record
  3.   ...
  4.  end;
  5.  
The packed record is ignoring alignment.
(no additional switches required)
Then use a constant record.

soerensen3

  • Full Member
  • ***
  • Posts: 162
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #11 on: August 20, 2018, 11:48:02 pm »
Just for the record, you are piping the console output to a file and look at it with a hex editor. Right? Are you sure that WriteLn just outputs your data as is or maybe it is doing a conversion? It would i.m.o. be better to write the record to a file stream to test that.
Lazarus 1.9 with FPC 3.0.4
Target: Manjaro Linux 64 Bit (4.9.68-1-MANJARO)

440bx

  • Hero Member
  • *****
  • Posts: 1086
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #12 on: August 21, 2018, 12:07:08 am »
Just for the record, you are piping the console output to a file and look at it with a hex editor. Right? Are you sure that WriteLn just outputs your data as is or maybe it is doing a conversion? It would i.m.o. be better to write the record to a file stream to test that.
I'm not sure if you were referring to what I'm doing but, just in case, I am not piping console output.  I let the compiler generate the .exe file and I look at the resulting .exe in hex using a hex utility.
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

soerensen3

  • Full Member
  • ***
  • Posts: 162
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #13 on: August 21, 2018, 04:15:24 pm »
Just for the record, you are piping the console output to a file and look at it with a hex editor. Right? Are you sure that WriteLn just outputs your data as is or maybe it is doing a conversion? It would i.m.o. be better to write the record to a file stream to test that.
I'm not sure if you were referring to what I'm doing but, just in case, I am not piping console output.  I let the compiler generate the .exe file and I look at the resulting .exe in hex using a hex utility.

Ok it was an assumption because of the WriteLn's in your code. But that's probably just to prevent the compiler from removing the records from your code.
Lazarus 1.9 with FPC 3.0.4
Target: Manjaro Linux 64 Bit (4.9.68-1-MANJARO)

440bx

  • Hero Member
  • *****
  • Posts: 1086
Re: can the compiler be told to align character arrays on a byte boundary ?
« Reply #14 on: August 21, 2018, 04:54:37 pm »
But that's probably just to prevent the compiler from removing the records from your code.
You got it.
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.