Recent

Author Topic: Pointer math and (un)packed arrays  (Read 1435 times)

Zoran

  • Hero Member
  • *****
  • Posts: 1464
    • http://wiki.lazarus.freepascal.org/User:Zoran
Pointer math and (un)packed arrays
« on: July 04, 2019, 11:29:55 pm »
There are many articles on internet about using pointer math with arrays (mostly for Delphi, but it should be same with FPC).
However, they just don't mention anything about whether these arrays have to be declared with packed keyword. It seems to me like a natural precondition for pointer math, but I might be wrong.

Are dynamic arrays always packed actually? I can't find it documented anywhere, but pointer math examples lead to that conclusion.
When exactly can we safely use pointer math on arrays?

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7507
Re: Pointer math and (un)packed arrays
« Reply #1 on: July 04, 2019, 11:53:18 pm »
Afaik on anything x86: yes, but that has only penalties, no hard exceptions for misaligned access. (well, except maybe for SSE2 but not SSE3 systems, but we have no SSE vector support up to that level)

The interesting case would be e.g. a record of a dword and a byte. So the record has 5 bytes, with both fields naturally aligned.

But the array would not, since the second record's dword would be at offset 5 in the array and thus misaligned.

440bx

  • Hero Member
  • *****
  • Posts: 1201
Re: Pointer math and (un)packed arrays
« Reply #2 on: July 05, 2019, 01:11:11 am »
they just don't mention anything about whether these arrays have to be declared with packed keyword. It seems to me like a natural precondition for pointer math, but I might be wrong.
They don't have to be packed.  The compiler doesn't care as long as it _knows_ what the packing state is.  The packing, as Marco, pointed out can result in a performance penalty when there are many elements/fields starting at an address that is not a multiple of their size but, the compiler can calculate the correct address either way.

When exactly can we safely use pointer math on arrays?
As long as you don't "lie" to the compiler, addressing through pointers or indexing should _functionally_ be the same.  IOW, the following should be an identity maintained by the compiler : ArrayBase[Index] = (ArrayBase + Index)^.  In both cases, the compiler knows it's dealing with an array and as long as it knows that, the generated code should be correct.

The fast way of getting in trouble is to declare some records (say 10 for example's sake) and then _overlay_ an array of records of that type over them.  In many cases, the alignment will be wrong because the compiler had no idea the 10 record declarations were to be part of an array.

There are ways to work around that, (I don't know how to do it in Pascal/FPC) but, in C/C++ you can tell the compiler the alignment for each individual record then you can declare a pointer to the first record/structure including its alignment.  Once the compiler knows the alignment and the alignment of the individual elements matches the alignment of the structure pointer, you go to town.

Here is an example in C,
Code: C  [Select]
  1. typedef struct TFORMATTED_OPTIONAL_HEADER
  2. {
  3. /* 00 */
  4.   MS_ALIGN
  5.   DWORD32      _00_OffsetMagic                 GCC_ALIGN = 0;
  6.   FMT_HEX      _00_OffsetMagicHex              ;
  7.   DWORD64      _00_ValueMagic                  = 0;
  8.   FMT_HEX      _00_ValueMagicHex               ;
  9.   FMT_DEC      _00_ValueMagicDec               ;
  10.   const WCHAR* _00_LabelMagic                  = OptionalHeaderFieldMeanings[0];
  11.   TDATA_TYPE   _00_Type32Magic                 = dt_word;
  12.   TDATA_TYPE   _00_Type64Magic                 = dt_word;
  13.   DWORD32      _00_DecimalMagic                = FALSE;
  14.  
  15. /* 01 */
  16.   MS_ALIGN
  17.   DWORD32      _01_OffsetMajorLinkerVersion
  18.                                                GCC_ALIGN   = 0;
  19.  
  20.   FMT_HEX      _01_OffsetMajorLinkerVersionHex ;
  21.   DWORD64      _01_ValueMajorLinkerVersion     = 0;
  22.   FMT_HEX      _01_ValueMajorLinkerVersionHex  ;
  23.   FMT_DEC      _01_ValueMajorLinkerVersionDec  ;
  24.   const WCHAR* _01_LabelMajorLinkerVersion     = OptionalHeaderFieldMeanings[1];
  25.   TDATA_TYPE   _01_Type32MajorLinkerVersion    = dt_byte;
  26.   TDATA_TYPE   _01_Type64MajorLinkerVersion    = dt_byte;
  27.   DWORD32      _01_DecimalMajorLinkerVersion   = TRUE;
  28. ...
  29. ...
  30. a bunch of definitions like the above
  31. ...
  32. ...
  33.  
  34. /* 29 */
  35.   MS_ALIGN
  36.   DWORD32      _29_OffsetNumberOfRvaAndSizes
  37.                                                 GCC_ALIGN = 0;
  38.  
  39.   FMT_HEX      _29_OffsetNumberOfRvaAndSizesHex ;
  40.   DWORD64      _29_ValueNumberOfRvaAndSizes     = 0;
  41.   FMT_HEX      _29_ValueNumberOfRvaAndSizesHex  ;
  42.   FMT_DEC      _29_ValueNumberOfRvaAndSizesDec  ;
  43.   const WCHAR* _29_LabelNumberOfRvaAndSizes   = OptionalHeaderFieldMeanings[29];
  44.   TDATA_TYPE   _29_Type32NumberOfRvaAndSizes    = dt_dword;
  45.   TDATA_TYPE   _29_Type64NumberOfRvaAndSizes    = dt_dword;
  46.   DWORD32      _29_DecimalNumberOfRvaAndSizes   = TRUE;
  47. } FORMATTED_OPTIONAL_HEADER, *PFORMATTED_OPTIONAL_HEADER;
  48.  

The MS_ALIGN and GCC_ALIGN are macros for MSVC and GCC to inform them of the alignment of every structure/record. 

Then the above structs/records that were individually defined can be accessed as an array by defining the following

// NOTE: the layout of each entry in the above structure MUST match the layout
//       of the entry structure below.

Code: C  [Select]
  1. typedef struct TOPTIONAL_HEADER_ENTRY
  2. {
  3.   MS_ALIGN
  4.   DWORD32      Offset                           GCC_ALIGN;
  5.   FMT_HEX      OffsetHex;
  6.   DWORD64      Value;
  7.   FMT_HEX      ValueHex;
  8.   FMT_DEC      ValueDec;
  9.   const WCHAR* Label;
  10.   TDATA_TYPE   Type32;
  11.   TDATA_TYPE   Type64;
  12.   DWORD32      DecimalOutput;
  13. } OPTIONAL_HEADER_ENTRY, *POPTIONAL_HEADER_ENTRY;
  14.  
That last definition is used to access the individually declared structs as if they were an ARRAY of OPTIONAL_HEADER_ENTRY and it works because the alignments match.

In Pascal/FPC instead of using a dirty trick like the above, you're better off having an array of initialized variables which, you can access with an index or a pointer since the compiler didn't get "lied" to.

HTH.



using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

ASerge

  • Hero Member
  • *****
  • Posts: 1411
Re: Pointer math and (un)packed arrays
« Reply #3 on: July 05, 2019, 03:58:40 am »
Are dynamic arrays always packed actually? I can't find it documented anywhere, but pointer math examples lead to that conclusion.
When exactly can we safely use pointer math on arrays?
I haven't seen any documentation either, but since ancient times all arrays (dynamic and static) have always been packed (not to be confused with bitpacked). At the moment this is true:
Code: Pascal  [Select]
  1. program Test;
  2. {$APPTYPE CONSOLE}
  3. {$MODE OBJFPC}
  4. {$ALIGN 8}
  5.  
  6. type
  7.   TRec = packed record
  8.     W: Word;
  9.     B: Byte;
  10.   end;
  11.   TRecArrayStatic = array[1..20] of TRec;
  12.   TRecArrayDynamic = array of TRec;
  13.  
  14.   TDynArrayStruct = packed record
  15.     RefCount: PtrInt;
  16.     High: SizeInt;
  17.   end;
  18.  
  19. var
  20.   A: TRecArrayDynamic;
  21. begin
  22.   SetLength(A, Length(TRecArrayStatic));
  23.   Writeln('SizeOf(TRec)=', SizeOf(TRec)); // =3
  24.   Writeln('SizeOf(TRecArrayStatic)=', SizeOf(TRecArrayStatic)); // =60
  25.   // Do not forget, the memory allocation is also aligned
  26.   Writeln('SizeOf(TRecArrayDynamic)=', MemSize(Pointer(A) - SizeOf(TDynArrayStruct)) - SizeOf(TDynArrayStruct)); // =68 on x32
  27.   Readln;
  28. end.
« Last Edit: July 05, 2019, 04:01:44 am by ASerge »


Zoran

  • Hero Member
  • *****
  • Posts: 1464
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: Pointer math and (un)packed arrays
« Reply #5 on: July 05, 2019, 03:05:48 pm »

440bx

  • Hero Member
  • *****
  • Posts: 1201
Re: Pointer math and (un)packed arrays
« Reply #6 on: July 05, 2019, 07:46:29 pm »
Yes, Sergej, from that, we should really conclude that arrays are always packed, but the existance of "packed" keyword in array declaration syntax and all this about packing and unpacking arrays makes this very confusing.
The usage of "packed" with arrays is mostly legacy.  It was mostly used to tell the compiler the format of some data it read from, usually, a file.

Here is what Jensen and Wirth's Pascal report says, in one place, about "packed":
Quote
B.2  Representation  of files

In  the  case  of  external  files  it  is  important  to  know  the
representation  of  files  chosen  by the  PASCAL  compiler.  Every
component  of  a  PASCAL-6000  file  occupies  an  integral  number  of
60-bit  words,  with  the  exception  of  files  with  component  type
char  (textfiles).  In this  case  PASCAL  files  use the  "standard"
representation  imDosed  by  CDC's  text  file  conventions:  10
characters  are  packed  into  each  word,  implying  that  the
procedures  put  and  get  include  packing  and  unpacking  operations
when  applied  to  textfiles.  The  end  of  a  line  is  represented  by
at  least  12,  right-adjusted  zero-bits  in  a  word.  Files
originating  from  card  decks follow  the  same  general  textfile
conventions.  Note  that  the  operating  system  removes  most  (but
not  necessary  all)  trailing  blanks  when  reading  cards,  Hence,
such  files  do  not  necessarily  consist  of  80-character  "card
images"
It used to be that a single character wasn't always represented by an 8 bit byte.  Quite often characters were "packed" together.  It was necessary to let the compiler know of this which was done through the use of "packed" in front of a character array.  The original Pascal included facilities to "pack" and "unpack" to take data from a hard-to-use packed state to a simpler/easier "unpacked" state. (a bit like IBM's packed and unpacked BCD)

Today, it's rare to see a Pascal compiler where "packed"/"unpacked" have an effect on array definitions.  These days, what the compiler really cares about is whether the array elements (records) are packed or not.  That will determine the size of the element which the compiler needs to calculate the address of each element.

I have attached a very simple little program that shows that "packed"/"unpacked" has no effect on an array definition (at least not in FPC/Delphi) but has a significant effect on the elements of the array (when they are records.)

Hopefully that takes care of the confusing part of "packed"/"unpacked" when applied to arrays.

HTH.
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

Zoran

  • Hero Member
  • *****
  • Posts: 1464
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: Pointer math and (un)packed arrays
« Reply #7 on: July 05, 2019, 09:57:14 pm »
Thank you, 440bx,
As expected from all said in this topic, your program shows that "packed" keyword in array declaration changes nothing.
Of course, the size of whole array changes with the size of its elements, and these, being records, do change its size when they are packed.

What you say in your post makes perfect sense.

When it is so, I still do find it strange that, while arrays have ignored "packed" keyword for decades already, neither Borland, nor Embarcadero, or FPC team don't say it explicitely in the documentation. They do not deprecate its usage, don't talk about it as legacy, but the docs still talk about packing arrays the way as if it really makes some difference.

440bx

  • Hero Member
  • *****
  • Posts: 1201
Re: Pointer math and (un)packed arrays
« Reply #8 on: July 05, 2019, 10:24:49 pm »
Thank you, 440bx,
...
I still do find it strange that, while arrays have ignored "packed" keyword for decades already, neither Borland, nor Embarcadero, or FPC team don't say it explicitely in the documentation. They do not deprecate its usage, don't talk about it as legacy, but the docs still talk about packing arrays the way as if it really makes some difference.
You're welcome and I completely agree about the general "silence" regarding the use of "packed" with arrays.  I've always wondered if there was some case I had missed with Delphi (and now FPC) where specifying "packed array" made a difference, I am yet to find a case where it does.

I worked with a Pascal compiler where "packed" made a difference, for instance, a packed array[0..31] of Boolean would take significantly less space (usually in the ballpark of 1/8th depending on the number of bits) than the same array unpacked - it bitpacked the Boolean elements.  For curiosity, I just tried it with FPC, and it totally ignored "packed" in that case too.

It might be worth suggesting improving the documentation regarding the use and the effect (or lack thereof) of "packed" with arrays.
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

PascalDragon

  • Hero Member
  • *****
  • Posts: 674
  • Compiler Developer
Re: Pointer math and (un)packed arrays
« Reply #9 on: July 06, 2019, 10:39:38 am »
FPC by default processes packed in a TP/Delphi compatible way, meaning that for arrays this is ignored. To get the behavior you described you can either use bitpacked or {$BitPacking On} (this will change the behavior of packed on array types to that of bitpacked).

This example

Code: Pascal  [Select]
  1. type
  2.   TArray1 = packed array[0..31] of Boolean;
  3.   TArray2 = bitpacked array[0..31] of Boolean;
  4.   {$BitPacking On}
  5.   TArray3 = packed array[0..31] of Boolean;
  6.  
  7. begin
  8.   Writeln('Size of TArray1: ', SizeOf(TArray1));
  9.   Writeln('Size of TArray2: ', SizeOf(TArray2));
  10.   Writeln('Size of TArray3: ', SizeOf(TArray3));
  11. end.

Will print:

Code: [Select]
Size of TArray1: 32
Size of TArray2: 4
Size of TArray3: 4

Zoran

  • Hero Member
  • *****
  • Posts: 1464
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: Pointer math and (un)packed arrays
« Reply #10 on: July 06, 2019, 02:59:24 pm »
Thank you, Sven.

Bitpacking obviously works with arrays, same way as with records, which is clearly documented.
The point is -- arrays are always packed, at least on byte boundaries. That is clearly different than records.

I expected that answer, but I could not find it explicitly stated anywhere in documentation (not by Borland, Embarcadero or FPC).
Therefore I asked for straight answer.

I worked with a Pascal compiler where "packed" made a difference, for instance, a packed array[0..31] of Boolean would take significantly less space

So, you only saw further packing possibility. That is behaviour you get in FPC with {$bitpacking on}.
But still, unlike records, "unpacked" arrays are actually packed, at least on byte boundaries.

440bx

  • Hero Member
  • *****
  • Posts: 1201
Re: Pointer math and (un)packed arrays
« Reply #11 on: July 06, 2019, 09:03:57 pm »
Thank you, Sven.

Bitpacking obviously works with arrays, same way as with records, which is clearly documented.
The point is -- arrays are always packed, at least on byte boundaries. That is clearly different than records.

I expected that answer, but I could not find it explicitly stated anywhere in documentation (not by Borland, Embarcadero or FPC).
Therefore I asked for straight answer.

I worked with a Pascal compiler where "packed" made a difference, for instance, a packed array[0..31] of Boolean would take significantly less space

So, you only saw further packing possibility. That is behaviour you get in FPC with {$bitpacking on}.
But still, unlike records, "unpacked" arrays are actually packed, at least on byte boundaries.
I hope I can explain this in a way that is understandable.  It is basically incorrect to think of the array itself as being packed or unpacked since an array will always be a collection of elements one after the other without gaps between them (which, because of that, you are thinking it's always packed.)   The modifier "packed" doesn't change that, what it changes (if applicable) is how the _elements_ of the array will be stored (but, they will still be stored one after the other without gaps, that won't change.) In the case of Boolean, packed changes how the _Booleans_ will be stored/represented not how they will be arranged in the array regardless of how they are represented.

IOW, it's not the array that is "packed", what's packed are the elements of the array (if and when applicable.)

For instance, in "packed array[0..n] of qword", the "packed" is superfluous since "qword" can't be packed, whereas, it is applicable in "packed array[0..n] of Boolean" since "Boolean" can be packed into a single bit.

HTH.

using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

ASerge

  • Hero Member
  • *****
  • Posts: 1411
Re: Pointer math and (un)packed arrays
« Reply #12 on: July 07, 2019, 01:02:41 am »
...whereas, it is applicable in "packed array[0..n] of Boolean" since "Boolean" can be packed into a single bit.
Don't confused with "bitpacked". "packet" - this is the absence of gaps between the elements (byte alignment). The elements themselves can be "packet" or not, this has nothing to do with the array modifier. Therefore, for arrays, specifying "packed" is syntactic sugar.

440bx

  • Hero Member
  • *****
  • Posts: 1201
Re: Pointer math and (un)packed arrays
« Reply #13 on: July 07, 2019, 01:42:11 am »
Don't confused with "bitpacked". "packet" - this is the absence of gaps between the elements (byte alignment). The elements themselves can be "packet" or not, this has nothing to do with the array modifier. Therefore, for arrays, specifying "packed" is syntactic sugar.
"bitpacked" is an FPC extension.  The "normal" way in Pascal to have a bitpacked array of Boolean is to specify "packed array[low..high] of Boolean".  The "packed" modifier isn't applied to the array, it's applied to the array elements.

The example that @PascalDragon gave shows it very clearly
Code: Pascal  [Select]
  1. type
  2.   TArray1 = packed array[0..31] of Boolean;
  3.   TArray2 = bitpacked array[0..31] of Boolean;
  4.   {$BitPacking On}
  5.   TArray3 = packed array[0..31] of Boolean;
  6.  
  7. begin
  8.   Writeln('Size of TArray1: ', SizeOf(TArray1));
  9.   Writeln('Size of TArray2: ', SizeOf(TArray2));
  10.   Writeln('Size of TArray3: ', SizeOf(TArray3));
  11. end.
when "{$BitPacking On}" is enabled then the "packed" modifier works as it was intended in "pure" Pascal.  It acts on the elements of the array not on the way array elements are arranged in memory (which is always consecutive, packed or not.)


 
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

ASerge

  • Hero Member
  • *****
  • Posts: 1411
Re: Pointer math and (un)packed arrays
« Reply #14 on: July 08, 2019, 04:15:52 pm »
"bitpacked" is an FPC extension.  The "normal" way in Pascal to have a bitpacked array of Boolean is to specify "packed array[low..high] of Boolean".  The "packed" modifier isn't applied to the array, it's applied to the array elements.
I don't even know where you got that from. There is no such.
From FPC documentation (3.3 Structured Types): The byte packing mechanism is simple: the compiler aligns each element of the structure on the first available byte boundary. A structure element, but not elements of its elements. Array is structure type.
Another example, from the documentation of the ISO 10206:1990 (6.4.3.1 General): The designation of a structured-type as packed shall affect the representation in data-storage of that structured-type only; i.e., if a component is itself structured, the component's representation in data-storage shall be packed only if the type of the component is designated packed.
And as proof you began to give examples from bitpacket mode, which I immediately separated.