Recent

Author Topic: What is more (memory) efficient  (Read 4760 times)

JdeHaan

  • Full Member
  • ***
  • Posts: 171
What is more (memory) efficient
« on: February 05, 2025, 01:40:47 pm »
Using this example:

Code: Pascal  [Select][+][-]
  1.   TValueType = (vtBoolean, vtInteger, vtReal);
  2.   TValue =  record
  3.     case Typ: TValueType of
  4.       vtBoolean:  (BoolVal: Boolean);
  5.       vtInteger:  (IntVal: Int64);
  6.       vtReal:     (RealVal: Double);
  7.   end;
  8.  
  9.   TValueArray = array [1..10] of TValue;
  10.  
The size of the record is 16 bytes, and the array occupies 160 bytes.

But if I use 'packed', like so:

Code: Pascal  [Select][+][-]
  1.   TValue = packed record
  2.   etc
  3.  
Then the record size is 9 bytes and the array is 90 bytes.

But the question is, what is more efficient in a 64 bit CPU, considering aligning and processing speed? How does the FPC compiler handle this?

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 410
  • I use FPC [main] 💪🐯💪
Re: What is more (memory) efficient
« Reply #1 on: February 05, 2025, 01:56:34 pm »
Strange, on my machine it's 12 bytes and 120 bytes
FPC[main] Win x64

And I have a question - where did 4 of those 12 bytes go? %)  I've never used union tightly, and I always thought that its size is equal to the largest in union

In D12CE it's 9 and 90.

UPD: Yep, after I switched to {$mode delphi} - I got 9 and 90 too
but why 9 ? one byte occupies a type ?
UPD2: Yep, that's exactly what it is.
« Last Edit: February 05, 2025, 02:02:45 pm by ALLIGATOR »
I may seem rude - please don't take it personally

alpine

  • Hero Member
  • *****
  • Posts: 1412
Re: What is more (memory) efficient
« Reply #2 on: February 05, 2025, 02:00:45 pm »
I've never used union tightly, and I always thought that its size is equal to the largest in union

+ For the Typ field.
« Last Edit: February 05, 2025, 02:02:34 pm by alpine »
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

JdeHaan

  • Full Member
  • ***
  • Posts: 171
Re: What is more (memory) efficient
« Reply #3 on: February 05, 2025, 02:03:02 pm »
Indeed strange.
I think it's automatically padding the typ field to 8 bytes.

Forgot to mention I'm using MacOS Sequoia 15.3 on an Intel Macbook


Lazarus 3.99 (rev main_3_99-2668-g6b352d830e) FPC 3.3.1 x86_64-darwin-cocoa

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12291
  • Debugger - SynEdit - and more
    • wiki
Re: What is more (memory) efficient
« Reply #4 on: February 05, 2025, 02:09:30 pm »
But the question is, what is more efficient in a 64 bit CPU, considering aligning and processing speed? How does the FPC compiler handle this?

I don't know the answer...

But, when you say "64 bit CPU" => do you mean any 64 bit CPU? I figure that includes Apple M CPUs, and other RISC... Or do you mean strictly Intel/AMD x86 CPUs?

From what I recall to have heard (and really not sure) RISC CPU may either be slow to access badly aligned data, or even throw an exception and not read the data at all.

Strictly INTEL/AMD, I don't know (and therefore also don't know if there would be a different by more recent vs older). I would suspect that maybe the double value could be affected. But again, I don't know.


How badly do you need to save that one byte? (Understanding that your real array is probably a lot bigger, but still...)
It isn't like you get much more data into a cache line, so you don't really gain speed by reducing cache misses.

440bx

  • Hero Member
  • *****
  • Posts: 6329
Re: What is more (memory) efficient
« Reply #5 on: February 05, 2025, 02:16:11 pm »
But the question is, what is more efficient in a 64 bit CPU, considering aligning and processing speed?
For speed, the unpacked record will be aligned therefore it will save the CPU having to align the data.

How does the FPC compiler handle this?
Depends, if the record is packed then the compiler "squeezes" all the fields so there are no unused bytes between them.  If it's not packed then the int64 and double cause 7 filler bytes to align those fields.

As far as "efficient", speed-wise the unpacked (i.e., aligned) record is more efficient but storage-wise it is not efficient. It's the exact opposite for the packed record (sort of "the other side of the coin".)

I don't know the details of the ARM architecture but, the vast majority of CPUs "prefer" the data types to be aligned on their natural boundaries.

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Warfley

  • Hero Member
  • *****
  • Posts: 2049
Re: What is more (memory) efficient
« Reply #6 on: February 05, 2025, 02:16:36 pm »
The Compiler aligns memory to be most efficient for processing. If you use packed, packrecords or alignment directives you overwrite the behavior to have a specific memory layout, but potentially at the cost of execution speed.

Generally speaking compilers usually prefer execution speed over memory consumption because memory is cheap (32 gigs of ram cost like 50€) while processing power is expensive (single core processing speed hasn't increased in decades).

That said, it always depends on the specific code. If there is really a performance difference depends on the exact data and access pattern and so on. What I have outlined above is just the general trend and intention behind alignment and packed

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 410
  • I use FPC [main] 💪🐯💪
Re: What is more (memory) efficient
« Reply #7 on: February 05, 2025, 02:43:58 pm »
my bench:

Code: Pascal  [Select][+][-]
  1. program test;
  2. {$mode delphi}
  3. {$optimization on}
  4.  
  5. uses SysUtils;
  6.  
  7. type
  8.   TValueType = (vtBoolean, vtInteger, vtReal);
  9.   PValueNotPacked = ^TValueNotPacked;
  10.   TValueNotPacked = record
  11.     case Typ: TValueType of
  12.       vtBoolean:  (BoolVal: Boolean);
  13.       vtInteger:  (IntVal: Int64);
  14.       vtReal:     (RealVal: Double);
  15.   end;
  16.   PValuePacked = ^TValuePacked;
  17.   TValuePacked = packed record
  18.     case Typ: TValueType of
  19.       vtBoolean:  (BoolVal: Boolean);
  20.       vtInteger:  (IntVal: Int64);
  21.       vtReal:     (RealVal: Double);
  22.   end;
  23.  
  24. const
  25.   size = 512*1024*1024;
  26.  
  27. procedure bench_notpacked1;
  28. var
  29.   arr_notpacked: array of TValueNotPacked;
  30.   i, s: NativeInt;
  31.   t: TDateTime;
  32. begin
  33.   Write(SizeOf(TValueNotPacked), ', ');
  34.   SetLength(arr_notpacked, size);
  35.   t:=Now;
  36.   begin
  37.     for i:=0 to size-1 do
  38.       s+=arr_notpacked[i].IntVal;
  39.   end;
  40.   WriteLn('Not packed 1: ',(Now-t)*MSecsPerDay:0:0, ' ms.');
  41.   SetLength(arr_notpacked, 0);
  42. end;
  43.  
  44. procedure bench_notpacked2;
  45. var
  46.   arr_notpacked: array of TValueNotPacked;
  47.   s: NativeInt;
  48.   t: TDateTime;
  49.   p, pend: PValueNotPacked;
  50. begin
  51.   Write(SizeOf(TValueNotPacked), ', ');
  52.   SetLength(arr_notpacked, size);
  53.   t:=Now;
  54.   begin
  55.     p:=@arr_notpacked[0];
  56.     pend:=@arr_notpacked[size-1];
  57.     while p<=pend do
  58.     begin
  59.       s+=p.IntVal;
  60.       inc(p);
  61.     end;
  62.   end;
  63.   WriteLn('Not packed 2: ',(Now-t)*MSecsPerDay:0:0, ' ms.');
  64.   SetLength(arr_notpacked, 0);
  65. end;
  66.  
  67. procedure bench_packed1;
  68. var
  69.   arr_packed: array of TValuePacked;
  70.   i, s: NativeInt;
  71.   t: TDateTime;
  72. begin
  73.   Write(SizeOf(TValuePacked), ', ');
  74.   SetLength(arr_packed, size);
  75.   t:=Now;
  76.   begin
  77.     for i:=0 to size-1 do
  78.       s+=arr_packed[i].IntVal;
  79.   end;
  80.   WriteLn('Packed 1: ',(Now-t)*MSecsPerDay:0:0, ' ms.');
  81.   SetLength(arr_packed, 0);
  82. end;
  83.  
  84. procedure bench_packed2;
  85. var
  86.   arr_packed: array of TValuePacked;
  87.   s: NativeInt;
  88.   t: TDateTime;
  89.   p, pend: PValuePacked;
  90. begin
  91.   Write(SizeOf(TValuePacked), ', ');
  92.   SetLength(arr_packed, size);
  93.   t:=Now;
  94.   p:=@arr_packed[0];
  95.   pend:=@arr_packed[size-1];
  96.   while p<=pend do
  97.   begin
  98.     s+=p.IntVal;
  99.     inc(p);
  100.   end;
  101.   WriteLn('Packed 2: ',(Now-t)*MSecsPerDay:0:0, ' ms.');
  102.   SetLength(arr_packed, 0);
  103. end;
  104.  
  105. begin
  106.   bench_notpacked1;
  107.   bench_notpacked2;
  108.   bench_packed1;
  109.   bench_packed2;
  110.   ReadLn;
  111. end.
  112.  

i7-8750H laptop, DDR4

Code: Pascal  [Select][+][-]
  1. 16, Not packed 1: 558 ms.
  2. 16, Not packed 2: 467 ms.
  3. 9, Packed 1: 425 ms.
  4. 9, Packed 2: 310 ms.
  5.  
« Last Edit: February 05, 2025, 03:31:51 pm by ALLIGATOR »
I may seem rude - please don't take it personally

JdeHaan

  • Full Member
  • ***
  • Posts: 171
Re: What is more (memory) efficient
« Reply #8 on: February 05, 2025, 03:29:30 pm »
On my MacBookPro from 2019 (2,6 GHz 6-Core Intel Core i7; 16 GB 2667 MHz DDR4):

16, Not packed 1: 2003 ms.
16, Not packed 2: 609 ms.
9, Packed 1: 461 ms.
9, Packed 2: 373 ms.


So, conclusion is to use packed and pointer arithmetic?

Note, after the 3rd run the first Not Packed drops to around 700ms - probably due to caching?
« Last Edit: February 05, 2025, 03:37:26 pm by JdeHaan »

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 410
  • I use FPC [main] 💪🐯💪
Re: What is more (memory) efficient
« Reply #9 on: February 05, 2025, 03:37:56 pm »
because memory is cheap (32 gigs of ram cost like 50€)

cheap per gigabyte, but expensive for access time - so all algorithms for processing large amounts of data should primarily look at ways to optimize the location of data in memory, e.g. AoS and SoA
But it can also be the case that the computation is more expensive (time consuming) per “byte” than the reading of that “byte” from external memory itself
I may seem rude - please don't take it personally

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 410
  • I use FPC [main] 💪🐯💪
Re: What is more (memory) efficient
« Reply #10 on: February 05, 2025, 03:41:04 pm »
So, conclusion is to use packed and pointer arithmetic?

The best answer is to test on your data, on your algorithm, and on all possible hardware where your algorithm will run...
Even on x86_64 the performance of the same code may vary from generation to generation
Use SIMD and proven algorithms that big minds have already worked on
I may seem rude - please don't take it personally

Warfley

  • Hero Member
  • *****
  • Posts: 2049
Re: What is more (memory) efficient
« Reply #11 on: February 05, 2025, 03:47:26 pm »
cheap per gigabyte, but expensive for access time - so all algorithms for processing large amounts of data should primarily look at ways to optimize the location of data in memory, e.g. AoS and SoA
But it can also be the case that the computation is more expensive (time consuming) per “byte” than the reading of that “byte” from external memory itself

As I said, depends on the access pattern. In your benchmark the most obvious optimization would be to use an array of Integer and ditch records completely, because you only ever access the int value field. But I'd argue that this hardly represents real world applications

Record alignment is best for random access to fields, which is a pretty good assumption in most high level code, even tho not always true
« Last Edit: February 05, 2025, 03:50:27 pm by Warfley »

BrunoK

  • Hero Member
  • *****
  • Posts: 766
  • Retired programmer
Re: What is more (memory) efficient
« Reply #12 on: February 05, 2025, 04:03:05 pm »
But the question is, what is more efficient in a 64 bit CPU, considering aligning and processing speed?
For speed, the unpacked record will be aligned therefore it will save the CPU having to align the data.

How does the FPC compiler handle this?
Depends, if the record is packed then the compiler "squeezes" all the fields so there are no unused bytes between them.  If it's not packed then the int64 and double cause 7 filler bytes to align those fields.

As far as "efficient", speed-wise the unpacked (i.e., aligned) record is more efficient but storage-wise it is not efficient. It's the exact opposite for the packed record (sort of "the other side of the coin".)

I don't know the details of the ARM architecture but, the vast majority of CPUs "prefer" the data types to be aligned on their natural boundaries.
All the above affirmations do not match the results of ALLIGATOR's benchmark on my W10 Intel system.

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 410
  • I use FPC [main] 💪🐯💪
Re: What is more (memory) efficient
« Reply #13 on: February 05, 2025, 04:18:50 pm »
Record alignment is best for random access to fields, which is a pretty good assumption in most high level code, even tho not always true

Ok, now with random access  :P

Code: Pascal  [Select][+][-]
  1. program test;
  2. {$mode delphi}
  3. {$optimization on}
  4.  
  5. uses SysUtils;
  6.  
  7. type
  8.   TValueType = (vtBoolean, vtInteger, vtReal);
  9.   PValueNotPacked = ^TValueNotPacked;
  10.   TValueNotPacked = record
  11.     case Typ: TValueType of
  12.       vtBoolean:  (BoolVal: Boolean);
  13.       vtInteger:  (IntVal: Int64);
  14.       vtReal:     (RealVal: Double);
  15.   end;
  16.   PValuePacked = ^TValuePacked;
  17.   TValuePacked = packed record
  18.     case Typ: TValueType of
  19.       vtBoolean:  (BoolVal: Boolean);
  20.       vtInteger:  (IntVal: Int64);
  21.       vtReal:     (RealVal: Double);
  22.   end;
  23.  
  24. const
  25.   size = 128*1024*1024;
  26.  
  27. function next(x: NativeUInt): NativeUInt; inline;
  28. begin
  29.   Result := (x * 1103515245 + 12345) mod size;
  30. end;
  31.  
  32.  
  33. procedure bench_notpacked1;
  34. var
  35.   arr_notpacked: array of TValueNotPacked;
  36.   i, s: NativeInt;
  37.   t: TDateTime;
  38. begin
  39.   Write(SizeOf(TValueNotPacked), ', ');
  40.   SetLength(arr_notpacked, size);
  41.   t:=Now;
  42.   begin
  43.     for i:=0 to size-1 do
  44.       s+=arr_notpacked[next(i)].IntVal;
  45.   end;
  46.   WriteLn('Not packed 1: ',(Now-t)*MSecsPerDay:0:0, ' ms.');
  47.   SetLength(arr_notpacked, 0);
  48. end;
  49.  
  50. procedure bench_packed1;
  51. var
  52.   arr_packed: array of TValuePacked;
  53.   i, s: NativeInt;
  54.   t: TDateTime;
  55. begin
  56.   Write(SizeOf(TValuePacked), ', ');
  57.   SetLength(arr_packed, size);
  58.   t:=Now;
  59.   begin
  60.     for i:=0 to size-1 do
  61.       s+=arr_packed[next(i)].IntVal;
  62.   end;
  63.   WriteLn('Packed 1: ',(Now-t)*MSecsPerDay:0:0, ' ms.');
  64.   SetLength(arr_packed, 0);
  65. end;
  66.  
  67. begin
  68.   bench_notpacked1;
  69.   bench_packed1;
  70.   ReadLn;
  71. end.
  72.  

Code: Pascal  [Select][+][-]
  1. 16, Not packed 1: 2783 ms.
  2. 9, Packed 1: 1619 ms.
  3.  

i7-8750H, laptop, ddr4
I may seem rude - please don't take it personally

440bx

  • Hero Member
  • *****
  • Posts: 6329
Re: What is more (memory) efficient
« Reply #14 on: February 05, 2025, 04:26:35 pm »
All the above affirmations do not match the results of ALLIGATOR's benchmark on my W10 Intel system.
I noticed that too.  Something is wrong with that program because the aligned structure should yield better times.

I'm guessing that the problem is in the memory allocation time consumed by SetLength. 

Anyway, since there is obviously something wrong with that program, I created my own which gives the expected results.  Here is the program for anyone interested to try it out:
Code: Pascal  [Select][+][-]
  1. {$APPTYPE CONSOLE}
  2.  
  3.  
  4. program _Alignment;
  5.  
  6. uses sysutils;
  7.  
  8. type
  9.   TValueType = (vtBoolean, vtInteger, vtReal);
  10.  
  11.   TValue1 =  record
  12.     case &Type: TValueType of
  13.       vtBoolean:  (BoolVal: Boolean);
  14.       vtInteger:  (IntVal : Int64);
  15.       vtReal:     (RealVal: Double);
  16.   end;
  17.  
  18.   TValue2 =  packed record
  19.     case &Type: TValueType of
  20.       vtBoolean:  (BoolVal: Boolean);
  21.       vtInteger:  (IntVal : Int64);
  22.       vtReal:     (RealVal: Double);
  23.   end;
  24.  
  25. type
  26.   RANGE       = 0..512 * 1024 - 1;
  27.  
  28.   REPETITIONS = 0..499;
  29.  
  30. var
  31.   ValueArray1 : array[RANGE] of TValue1;
  32.   ValueArray2 : array[RANGE] of TValue2;
  33.  
  34.  
  35. procedure bench_notpacked;
  36. var
  37.   r   : REPETITIONS;
  38.   i   : intptr;
  39.  
  40.   t   : TDateTime;
  41. begin
  42.   t := Now;
  43.  
  44.   for r in REPETITIONS do
  45.   for i in RANGE do
  46.   with ValueArray1[i] do
  47.   begin
  48.    &Type  := vtInteger;
  49.    IntVal := 0;
  50.   end;
  51.  
  52.   for r in REPETITIONS do
  53.   for i in RANGE do
  54.   begin
  55.     ValueArray1[i].IntVal += i;
  56.   end;
  57.  
  58.   writeln('NOT packed time elapsed: ', (Now - t) * msecsperday:0:0, ' ms');
  59. end;
  60.  
  61.  
  62. procedure bench_packed;
  63. var
  64.   r   : REPETITIONS;
  65.   i   : intptr;
  66.  
  67.   t   : TDateTime;
  68. begin
  69.   t := Now;
  70.  
  71.   for r in REPETITIONS do
  72.   for i in RANGE do
  73.   with ValueArray2[i] do
  74.   begin
  75.    &Type  := vtInteger;
  76.    IntVal := 0;
  77.   end;
  78.  
  79.   for r in REPETITIONS do
  80.   for i in RANGE do
  81.   begin
  82.     ValueArray2[i].IntVal += i;
  83.   end;
  84.  
  85.   writeln('    packed time elapsed: ', (Now - t) * msecsperday:0:0, ' ms');
  86. end;
  87.  
  88.  
  89.  
  90.  begin
  91.    writeln('sizeof TValueType: ', sizeof(TValueType):2);
  92.    writeln('sizeof TValue1   : ', sizeof(TValue1):2);
  93.    writeln('sizeof TValue2   : ', sizeof(TValue2):2);
  94.    writeln('sizeof ValueArray1: ', sizeof(ValueArray1));
  95.    writeln('sizeof ValueArray2: ', sizeof(ValueArray2));
  96.    writeln;
  97.  
  98.    bench_packed;
  99.    bench_notpacked;
  100.  
  101.    writeln;
  102.    writeln('Press ENTER/RETURN to end this program');
  103.    readln;
  104.  end.
  105.  
On my machine, the packed record is consistently about 10% slower.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

 

TinyPortal © 2005-2018