Recent

Author Topic: What is more (memory) efficient  (Read 4668 times)

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12200
  • Debugger - SynEdit - and more
    • wiki
Re: What is more (memory) efficient
« Reply #15 on: February 05, 2025, 04:28:25 pm »
my bench:

See my post at https://forum.lazarus.freepascal.org/index.php/topic,70056.msg545622.html#msg545622
(and a few posts later, how it actually applied to benchmarks in that thread)

Benchmarks like this often have "arbitrary" speed changes.

I have not checked (as far as it would be possible) if that is the case with the benchmark here.

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 404
  • I use FPC [main] 💪🐯💪
Re: What is more (memory) efficient
« Reply #16 on: February 05, 2025, 04:45:04 pm »
On my machine, the packed record is consistently about 10% slower.

{$mode objfpc}{$optimization on}
Code: Pascal  [Select][+][-]
  1. sizeof TValueType:  4
  2. sizeof TValue1   : 16
  3. sizeof TValue2   : 12
  4. sizeof ValueArray1: 8388608
  5. sizeof ValueArray2: 6291456
  6.  
  7.     packed time elapsed: 414 ms
  8. NOT packed time elapsed: 512 ms
  9.  
  10. Press ENTER/RETURN to end this program

{$mode delphi}{$optimization on}
Code: Pascal  [Select][+][-]
  1. sizeof TValueType:  1
  2. sizeof TValue1   : 16
  3. sizeof TValue2   :  9
  4. sizeof ValueArray1: 8388608
  5. sizeof ValueArray2: 4718592
  6.  
  7.     packed time elapsed: 347 ms
  8. NOT packed time elapsed: 487 ms
  9.  
  10. Press ENTER/RETURN to end this program
  11.  

Have you enabled optimization?
I did, and here's my result
« Last Edit: February 05, 2025, 04:46:51 pm by ALLIGATOR »
I may seem rude - please don't take it personally

BrunoK

  • Hero Member
  • *****
  • Posts: 766
  • Retired programmer
Re: What is more (memory) efficient
« Reply #17 on: February 05, 2025, 04:56:05 pm »
All the above affirmations do not match the results of ALLIGATOR's benchmark on my W10 Intel system.
I noticed that too.  Something is wrong with that program because the aligned structure should yield better times.

I'm guessing that the problem is in the memory allocation time consumed by SetLength. 

Anyway, since there is obviously something wrong with that program, I created my own which gives the expected results.  Here is the program for anyone interested to try it out:
...
On my machine, the packed record is consistently about 10% slower.
Try this :
Code: Pascal  [Select][+][-]
  1. program _Alignment_BK;
  2. {$APPTYPE CONSOLE}
  3.  
  4. uses
  5.   SysUtils;
  6.  
  7. type
  8.   TValueType = (vtBoolean, vtInteger, vtReal);
  9.  
  10.   TValue1 = record
  11.     case &Type: TValueType of
  12.       vtBoolean: (BoolVal: boolean);
  13.       vtInteger: (IntVal: int64);
  14.       vtReal: (RealVal: double);
  15.   end;
  16.  
  17.   {$PUSH}
  18.   {$Z1}
  19.   TValueType2 = (vt2Boolean, vt2Integer, vt2Real);
  20.   {$POP}
  21.  
  22.  
  23.   TValue2 = packed record
  24.     case &Type: TValueType2 of
  25.       vt2Boolean: (BoolVal: boolean);
  26.       vt2Integer: (IntVal: int64);
  27.       vt2Real: (RealVal: double);
  28.   end;
  29.  
  30. type
  31.   RANGE = 0..512 * 1024 - 1;
  32.  
  33.   REPETITIONS = 0..499;
  34.  
  35. var
  36.   ValueArray1: array[RANGE] of TValue1;
  37.   ValueArray2: array[RANGE] of TValue2;
  38.  
  39.  
  40.   procedure bench_notpacked;
  41.   var
  42.     r: REPETITIONS;
  43.     i: intptr;
  44.  
  45.     t: TDateTime;
  46.   begin
  47.     t := Now;
  48.  
  49.     for r in REPETITIONS do
  50.       for i in RANGE do
  51.         with ValueArray1[i] do
  52.         begin
  53.           &Type := vtInteger;
  54.           IntVal := 0;
  55.         end;
  56.  
  57.     for r in REPETITIONS do
  58.       for i in RANGE do
  59.       begin
  60.         ValueArray1[i].IntVal += i;
  61.       end;
  62.  
  63.     writeln('NOT packed time elapsed: ', (Now - t) * msecsperday: 0: 0, ' ms',
  64.       ' SizeOf(ValueArray1)=', SizeOf(ValueArray1));
  65.   end;
  66.  
  67.  
  68.   procedure bench_packed;
  69.   var
  70.     r: REPETITIONS;
  71.     i: intptr;
  72.  
  73.     t: TDateTime;
  74.   begin
  75.     t := Now;
  76.  
  77.     for r in REPETITIONS do
  78.       for i in RANGE do
  79.         with ValueArray2[i] do
  80.         begin
  81.           &Type := vt2Integer;
  82.           IntVal := 0;
  83.         end;
  84.  
  85.     for r in REPETITIONS do
  86.       for i in RANGE do
  87.       begin
  88.         ValueArray2[i].IntVal += i;
  89.       end;
  90.  
  91.     writeln('    packed time elapsed: ', (Now - t) * msecsperday: 0: 0, ' ms',
  92.       ' SizeOf(ValueArray2)=', SizeOf(ValueArray2));
  93.   end;
  94.  
  95.  
  96.  
  97. begin
  98.   writeln('sizeof TValueType: ', sizeof(TValueType): 2);
  99.   writeln('sizeof TValue1   : ', sizeof(TValue1): 2);
  100.   writeln('sizeof TValue2   : ', sizeof(TValue2): 2);
  101.   writeln('sizeof ValueArray1: ', sizeof(ValueArray1));
  102.   writeln('sizeof ValueArray2: ', sizeof(ValueArray2));
  103.   writeln;
  104.  
  105.   bench_notpacked;
  106.   bench_packed;
  107.  
  108.   writeln;
  109.   writeln('Press ENTER/RETURN to end this program');
  110.   readln;
  111. end.

On my machine, -O1 -OoREGVAR :

sizeof TValueType:  4
sizeof TValue1   : 16
sizeof TValue2   :  9
sizeof ValueArray1: 8388608
sizeof ValueArray2: 4718592

NOT packed time elapsed: 466 ms SizeOf(ValueArray1)=8388608
    packed time elapsed: 276 ms SizeOf(ValueArray2)=4718592

Press ENTER/RETURN to end this program



ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 404
  • I use FPC [main] 💪🐯💪
Re: What is more (memory) efficient
« Reply #18 on: February 05, 2025, 05:05:40 pm »
See my post at https://forum.lazarus.freepascal.org/index.php/topic,70056.msg545622.html#msg545622

Yeah, I saw your post back in that thread.
(thanks 👍, by the way)

Your main point is about code alignment, I understand I accept it

Yes, it does affect very much, but I want to consider the ideal case and discard variants with non-ideal code alignment in memory, exclude the part with code alignment (at the expense of any options to help the compiler, $codealign) and test only data handling
I may seem rude - please don't take it personally

440bx

  • Hero Member
  • *****
  • Posts: 6148
Re: What is more (memory) efficient
« Reply #19 on: February 05, 2025, 05:07:45 pm »
with Optimization level 1 and 4, the _unpacked_ is always faster than the packed (which is what it should be).  The same thing is true with optimization level 0.

with Optimization levels 2 and 3, packed yields faster results than unpacked.  That makes NO sense whatsoever because the majority (if not all) CPUs are optimized to access aligned data.

I don't know what FPC is doing but, it's the opposite of what happens with any other compiler (at least for optimization modes 2 and 3.)

Too weird.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 404
  • I use FPC [main] 💪🐯💪
Re: What is more (memory) efficient
« Reply #20 on: February 05, 2025, 05:19:51 pm »
I created my own which gives the expected results.  Here is the program for anyone interested to try it out:
...
On my machine, the packed record is consistently about 10% slower.

Code: Pascal  [Select][+][-]
  1. sizeof TValueType:  4
  2. sizeof TValue1   : 16
  3. sizeof TValue2   : 12
  4. sizeof ValueArray1: 8388608
  5. sizeof ValueArray2: 6291456
  6.  
  7.  
  8. O0:
  9.     packed time elapsed: 995 ms
  10. NOT packed time elapsed: 1056 ms
  11.  
  12. O1:
  13.     packed time elapsed: 993 ms
  14. NOT packed time elapsed: 1040 ms
  15.  
  16. O2:
  17.     packed time elapsed: 390 ms
  18. NOT packed time elapsed: 491 ms
  19.  
  20. O3:
  21.     packed time elapsed: 361 ms
  22. NOT packed time elapsed: 498 ms
  23.  
  24. O4:
  25.     packed time elapsed: 377 ms
  26. NOT packed time elapsed: 500 ms
  27.  
i7-8750H, laptop, ddr4
FPC[main]

What's your hardware like?
And the version of the compiler?
I may seem rude - please don't take it personally

BrunoK

  • Hero Member
  • *****
  • Posts: 766
  • Retired programmer
Re: What is more (memory) efficient
« Reply #21 on: February 05, 2025, 05:26:10 pm »
with Optimization level 1 and 4, the _unpacked_ is always faster than the packed (which is what it should be).  The same thing is true with optimization level 0.

with Optimization levels 2 and 3, packed yields faster results than unpacked.  That makes NO sense whatsoever because the majority (if not all) CPUs are optimized to access aligned data.

I don't know what FPC is doing but, it's the opposite of what happens with any other compiler (at least for optimization modes 2 and 3.)

Too weird.
You have to consider that there is nearly 2 times more processed structures in the processor cache when the records are contiguous and packed.

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 404
  • I use FPC [main] 💪🐯💪
Re: What is more (memory) efficient
« Reply #22 on: February 05, 2025, 05:31:05 pm »
@440bx

Could you please show me the results of this test?

https://forum.lazarus.freepascal.org/index.php/topic,70108.msg546082.html#msg546082
I may seem rude - please don't take it personally

440bx

  • Hero Member
  • *****
  • Posts: 6148
Re: What is more (memory) efficient
« Reply #23 on: February 05, 2025, 05:45:41 pm »
@ALLIGATOR,

I find your results truly baffling.

Anyway, I'm using FPC v3.2.2 (the stable release version) on Win7 on an i860 (a fairly old CPU but that doesn't matter in this case.)

These are the results I get from the program you linked to (program untouched, run as-is):

16, Not packed 1: 56895 ms.
9, Packed 1: 13330 ms.

results with O4:

16, Not packed 1: 15464 ms.
9, Packed 1: 13147 ms.

It's quite telling that with O4 the packed time is basically unchanged whereas the unpacked one is over almost 4 times faster.

There's something fishy in there.



@BrunoK,

I understand what you're saying about the packing and the cache space but, this is the first time _ever_ that I see unaligned data be processed faster than aligned data. 

I cannot argue the timing results, they are what they are but, I believe there is something "not quite right" someplace.  I just don't know where.

I believe that in the worst possible case, aligned data would take as long as unaligned data. 

IOW, FPC has got the world upside down.


ETA:

I wonder what the results from Delphi are...

It would be nice if someone with a copy of Delphi would run the programs and post the results.

« Last Edit: February 05, 2025, 05:51:16 pm by 440bx »
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 404
  • I use FPC [main] 💪🐯💪
Re: What is more (memory) efficient
« Reply #24 on: February 05, 2025, 06:06:38 pm »
These are the results I get from the program you linked to (program untouched, run as-is):
Thank you! 🙏

I wonder what the results from Delphi are...

It would be nice if someone with a copy of Delphi would run the programs and post the results.

Code: Pascal  [Select][+][-]
  1. D12CE_x64_Release_test_v1_sequential_access:
  2. 16, Not packed 1: 862 ms.
  3. 16, Not packed 2: 927 ms.
  4. 9, Packed 1: 813 ms.
  5. 9, Packed 2: 849 ms.
  6.  
  7. D12CE_x64_Release_test_v2_random_access:
  8. 16, Not packed 1: 2854 ms.
  9. 9, Packed 1: 1626 ms.

Attached the executables, (if you're not afraid  ::))
I may seem rude - please don't take it personally

BildatBoffin

  • New Member
  • *
  • Posts: 47
Re: What is more (memory) efficient
« Reply #25 on: February 05, 2025, 06:09:34 pm »
The packed version requires an LEA instruction to access array elem, while the non-packed can just access the base ptr + some offset. See https://godbolt.org/z/x1M9s3ndY. So packed here saves the heap space, not the cpu time.

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 404
  • I use FPC [main] 💪🐯💪
Re: What is more (memory) efficient
« Reply #26 on: February 05, 2025, 06:25:50 pm »
The packed version requires an LEA instruction to access array elem, while the non-packed can just access the base ptr + some offset. See https://godbolt.org/z/x1M9s3ndY. So packed here saves the heap space, not the cpu time.

And you run a benchmark and see  :D

Besides, what's so scary about the LEA instructions ?  :D

difference:

Packed:
Code: ASM  [Select][+][-]
  1. leaq    (%rdx,%rdx,8),%rcx
  2.  

Not Packed:
Code: ASM  [Select][+][-]
  1. movq    %rdx,%rcx
  2. shlq    $4,%rcx
  3.  

lea will be just as fast as mov+shl
I may seem rude - please don't take it personally

440bx

  • Hero Member
  • *****
  • Posts: 6148
Re: What is more (memory) efficient
« Reply #27 on: February 05, 2025, 06:30:23 pm »
@ALLIGATOR,

Thank you for trying it in Delphi.  I appreciate that.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Zoran

  • Hero Member
  • *****
  • Posts: 1988
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: What is more (memory) efficient
« Reply #28 on: February 05, 2025, 06:47:57 pm »
Using this example:

Code: Pascal  [Select][+][-]
  1.   TValueType = (vtBoolean, vtInteger, vtReal);
  2.   TValue =  record
  3.     case Typ: TValueType of
  4.       vtBoolean:  (BoolVal: Boolean);
  5.       vtInteger:  (IntVal: Int64);
  6.       vtReal:     (RealVal: Double);
  7.   end;
  8.  
  9.   TValueArray = array [1..10] of TValue;
  10.  

Strange, on my machine it's 12 bytes and 120 bytes
FPC[main] Win x64

And I have a question - where did 4 of those 12 bytes go? %)  I've never used union tightly, and I always thought that its size is equal to the largest in union

In D12CE it's 9 and 90.

UPD: Yep, after I switched to {$mode delphi} - I got 9 and 90 too
but why 9 ? one byte occupies a type ?
UPD2: Yep, that's exactly what it is.


Just note that you probably don't need the "case selector" variable. If you use:

Code: Pascal  [Select][+][-]
  1. type
  2.   TValueType = (vtBoolean, vtInteger, vtReal);
  3.   TValue = record
  4.     case TValueType of  // note - variable Typ removed
  5.       vtBoolean:  (BoolVal: Boolean);
  6.       vtInteger:  (IntVal: Int64);
  7.       vtReal:     (RealVal: Double);
  8.   end;
  9.  

Then the size I get (x86_64 and i386) is 8 bytes.
Swan, ZX Spectrum emulator https://github.com/zoran-vucenovic/swan

BrunoK

  • Hero Member
  • *****
  • Posts: 766
  • Retired programmer
Re: What is more (memory) efficient
« Reply #29 on: February 05, 2025, 07:06:19 pm »

Just note that you probably don't need the "case selector" variable. If you use:
...
Then the size I get (x86_64 and i386) is 8 bytes.
Didn't know that the case selector field could be stored this way. Might be quite useful for variable records mapping; learned something new today.

 

TinyPortal © 2005-2018