Lazarus

Free Pascal => General => Topic started by: Okoba on March 16, 2023, 04:25:22 pm

Title: Initialize for object
Post by: Okoba on March 16, 2023, 04:25:22 pm
Hello,

What does Initialize does for object? It seems nothing.
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. type
  4.   TTest = object
  5.     I: Integer;
  6.   end;
  7.  
  8.   procedure Test;
  9.   var
  10.     V: TTest;
  11.   begin
  12.     V.I := 10;
  13.     WriteLn(V.I);
  14.     Initialize(V);
  15.     WriteLn(V.I);
  16.   end;
  17.  
  18. begin
  19.   Test;
  20.   ReadLn;
  21. end.                            
Title: Re: Initialize for object
Post by: Nitorami on March 16, 2023, 06:14:30 pm
If you use virtual methods within your object then you'll need to declare and call a constructor init, otherwise the VMT (Virtual Method Table) won't be created and you'll get an AV. If and how the "initialize" is involved in that, I don't know. I did not even know that such a method exists for objects. I only use operator initialise for automatic initialisation of advanced records.
Title: Re: Initialize for object
Post by: PascalDragon on March 17, 2023, 03:42:10 pm
What does Initialize does for object? It seems nothing.

Title: Re: Initialize for object
Post by: Okoba on March 17, 2023, 07:43:25 pm
So why in sample code, it writes 10 even after Initialize? Shouldn't it set to zero? Like what it does for a record? Is there a way to Initialize the memory beside filling zero?
Title: Re: Initialize for object
Post by: Thaddy on March 17, 2023, 07:54:51 pm
Default() ?
If that fails it is a bug.
Title: Re: Initialize for object
Post by: Okoba on March 17, 2023, 08:11:19 pm
Default works. Thanks.
I still confused on why Initialize works like this: https://forum.lazarus.freepascal.org/index.php/topic,61795
After years of working with FPC, I still can not properly explain to someone what is Initialize and how it behaves. I should be a very bad at understanding it.
Title: Re: Initialize for object
Post by: Warfley on March 17, 2023, 08:23:30 pm
Initialize has only an effect on managed types, for unmanaged types it has no effect. Also you only need to call it when you are working with untyped memory (or memory whose type is different from the target type).
I must admit it is not quite clear from the doc (https://lazarus-ccr.sourceforge.io/docs/rtl/system/initialize.html):
Quote
Initialize is a compiler intrinsic: it initializes a memory area T for any kind of managed variable. Initializing means zeroeing out the memory area. In this sense it is close in functionality to Default, but Default requires an already initialized variable. It performs the opposite operation of finalize, which should be used to clean up the memory block when it is no longer needed.
Note that initialize is different from Default as Default can only be assigned to an initialized object

For each initialize you must also call a finalize. Example:
Code: Pascal  [Select][+][-]
  1. var
  2.   buff: Array[0..SizeOf(String) - 1] of Byte; // Unmanaged memory
  3.   str: PString; // Pointer to managed memory
  4. begin
  5.   str := @buff[0]; // Str points now to unmanaged memory
  6.   Initialize(str^); // because str points to unmanaged memory it must be manually initialized
  7.   ReadLn(str^);
  8.   WriteLn('Hello ', str^);
  9.   Finalize(str^); // The initialize must be matched by a finalize
  10. end;
You only need initialize if you know it is a managed type, or you are using generics which could be a managed type. But as a record or object could contain a managed type (or a type that contains a managed type), which makes itself a managed type by proxy, it's often better to be safe than sorry and to call it whenever you allocate untyped memory
Title: Re: Initialize for object
Post by: dsiders on March 17, 2023, 08:51:15 pm
Default works. Thanks.
I still confused on why Initialize works like this: https://forum.lazarus.freepascal.org/index.php/topic,61795
After years of working with FPC, I still can not properly explain to someone what is Initialize and how it behaves. I should be a very bad at understanding it.

Perhaps this will help: https://www.freepascal.org/docs-html/ref/refse20.html
Do you see Integer mentioned anywhere on the page?
Title: Re: Initialize for object
Post by: Okoba on March 17, 2023, 10:08:53 pm
Thank you both.
So Initialize is almost never needed, and to make an object go back to default is to use Default(). That's a little unfortunate as 1- Default makes a new object and copy to the destination, and that is slow. 2- Initialize operator for records makes someone like me to think that I need to call Initialize every time.
Now I think I learned from my mistakes and your kind help. Default() to go.
Title: Re: Initialize for object
Post by: Warfley on March 17, 2023, 10:12:08 pm
Maybe a real example on how to use initialize might be useful. Assuming you need a datastructure (for simplicety of this example a stack) for high performance usage, so you don't want to waste a lot of time in the memory allocator, you make use of the already existing virtual memory and paging functionality of the OS and simply preallocate a huge memory region beforehand and then operate on that.

Now if you use an Array with SetLength, SetLength will already implicetly call Initialize on all the elements of the Array. This is good for ease of use, but this means that all the memory will be touched and all your virtual memory and paging advantages go out the window. You want raw virtual memory, so you use GetMem instead:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   SysUtils;
  7.  
  8. type
  9.   generic TPreallocatedStack<T> = class
  10.   private type PT = ^T;
  11.   private
  12.     FData: PT;
  13.     FLength: SizeInt;
  14.     FSize: SizeInt;
  15.   public
  16.     constructor Create(const ASize: SizeInt);
  17.     destructor Destroy; override;
  18.  
  19.     procedure Push(constref AItem: T);
  20.     function Pop: T;
  21.   end;
  22.  
  23. constructor TPreallocatedStack.Create(const ASize: SizeInt);
  24. begin
  25.   inherited Create;
  26.   FData := GetMem(ASize * SizeOf(T));
  27.   FSize := ASize;
  28.   FLength := 0;
  29. end;
  30.  
  31. destructor TPreallocatedStack.Destroy;
  32. begin
  33.   Freemem(FData);
  34.   inherited Destroy;
  35. end;
  36.  
  37. procedure TPreallocatedStack.Push(constref AItem: T);
  38. begin
  39.   FData[FLength] := AItem;
  40.   Inc(FLength);
  41. end;
  42.  
  43. function TPreallocatedStack.Pop: T;
  44. begin
  45.   Dec(FLength);
  46.   Result := FData[FLength];
  47. end;
  48.  
  49. type
  50.   TIntStack = specialize TPreallocatedStack<Integer>;
  51.  
  52. var
  53.   Stack: TIntStack;
  54.   i: Integer;
  55. begin
  56.   Stack := TIntStack.Create(1024*1024*1024*10); // Allocates around 40 Gigabytes (10 gigs * 4 bytes per int)
  57.   try
  58.     for i:=0 to 10 do
  59.       Stack.Push(i);
  60.     for i := 0 to 3 do
  61.       WriteLn(Stack.Pop);
  62.   finally
  63.     Stack.Free;
  64.   end;
  65. end.
  66.  
This works fine and is really fast. If instead of using the raw getmem we would use SetLength, which initializes the memory, the program would hang up on the setlenth and at some point it would crash because my computer runs out of memory (I don't have 40 gigs of RAM).

So we have everything we want right? But the problem is now that if we are using Managed types, the Management Operators will not be called:
Code: Pascal  [Select][+][-]
  1.   TStringStack = specialize TPreallocatedStack<String>;
  2.  
  3. var
  4.   Stack: TStringStack;
  5.   i: Integer;
  6. begin
  7.   Stack := TStringStack.Create(1024*1024*1024); // Changed to only 4GB because of HeapTrc
  8.   try
  9.     for i:=0 to 10 do
  10.       Stack.Push(i.ToString);
  11.     for i := 0 to 3 do
  12.       WriteLn(Stack.Pop);
  13.   finally
  14.     Stack.Free;
  15.   end;
  16. end.
I'm now using heaptrc, and because heaptrc will fill the memory with $ff (a security feature), but because this touches all the memory, similarly how SetLength works, this of course completely removes the speed advantage and consumes all the virtual memory, so it is just for testing purposes. But as a consequence I needed to reduce the size of the memory allocation, because as with SetLength, otherwise it would crash my PC.

Now we get a segfault, because when heaptrc initializes the memory, it writes $ff into it. This results in invalid string values. If we mitigate this by nulling the data manually (by adding FillChar(FData^, ASize * SizeOf(T), 0) to the constructor), we get a bunch of memory leaks.

The reason for this is, that the memory was not initialized in the beginning and finalized in the end (well actually the fillchar above would be a correct initialization for String, but thats rather a coincidence). So to allow this datastructure to use managed types, it needs to call initialize and finalize. And not like SetLength over all the data, but only where it is needed.
The simplest solution is to just do that in push and pop, as well as the destructor (as the destructor removes all the remaining items):
Code: Pascal  [Select][+][-]
  1. destructor TPreallocatedStack.Destroy;
  2. begin
  3.   Finalize(FData^, FLength);
  4.   Freemem(FData);
  5.   inherited Destroy;
  6. end;
  7.  
  8. procedure TPreallocatedStack.Push(constref AItem: T);
  9. begin
  10.   Initialize(FData[FLength]);
  11.   FData[FLength] := AItem;
  12.   Inc(FLength);
  13. end;
  14.  
  15. function TPreallocatedStack.Pop: T;
  16. begin
  17.   Dec(FLength);
  18.   Result := FData[FLength];
  19.   Finalize(FData[FLength]);
  20. end;

Now if we run the same code again it works flawless, no segfaults, no memory leaks. And when we remove heaptrc, we can again increase the size to ridiculus amounts and it is still blazing fast.

Of course there can still be improvements made, for example when pushing and then popping and pushing again, its Initialize, finalize, initialize, this could be just one initialize, so another counter can be added to count what was already initialized, and not finalize in between.

Another thing is, managed types can add arbitrary complex code during copying, e.g. assume the following managed record:
Code: Pascal  [Select][+][-]
  1. class operator Copy(constref Source: TRec; var Dest: TRec);
  2. begin
  3.   Sleep(1000);
  4.   Dest.Data := Source.Data;
  5. end;
This would at every assignment of the record add a 1 second sleep, really pointless but possible. Now look at the pop method:
Code: Pascal  [Select][+][-]
  1. function TPreallocatedStack.Pop: T;
  2. begin
  3.   Dec(FLength);
  4.   Result := FData[FLength];
  5.   Finalize(FData[FLength]);
  6. end;
What is done here is an assignment, which is great and all, but would cause the 1 second sleep. What you can do with Initialize and Finalize is to implement a clear and Move:
Code: Pascal  [Select][+][-]
  1. function TPreallocatedStack.Pop: T;
  2. begin
  3.   Dec(FLength);
  4.   Finalize(Result); // Clears the result
  5.   Move(FData[FLength], Result, SizeOf(T)); // Move does not call Copy operator
  6.   FillChar(FData[FLength], SizeOf(T), 0); // Not necessary but for robustness
  7. end;
What this does is to clear the result variable, and then move the data from the Stack into result without calling the Copy operator. The clearing is necessary because otherwise overriding without a copy operator would break the validity assumptions of the managed type (e.g. for string the reference count would not be decreased).
After the move, the data in FData is invalid, and should not be used anymore, because the only valid version of the data (as without the copy operator no copy was created), is now in Result. To ensure this, FillChar is used, to write a different value into it, but it is technically not necessary.

Such, admittedly rather advanced programming techniques, allow to control managed behavior closely, and e.g. for performance reasons, avoid copies whenever possible (people familiar with C++ might recognize that this basically emulates std::move in C++).

This is not that much of a concern now (except when you really need that last bit performance for your strings), but assuming that Managed Records will some day work, you might have List type records, which when doing an assignment with := might copy large lists with gigabytes of data. E.g. assume a Hashset of lists (this is actually how I discovered it myself, when managed records came to trunk I built all kinds of datastructures with them and was wondering why everything was so slow), something quite common, if for every rehash all the lists would be elementwise copied, this might cause long freezes and huge amounts of memory consumption for all the copies.

So while it's right now just a bit of a curiosity, if (and when) managed records become popular, this might be necessary for ensuring compatibility with very complex copy mechanics
Title: Re: Initialize for object
Post by: Okoba on March 17, 2023, 10:29:20 pm
Warfley, that is a great example, thank you so much.
So two summarise what Initialize() does, it prepares a managed type to be used? Like creating a class (and its virtual structure)? For example for string it prepares the reference counting?
I ask to clarify as the documentation only says, ti zeros out the memory, so it should work like FillZero.
Title: Re: Initialize for object
Post by: Warfley on March 17, 2023, 11:08:24 pm
Warfley, that is a great example, thank you so much.
So two summarise what Initialize() does, it prepares a managed type to be used? Like creating a class (and its virtual structure)? For example for string it prepares the reference counting?
I ask to clarify as the documentation only says, ti zeros out the memory, so it should work like FillZero.
Yes, zeroing out the memory was correct when the only managed types where arrays, strings and interfaces, but with the new managed records, initialization can mean anything, as an example:
Code: Pascal  [Select][+][-]
  1. type
  2.   PManagedTest = ^TManagedTest;
  3.   TManagedTest = record
  4.     Initialized: Boolean;
  5.  
  6.     class operator Initialize(var Self: TManagedTest);
  7.   end;
  8.  
  9. class operator TManagedTest.Initialize(var Self: TManagedTest);
  10. begin
  11.   Self.Initialized := True;
  12. end;
  13.  
  14.  
  15. var
  16.   p: PManagedTest;
  17. begin
  18.   p := GetMem(SizeOf(p^));
  19.   FillChar(p^, SizeOf(p^), 0);
  20.   WriteLn(p^.Initialized); // False because 0 initialized
  21.   Initialize(p^);
  22.   WriteLn(p^.Initialized); // True because initialize operator is called
  23. end.  
Here initialized is explicetly set to true during initialization, so here you can see that the documentation is not up to date anymore.

Managed records are still a bit icky, for example Default zeroes fields, so setting p^:=Default(TManagedTest), will actually set p^.Initialized to false, meaning Default is not actually an initialized value (meaning that using Default you might be in an invalid state). Other notable things include that Generics.Collections.TCustomList does not use Finalize correctly either, in it's DoRemove it does the following:
Code: Pascal  [Select][+][-]
  1.   FItems[AIndex] := Default(T);
  2.   if AIndex <> FLength then
  3.   begin
  4.     System.Move(FItems[AIndex + 1], FItems[AIndex], (FLength - AIndex) * SizeOf(T));
  5.     FillChar(FItems[FLength], SizeOf(T), 0);
  6.   end;
Where FillChar with 0 is used, which with the type above would result in an unitialized value.

So yeah the new managed records aren't really thought through right now. So you are not alone with it :)
Title: Re: Initialize for object
Post by: Okoba on March 17, 2023, 11:13:37 pm
Oh now it clicked for me.
Thanks you again.
TinyPortal © 2005-2018