Recent

Author Topic: Precautions related to absence of a garbage collector  (Read 4904 times)

rnfpc

  • Full Member
  • ***
  • Posts: 101
Precautions related to absence of a garbage collector
« on: May 05, 2023, 04:15:05 pm »
Unlike many popular languages today, Pascal/Lazarus is not a garbage-collected language.

For persons well versed in languages like Python or Java, which are garbage-collected, can coding in Pascal lead to unsuspected errors?

What precautions should be taken while coding in Pascal in view of the fact that Pascal is not a garbage collected language? Thanks for your insight.
« Last Edit: May 05, 2023, 04:52:25 pm by rnfpc »

Blaazen

  • Hero Member
  • *****
  • Posts: 3241
  • POKE 54296,15
    • Eye-Candy Controls
Re: Precautions related to absence of a garbage collector
« Reply #1 on: May 05, 2023, 04:59:18 pm »
Precautions:
1) Carefully destroy everything you created
2) Use heaptrace during development, it will show you memory leaks

Note: some types are managed (strings). It makes it easier.
Lazarus 2.3.0 (rev main-2_3-2863...) FPC 3.3.1 x86_64-linux-qt Chakra, Qt 4.8.7/5.13.2, Plasma 5.17.3
Lazarus 1.8.2 r57369 FPC 3.0.4 i386-win32-win32/win64 Wine 3.21

Try Eye-Candy Controls: https://sourceforge.net/projects/eccontrols/files/

Handoko

  • Hero Member
  • *****
  • Posts: 5396
  • My goal: build my own game engine using Lazarus
Re: Precautions related to absence of a garbage collector
« Reply #2 on: May 05, 2023, 05:54:07 pm »
  • Always free the object you created, but you should know some objects will be freed automatically by its owner.
  • Use try-finally block wisely.
    Read more: https://blog.marcocantu.com/blog/2018-january-multiple-try-finally-delphi.html

    If you're not a fan of using exception and try-finally blocks, make sure you test all the inputs and possible conditions before create the object and test and free it immediately if something goes wrong. For example you can avoid exception to happen by using TryStrToInt, and you can read the OS/hardware error using IOResult.
  • Make sure the flow runs correctly. Pay attention on these flow control statements: break, exit, goto, which may cause the flow to miss the line to free the object.
« Last Edit: May 05, 2023, 06:27:02 pm by Handoko »

Warfley

  • Hero Member
  • *****
  • Posts: 1869
Re: Precautions related to absence of a garbage collector
« Reply #3 on: May 05, 2023, 06:28:28 pm »
I'd be careful with the term garbage collector, bacause not all automated memory management is using garbage collection, e.g. a popular alternative is reference counting. But no matter if it's GC or Ref Counting more generally most modern languages have some form of automated memory management. Including Pascal.

Pascal does have automated memory management for three types (of types): Strings, Arrays and COM Interfaces (not counting managed records as they are not working correctly right now). For example:
Code: Pascal  [Select][+][-]
  1. procedure WriteAsString(data: Integer);
  2. var
  3.   s: String;
  4. begin
  5.   s := IntToStr(data); // Creates a new string, includes allocating memory for it
  6.   WriteLn(s);
  7. end; // The memory used by s will automatically be freed because strings are managed types

But you are right that for other types, namely classes and typed and untyped pointers this is not the case.

So here is some advice on how to deal with that. Always think in terms of ownership. Everytime you allocate memory there must be an "owner", some piece of code that is responsible for that memory. You then must make sure that that owner always frees the memory. Also important is that you must make sure that whenever the owner "dies" no one else has access to that memory anymore.

The easiest case is when you create a temporary object, e.g. a TStringList to read a config file:
Code: Pascal  [Select][+][-]
  1. procedure ReadConfigFile(const FileName: String);
  2. var
  3.   sl: TStringList;
  4. begin
  5.   sl := TStringList.Create;
  6.   try
  7.     sl.LoadFromFile(FileName);
  8.     // Do something with the data in SL
  9.     ConfigValue1 := sl.Values['Config1'];
  10.     ConfigValue2 := sl.Values['Config2'];
  11.   finally
  12.     sl.Free;
  13.   end;
  14. end;
In this case the owner of that SL is the function ReadConfigFile. It is created by that function, and in the end freed by that function. The try-finally ensures that this will always happen, no matter if you have an early break, exception or whatever in your code.

The important part here is that sl cannot leave this function. after the finally block sl should not be used anymore. This also includes passing SL to some other object, this is fine as long as this other object can only use SL while the function is still active. Example:
Code: Pascal  [Select][+][-]
  1. function EncodeBase64(data: Array of Byte): String;
  2. var
  3.   ss: TStringStream;
  4.   enc: TBase64EncodingStream;
  5.   b: Byte;
  6. begin
  7.   ss := TStringStream.Create;
  8.   try
  9.     enc := TBase64EncodingStream.Create(ss);
  10.     try
  11.       for b in data do
  12.         enc.WriteByte(b);
  13.       enc.Flush;
  14.       ss.Seek(0, soFromBeginning);
  15.       Result := ss.DataString;
  16.     finally
  17.       enc.Free;
  18.     end;
  19.   finally
  20.     ss.Free;
  21.   end;
  22. end;
Here this function owns two objects, the enc and ss. Again both are ensured to be freed through try-finally. But enc uses ss, so basically what you are doing here is "borrowing" the ss, which is owned by the function to the enc object to be used. It is important to ensure that enc is not using ss after it is freed. The only way to ensure this is to make sure that SS is not freed before enc is.

So here it is important for your owner to make sure that if you "borrow" your memory to another object, that object does not have access to the borrowed memory after it is freed. Most of the time this means either freeing the object that borrows the memory before freeing the borrowed memory, in other cases this may mean to revoke access to said object:
Code: Pascal  [Select][+][-]
  1. var
  2.   GlobalLogger: TStream;
  3.  
  4. procedure LogString(str: String);
  5. begin
  6.   if Assigned(GlobalLogger) then // If logger is registered use it
  7.   begin
  8.     GlobalLogger.Write(str[1], Length(str));
  9.     GlobalLogger.WriteByte(10); // Newline
  10.   end
  11.   else // otherwise use writeln
  12.     WriteLn(str);
  13. end;
  14.  
  15. procedure FileLoggerTest;
  16. var
  17.   fs: TFileStream;
  18. begin
  19.   fs := TFileStream.Create('test.log');
  20.   try
  21.     // Set global logger to "borrow" fs
  22.     GlobalLogger := fs;
  23.     LogString('Test log file');
  24.   finally
  25.     // ensure global logger is reset to revoke access before freeing fs
  26.     GlobalLogger := nil;
  27.     fs.Free;
  28.   end;
  29.   LogString('Test log WriteLn');
  30. end;
Here you borrow the filestream owned by the function to a global variable, which is used by another function. So when the filestream is freed, to ensure it cannot be used by the global variable anymore, it must be overriden.

Thats for local objects, this is generally quite easy because here the owner is the creating function, and can usually be "solved" with just 2 rules of thumb: 1. Always use try-finally after the creation to free (so as soon as you write a Create, write the try-finally-free afterwads) and 2. make sure that the object is not used outside the lifetime of that function. To do the second (which is easier said than done), just keep in mind who borrows the object and check that the lifetimes of those borrows are shorter than the lifetime of the owner and you are fine.

Now to the more complex situation, when the owner is not the creator, or ownership is transferred. The easiest example of this is a list:
Code: Pascal  [Select][+][-]
  1. uses
  2.   classes,
  3.   Generics.Collections;
  4.  
  5. type
  6.   TStringListList = specialize TObjectList<TStringList>;
  7.  
  8. var
  9.   list: TStringListList;
  10. begin
  11.   list := TStringListList.Create(True); // The parameter AOwnsObjects is True so now the list owns the object
  12.   try
  13.     list.Add(TStringList.Create); // creates new object that is owned by list
  14.   finally
  15.     list.Free; // TObjectList with AOwnsObjects will Free all objects it owns when freed
  16.   end;
  17. end.
Here TObjectList is the owner of the StringList, meaning even though it is created by the code block of the program, it does not need to be freed there, because the ownership is transfered to the list via the .Add.
You can also take back ownership of an object from a list:
Code: Pascal  [Select][+][-]
  1. type
  2.   TStringListList = specialize TObjectList<TStringList>;
  3.  
  4. var
  5.   list: TStringListList;
  6.   sl: TStringList;
  7. begin
  8.   list := TStringListList.Create(True); // The parameter AOwnsObjects is True so now the list owns the object
  9.   try
  10.     list.Add(TStringList.Create); // creates new object that is owned by list
  11.     sl := list.Extract(0); // Now the list does not own the StringList anymore
  12.     try
  13.       // ...
  14.     finally
  15.       sl.Free; // Because we taken back ownership, we must free it here
  16.     end;
  17.   finally
  18.     list.Free; // TObjectList with AOwnsObjects will Free all objects it owns when freed
  19.   end;
  20. end.

A usual use-case for that is in GUI development, every component has the Owner property. So when you create a Button and set it's owner to be the form it's located on, when the Form is freed, it will also be freed.

So the next important rule of thumb is: always know when some other entity claims ownership of an object. There must always be exactly one owner. When you put your elements into a TObjectList, that list becomes the owner until you take the object away (with Extract). If you put a component on a Form, that form becomes the owner of that component.

It is very important to not have multiple owners. E.g. if you put the same object into two object lists, it will crash:
Code: Pascal  [Select][+][-]
  1. type
  2.   TStringListList = specialize TObjectList<TStringList>;
  3.  
  4. var
  5.   l1 ,l2: TStringListList;
  6.   sl: TStringList;
  7. begin
  8.   l1 := TStringListList.Create(True);
  9.   try
  10.     l2 := TStringListList.Create(True);
  11.     try
  12.       sl := TStringList.Create;
  13.       l1.Add(sl); // Transfers ownership to l1
  14.       l2.Add(sl); // Transfers ownership to l2
  15.     finally
  16.       l2.Free; // L2 thinks it owns sl so it frees it
  17.     end;
  18.   finally
  19.     l1.Free; // l1 thinkts it owns sl so it tries to free it
  20.     // But sl was already freed by l2 -> error
  21.   end;
  22. end.

Following these simple rules will get you through 80%-90% of your programming just fine, just always remember to check that there is always exactly 1 owner, when giving away the object, check if it's a borrow, if so make sure that the borrow does not live longer than the owner, and if it's not a borrow and you transfer ownership, make sure that you don't end up with two owners because you forgott that you already transfered ownership.

Because this post is already getting ungodly long, I will write how to handle the remaining 10%-20% of cases (the "hard" cases) in another post. But this should be fine for most use cases.

PS: One big problem with failing to manage your memory properly is that it may cause bugs that are unpredictable and do not cause crashes:
Code: Pascal  [Select][+][-]
  1. var
  2.   sl1, sl2: TStringList;
  3. begin
  4.   sl1 := TStringList.Create;
  5.   try
  6.     sl1.Text := 'Hello';
  7.   finally
  8.     sl1.Free;
  9.   end;
  10.   sl2 := TStringList.Create;
  11.   try
  12.     sl2.Text := 'World';
  13.  
  14.     WriteLn(sl1.Text); // sl1 is already freed, so this is an error
  15.   finally
  16.     sl2.Free;
  17.   end;
  18. end.
Even though this is a clear use-after-free error, where you use an object after it's lifetime was ended by it's owner (through try-finally), it does not crash or report any error, but rather prints out "World". The reason: sl2 happend to get the same memory allocated that was just freed by sl1.

And this is the really tricky thing about memory management. Errors in memory management may not lead to any easy to spot crashes, but may just cause some weird unexplainable behavior, which makes them much harder to track down.

For this there are tools like Heaptrc or Valgrind, which can help you spot these errors, but 1. they are not foolproof either, and 2. they only show you the errors when you encounter them. Both of those tools are not useful for production code, so if you don't spot the error during testing, and you roll out your software to the users/customers, they may encounter this weird behavior and it's nearly impossible for you to track it down afterwards.

So always be aware, it's trickier than it might seem.

Warfley

  • Hero Member
  • *****
  • Posts: 1869
Re: Precautions related to absence of a garbage collector
« Reply #4 on: May 05, 2023, 06:31:13 pm »
    If you're not a fan of using exception and try-finally blocks, make sure you test all the inputs and possible conditions before create the object and test and free it immediately if something goes wrong. For example you can avoid exception to happen by using TryStrToInt, and you can read the OS/hardware error using IOResult.[/li]
    [/list]
    • Make sure the flow runs correctly. Pay attention on these flow control statements: break, exit, goto, which may cause the flow to miss the line to free the object.
    Even if you don't like exceptions, you should still always use try-finally, because try-finally is not just for exceptions. It also takes care of break, continue and exit, and also prevents you from gotoing out of the try-finally to ensure that finally will always be called.
    So if you use exceptions or not, always use try-finally. This way you don't have to think about control flow jumps at all

    Warfley

    • Hero Member
    • *****
    • Posts: 1869
    Re: Precautions related to absence of a garbage collector
    « Reply #5 on: May 05, 2023, 07:53:16 pm »
    Because this post is already getting ungodly long, I will write how to handle the remaining 10%-20% of cases (the "hard" cases) in another post. But this should be fine for most use cases.
    So let's talk about the hard cases then, which is when there is no apperant owner. As I previously said, when you have a tree structure, determining ownership is easy, each parent owns it's child nodes. For example let's take a simple expression tree:
    Code: Pascal  [Select][+][-]
    1. type
    2.   PTreeNode = ^TTreeNode;
    3.   TTreeNode = record
    4.     Value: String;
    5.     Children: Array of PTreeNode;
    6.   end;
    7.  
    8. function AddNode(const Value: String; Parent: PTreeNode = Nil): PTreeNode;
    9. begin
    10.   New(Result);  
    11.   Result^.Value := Value;
    12.   if Assigned(Parent) then
    13.     Parent^.Children += [Result];
    14. end;
    15.  
    16. procedure FreeNode(ANode: PTreeNode);
    17. var
    18.   Child: PTreeNode;
    19. begin
    20.   for Child in ANode^.Children do
    21.     FreeNode(child);
    22.   Dispose(ANode);
    23. end;
    24.  
    25. var
    26.   FivePlusThree: PTreeNode;
    27. begin
    28.   FivePlusThree := AddNode('+');
    29.   try
    30.     AddNode('5', FivePlusThree);
    31.     AddNode('3', FivePlusThree);
    32.     (* Tree:
    33.       '+'
    34.       / \
    35.     '5' '3'
    36.     *)
    37.   finally
    38.     FreeNode(FivePlusThree);
    39.   end;
    40. end.
    Here FreeNode of the root will free the whole tree, as the parent frees it's children. So far so easy. But what if you allow to the same subtrees internally:
    Code: Pascal  [Select][+][-]
    1. function NodeWithChildren(const Value: String; Children: Array of PTreeNode): PTreeNode;  
    2. var
    3.   Child: PTreeNode;
    4. begin
    5.   New(Result);
    6.   Result^.Value:=Value;
    7.   for Child in Children do
    8.     Result^.Children += [Child];
    9. end;  
    10.  
    11. var
    12.   FivePlusThree: PTreeNode;
    13.   AddSameSubtree: PTreeNode;
    14. begin
    15.   FivePlusThree := AddNode('+');
    16.   AddNode('5', FivePlusThree);
    17.   AddNode('3', FivePlusThree);
    18.   (* Tree:
    19.     '+'
    20.     / \
    21.   '5' '3'
    22.   *)
    23.   AddSameSubtree := NodeWithChildren('*', [FivePlusThree, FivePlusThree]);
    24.   (* Tree:
    25.     '*'
    26.     / \
    27.   5+3<-
    28.   *)
    29.   try
    30.     // Do something with tree
    31.   finally
    32.     FreeNode(AddSameSubtree); // with the FreeNode above this causes a crash because FivePlusThree would be freed twice
    33.   end;
    Now both of the two subtrees are the same node. This may seem contrived in this example code, but this is actually a simplified version of a real program I have worked on, where we had expressions which could have been recursive (i.e. a subtree pointing to itself, like the infinite series 1+1+1+1...). So it was impossible to make a strict each child has one parent relationship. So there was no clear owner.

    Basically here you have a few options. The easiest one, which I went with back then (due to performance reasons), is even though there is no "natural" ownership here, it can be artificially added a seperate owner, a graph structure, which would free all nodes together once I was done using the graph:
    Code: Pascal  [Select][+][-]
    1. type
    2.   PTreeNode = ^TTreeNode;
    3.   TTreeNode = record
    4.     Value: String;
    5.     Children: Array of PTreeNode;
    6.   end;
    7.  
    8.   TGraph = Array of PTreeNode;
    9.  
    10. function AddNode(AGraph: TGraph; const Value: String; Parent: PTreeNode = Nil): PTreeNode;
    11. begin
    12.   New(Result);
    13.   Result^.Value := Value;
    14.   if Assigned(Parent) then
    15.     Parent^.Children += [Result];
    16.   AGraph += [Result];
    17. end;
    18.  
    19. function NodeWithChildren(AGraph: TGraph; const Value: String; Children: Array of PTreeNode): PTreeNode;
    20. var
    21.   Child: PTreeNode;
    22. begin
    23.   New(Result);
    24.   Result^.Value:=Value;
    25.   for Child in Children do
    26.     Result^.Children += [Child];
    27.   AGraph += [Result];
    28. end;
    29.  
    30. procedure FreeGraph(var AGraph: TGraph);
    31. var
    32.   Node: PTreeNode;
    33. begin
    34.   for Node in AGraph do
    35.     Dispose(Node);
    36.   AGraph := nil;
    37. end;
    38.  
    39. var
    40.   Graph: TGraph;
    41.   FivePlusThree: PTreeNode;
    42.   AddSameSubtree: PTreeNode;
    43. begin
    44.   Graph := [];
    45.   try
    46.     FivePlusThree := AddNode(Graph, '+'); // Will transfer ownership to graph
    47.  
    48.     AddNode(Graph, '5', FivePlusThree); // Will transfer ownership to graph
    49.     AddNode(Graph, '3', FivePlusThree); // Will transfer ownership to graph
    50.     (* Tree:
    51.       '+'
    52.       / \
    53.     '5' '3'
    54.     *)
    55.     AddSameSubtree := NodeWithChildren(Graph, '*', [FivePlusThree, FivePlusThree]); // Will transfer ownership to graph
    56.     (* Tree:
    57.       '*'
    58.       / \
    59.     5+3<-
    60.     *)
    61.  
    62.   finally
    63.     FreeGraph(Graph); // Frees all nodes
    64.   end;
    65. end.

    Now rather than having each parent have their subtree as owner, there is one graph which owns all the nodes. This means all the nodes are deleted at once, so no node can be used while it's children might already have been freed.

    This is a very simple solution, which I chose back then because it is easy to implement, and fast, and I only construct the trees once, compute them and then delete them as a whole.

    But you might be in the situation where you want to manage the nodes individually, so you can replace or delete subtrees without deleting the whole graph. In that case, there simply is no owner, each subtree can have multiple owners, and should be freed only once all the owners don't use it anymore. In that case the easiest solution here is to simply count how many parents a node has, and once all parents have been freed, free the child.

    This is exactly the reference counting which I have already talked about in my first post. To do this you would give each node a reference count and when it hits 0 you free it:
    Code: Pascal  [Select][+][-]
    1.   PTreeNode = ^TTreeNode;
    2.   TTreeNode = record
    3.     RefCount: Integer;
    4.     Value: String;
    5.     Children: Array of PTreeNode;
    6.   end;
    7.  
    8. function AddNode(const Value: String; Parent: PTreeNode = Nil): PTreeNode;
    9. begin
    10.   New(Result);
    11.   Result^.RefCount := 0;
    12.   Result^.Value := Value;
    13.   if Assigned(Parent) then
    14.   begin
    15.     Inc(Result^.RefCount); // Has parent -> Refcount + 1
    16.     Parent^.Children += [Result];
    17.   end;
    18. end;
    19.  
    20. procedure FreeNode(ANode: PTreeNode);
    21. var
    22.   Child: PTreeNode;
    23. begin
    24.   Dec(ANode^.RefCount);
    25.   if ANode^.RefCount > 0 then
    26.     Exit; // Don't free as long as we still have references
    27.   for Child in ANode^.Children do
    28.     FreeNode(child);
    29.   Dispose(ANode);
    30. end;
    31.  
    32. function NodeWithChildren(const Value: String; Children: Array of PTreeNode): PTreeNode;  
    33. var
    34.   Child: PTreeNode;
    35. begin
    36.   New(Result);
    37.   Result^.Value:=Value;
    38.   for Child in Children do
    39.   begin
    40.     Inc(Child^.RefCount); // Increment refcount for new parent
    41.     Result^.Children += [Child];
    42.   end;
    43. end;  

    Now when FreeNode is called on the first time it encounters FivePlusThree it will decrement the Reference count, which is then 1, so it won't be freed. When it is then called a second time for the second reference on FivePlusThree, it will be decremented further to 0 and causing the Free.

    But doing this manually is actually not that easy. As you can see, even in this little example there are already two places AddNode and NodeWithChildren which have to increase the reference counter. If I am unaware and forget it for just one of these functions, or someone new to the codebase writes their own code and doesn't know about the reference counting or misses it, then this can very easiely go very wrong.
    But there is a solution to that, because as I said in my first post, Pascal already has reference counting for some datatypes, and you can simply use that. When using interfaces you get reference counting for free:
    Code: Pascal  [Select][+][-]
    1. type
    2.   ITreeNode = Interface
    3.   procedure SetValue(const AValue: String);
    4.   procedure AddChild(const AChild: ITreeNode);
    5.   end;
    6.  
    7.   TTreeNode = class(TInterfacedObject, ITreeNode)
    8.   public
    9.     Value: String;
    10.     Children: Array of ITreeNode;
    11.  
    12.     procedure SetValue(const AValue: String);
    13.     procedure AddChild(const AChild: ITreeNode);
    14.   end;
    15.  
    16.  
    17. procedure TTreeNode.SetValue(const AValue: String);
    18. begin
    19.   Value := AValue;
    20. end;
    21.  
    22. procedure TTreeNode.AddChild(const AChild: ITreeNode);
    23. begin
    24.   Children += [AChild];
    25. end;
    26.  
    27. function AddNode(const Value: String; Parent: ITreeNode = Nil): ITreeNode;
    28. begin
    29.   Result := TTreeNode.Create;
    30.   Result.SetValue(Value);
    31.   if Assigned(Parent) then
    32.     Parent.AddChild(Result);
    33. end;
    34.  
    35. function NodeWithChildren(const Value: String; Children: Array of ITreeNode): ITreeNode;
    36. var
    37.   Child: ITreeNode;
    38. begin
    39.   Result := TTreeNode.Create;
    40.   Result.SetValue(Value);
    41.   for Child in Children do
    42.     Result.AddChild(Child);
    43. end;
    44.  
    45. var
    46.   FivePlusThree: ITreeNode;
    47.   AddSameSubtree: ITreeNode;
    48.  
    49. begin
    50.   FivePlusThree := AddNode('+');
    51.  
    52.   AddNode('5', FivePlusThree);
    53.   AddNode('3', FivePlusThree);
    54.   (* Tree:
    55.     '+'
    56.     / \
    57.   '5' '3'
    58.   *)
    59.   AddSameSubtree := NodeWithChildren('*', [FivePlusThree, FivePlusThree]);
    60.   (* Tree:
    61.     '*'
    62.     / \
    63.   5+3<-
    64.   *)
    65.   AddSameSubtree := nil; // No more references -> AddSameSubtree will be freed
    66.                          // This will also reduce the Refcoutn of FivePlusThree by 2
    67.   FivePlusThree := nil;  // Remove last reference to Five PlusThree  
    As you can see using interfaces adds a bit of code overhead to manage that interface, but what you get in return is full and complete automated memory management, as if you are using a garbage collected language like C# or Python or whatever.

    I have done this exact trick for my STAX library for the generators (https://github.com/Warfley/STAX/blob/master/src/stax.generators.pas). One generator may be shared between multiple tasks, and should survive until all tasks that used it finished. So I simply made the IGenerator interface to use reference counting and it works like a charm.
    Sadly the overhead of using an Interface is only worth when you are writing a class from scratch. For all the pre existing classes like TStringList, etc. you need to work around the fact that they are not interfaces (I actually thought about writing a small code generation script which would go through the whole LCL, RTL and FCL documentation and create Interfaces for all the standard classes, to allow fully reference counted programming. Maybe I will pick that up some time in the future).

    So to summarize: Always make sure you have exactly one owner. For local objects it's often the function that creates the object with a Try-Finally block, for tree structures, it's a simple Parent-owns-Child relationship. You can borrow objects but then this borrow cannot live longer than the owner, or you can transfer ownership.
    If there is no clear ownership structure, e.g. in a non Tree Graph structure, then you can either try to artificially add an external owner, or if there simply is no sensible owner to be found, don't use manual memory management at all, but use automated memory management as you know it from other languages. Don't make your life extra hard, if the problem of memory management is to complicated, just let the compiler do it for you automatically with Interfaces.
    « Last Edit: May 05, 2023, 07:55:26 pm by Warfley »

    rnfpc

    • Full Member
    • ***
    • Posts: 101
    Re: Precautions related to absence of a garbage collector
    « Reply #6 on: May 05, 2023, 07:58:24 pm »
    In the context of this topic it may be interesting to comment on how the relatively new language Rust works.  Rust is said to be safer even though it does not use garbage collection.  It appeared to me (I am no expert) that the language itself is like C++ but its compiler checks all aspects of ownership, borrowing etc and refuses to compile if there is any laxity in the code.  Similar strict semantics can also be applied to other languages including Pascal to make them safer.

    Warfley

    • Hero Member
    • *****
    • Posts: 1869
    Re: Precautions related to absence of a garbage collector
    « Reply #7 on: May 05, 2023, 08:19:17 pm »
    In the context of this topic it may be interesting to comment on how the relatively new language Rust works.  Rust is said to be safer even though it does not use garbage collection.  It appeared to me (I am no expert) that the language itself is like C++ but its compiler checks all aspects of ownership, borrowing etc and refuses to compile if there is any laxity in the code.  Similar strict semantics can also be applied to other languages including Pascal to make them safer.
    Yes exactly, but there is a bit more to it. To pull a bit from theory, for a turing complete language like Pascal any non trivial semantic statement is undecidable (rices theorem), meaning you cannot prove any non trivial semantic features of a piece of code. Such a semantic statement may be: Does this code produce a certain error, does this code loop forever or in fact, does this code violate ownership and lifetime relationships. So it's not just a compilerswitch you can turn on for any language.
    How rust circumvents this annoying mathematical impossibility (not just impractical, provably impossible), is to basically try to reason about the code as good as possible. It can only prove this sort of stuff for certain structures, but it may very well be that you write code that does adhere to these ownership relationships, and as a human looking into it you can clearly see that, but the compiler is simply not possible to prove it automatically.

    To minimize this problem as much as possible, rust has two solutions, first the whole language is built around that concept. Basically the more information the compiler has about the ownership of an object the more it can reason about. For example in rust you can borrow as read only or as writer (i.e const or mutable), with const being the default. This makes it much easier for the compiler to argue who is touching that object at a time. This is simply not possible in Pascal, any member of a struct or class is always mutable. You can't have const members. And thats just one of the special rules from Rust. Basically the whole language is built in a way to help the compiler reason about the ownership of an object.

    The second solution rust has is unsafe code. Sometimes you just can't get the compiler to believe you that what you are doing is correct, even though it is. In that case you can just make the code unsafe, basically for a certain region in the code tell the compiler: "Don't try to check I know what I'm doing" and this allows you to avoid such problems. The idea being that 90%-99% of your code will be provably safe, and for the remaining 1%-10% it's on your own risk.

    Rust is a really interesting language, but it only works because it was built from the ground up to be that way, and even then you sometimes need unsafe code. It's sadly nothing you can just patch onto an existing language (and the C++ standardization team tries this very hard, with some success, but it can never be as seamless as in rust). If you look at modern C++ you see a bit of an in between, where you have with std::unique_ptr, const pointers and std::move some of the concepts you find in rust, and allow for strict ownership relations, but it always feels a bit tagged on. And you often see that many C++ programmers ignore this even where it may be sensible, because it's an opt-in and not the default behavior.

    But what's extremely interesting is this blog post from Mozilla a few years ago when they rewrote the Firefox CSS parser in Rust they made an analysis of the bugs that where previously found in that C++ code: https://hacks.mozilla.org/2019/02/rewriting-a-browser-component-in-rust/
    Quote
    Over the course of its lifetime, there have been 69 security bugs in Firefox’s style component. If we’d had a time machine and could have written this component in Rust from the start, 51 (73.9%) of these bugs would not have been possible. While Rust makes it easier to write better code, it’s not foolproof.

    So 74% of their bugs would simply not be possible in Rust, with all of the security critical bugs being part of that. So especially when writing security related code, manual memory management is the main source of vulnerabilities, and using automated memory management is the easiest solution to avoid this.
    But you don't need Rust for that. Using Pascals managed types like Interfaces, with range guards on array access provides the same security benefits. So when available, you should always prefer automated memory management over manual memory management. The only problem with Interfaces and reference counting compared to the Rust design is that it adds some runtime overhead. So while being as safe as Rust, it is not as fast and efficient. For most programs this may not be a problem, but for e.g. a Browser engine such as Firefox Rust is perfect (which is why Mozilla developed it in the first place)

    Rust is from the ground up developed to build high performance very secure software. So for most developers that don't require this level of performance, other memory management techniques such as reference counting or garbage collection are fully sufficient (Garbage collection has an even bigger advantage over reference counting, even though it is much less performant, because it basically just halts all execution, you don't have any threading issues when dealing with garbage collection, that you may encounter with reference counting)
    « Last Edit: May 05, 2023, 08:30:15 pm by Warfley »

    VTwin

    • Hero Member
    • *****
    • Posts: 1224
    • Former Turbo Pascal 3 user
    Re: Precautions related to absence of a garbage collector
    « Reply #8 on: May 05, 2023, 10:57:50 pm »
    Thanks Warfley, this is a very useful summary. Perhaps consider making a wiki page.

    I was reading about Zig recently, it seems to be an upcoming rival for C and Rust.
    “Talk is cheap. Show me the code.” -Linus Torvalds

    Free Pascal Compiler 3.2.2
    macOS 14.5: Lazarus 3.4 (64 bit Cocoa M1)
    Ubuntu 18.04.3: Lazarus 3.4 (64 bit on VBox)
    Windows 7 Pro SP1: Lazarus 3.4 (64 bit on VBox)

    rnfpc

    • Full Member
    • ***
    • Posts: 101
    Re: Precautions related to absence of a garbage collector
    « Reply #9 on: May 06, 2023, 10:38:12 am »
    A good Pascal code analyzer which points out ownership/borrowing issues can be very helpful. Some Pascal code analyzers are available, such as:

    https://www.peganza.com

    and more as discussed here:

    https://stackoverflow.com/questions/532986/are-there-any-static-code-analysis-tools-for-delphi-pascal

    I have not tried any of these and will appreciate if anyone here can share his/her experience.
    « Last Edit: May 06, 2023, 12:05:32 pm by rnfpc »

    VisualLab

    • Hero Member
    • *****
    • Posts: 639
    Re: Precautions related to absence of a garbage collector
    « Reply #10 on: May 08, 2023, 07:02:52 pm »
    For example in rust you can borrow as read only or as writer (i.e const or mutable), with const being the default.

    How does this affect the performance of the generated machine code? Doesn't the presence of many immutable variables cause slower execution? (overhead for memory allocation). Not only Rust uses immutable variables (e.g. functional languages) by default.

    I was reading about Zig recently, it seems to be an upcoming rival for C and Rust.

    After a cursory look at the Zig language, I get the impression that it has more advantages than Rust (time will tell if this impression is correct). And it certainly seems more consistent and clearer than Rust. Rust gives the impression of excessive and unnecessary "baroque bloat". What I really like about Zig is the lack of macros and preprocessor :)

    Anyway, if future OSes are written in Rust, it will be hard times for users of other languages (C was/is nasty, but many of its quirks can be worked around). Hopefully Rust will remain a niche for hardened "hermit programmers" and "whipper programmers" :)

    Warfley

    • Hero Member
    • *****
    • Posts: 1869
    Re: Precautions related to absence of a garbage collector
    « Reply #11 on: May 09, 2023, 01:07:27 pm »
    Performance is actually one of the great strength of rust, because the ownership and borrow system allows for much more aggressive optimization. The compiler knows at all times what data is used where and can therefore optimize a lot more, similar how fortran is used for HPC because it doesn't allow aliasing and can create much more efficient code. Also rust cannot just prove memory safety but also thread safety, making rust code much easier to parallelize. Rusts Performance benefits are one of the key reasons for why it is used in Firefox.

    About functional languages, they are a whole different beast, basically because functions are side effect free this allows for completely new optimizations that are not possible otherwise through fixpoint Analysis. This means unoptimized functional programs are really slow, but when you enable optimization and know how which data types work you can get amazing performance, a great example: https://stackoverflow.com/questions/6964392/speed-comparison-with-project-euler-c-vs-python-vs-erlang-vs-haskell
    Without optimization the Haskell code takes 2 minutes 37 seconds, with optimizations, without any algorithmic changes it drops to less than 8 seconds beating the performance of the C version of that algorithm (also on full optimizations)

    Mr.Madguy

    • Hero Member
    • *****
    • Posts: 865
    Re: Precautions related to absence of a garbage collector
    « Reply #12 on: May 09, 2023, 02:49:08 pm »
    Unlike many popular languages today, Pascal/Lazarus is not a garbage-collected language.

    For persons well versed in languages like Python or Java, which are garbage-collected, can coding in Pascal lead to unsuspected errors?

    What precautions should be taken while coding in Pascal in view of the fact that Pascal is not a garbage collected language? Thanks for your insight.
    Garbage collectors are used in scripted languages not to automate memory allocation/freeing to make them safer. Major reason: even basic types like integers being objects and requiring memory allocation/freeing - is much more slower design, that would be way too slow, if objects would be freed immediately.
    Is it healthy for project not to have regular stable releases?
    Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

    VisualLab

    • Hero Member
    • *****
    • Posts: 639
    Re: Precautions related to absence of a garbage collector
    « Reply #13 on: May 09, 2023, 03:23:01 pm »
    Performance is actually one of the great strength of rust, because the ownership and borrow system allows for much more aggressive optimization. The compiler knows at all times what data is used where and can therefore optimize a lot more, similar how fortran is used for HPC because it doesn't allow aliasing and can create much more efficient code.

    I agree with that. However, what I meant was that using a significant number of immutable variables will result in having to allocate memory frequently. Some time ago I was looking through test results somewhere (unfortunately I didn't write it down, which is a pity) where the program (EXE) generated by the Rust compiler was clearly slower than the program (EXE) generated by the C compiler. However, the differences were not as significant as between C and Java.

    Of course, to be able to say authoritatively which compiler generates "faster machine code" or that both generate code with a similar speed, you would need to do a lot of different tests, taking into account I/O operations (files, displaying graphics, etc.).

    Warfley

    • Hero Member
    • *****
    • Posts: 1869
    Re: Precautions related to absence of a garbage collector
    « Reply #14 on: May 09, 2023, 03:37:16 pm »
    This is only the case if you copy the data often, which is when you have a data structure and modify it while still using the old data structure:
    Code: Pascal  [Select][+][-]
    1. x := [1, 2, 3];
    2. y := x + [4];
    3.  
    4. WriteLn(x, y)
    Here you need to copy x to create y, but only because x is later used by WriteLn. If the compiler can prove that x is not used after y was created, it can just mutate x instead of copying it.
    And because rust knows at all times what data is used by whom, such optimizations can be done quite easily. But of course the data type must support these optimizations (i.e. the compiler must know that an addition can also be done in place on the existing array)

    This is also a very common optimization you see in functional languages with their algebraic/recursive types
    « Last Edit: May 09, 2023, 03:39:46 pm by Warfley »

     

    TinyPortal © 2005-2018