Recent

Author Topic: alloca  (Read 4709 times)

Bogen85

  • Hero Member
  • *****
  • Posts: 702
Re: alloca
« Reply #30 on: March 02, 2023, 08:29:23 pm »
So I ported it:  :P

I'm not using the optional accum.
On linux x86_64 (fpc trunk) it appears to be working for me. (minimal testing so far...)

Bogen85

  • Hero Member
  • *****
  • Posts: 702
Re: alloca
« Reply #31 on: March 02, 2023, 08:58:19 pm »
So I ported it:  :P

I'm not using the optional accum.
On linux x86_64 (fpc trunk) it appears to be working for me. (minimal testing so far...)

Although for classes (for me at least) it has limited usefulness. At least it autofrees, but I use destructors for more than just freeing... With alloca no destructor/finalizer is called. With advanced records I get automatic freeing and can have a finalizer as well.

Bogen85

  • Hero Member
  • *****
  • Posts: 702
Re: alloca
« Reply #32 on: March 02, 2023, 09:05:25 pm »
Well, with online one primary and writeln(i) in main it works.
With multiple it segfaults...

(using an inline class procedure to initialize).

Code: Pascal  [Select][+][-]
  1. program stackclass;
  2.  
  3. type
  4.   TSBoo = class
  5.     strict private
  6.       fa: integer;
  7.     public
  8.       class procedure make(var self:TSBoo; const ia: integer); static; inline;
  9.     published
  10.       property a: integer read fa write fa;
  11.   end;
  12.  
  13. //rcx=size
  14. //rdx=alignm
  15. function _alloca(const size: QWORD; const alignm: QWORD = 16):pointer; assembler nostackframe;
  16. label
  17.  _1,
  18.  _2;
  19. asm
  20.   mov (%rsp),%r9  // return address
  21.   mov %ecx  ,%ecx // zero-extend
  22.   mov %edx  ,%edx // zero-extend
  23.  
  24.   cmp $16, %rdx
  25.   jge _1
  26.   mov $16, %rdx   // Minimum alignment to consider in Win 64 is 16 bytes
  27. _1:
  28.   cmp $4096, %rdx
  29.   jle _2
  30.   mov $4096, %rdx
  31. _2:
  32.   lea (%rcx), %rax  //rax:=size
  33.  
  34.   lea 8(%rsp), %r10 //ptr=rsp+8
  35.   sub %rax, %r10    //ptr:=ptr-size
  36.   neg %rdx          //alignm:=-alignm
  37.   and %rdx, %r10    //ptr:=AlignDown(ptr,alignm)
  38.  
  39.   xor %r11, %r11    //r11:=0
  40.   lea 8(%rsp), %rax //ptr2:=rsp+8
  41.   sub %r10, %rax    //psize:=ptr2-ptr
  42.  
  43.   sub %rax, %rsp   //rsp:=rsp-psize
  44.  
  45.   mov  %r9,(%rsp)  //set return address
  46.   mov %rsp,%rax    //Result:=rsp
  47.   add   $8,%rax    //Result:=Result+8
  48. end;
  49.  
  50. class procedure TSBoo.make(var self:TSBoo; const ia: integer); static; inline;
  51.   begin
  52.     self := TSBoo(_alloca(sizeof(TSBoo)));
  53.     self.fa := ia;
  54.   end;
  55.  
  56. procedure main;
  57.   procedure primary;
  58.     var
  59.       boo: TSBoo = nil;
  60.       boo1: TSBoo = nil;
  61.       boo2: TSBoo = nil;
  62.       boo3: TSBoo = nil;
  63.  
  64.     begin
  65.       TSBoo.make(boo, 4);
  66.       TSBoo.make(boo1, 41);
  67.       TSBoo.make(boo2, 42);
  68.       TSBoo.make(boo3, 43);
  69.       writeln(boo.a);
  70.       writeln(boo1.a);
  71.       writeln(boo2.a);
  72.       writeln(boo3.a);
  73.     end;
  74.  
  75.   var i: integer;
  76.   begin
  77.     i := 40;
  78.     primary;
  79.     writeln(i);
  80.     primary;
  81.     writeln(i);
  82.     primary;
  83.     writeln(i);
  84.   end;
  85.  
  86. begin
  87.   main;
  88. end.

Code: Text  [Select][+][-]
  1. $ ./bin/examples/stackclass
  2. 4
  3. 41
  4. 42
  5. 43
  6. 40
  7. Segmentation fault (core dumped)

Thaddy

  • Hero Member
  • *****
  • Posts: 16945
  • Ceterum censeo Trump esse delendam
Re: alloca
« Reply #33 on: March 02, 2023, 09:29:50 pm »
That is missing the point. My code works with a constructor, not a fairy tale.
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

Bogen85

  • Hero Member
  • *****
  • Posts: 702
Re: alloca
« Reply #34 on: March 02, 2023, 09:36:36 pm »
That is missing the point. My code works with a constructor, not a fairy tale.

I agree it is missing the point... When I see your complete example for Linux 64 I'll play with it some...

I am merely doing this:
Code: Pascal  [Select][+][-]
  1. // snippet from classonstack with your and my code mixed:
  2.   function _alloca(size:qword):Pointer; sysv_abi_default; assembler; nostackframe;
  3.   asm
  4.     movqq       %rsp,%rax
  5.     subq        %rdi,%rax
  6.     lea     -8(%rax),%rax
  7.     andq        $-32,%rax
  8.     movqq     (%rsp),%rdi
  9.     movqq       %rax,%rsp
  10.     lea    -32(%rsp),%rsp
  11.     jmp    %rdi
  12.   end;  
  13.  
  14.   { this does all the magic }
  15.   class function TStackObject.NewInstance:Tobject; // declaration is override
  16.   var
  17.     p : pointer;
  18.   begin
  19.     { allocate on the stack }
  20.     p:=_alloca(instancesize);
  21.     if p <> nil then InitInstance(p);
  22.     NewInstance:=TObject(p);
  23.   end;
This "works" on Win64, but not on linux64.
I quoted "works" because your code also fails on Win64 if in the same function SEH (exceptions) are used, whereas the code I used takes that into account.

It is pretty stable code the way I do it normally. This is adapted, just to show you what I am doing.

Will not TStackObject.NewInstance auto free the what has been allocated on the stack when it returns? What am I missing?

Red_prig

  • Full Member
  • ***
  • Posts: 153
Re: alloca
« Reply #35 on: March 02, 2023, 09:44:55 pm »
Yes, my first alloca formally releases memory when the function ends, therefore it is not suitable for class initialization. The ported code modifies the stackframe which makes the memory usage a bit freer (I didn't fully understand all the details)

Bogen85

  • Hero Member
  • *****
  • Posts: 702
Re: alloca
« Reply #36 on: March 02, 2023, 09:49:07 pm »
Yes, my first alloca formally releases memory when the function ends, therefore it is not suitable for class initialization. The ported code modifies the stackframe which makes the memory usage a bit freer (I didn't fully understand all the details)

Yeah, that is why I declared my initialization function inline in my toy (not practical for anything class related...) example.

Bogen85

  • Hero Member
  • *****
  • Posts: 702
Re: alloca
« Reply #37 on: March 02, 2023, 11:13:35 pm »
That is missing the point. My code works with a constructor, not a fairy tale.

I use constructors as well... That is of course another reason I don't see this as suitable for class instance allocation/deallocation.

Bogen85

  • Hero Member
  • *****
  • Posts: 702
Re: alloca
« Reply #38 on: March 02, 2023, 11:36:01 pm »
That is missing the point. My code works with a constructor, not a fairy tale.

I use constructors as well... That is of course another reason I don't see this as suitable for class instance allocation/deallocation.

EDIT: Acknowledgement of missing the point
Yes, using this for classes or records is likely missing the point on my part, so yeah, any usage for classes (since a constructor can't be used for starters) is contrived (a fairy tale?).
So I'm not sure what this would be useful for. A temporary buffer? A string can already do that... (yes, on the heap...)
But it might be faster than string or things that use heap based allocation, but are not records already on the stack?

PascalDragon

  • Hero Member
  • *****
  • Posts: 5968
  • Compiler Developer
Re: alloca
« Reply #39 on: March 03, 2023, 07:29:39 am »
Yes, I know, I rephrase it: Why is it is not accessible through {$linklib c}  on my system? (Debian 64 under wsl1 and Debian 64 on  wsl2)

If you would know that then you would not ask why it's not accessible using $linklib c, because it's not part of the C library. “C library” <> GCC/Clang. It's essentially the same as Writeln or TypeInfo which are intrinsics solely provided by the compiler.

Warfley

  • Hero Member
  • *****
  • Posts: 1872
Re: alloca
« Reply #40 on: March 03, 2023, 08:58:37 am »
EDIT: Acknowledgement of missing the point
Yes, using this for classes or records is likely missing the point on my part, so yeah, any usage for classes (since a constructor can't be used for starters) is contrived (a fairy tale?).
So I'm not sure what this would be useful for. A temporary buffer? A string can already do that... (yes, on the heap...)
But it might be faster than string or things that use heap based allocation, but are not records already on the stack?
Of course the constructor can be used. You just need to override NewInstance and FreeInstance, to use your own memory.
Example:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   heaptrc, Classes;
  7.  
  8. type
  9.   TAnywhereStringList = class(TStringList)
  10.   private
  11.     class var DataPtr: Pointer;
  12.   public
  13.     class function newinstance: tobject; override;
  14.     procedure FreeInstance; override;
  15.  
  16.     class function CreateAt(mem: Pointer): TAnywhereStringList;
  17.   end;
  18.  
  19. class function TAnywhereStringList.newinstance: tobject;
  20. begin
  21.   if Assigned(DataPtr) then
  22.     InitInstance(DataPtr);
  23.   Result := TAnywhereStringList(DataPtr);
  24.   DataPtr := nil;
  25. end;
  26.  
  27. procedure TAnywhereStringList.FreeInstance;
  28. begin
  29.   CleanupInstance;
  30. end;
  31.  
  32. class function TAnywhereStringList.CreateAt(mem: Pointer): TAnywhereStringList;
  33. begin
  34.   DataPtr := mem;
  35.   Result := TAnywhereStringList.Create;
  36. end;
  37.  
  38. var
  39.   globalBuffer: Array[0..1024] of Byte;
  40. procedure Test;
  41. var
  42.   localBuffer: Array[0..1024] of Byte;
  43.   heapBuffer: PByte;
  44.  
  45.   globalSL, localSL, heapSL: TAnywhereStringList;
  46. begin
  47.   heapBuffer:=GetMem(1024);
  48.   try
  49.     globalSL := TAnywhereStringList.CreateAt(@globalBuffer);
  50.     localSL := TAnywhereStringList.CreateAt(@localBuffer);
  51.     heapSL := TAnywhereStringList.CreateAt(heapBuffer);
  52.  
  53.     globalSL.Add('Hello');
  54.     globalSL.add('Global');
  55.     localSL.Add('Hello');
  56.     localSL.add('Local');
  57.     heapSL.Add('Hello');
  58.     heapSL.add('Heap');
  59.     WriteLn(globalSL.Text);
  60.     WriteLn(localSL.Text);
  61.     WriteLn(heapSL.Text);
  62.   finally
  63.     globalSL.Free;
  64.     localSL.Free;
  65.     heapSL.Free;
  66.     Freemem(heapBuffer);
  67.   end;
  68. end;
  69.  
  70. begin
  71.   Test;
  72.   ReadLn;
  73. end.

This creates a TStringList class (derivate) which can be created at any memory location, not just the heap.
This has some advantages. First the heap is very slow. With such a mechanism you can use stack allocators (which are not necessarily located on the stack, but use a stack as internal datastructure for memory allocation). Or if you have multiple processes you can have shared memory between them, and you can use such a technique to share classes within that shared memory, to be accessible from the different processes.

Why to specifically use alloca, wenn the stack is just way faster. An allocation on the heap is quite complex, on the stack it's pretty much just a subtraction to the stack pointer and thats it. Also you might have systems that do not have a heap to begin with (e.g. microcontrollers).

Most importantly, to your question, why not simply use records (or objects for that matter) and this is very simple, if you want to use existing classes, like string list, or Base64Encoder/Decoder, without relying on the stack, this is the way to go.
If you write your own types, I would recommend going with objects rather than classes (when possible, objects are a bit more restrictive), or advanced records (if inheritance isn't required), because they don't require all that boilerplate code above. But if you need to use existing classes from RTL/FCL or LCL on the stack, this is the way to go

Thaddy

  • Hero Member
  • *****
  • Posts: 16945
  • Ceterum censeo Trump esse delendam
Re: alloca
« Reply #41 on: March 03, 2023, 09:34:29 am »
In your code only the pointer is allocated on the stack? the call to getmem() always allocates heap, not stack.
My code allocates ALL memory for the object on the stack, and that is free'd when that class goes out of scope. Of course that has certain risks if the stack size is not big enough, so the stack size must be carefully managed when you use my code. Your example has less risks in that regard, but you still need to call free on classes and Freemem on the local pointer.
Another thing with my code compared to yours is that you must derive from my base class.
Also note that while it is true that you simply subtract the size from the stack pointer, this an get tricky when exception handling is used. Hence the assember language code that I use is more involved: it protects against stack corruption when SEH is used inside the routine even if the full class instance is allocated on the stack, but it does not have to care about the memory itself.
See, there are important differences between just a local pointer and heap allocation and my approach.
I wish I could do more or less the same on Linux64, but thusfar I did not succeed, although it should be possible. (I have compilable and runnable code on Linux64, but it does not do what I want, i.e. although no leaks, it just does not work as designed. On Win64 the code works as designed.

What you could do to create your whole class instance on the stack is to declare a sufficiently large static local array and use that for the class(es). That would be memory that in its entirety is allocated on the stack, not just the pointer. Then you get around the limitations of getmem(), which really is heap memory, but with a pointer on the stack.

The heaptrc unit will reveal the difference between your code and my code: the allocations. In my example there are no heap allocations at all! except for the blocks necessary for the system unit.

Anyway it was and still is a fun subject with only marginal use cases, but the speed really matters sometimes.

I have been always interested since I discovered this trick in the implementation section of the D4 grids.pas sourcecode. It was NOT documented and hidden on purpose. I don't think it is used anymore, but I have only the sourcecode for D2..D2007 and Kylix3 RTL/VCL. A jezuite priest actually wrote the first example.

Note that if some code in the stack based class needs finalization apart from memory allocations, even with my code you should put that code in an overriden FreeInstance. E.g. when you manipulate class vars, etc.

It is not in anyway critique, merely pointing out the different approaches  and the pitfalls.
« Last Edit: March 03, 2023, 10:02:50 am by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

Warfley

  • Hero Member
  • *****
  • Posts: 1872
Re: alloca
« Reply #42 on: March 03, 2023, 10:10:23 am »
In your code only the pointer is allocated on the stack? the call to getmem() always allocates heap, not stack.
My code allocates ALL memory for the object on the stack, and that is free'd when that class goes out of scope. Of course that has certain risks if the stack size is not big enough, so the stack size must be carefully managed when you use my code. Your example has less risks in that regard, but you still need to call free on classes and Freemem on the local pointer.
Of course my example was not for a stack allocated class, but for a class that can be allocated in any memory, stack, heap, global variables (DATA segment), shared memory, file mapped memory, whatever.
Because as I said, the stack might not be the only target, but you may want to have a class in some other memory (like shared memory between processes, or global variable memory).

Quote
The heaptrc unit will reveal the difference between your code and my code: the allocations. In my example there are no heap allocations at all! except for the blocks necessary for the system unit.
It should be noted that I allocate a TStringList, which does internal allocations. Of course it's a bad example when you want to avoid heap allocations at all, but I think it's nice to visualize how to use custom allocation for already existing RTL classes.

Also your example calls alloca within the NewInstance, which as a function has it's own stack frame. So when the constructor calls this function, the memory will be allocated on the stack of that function, but when NewInstance returns, the stack frame is poped and you memory is gone.

I believe this is also the reason why you have problems with SEH exceptions, because you are referencing a stack frame which does not exist anymore. What you need to do is to allocate the stack memory first, and then hand it over to the class, to be used by the NewInstance function.

This is the main problem with a stack allocator, because It would be extremely nice to put it all encapsulated into it's own class with it's own functions, but alloca must be used within the function where the object shall be located, and therefore can't be encapsulated into another function

This is the reason why I have this weird CreateAt function, which just sets a global variable with the memory address, which will later be used by NewInstance:
Code: Pascal  [Select][+][-]
  1. class function TAnywhereStringList.CreateAt(mem: Pointer): TAnywhereStringList;
  2. begin
  3.   DataPtr := mem;
  4.   Result := TAnywhereStringList.Create;
  5. end;

Not the prettiest solution, but its necessary.
« Last Edit: March 03, 2023, 10:13:02 am by Warfley »

Thaddy

  • Hero Member
  • *****
  • Posts: 16945
  • Ceterum censeo Trump esse delendam
Re: alloca
« Reply #43 on: March 03, 2023, 10:32:17 am »
Well, still differ. As I wrote, for my approach to succeed a class and subclasses) needs to derive from My own root object. Also any memory allocations that a class needs, needs to be allocated using alloca. But there is no reason that you can't write or copy TStringlist and plainly change its root(s).

The use case that I used professionally was in a ticker app, where many shortlived allocations and deallocations of a ticker class (following the world's exchanges in near real time) could occur is a very short amount of time: in that use case the speed-up was (still is, still in use) immense even compared to using records instead, allocated with new() and Dispose.
Anyway you have some nice idea's, just different from my purpose.
And the exception handling was sorted. Also note the compiler can leave out any implicit exceptions:{$implicitexceptions off}. Since that is a locally controlled directive you can basically control that and with a very granular resolution. Ergo:My code does not apply to the normal classes derived from TObject.
« Last Edit: March 03, 2023, 10:45:15 am by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

Warfley

  • Hero Member
  • *****
  • Posts: 1872
Re: alloca
« Reply #44 on: March 03, 2023, 10:48:56 am »
The use case that I used professionally was in a ticker app, where many shortlived allocations and deallocations of a ticker class (following the world's exchanges in near real time) could occur is a very short amount of time: in that use case the speed-up was (still is, still in use) immense even compared to using records instead, allocated with new() and Dispose.
Anyway you have some nice idea's, just different from my purpose.
I once had another interesting use-case, where I needed to compute something by building a tree with a lot of nodes, where all the nodes where created dynamically, but the whole tree was freed at once. So what I did was I pre allocated a huge area of memory, on the heap, and then allocated the node objects on that like a stack. This way I had only one large heap allocation, while all the new allocations of the nodes where basically nothing more than incrementing a pointer on the stack. And after I was done doing the computation, all that was needed to remove the whole tree was to free that whole memory block as one.

While the time save per tree node was quite small, the total time save over billions of nodes was actually quite huge

 

TinyPortal © 2005-2018