I wouldn't go to deep into assembler. Each CPU type has there own.
- Intel/AMD
- Apple M1
- various arm...
....
For the stack, you need:
call / ret
which is the kinda origin.
call will push the return address, ret will pop it.
And you want to know what a "stack frame" is.
One step back, and without asm.
Look at the heap - your "unorganized" ram.
It's just a continuous memory, like a set of boxes you can store stuff in.
And the problem is fragmentation:
- you allocate 3 times 10 boxes
- you no longer need the middle block of 10 boxes, you free it, now you have a gap
You can use that gap for anything up to 10 blocks. But not for 11 blocks.
In consequence, if you allocated all your memory in chunks of 10 blocks, and you freed every second such block, then you have thousands of free blocks, yet you can't allocate a consecutive block of 11.
And, if you deal with heap, the mem-manager must keep track of all those blocks.
A stack is continuous.
It is allocated up to a certain address.
If you need to push on it (or generic allocate space), you take that space at the very top (the side with the "certain address") of the stack (no gaps introduced).
If you need a further block, you allocate it again in the same manner.
=> Now you simply can't free the first block before you freed the 2nd.
Example (here I will allocate by increasing / on Intel the stack works downwards, but that doesn't change the concept):
- Your stack start at $0100000
- Your stack pointer (the marker up to were it is allocated) is at the same address $0100000 => the stack is empty
- You allocate $10 bytes, your stack pointer goes to $0100010 ($10 diff to the "stack start")
- You allocate another $20 bytes, your stack pointer goes to $0100030 ($10 + $20 diff to the "stack start")
Now you can see, you can't free the first $10 bytes, unless you freed the $20 too. Because you can't decrease the stackpointer to $0100000 without freeing the $20 too.
And if you decreased it by $10 from $0100030 to $0100020 you would free half of the $20 instead of the $10 block.
And that is the concept. It makes management easier.
Now as I said, one big use is the return address.
There is also pop/push to add/remove. (obviously the last pushed must be popped first FIFO)
***
And there are stack frames.
if you enter a procedure that has locals, it may need $80 bytes for the locals (all fixed size).
Lets say the stackpointer was at $0100500
- The procedure takes the current $0100500 and stores it as frame base.
- it allocates $80 bytes => stack pointer to $0100580
Remember those can only be freed if all subsequent allocs have been freed too.
Nested/called procedures may alloc stack => but will also free it when the return
And when this function will return, it will be able to free its chunk.
- And this block of memory is now this functions stack frame. The function remembers the address $0100500 independent of the stackpointer (on Inter CPU there is a register for that RSP stack-base / but it's not mandatory to use)
Obviously a dynamic array can't go into this stackframe, because the size of the frame is constant. If you have a local dyn array that is ok, because it is just the pointer. And the data goes on the heap.
Records are fixed size, and can go on the stack-frame
"objects" (instance of class") are NOT fixed size. They could be of an inherited class with more fields. So therefore they are a pointer, and must go on the heap.
"objects" old style are fixed size. And with inheritance that is trouble / and an entire topic of it's own / and a "just keep away from it..."
And thats it.
You avoid fragmentation and managing fragments....
Now you can decide, if you still need asm.
On Intel stacks increase downwards.
So if it starts an $0100000 and you alloc $10 the pointer goes to $00FFFF0 (and your allocated me is from $00FFFF0 to $0100000 / doesn't matter, unless you do asm, or otherwise hack into the stack)