@marcov, @MarkMLI Yes, it seems strange, but it is the case.
Also thanks to everybody else for the ideas, hints.
Now I made 4 test runs (in a smaller machine, with about 6GB total memory: physical + swap):
The first one is just letting the main array of Points grow as normal from 100 elements. It can be seen that the 0-th element of the array is moved at every 10k cycles (my printout frequency) to a lower address. This is totally in line with the earlier hypothesis, that when I grow the array it does not fit to the place where it was and moves to a new location. Also, because I create small, sticking arrays in the Heap, it seems that such a small array is placed right next to the large array and stay there for ever, so when the large array needs to be increased again, it eats up a new fresh memory area. At the end the whole memory is cut into pieces by the small sticky arrays and when the large array needs again a large block, it cannot find and the process is killed. In the attached table it can be seen that the total allocated memory is totally in line with the expectation of 28 (+ some overhead) bytes/point, but the memory address of the 0-th element is moved in total about 6GB when it cannot move further and crashes.
The second one is the same program with one line difference. At the very beginning I set the size of the large array to 10M points, but then immediately after it, as before, the program manages it the same way, i.e. sets its length to 100 (initial points) and then grow from there. It can be seen that memory usage is different. The first 20k points fit in the original space, the 0-th element is not moved. Then it moved few times (the other direction in the heap than in test one!), but between 40k and 50k it jumps a much larger step and for some while it is happy with it. At 460k it moves again this time to lower addresses, then again up at 510k, etc. This goes on till 940k, where suddenly it starts to move the array a lot, until again it cannot find a new place for it after moving the address in net 4.6GB (I did parallel runs with this one so it had somewhat less total memory, hence basically it is the same problem as with the 6GB above, where I had only one instance running).
The third test is again building up the array from 100 elements upwards without a pre-sizing up-and-then-down, but switching cmem on, as the first unit of the program. Here I do not have the allocated memory as heaptrc does not give data when cmem is on. However from top I see a normal memory usage and from the address of the 0-th element it can be seen that at the beginning it also moves often as the allocated size is quickly overgrown, but later it makes big jumps (double the size every step), so it needs to reallocate much less frequently. The program could run much farther than the others. I did not wait until it crashes.
The last test is a combination of the pre-allocation by simply setting again the size to 10M (my intended end number) and then back to 100, but apparently cmem (unlike in test two the normal memory management) is not polluting this memory (I guess as long as it can), so although the memory usage reported is kept low (as I see in top/System Monitor) this area is somehow "reserved" for this array and therefore it never needs to be moved (guess before reaching 10M).
As a conclusion I do not need to use the workaround, the way test four is done seems perfect.
Just for my curiosity: What is the benefit of the "normal" memory management over cmem? If cmem is better (this case definitely is) why is it not a default built in one?