Lazarus

Programming => General => Topic started by: julkas on August 15, 2019, 01:45:27 pm

Title: Why CMem?
Post by: julkas on August 15, 2019, 01:45:27 pm
Why https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-fpascal-7.html uses CMem memory manager?
Title: Re: Why CMem?
Post by: Thaddy on August 15, 2019, 01:49:50 pm
Don't duplicate questions. Answered in the other thread.
Title: Re: Why CMem?
Post by: julkas on August 15, 2019, 01:53:01 pm
Don't duplicate questions. Answered in the other thread.
Your answer is not clear for me.
Moderator - If my question is duplicated, please remove my post.
Title: Re: Why CMem?
Post by: Thaddy on August 15, 2019, 02:12:00 pm
https://forum.lazarus.freepascal.org/index.php/topic,44953.msg330709.html#msg330709 is the exact same.
Title: Re: Why CMem?
Post by: julkas on August 15, 2019, 02:15:32 pm
Yes @Thaddy. Same, but in other topic (post). I want clear answer.
And I think nobody can give me clear answer. Even FPC developers.
Title: Re: Why CMem?
Post by: lucamar on August 15, 2019, 02:36:02 pm
My guess? To make it a fairer comparison. Read the intro at "Free Pascal versus C++ g++ fastest programs (https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/fpascal-gpp.html)"
Title: Re: Why CMem?
Post by: julkas on August 15, 2019, 02:44:00 pm
My guess? To make it a fairer comparison. Read the intro at "Free Pascal versus C++ g++ fastest programs (https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/fpascal-gpp.html)"
Quote
"measuring the quality of the generated code when both compilers are presented with what amounts to the same program."
Quote
These are only the fastest programs.
Title: Re: Why CMem?
Post by: julkas on August 15, 2019, 03:29:08 pm
FastMM, ScaleMM, BrainMM, FPC memory manager. So - CMem?
Title: Re: Why CMem?
Post by: avra on August 15, 2019, 03:35:06 pm
I want clear answer.
That is a benchmark game, and since FPC has a choice with memory managers author probably tried them all and left the faster one.
Title: Re: Why CMem?
Post by: julkas on August 15, 2019, 03:38:05 pm
I want clear answer.
That is a benchmark game, and since FPC has a choice with memory managers author probably tried them all and left the faster one.
Why CMem is the faster one?
Yes, this is a game.
Title: Re: Why CMem?
Post by: BrunoK on August 15, 2019, 04:35:00 pm
Because cmem seems faster in multithreaded applications (by much) whereas there is no noticable difference with heap.inc in single threaded apps. 
At least on windows.
Title: Re: Why CMem?
Post by: julkas on August 15, 2019, 05:39:13 pm
Because cmem seems faster in multithreaded applications (by much) whereas there is no noticable difference with heap.inc in single threaded apps. 
At least on windows.
https://github.com/graemeg/freepascal/blob/master/rtl/inc/cmem.pp
Title: Re: Why CMem?
Post by: Thaddy on August 15, 2019, 07:45:36 pm
Why do you refer to a fork instead of the real code available from freepascal.org? It is the exact same and usually more current.
Title: Re: Why CMem?
Post by: Akira1364 on August 17, 2019, 05:46:37 am
Why do you refer to a fork instead of the real code available from freepascal.org? It is the exact same and usually more current.

It's a direct mirror of the trunk FPC SVN, not a fork.

Why https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-fpascal-7.html uses CMem memory manager?

As the person who submitted that version of the benchmark: because CMem significantly improves the performance of multi-threaded applications in general (on Windows too, by the way). The execution time for that program without CMem is WAY longer than 3.06 seconds, to say the very least.
Title: Re: Why CMem?
Post by: Thaddy on August 17, 2019, 07:28:00 am
When we measured the different memory managers during the Delphi memory manager challenge we had a different conclusion, at least compared to the newer per thread allocators like scalemm, smartmm and topmm and family. It may be that a FPC conversion is possible for those. The most important think conclusion was that the Delphi MM (FastMM) and cmem do not scale.
In some applications cmem indeed outperforms the fpc default a memory manager, but it depends on application as I explained.
The benchmark sourcecode (for 32 bit) can be optained from http://fastcode.sourceforge.net/. My old collegue Andre Musche has likely a 64 bit version of SmartMMv2 as well. I will ask him.
(Lock-free context switching) My other former collegue Ivo Tops wrote topmm.
I can prepare those (smartmmv2 and topmm) for fpc - windows32/64 (did that before, but code is not current: fpc 2.0.)
[to be completed with example]
Title: Re: Why CMem?
Post by: k1ng on August 17, 2019, 10:53:00 am
There are some benchmarks in mORMot source code (https://github.com/synopse/mORMot/blob/c5b7e9a3716930be6d804a75d63920193c3ca14f/SynFPCCMemAligned.pas#L57):
Code: [Select]
  Some raw numbers, from TestSQL3 string allocation tests (single threaded):
    - FPC default heap
     500000 interning 8 KB in 77.34ms i.e. 6,464,959/s, aver. 0us, 98.6 MB/s
     500000 direct 7.6 MB in 100.73ms i.e. 4,963,518/s, aver. 0us, 75.7 MB/s
    - glibc 2.23
     500000 interning 8 KB in 76.06ms i.e. 6,573,152/s, aver. 0us, 100.2 MB/s
     500000 direct 7.6 MB in 36.64ms i.e. 13,645,915/s, aver. 0us, 208.2 MB/s
    - jemalloc 3.6
     500000 interning 8 KB in 78.60ms i.e. 6,361,323/s, aver. 0us, 97 MB/s
     500000 direct 7.6 MB in 58.08ms i.e. 8,608,667/s, aver. 0us, 131.3 MB/s
    - Intel TBB 4.4
     500000 interning 8 KB in 61.96ms i.e. 8,068,810/s, aver. 0us, 123.1 MB/s
     500000 direct 7.6 MB in 36.46ms i.e. 13,711,402/s, aver. 0us, 209.2 MB/s
    for multi-threaded process, we observed best scaling with TBB on this system
    BUT memory consumption raised to 60 more space (gblic=2.6GB vs TBB=170GB)!
    -> so for serious server work, glibc (FPC_SYNCMEM) sounds the best candidate
Unfortunately they didn't published the results for the multi-threaded tests afaik.

Note that there is also a fork of FastMM (FastMM4-AVX) (https://github.com/maximmasiutin/FastMM4-AVX) which seems to do pretty good for multi-threading.

Delphi 10.2 Tokyo:
Code: [Select]
                     Xeon E5-2543v2 2*CPU      i7-7700K CPU
                    (allocated 20 logical   (8 logical threads,
                     threads, 10 physical    4 physical cores),
                     cores, NUMA), AVX-1          AVX-2

                    Orig.  AVX-br.  Ratio   Orig.  AVX-br. Ratio
                    ------  -----  ------   -----  -----  ------
02-threads realloc   96552  59951  62.09%   65213  49471  75.86%
04-threads realloc   97998  39494  40.30%   64402  47714  74.09%
08-threads realloc   98325  33743  34.32%   64796  58754  90.68%
16-threads realloc  116273  45161  38.84%   70722  60293  85.25%
31-threads realloc  122528  53616  43.76%   70939  62962  88.76%
64-threads realloc  137661  54330  39.47%   73696  64824  87.96%
NexusDB 02 threads  122846  90380  73.72%   79479  66153  83.23%
NexusDB 04 threads  122131  53103  43.77%   69183  43001  62.16%
NexusDB 08 threads  124419  40914  32.88%   64977  33609  51.72%
NexusDB 12 threads  181239  55818  30.80%   83983  44658  53.18%
NexusDB 16 threads  135211  62044  43.61%   59917  32463  54.18%
NexusDB 31 threads  134815  48132  33.46%   54686  31184  57.02%
NexusDB 64 threads  187094  57672  30.25%   63089  41955  66.50%

Delphi 10.2 Update 3 (note that it uses different CPUs as well)
Code: [Select]
                     Xeon E5-2667v4 2*CPU       i9-7900X CPU
                    (allocated 32 logical   (20 logical threads,
                     threads, 16 physical    10 physical cores),
                     cores, NUMA), AVX-2          AVX-512

                    Orig.  AVX-br.  Ratio   Orig.  AVX-br. Ratio
                    ------  -----  ------   -----  -----  ------
02-threads realloc   80544  60025  74.52%   66100  55854  84.50%
04-threads realloc   80751  47743  59.12%   64772  40213  62.08%
08-threads realloc   82645  32691  39.56%   62246  27056  43.47%
12-threads realloc   89951  43270  48.10%   65456  25853  39.50%
16-threads realloc   95729  56571  59.10%   67513  27058  40.08%
31-threads realloc  109099  97290  89.18%   63180  28408  44.96%
64-threads realloc  118589 104230  87.89%   57974  28951  49.94%
NexusDB 01 thread   160100 121961  76.18%   93341  95807 102.64%
NexusDB 02 threads  115447  78339  67.86%   77034  70056  90.94%
NexusDB 04 threads  107851  49403  45.81%   73162  50039  68.39%
NexusDB 08 threads  111490  36675  32.90%   70672  42116  59.59%
NexusDB 12 threads  148148  46608  31.46%   92693  53900  58.15%
NexusDB 16 threads  111041  38461  34.64%   66549  37317  56.07%
NexusDB 31 threads  123496  44232  35.82%   62552  34150  54.60%
NexusDB 64 threads  179924  62414  34.69%   83914  42915  51.14%
Title: Re: Why CMem?
Post by: julkas on August 17, 2019, 11:06:56 am
 Malloc intro by Dan Luu - https://danluu.com/malloc-tutorial/.
Title: Re: Why CMem?
Post by: BrunoK on August 17, 2019, 12:59:11 pm
I find this article and ensuing discussion pretty interesting on the relative nature of memory allocator performance.
Title: Re: Why CMem?
Post by: julkas on August 17, 2019, 01:25:16 pm
I find this article and ensuing discussion pretty interesting on the relative nature of memory allocator performance.
+1.
Title: Re: Why CMem?
Post by: BrunoK on August 17, 2019, 02:27:15 pm
I wanted also to mention this article and discussion :
http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/
Title: Re: Why CMem?
Post by: marcov on August 17, 2019, 05:14:08 pm
If you benchmarked on Linux, your results are already old because the heapmanager changed:

Quote
r42713 florian 2019-08-16 22:47:37 +0200 (Fri, 16 Aug 2019)
+ make use of the mremap syscall of linux to re-allocate large memory blocks faster
Commit consists out of 1 line

    M /trunk/rtl/inc/heap.inc
    M /trunk/rtl/linux/ossysc.inc
    M /trunk/rtl/unix/sysheap.inc
    A /trunk/tests/test/theap2.pp

Also do such micro benchmarks really mean something ? The memory manager that holds large per thread spares of 8kb blocks wins the 8kb benchmark, but in practice might only be wasting memory.
Title: Re: Why CMem?
Post by: Thaddy on August 17, 2019, 05:20:55 pm
Yes. Then again I ran into the problem that in FPC we have to maintain and store size, otherwise Lazarus crashes..... Which is silly.
That slows down our memory manager considerably. The size can on many platforms be obtained elsewhere:
- windows _memsize
- linux & bsd  malloc_usable_size

For pure pascal programs you can skip the size requirement in the memory manager.
Lazarus is using some - should be! - implementation detail.

That said, I have become to realize that modern cmem is not like 2005, more like 2015, and indeed scales better than ours.
After a lot of testing today: yes, a modern cmem is faster in ALL cases I tested. Even with Florian's improvement.
Title: Re: Why CMem?
Post by: marcov on August 17, 2019, 05:33:45 pm
Yes. Then again I ran into the problem that in FPC we have to maintain and store size, otherwise Lazarus crashes..... Which is silly.

It is a property of the language.

Quote
That slows down our memory manager considerably. The size can on many platforms be obtained elsewhere:
- windows _memsize
- linux & bsd  malloc_usable_size

No. Malloc_usable_size is a property of the heapmanager, just like FPC's heapmanager has it.

You are confusing base memory primitives (like mmap and realloc) with alternate heapmanagers. FPC's heapmanager is an alternative for malloc, so telling it to use malloc is not logical.    I can't find a url for memsize. Seems like an (msv)crt internal function, but FPC doesn't use crt at all, again the same mistake, FPC RTL is a replacement for the crt, not something on top of it.

Maybe there are better ways than storing it in every a memoryblock (like keeping the administration  in unallocated memory), that could avoid some SSE alignment penalties for the actual blocks, but could possibly also cause an access performance hit when used relatively a lot by a language.
Title: Re: Why CMem?
Post by: Thaddy on August 17, 2019, 06:07:48 pm
_msize Marco,  https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/msize?view=vs-2019 Sorry. That is the windows equivalent.
I did not write the correct call but for the major platforms it is not needed to keep size (Win, apple, linux, bsd all have a call that does the same)
You confuse the lowest level: the OS.
Delphi doesn't keep size.  (not the first time I mentioned it.....like 8 years ago)
It is a major hinder to improve the FPC memory manager: too much housekeeping on top.
Title: Re: Why CMem?
Post by: marcov on August 17, 2019, 09:04:12 pm
_msize Marco,  https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/msize?view=vs-2019 Sorry. That is the windows equivalent.

No it is not. It is part of the C language runtime on top of the original windows api(which are kernel32 and user32). Of course msvcrt is pretty standard now, but it is a totally different thing.

Quote
I did not write the correct call but for the major platforms it is not needed to keep size (Win, apple, linux, bsd all have a call that does the same)

All heapmanagers have such ability. Malloc, FPC's heapmgr, all. But that needs to implemented somewhere. Either in fpc heapmgr or in some malloc. *nix only has mmap/sbrk.

malloc and crt's memmanager is on the same level as the FPC heapmanager, part of a language runtime. IOW a call to a malloc  or crt function doesn't solve the problem for the FPC heapmanager.

Quote
You confuse the lowest level: the OS.
Delphi doesn't keep size.  (not the first time I mentioned it.....like 8 years ago)

Possible. There are legacy features that nobody knows exactly why they are implemented as they are. Sometimes in the past the simplest solution was chosen. But it is there are also possibilities that this makes the memsize() call very cheap.

Quote
It is a major hinder to improve the FPC memory manager: too much housekeeping on top.

Now is the time. Sooner or later we will have to emit only aligned blocks (to 16 or 32 byte borders), and then this must also change.
Title: Re: Why CMem?
Post by: PascalDragon on August 18, 2019, 08:55:36 pm
Yes. Then again I ran into the problem that in FPC we have to maintain and store size, otherwise Lazarus crashes..... Which is silly.
That slows down our memory manager considerably. The size can on many platforms be obtained elsewhere:
- windows _memsize
- linux & bsd  malloc_usable_size
How do you come to that conclusion? FPC does not enforce that all, you just need to make sure that your implementation of the functions of TMemoryManager behave correctly. And if Lazarus (or some package of it) should depend on the internal workings of the memory manager (e.g. by assuming a size field in front of the allocated block) then that is a bug and should be reported.
Title: Re: Why CMem?
Post by: Thaddy on August 18, 2019, 09:03:21 pm
@PascalDragon
In general, but focussing on cmem:
If you leave out the size, Lazarus doesn't work - but Fpc does! -. Try it yourself using cmem as a template, simply rip out the silly size management.
So Lazarus relies on implementation detail.
This behavior is also documented, although obfuscated (freemem remark) https://forum.lazarus.freepascal.org/index.php?action=post;topic=46420.15;last_msg=331064
The "should" should be replaced with "must" but only for Lazarus.
(Actually I think this is quite a substantial bug in Lazarus)
It is enough to move the declarations from interface to implementation. (These should not have any use there whatever any memory manager implementation)
Actually, I would recommend to move the declarations to the implementation section just because of that!

Problem: Lazarus people: And now what??????????

To my knowledge, Compiler and RTL do not rely on size in the memory manager at all.
Title: Re: Why CMem?
Post by: k1ng on August 18, 2019, 10:01:43 pm
This behavior is also documented, although obfuscated (freemem remark) https://forum.lazarus.freepascal.org/index.php?action=post;topic=46420.15;last_msg=331064
I'm sure your link is wrong %)

Actually, I would recommend to move the declarations to the implementation section just because of that!

Problem: Lazarus people: And now what??????????
Fix it in Lazarus code? ;)
Title: Re: Why CMem?
Post by: jamie on August 18, 2019, 11:08:53 pm
Since we are on the subject memory I've wonder at times how FPC manages to know how to release a pointer allocation?

 I know it's obvious if you simply use the same pointer to the FreeMem it should work because It knows where it is in memory however, what happens when I increment that pointer? Now its no longer at its original starting point..

 do we save the actual location of the pointer itself so that it can be identified no matter what value it gets changed to or do we try to find a memory block that pointer fits into and assume that is the correct one? I would think the latter would be a little slow but the former would be great since the address of the pointer body shouldn't change.
Title: Re: Why CMem?
Post by: PascalDragon on August 19, 2019, 09:45:47 am
@PascalDragon
In general, but focussing on cmem:
If you leave out the size, Lazarus doesn't work - but Fpc does! -. Try it yourself using cmem as a template, simply rip out the silly size management.
So Lazarus relies on implementation detail.
This behavior is also documented, although obfuscated (freemem remark) https://forum.lazarus.freepascal.org/index.php?action=post;topic=46420.15;last_msg=331064
The "should" should be replaced with "must" but only for Lazarus.
(Actually I think this is quite a substantial bug in Lazarus)
First of your link is wrong. I assume you meant this (https://www.freepascal.org/docs-html/current/prog/progsu175.html)?
It's only a possible implementation. Maybe that should be clarified by providing an alternative (e.g. "or the underlying memory manager provides a function to retrieve the size of a memory block"). If some code relies on that then this is a bug in that code and needs to be reported.
It is enough to move the declarations from interface to implementation. (These should not have any use there whatever any memory manager implementation)
Actually, I would recommend to move the declarations to the implementation section just because of that!
What declarations? Those of TMemoryManager and its function variables? No. Those must be in the interface section, because they are needed to be used from other units, namely to implement memory managers.
Problem: Lazarus people: And now what??????????
Report a bug with them.
Title: Re: Why CMem?
Post by: 440bx on August 19, 2019, 10:35:07 am
what happens when I increment that pointer? Now its no longer at its original starting point..
For a number of reasons, performance among them, most memory managers require the original pointer they returned to free the memory block.

With most memory managers, passing a pointer that points somewhere within an allocated memory block rarely, if ever, yields a desirable result.
Title: Re: Why CMem?
Post by: Tz on August 19, 2019, 04:58:36 pm
In my newbie point of view, if I want platform specific memory manager
for example tcmalloc or jemalloc, 
I just use CMem and change

{$else}
  LibName = 'c';  // change to tcmalloc or jemalloc and install the library
{$endif}

just simple as that. Its like plug the MM and play. 
Its freedom to play, other programming? maybe just a few?
Pascal Programming is for teaching, indeed.

I tried that before, but stick to c, cause its already there.
No need tcmalloc or jemalloc deployment check. (This is admin view)

or if I compiled with -glh it's cost me zero :)  (this newbie view, need explanation from hero)

program mem;
{$mode objfpc}
{$h+}
uses CMem, CThreads;
var b :array of Byte;
begin
  SetLength(b, 1000);
end.

Once I had memory leak, if I declare CThreads first, as like most of examples.
Since then I put CMem first.


For speed, I think all Hero here, has got a point.


Excuse me julkas, its nice topic, you gather all Hero, here.


For all Hero, a bit request from batalion of newbie :)

program Project1;
{$mode objfpc}
{$h+}
uses
  {$IFDEF UNIX}{$IFDEF UseCThreads}      // why in each program have to declare like this
  cthreads,
  {$ENDIF}{$ENDIF}

is there a way for platform specific like

program Project1;

uses FpcWindows or DelphiWindows ...

program Project1;

uses FpcLinux or DelphiLinux ...


and all CMem, CThreads, directives, etc are handle seamlessly?

long live newbie,  cheers :)


Title: Re: Why CMem?
Post by: Leledumbo on August 19, 2019, 05:29:05 pm
For all Hero, a bit request from batalion of newbie :)

program Project1;
{$mode objfpc}
{$h+}
uses
  {$IFDEF UNIX}{$IFDEF UseCThreads}      // why in each program have to declare like this
  cthreads,
  {$ENDIF}{$ENDIF}
Single source cross platform compatibility in mind, otherwise you have to change the uses clause everytime you change target platform. Some of us, like me, uses the build mode to build binaries for multiple platform in a single click, it won't work if I have to do what I said earlier, nor it will be a pleasing experience having blocks of ifdef for every platform.
is there a way for platform specific like

program Project1;

uses FpcWindows or DelphiWindows ...

program Project1;

uses FpcLinux or DelphiLinux ...


and all CMem, CThreads, directives, etc are handle seamlessly?
First, it's completely the opposite of the above principle. Second, it cannot be done so. Not all (Unix targeting) programs require threads, so it can't be forced so (since not just unnecessary, but including cthreads pull a significant dependency to the binary size). Directives work at compile time, hence it cannot be included in a compiled unit. Or do you prefer FpcWindows, FpcUnixNoThreads, FpcUnixWithThreads, etc.? It's a big NAH from me, both from users POV and maintainers POV.
Title: Re: Why CMem?
Post by: jamie on August 20, 2019, 12:06:19 am
what happens when I increment that pointer? Now its no longer at its original starting point..
For a number of reasons, performance among them, most memory managers require the original pointer they returned to free the memory block.

With most memory managers, passing a pointer that points somewhere within an allocated memory block rarely, if ever, yields a desirable result.
I was thinking on the lines of it keeping an ADDRESS reference to the pointer variable itself.
by giving the pointer via reference back to a memory handler to release it, it can then scan a table of allocated chunks to locate a matching reference to a pointer.

 This way the pointer itself can change value but the entity will always be the same.

 Which means of course if you make one pointer := another, the pointer can not be used to release memory from the "another" pointer.

Title: Re: Why CMem?
Post by: 440bx on August 20, 2019, 12:40:50 am
I was thinking on the lines of it keeping an ADDRESS reference to the pointer variable itself.
by giving the pointer via reference back to a memory handler to release it, it can then scan a table of allocated chunks to locate a matching reference to a pointer.

 This way the pointer itself can change value but the entity will always be the same.

 Which means of course if you make one pointer := another, the pointer can not be used to release memory from the "another" pointer.
What you described sounds like how Windows 16bit managed memory.  GlobalAlloc and LocalAlloc returned a pointer to a pointer, this second pointer being the actual pointer to the memory block.  This is what allowed Win16 to move memory blocks.  Whenever it moved a block, usually to lower heap fragmentation, it would update that second pointer. 

Even using that method, Win16 depended on the second pointer referencing the beginning of the memory block to either, free it or move it around.

A generic memory manager has to balance the performance of allocating and freeing blocks while doing its best to keep fragmentation at a minimum. 

Under Win32 and Win64 and, equivalent virtual memory based O/Ses, moving memory is not a concern since a virtual address is not a reference to a real physical memory block and, the real memory block can be changed anytime by updating the page directory or page table entries but, memory fragmentation is still a concern both, for virtual address space and any heap manager.

When memory allocation/deallocation is a concern, the best and often simplest solution is to allocate a large enough block for what the program needs and manage the allocation/deallocation of blocks within it yourself.


Title: Re: Why CMem?
Post by: PascalDragon on August 20, 2019, 09:12:15 am
what happens when I increment that pointer? Now its no longer at its original starting point..
For a number of reasons, performance among them, most memory managers require the original pointer they returned to free the memory block.

With most memory managers, passing a pointer that points somewhere within an allocated memory block rarely, if ever, yields a desirable result.
I was thinking on the lines of it keeping an ADDRESS reference to the pointer variable itself.
by giving the pointer via reference back to a memory handler to release it, it can then scan a table of allocated chunks to locate a matching reference to a pointer.

 This way the pointer itself can change value but the entity will always be the same.

 Which means of course if you make one pointer := another, the pointer can not be used to release memory from the "another" pointer.
I personally think that performance is more important than the "inconvenience" to keep the original pointer, which the memory function returned to me, around.
Title: Re: Why CMem?
Post by: BrunoK on August 20, 2019, 03:09:15 pm
Where to start.

1° cmem on windows. As Grumpy writes _msize can be used to retrieve the size of the allocated memory instead of having a header sizeuint containing the size.
Note that heap.inc has a similar approach in its own structures.
cmem using _msize. Tested with rebuilding FPC 3.0.4 latest (3.0.6 in my private branch) and rebuilding a relatively recent lazarus.
Code: Pascal  [Select][+][-]
  1. {
  2.     This file is part of the Free Pascal run time library.
  3.     Copyright (c) 1999 by Michael Van Canneyt, member of the
  4.     Free Pascal development team
  5.  
  6.     Implements a memory manager that uses the C memory management.
  7.  
  8.     See the file COPYING.FPC, included in this distribution,
  9.     for details about the copyright.
  10.  
  11.     This program is distributed in the hope that it will be useful,
  12.     but WITHOUT ANY WARRANTY; without even the implied warranty of
  13.     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  14.  
  15.  **********************************************************************}
  16. unit cmem;
  17.  
  18. {$MODE OBJFPC}
  19.  
  20. {$DEFINE BK_FEATURES} { ~bk Code regarding agnostic heaptrc and initializing
  21.                             memorymanager and heaptrc BEFORE system.pp }
  22.  
  23. interface
  24.  
  25. const
  26.  
  27. {$if defined(go32v2) or defined(wii)}
  28.   {$define USE_STATIC_LIBC}
  29. {$endif}
  30.  
  31. {$if defined(win32)}
  32.   LibName = 'msvcrt';
  33. {$elseif defined(win64)}
  34.   LibName = 'msvcrt';
  35. {$elseif defined(wince)}
  36.   LibName = 'coredll';
  37. {$elseif defined(netware)}
  38.   LibName = 'clib';
  39. {$elseif defined(netwlibc)}
  40.   LibName = 'libc';
  41. {$elseif defined(macos)}
  42.   LibName = 'StdCLib';
  43. {$elseif defined(beos)}
  44.   LibName = 'root';
  45. {$else}
  46.   LibName = 'c';
  47. {$endif}
  48.  
  49. {$ifdef USE_STATIC_LIBC}
  50.   {$linklib c}
  51. function malloc(Size: ptruint): Pointer; cdecl; external;
  52. procedure Free(P: pointer); cdecl; external;
  53. function realloc(P: Pointer; Size: ptruint): pointer; cdecl; external;
  54. function calloc(unitSize, UnitCount: ptruint): pointer; cdecl; external;
  55. {$else not USE_STATIC_LIBC}
  56. Function Malloc (Size : ptruint) : Pointer; cdecl; external LibName name 'malloc';
  57. Procedure Free (P : pointer); cdecl; external LibName name 'free';
  58. function ReAlloc (P : Pointer; Size : ptruint) : pointer; cdecl; external LibName name 'realloc';
  59. Function CAlloc (unitSize,UnitCount : ptruint) : pointer; cdecl; external LibName name 'calloc';
  60. {$if defined(Windows)}
  61. Function _MemSize(P : Pointer) : ptruint; cdecl; external LibName name '_msize';
  62. {$endif}
  63. {$endif not USE_STATIC_LIBC}
  64.  
  65. implementation
  66.  
  67. uses
  68.   SysUtils;
  69. function CGetMem(Size: ptruint): Pointer;
  70. {var
  71.   lErrno: integer; }
  72. begin
  73.   {$if defined(Windows)}
  74.   CGetMem := Malloc(Size);
  75.   {$else}
  76.   CGetMem := Malloc(Size + sizeof(ptruint));
  77.   if (CGetMem <> nil) then begin
  78.     Pptruint(CGetMem)^ := size;
  79.     Inc(CGetMem, sizeof(ptruint));
  80.   end;
  81.   {$endif}
  82. end;
  83.  
  84. function CFreeMem(P: pointer): ptruint;
  85. begin
  86.   {$if not defined(Windows)}
  87.   if (p <> nil) then
  88.     Dec(p, sizeof(ptruint));
  89.   {$endif}
  90.   Free(P);
  91.   CFreeMem := 0;
  92. end;
  93.  
  94. function CFreeMemSize(p: pointer; Size: ptruint): ptruint;
  95.  
  96. begin
  97.   if size <= 0 then
  98.     exit;
  99.   {$if not defined(Windows)}
  100.   if (p <> nil) then begin
  101.     if (size <> Pptruint(p - sizeof(ptruint))^) then
  102.       runerror(204);
  103.   end;
  104.   {$endif}
  105.   CFreeMemSize := CFreeMem(p);
  106. end;
  107.  
  108. function CAllocMem(Size: ptruint): Pointer;
  109. begin
  110.   {$if defined(Windows)}
  111.   CAllocMem := calloc(Size, 1);
  112.   {$else}
  113.   CAllocMem := calloc(Size + sizeof(ptruint), 1);
  114.   if (CAllocMem <> nil) then begin
  115.     Pptruint(CAllocMem)^ := size;
  116.     Inc(CAllocMem, sizeof(ptruint));
  117.   end;
  118.   {$endif}
  119. end;
  120.  
  121. function CReAllocMem(var p: pointer; Size: ptruint): Pointer;
  122. begin
  123.   {$if defined(WINDOWS)}
  124.   p := Realloc(p, size);
  125.   CReAllocMem := p;
  126.   {$else}
  127.   if size = 0 then begin
  128.     if p <> nil then begin
  129.       Dec(p, sizeof(ptruint));
  130.       Free(p);
  131.       p := nil;
  132.     end;
  133.   end
  134.   else begin
  135.     Inc(size, sizeof(ptruint));
  136.     if p = nil then
  137.       p := malloc(Size)
  138.     else begin
  139.       Dec(p, sizeof(ptruint));
  140.       p := realloc(p, size);
  141.     end;
  142.     if (p <> nil) then begin
  143.       Pptruint(p)^ := size - sizeof(ptruint);
  144.       Inc(p, sizeof(ptruint));
  145.     end;
  146.   end;
  147.   {$endif}
  148.   CReAllocMem := p;
  149. end;
  150.  
  151. function CMemSize(p: pointer): ptruint;
  152.  
  153. begin
  154.   {$if defined(WINDOWS)}
  155.   CMemSize := _MemSize(p);
  156.   {$else}
  157.   CMemSize := Pptruint(p - sizeof(ptruint))^;
  158.   {$endif}
  159. end;
  160.  
  161. function CGetHeapStatus: THeapStatus;
  162.  
  163. var res: THeapStatus;
  164.  
  165. begin
  166.   fillchar(res, sizeof(res), 0);
  167.   CGetHeapStatus := res;
  168. end;
  169.  
  170. function CGetFPCHeapStatus: TFPCHeapStatus;
  171.  
  172. begin
  173.   fillchar(CGetFPCHeapStatus, sizeof(CGetFPCHeapStatus), 0);
  174. end;
  175.  
  176. const
  177.   CMemoryManager: TMemoryManager =
  178.     (
  179.     NeedLock: False;
  180.     GetMem: @CGetmem;
  181.     FreeMem: @CFreeMem;
  182.     FreememSize: @CFreememSize;
  183.     AllocMem: @CAllocMem;
  184.     ReallocMem: @CReAllocMem;
  185.     MemSize: @CMemSize;
  186.     InitThread: nil;
  187.     DoneThread: nil;
  188.     RelocateHeap: nil;
  189.     GetHeapStatus: @CGetHeapStatus;
  190.     GetFPCHeapStatus: @CGetFPCHeapStatus;
  191.     );
  192.  
  193. var
  194.   OldMemoryManager: TMemoryManager;
  195.  
  196. initialization
  197.   GetMemoryManager(OldMemoryManager);
  198.   SetMemoryManager(CMemoryManager, mmtCmem, nil, @FinalizeCMem);
  199.  
  200. Finalization
  201.   SetMemoryManager (OldMemoryManager);
  202.  
  203. end.
  204.  

2° Usage of MemoryManager.memsize  :

@PascalDragon
In general, but focussing on cmem:
If you leave out the size, Lazarus doesn't work - but Fpc does! -. Try it yourself using cmem as a template, simply rip out the silly size management.
So Lazarus relies on implementation detail.

Could any of you 2 indicate where in lazarus memsize is wrongly used, please, and how those limited number of places do affect Lazarus. Also how would ansistring work in FPC without MemSize ?

MemSize MUST IMPERATIVELY BE IMPLEMENTED CORRECTLY in any memory manager otherwise the
following FPC routines would never work. They are
- fpc_AnsiStr_SetLength
- fpc_UnicodeStr_SetLength
- fpc_WideStr_SetLength

The least one can say is these are pretty commonly used ...

I mention them because I did a few changes (3.0.6) in these places to put all memory manager (that is heap.inc and cmem) on equal footing regarding string pre-overallocation.
Title: Re: Why CMem?
Post by: PascalDragon on August 21, 2019, 09:43:51 am
2° Usage of MemoryManager.memsize  :

@PascalDragon
In general, but focussing on cmem:
If you leave out the size, Lazarus doesn't work - but Fpc does! -. Try it yourself using cmem as a template, simply rip out the silly size management.
So Lazarus relies on implementation detail.

Could any of you 2 indicate where in lazarus memsize is wrongly used, please, and how those limited number of places do affect Lazarus. Also how would ansistring work in FPC without MemSize ?
I don't know what is failing, I have not tried. It could even be that it's not MemSize that is the problem, but that some code tries to simply access the hidden size field. Either Thaddy has more info or you could play around with it yourself.
Title: Re: Why CMem?
Post by: Thaddy on August 21, 2019, 10:01:22 am
[I don't know what is failing, I have not tried. It could even be that it's not MemSize that is the problem, but that some code tries to simply access the hidden size field. Either Thaddy has more info or you could play around with it yourself.
The info that I have is that a memory manager implementation for Delphi doesn't need to store length info.
When I subsequently rip out the length information from cmem in Freepascal that manager seems to work in Freepascal, but not in Lazarus.

Further: many platforms have other means to obtain the size of an allocated block, such OS's store the size themselves.
Examples are
- Windows  _msize
- OSX, malloc_size
- Linux and BSD's malloc_usable_size (This is a GNU extension)

So at least for those platforms there is technically no need to store the size, although they store the size of the allocated block and not the actual used length.
Freepascal stores length at negative offset from the variable if applicable, so that should be enough. Should not be done in the memory manager.

Once I can drop the size, the memory manager can also have better alignment.

An experimental example (linux):
Code: Pascal  [Select][+][-]
  1. unit fastcmem;  
  2. {$mode delphi}
  3. interface  
  4. // slightly faster than cmem, all size management code removed.
  5. Const  
  6.   LibName = 'libc';  
  7.  
  8. Function Malloc (Size : ptrint) : Pointer;  
  9.   cdecl; external LibName name 'malloc';  
  10. Procedure Free (P : pointer);  
  11.   cdecl; external LibName name 'free';  
  12. function ReAlloc (P : Pointer; Size : ptrint) : pointer;  
  13.   cdecl; external LibName name 'realloc';  
  14. Function CAlloc (unitSize,UnitCount : ptrint) : pointer;  
  15.   cdecl; external LibName name 'calloc';  
  16. Function MemSizeUsed(p:pointer):ptrint;
  17.   cdecl; external libname name 'malloc_usable_size';
  18.  
  19. implementation  
  20. Function CGetMem  (Size : ptruint) : Pointer;  
  21. begin  
  22.   CGetMem:=Malloc(Size);  
  23. end;  
  24.  
  25. Function CFreeMem (P : pointer) : ptruint;  
  26. begin  
  27.   Free(P);  
  28.   CFreeMem:=memsizeused(p);  
  29. end;  
  30.  
  31. Function CFreeMemSize (P : pointer;size:ptruint) : ptruint;  
  32. begin  
  33.   Free(p);
  34.   CFreeMemSize:=MemSizeUsed(p);  
  35. end;  
  36.  
  37.  
  38. Function CAllocMem(Size : ptruint) : Pointer;  
  39. begin  
  40.   CAllocMem:=calloc(Size,1);  
  41. end;  
  42.  
  43. Function CReAllocMem (const p:pointer;Size:ptruint):Pointer;  
  44. begin  
  45.   CReAllocMem:=realloc(p,size);    
  46. end;  
  47.  
  48. Function CMemSize (p:pointer): ptruint;  
  49. begin  
  50.   CMemSize:=memsizeused(p);
  51. end;  
  52.  
  53. function CGetHeapStatus:THeapStatus;  
  54. begin  
  55.   CGetHeapStatus :=Default(THeapStatus);  
  56. end;  
  57.  
  58. function CGetFPCHeapStatus:TFPCHeapStatus;  
  59. begin  
  60.   CGetFPCHeapStatus:=Default(TFPCHeapStatus);  
  61. end;  
  62.  
  63.  
  64. Const  
  65.  CMemoryManager : TMemoryManager =  
  66.     (  
  67.       NeedLock : false;  
  68.       GetMem : @CGetmem;  
  69.       FreeMem : @CFreeMem;  
  70.       FreememSize : @CfreeMemSize;  
  71.       AllocMem : @CAllocMem;  
  72.       ReallocMem : @CReAllocMem;  
  73.       MemSize : @CMemSize;  
  74.       InitThread : Nil;  
  75.       DoneThread : Nil;  
  76.       RelocateHeap : Nil;  
  77.       GetHeapStatus :@CGetHeapStatus;
  78.       GetFPCHeapStatus:@CGetFPCHeapStatus;
  79.     );  
  80.  
  81. Var  
  82.   OldMemoryManager : TMemoryManager;  
  83.  
  84. Initialization  
  85.   OldMemoryManager:=Default(TMemoryManager);
  86.   GetMemoryManager (OldMemoryManager);  
  87.   SetMemoryManager (CmemoryManager);  
  88.  
  89. Finalization  
  90.   SetMemoryManager (OldMemoryManager);  
  91. end.

Fails with Lazarus....works with Freepascal.
Code: Pascal  [Select][+][-]
  1. program testcmem ;
  2. {$if not declared(useheaptrc)}uses fastcmem;{$endif}{$H+}{$I-}
  3. type
  4.   TMyrec=record
  5.   s:string;
  6.   end;
  7.   PMyRec=^TMyrec;
  8. var
  9.   P:array[0..99] of PMyRec;
  10.   i:integer;
  11. begin
  12.   //allocate some memory
  13.   for i := 0 to 99 do begin New(P[i]); writestr(P[i]^.s,'test me ',i);end;
  14.   // de-allocate
  15.   for i := 99 downto 0 do begin writeln(P[i]^.s);Dispose(P[i]);end;
  16. end.
   

Aside:
It is really true that modern C allocators are faster than the default Freepascal manager. and threadsafe without locking, even scale better: they use a.o. lockfree patterns.
In the past this was not the case. Now it is the case for almost any application, not only threaded ones.

 
Title: Re: Why CMem?
Post by: BrunoK on August 21, 2019, 01:02:19 pm
The info that I have is that a memory manager implementation for Delphi doesn't need to store length info.
When I subsequently rip out the length information from cmem in Freepascal that manager seems to work in Freepascal, but not in Lazarus.
Without knowing the Memsize of an assigned pointer, the ansistring, unicodestring and widestring would just NOT work in FPC, try recompiling FPC, any recent version without use of MemSize, you wont go very far.
Quote
So at least for those platforms there is technically no need to store the size, although they store the size of the allocated block and not the actual used length.
That is exactly what happens (and is exploited by FPC) in heap.inc. FPC system unit is not clean because it assumes that memory blocks are allocated rounded up to 16 bytes (see note in astrings.inc.NewAnsiString  that goes like   { request a multiple of 16 because the heap manager alloctes anyways chunks of 16 bytes } ). The comment itself is a lie because newansistring does not round up its request to upper 16 bytes bound, this is something I changed in my 3.0.6.

Heap.inc makes quite a few abusive assumptions about growing blocks. As an example when TFPList and siblings expand it may very well be that allocating additional memory isn't requested because Heap.inc has already decided to give much more memory, and it gets worse and worse as list grows.
Quote
Once I can drop the size, the memory manager can also have better alignment.
Except for special case, aligning on SizeOf(PtrUInt)  is quite sufficient.

Fails with Lazarus....works with Freepascal.
Code: Pascal  [Select][+][-]
  1. program testcmem ;
  2. {$if not declared(useheaptrc)}uses fastcmem;{$endif}{$H+}{$I-}
  3. type
  4.   TMyrec=record
  5.   s:string;
  6.   end;
  7.   PMyRec=^TMyrec;
  8. var
  9.   P:array[0..99] of PMyRec;
  10.   i:integer;
  11. begin
  12.   //allocate some memory
  13.   for i := 0 to 99 do begin New(P[i]); writestr(P[i]^.s,'test me ',i);end;
  14.   // de-allocate
  15.   for i := 99 downto 0 do begin writeln(P[i]^.s);Dispose(P[i]);end;
  16. end.
   
Works without a glitch on my W10 system (heap.inc or cmem and with/without my heaptrc  new stuff).
I don't know what is failing, I have not tried. It could even be that it's not MemSize that is the problem, but that some code tries to simply access the hidden size field. Either Thaddy has more info or you could play around with it yourself.
I unsderstand well what you suggest, (And imagine very well doing a hack myself :-) ) but that would be like searching for a needle in a haystack.
If Grumpy has an  example of failed program in lazarus, specially what units are used, it would help to spot a potential hack.
I've rebuild fpc and lazarus with cmem and the -msize change and my lazarus works OK.

heap.inc is stable, so it is good for general use. Also, since it is written in free pascal, one can examine the code and rebuild it.
Title: Re: Why CMem?
Post by: k1ng on August 21, 2019, 01:11:31 pm
[I don't know what is failing, I have not tried. It could even be that it's not MemSize that is the problem, but that some code tries to simply access the hidden size field. Either Thaddy has more info or you could play around with it yourself.
The info that I have is that a memory manager implementation for Delphi doesn't need to store length info.
When I subsequently rip out the length information from cmem in Freepascal that manager seems to work in Freepascal, but not in Lazarus.

Report it to Lazarus bugtracker? >:D

Once I can drop the size, the memory manager can also have better alignment.
It is really true that modern C allocators are faster than the default Freepascal manager. and threadsafe without locking, even scale better: they use a.o. lockfree patterns.
In the past this was not the case. Now it is the case for almost any application, not only threaded ones.

So how about submitting your changes for FPC trunk? So everyone can profit from it in the future  ;D
Title: Re: Why CMem?
Post by: Thaddy on August 21, 2019, 01:16:02 pm
[Without knowing the Memsize of an assigned pointer, the ansistring, unicodestring and widestring would just NOT work in FPC, try recompiling FPC, any recent version without use of MemSize, you wont go very far.
As you can see from my code, I implemented it.
Quote
That is exactly what happens (and is exploited by FPC) in heap.inc. FPC system unit is not clean because it assumes that memory blocks are allocated rounded up to 16 bytes (see note in astrings.inc.NewAnsiString  that goes like   { request a multiple of 16 because the heap manager alloctes anyways chunks of 16 bytes } ). The comment itself is a lie because newansistring does not round up its request to upper 16 bytes bound, this is something I changed in my 3.0.6.
Yes. Indeed if I follow you correctly.
Quote
Heap.inc makes quite a few abusive assumptions about growing blocks. As an example when TFPList and siblings expand it may very well be that allocating additional memory isn't requested because Heap.inc has already decided to give much more memory, and it gets worse and worse as list grows.
Quote
Quote
Once I can drop the size, the memory manager can also have better alignment.
Except for special case, aligning on SizeOf(PtrUInt)  is quite sufficient.
Not the case on modern processors, depending on optimizations (sse, mmx,avx etc on x86 and vppX on arm.

About examples: any Lazarus project will fail with my unit on Linux but seems to work (with _msize) on windows.
My freepascal example simply works, even with types that store length.
Title: Re: Why CMem?
Post by: BrunoK on August 21, 2019, 01:18:43 pm
Actually I found at least 2 lines that could cause troubles they are
Code: Pascal  [Select][+][-]
  1. D:\fpc-laz-asus\Lazarus\laz-svn-trunk\components\sparta\generics\source\inc\generics.dictionaries.inc
  2. components\sparta\generics\source\inc\generics.dictionaries.inc (278,40) Result := PSizeInt(PByte(@((@Self)^))-SizeOf(SizeInt))^;
  3. components\sparta\generics\source\inc\generics.dictionaries.inc (1318,42) Result := SizeInt((@PByte(@((@Self)^))[-SizeOf(SizeInt)])^);
  4.  
Title: Re: Why CMem?
Post by: Thaddy on August 21, 2019, 01:19:35 pm
 :) Yes, I found these too.... these are quite easy to patch, though. And indeed that makes my fastcmem fail too.(except on systems that also use negative offset, which happens to be the case)

@hnb
Maciej,  if you read this, you need to fix that anyway, because it relies on implementation detail of the memory manager.
Title: Re: Why CMem?
Post by: marcov on August 21, 2019, 03:30:02 pm
Where to start.

1° cmem on windows. As Grumpy writes _msize can be used to retrieve the size of the allocated memory instead of having a header sizeuint containing the size.
Note that heap.inc has a similar approach in its own structures.
cmem using _msize. Tested with rebuilding FPC 3.0.4 latest (3.0.6 in my private branch) and rebuilding a relatively recent lazarus.

This would make the cmem interface variable. If for some reason you need a specific windows version of CMEM, just use make a windows specific unit of cmem. (WCmem)

 
Quote
MemSize MUST IMPERATIVELY BE IMPLEMENTED CORRECTLY in any memory manager otherwise the
following FPC routines would never work. They are
- fpc_AnsiStr_SetLength
- fpc_UnicodeStr_SetLength
- fpc_WideStr_SetLength

The least one can say is these are pretty commonly used ...

I mention them because I did a few changes (3.0.6) in these places to put all memory manager (that is heap.inc and cmem) on equal footing regarding string pre-overallocation.

And there you hit a problem that these are often used functions which a procvar would slow down. Which is probably why it is as it is now. (just doesn't show up in the simplistic benchmarks)
Title: Re: Why CMem?
Post by: Thaddy on August 21, 2019, 03:50:49 pm
[And there you hit a problem that these are often used functions which a procvar would slow down. Which is probably why it is as it is now. (just doesn't show up in the simplistic benchmarks)
It also shows up in quite complex benchmarks. cmem's as they are now are much more advanced than a simple heap manager as it was before ~2008-2010. I fell into this trap of ceteris paribus. cmem's are not being equal.
And yes, it would require interface and implementation.inc per platform as per the definitions I found for the three major platforms that have equivalents for memsize.
Title: Re: Why CMem?
Post by: BrunoK on August 21, 2019, 04:45:08 pm
This would make the cmem interface variable. If for some reason you need a specific windows version of CMEM, just use make a windows specific unit of cmem. (WCmem)
Not variable, only fully complying with TMemoryManager requirements as defined in the WIKI.
See my post at August 20, 2019, 03:09:15 pm. This is actually the cmem I'm testing in my working copy of FPC / Lazarus.
And there you hit a problem that these are often used functions which a procvar would slow down. Which is probably why it is as it is now. (just doesn't show up in the simplistic benchmarks)
There I really don't understand what is the problem with procvar, can you explain please.

I have now in my version of FPC astrings.inc these bits of code that replace the unfair treatment off non heap.inc memory manager  (and the same for unicodestring and widestring).
Code: Pascal  [Select][+][-]
  1. const
  2.   cChunkRoundUp  = SizeUInt(15);   // Chunk up to 16 bytes not including string terminator
  3.   cChunkRounding = SizeUInt(not 15);
  4.  
then
Code: Pascal  [Select][+][-]
  1. Function NewAnsiString(Len : SizeInt) : Pointer;
  2. {
  3.   Allocate a new AnsiString on the heap.
  4.   initialize it to zero length and reference count 1.
  5. }
  6. Var
  7.   P : Pointer;
  8.   lAllocSize : SizeInt;
  9. begin
  10.   { request a multiple of 16 because the heap manager alloctes anyways chunks of 16 bytes }
  11.   { ~bk 18.08.?? force allocation by multiple of 16 including 2 bytes for
  12.                  null (UTF-16 string terminator). Put alternate memory manager
  13.                  on equal footing as heap.inc }
  14.   // GetMem(P,Len+(AnsiFirstOff+sizeof(char)));
  15.   lAllocSize := (Len+AnsiFirstOff+cChunkRoundUp+2) and cChunkRounding;
  16.   GetMem(P, lAllocSize);
  17.   If P<>Nil then begin
  18.      PAnsiRec(P)^.Ref:=1;         { Set reference count }
  19.      PAnsiRec(P)^.Len:=0;         { Initial length }
  20.      PAnsiRec(P)^.CodePage:=DefaultSystemCodePage;
  21.      PAnsiRec(P)^.ElementSize:=SizeOf(AnsiChar);
  22.      inc(p,AnsiFirstOff);         { Points to string now }
  23.      PAnsiChar(P)^    :=#0;       { Terminating #0 }
  24.      PAnsiChar(P + 1)^:=#0;       { additional in case aRawString is UTF-16}
  25.   end;
  26.   NewAnsiString:=P;
  27. end;
  28.  
and where MemoryManager.MemSize is called :
Code: Pascal  [Select][+][-]
  1. procedure fpc_AnsiStr_SetLength(var S: RawByteString;
  2.   l: SizeInt{$ifdef FPC_HAS_CPSTRING}; cp: TSystemCodePage{$endif FPC_HAS_CPSTRING});
  3.   [public , alias: 'FPC_ANSISTR_SETLENGTH']; compilerproc;
  4. {
  5.   Sets The length of string S to L.
  6.   Makes sure S is unique, and contains enough room.
  7. }
  8. ...some code ...
  9.     else if PAnsiRec(Pointer(S) - AnsiFirstOff)^.Ref = 1 then begin
  10.       Temp := Pointer(s) - AnsiFirstOff;
  11.       lens := MemSize(Temp);
  12.       lena := AnsiFirstOff + L + 2; // 2 for possible UTF-16
  13.       { allow shrinking string if that saves at least half of current size }
  14.       if (lena > lens) or ((lens > 32) and (lena <= (lens div 2))) then begin
  15.         lena := (lena + cChunkRoundUp) and cChunkRounding;
  16.         if lena<>lens then begin
  17.           reallocmem(Temp, lena);
  18.           Pointer(S) := Temp + AnsiFirstOff;
  19.         end;
  20.       end;
  21.     end
  22.  
The idea about the double terminating #0 is that maybe it could be possible to unify ansistring and unicodestring in the future. Anyway the code cost is marginal relative to the rest and  preallocating all strings with the same overallocation is perfectly alright.
In the task manager,  my tests show for lazarus that memory used with heap.inc or cmem.pp does not differ much (maybe a bit cheaper with cmem).

I'm currently modifying the compiler to accept the new following compiler directive to put on the command line. (my current version accepts already a -dcmem switch) :
Code: Pascal  [Select][+][-]
  1.         { Support for alternate MemoryManager (not the heap.inc default)
  2.           Note to myself : more should contain 'm'<MemoryManager_UnitName> ~bk
  3.                                             or 'm:='<MemoryManager_UnitName> }
  4.  
and if successful will attempt to create a winheap memory manager that will use the windows HeapAlloc / HeapReAlloc / HeapFree / HeapSize that will conform to TMemoryManager requirements.
Title: Re: Why CMem?
Post by: PascalDragon on August 23, 2019, 09:30:53 am
Actually I found at least 2 lines that could cause troubles they are
Code: Pascal  [Select][+][-]
  1. D:\fpc-laz-asus\Lazarus\laz-svn-trunk\components\sparta\generics\source\inc\generics.dictionaries.inc
  2. components\sparta\generics\source\inc\generics.dictionaries.inc (278,40) Result := PSizeInt(PByte(@((@Self)^))-SizeOf(SizeInt))^;
  3. components\sparta\generics\source\inc\generics.dictionaries.inc (1318,42) Result := SizeInt((@PByte(@((@Self)^))[-SizeOf(SizeInt)])^);
  4.  
Good find. :o That should indeed be fixed. Preferably before we release 3.2 which contains that as well...
Title: Re: Why CMem?
Post by: BrunoK on August 23, 2019, 11:32:49 am
Actually I found at least 2 lines that could cause troubles they are
Code: Pascal  [Select][+][-]
  1. D:\fpc-laz-asus\Lazarus\laz-svn-trunk\components\sparta\generics\source\inc\generics.dictionaries.inc
  2. components\sparta\generics\source\inc\generics.dictionaries.inc (278,40) Result := PSizeInt(PByte(@((@Self)^))-SizeOf(SizeInt))^;
  3. components\sparta\generics\source\inc\generics.dictionaries.inc (1318,42) Result := SizeInt((@PByte(@((@Self)^))[-SizeOf(SizeInt)])^);
  4.  
Good find. :o That should indeed be fixed. Preferably before we release 3.2 which contains that as well...
bk edit : bullshit -> ignore please
Attached 2 possible (untested) patches.
1° for lazarus trunk
2° for FPC trunk
Title: Re: Why CMem?
Post by: PascalDragon on August 23, 2019, 03:54:01 pm
Actually I found at least 2 lines that could cause troubles they are
Code: Pascal  [Select][+][-]
  1. D:\fpc-laz-asus\Lazarus\laz-svn-trunk\components\sparta\generics\source\inc\generics.dictionaries.inc
  2. components\sparta\generics\source\inc\generics.dictionaries.inc (278,40) Result := PSizeInt(PByte(@((@Self)^))-SizeOf(SizeInt))^;
  3. components\sparta\generics\source\inc\generics.dictionaries.inc (1318,42) Result := SizeInt((@PByte(@((@Self)^))[-SizeOf(SizeInt)])^);
  4.  
Good find. :o That should indeed be fixed. Preferably before we release 3.2 which contains that as well...
Attached 2 possible (untested) patches.
1° for lazarus trunk
2° for FPC trunk
I just checked the original code. They're a false alarm: they rely on the internal structure of dynamic arrays, not on the size of the allocated memory. So nothing to see here...
Title: Re: Why CMem?
Post by: BrunoK on August 23, 2019, 04:36:24 pm
I just checked the original code. They're a false alarm: they rely on the internal structure of dynamic arrays, not on the size of the allocated memory. So nothing to see here...
I rushed a bit to fast on these because they looked so much like hacking getting the size of a memory block. One gets confused pretty easily with indirected indirect pointers.
Title: Re: Why CMem?
Post by: Thaddy on August 23, 2019, 05:27:06 pm
So? it looks like it is possible to rip out the size?
Title: Re: Why CMem?
Post by: PascalDragon on August 24, 2019, 10:55:47 am
One gets confused pretty easily with indirected indirect pointers.
I hear you :-[

So? it looks like it is possible to rip out the size?
Did you test BrunoK's adjusted cmem without the size with Lazarus? Cause maybe it's a package you have installed in Lazarus which is misbehaving there.
Title: Re: Why CMem?
Post by: BrunoK on September 19, 2019, 12:20:38 pm
So? it looks like it is possible to rip out the size?
Did you test BrunoK's adjusted cmem without the size with Lazarus? Cause maybe it's a package you have installed in Lazarus which is misbehaving there.
Just hanging around memory allocation, at least on windows, I have run some tests about memory block aligments supplied by heap.inc / cmem / winheap (see rtl\win\winheap.pp unit attached). 

Actually Grumpy may have a point, specifically on windows/cmem, that having the parasitic ptruint just in front the the allocated memory breaks alignment for structures expecting 8 bytes / 16 bytes (128 bits) alignment.

Testing the pointers returned by heap.inc / cmem / winheap (starting a modified version of Lazarus with test code in heap.inc) WITHOUT the additional ptruint gives the following alignments :
CPU32 -> pointer to memory block always aligned at least on multiple of 8 bytes boundaries.
CPU64 -> pointer to memory block always aligned at least on multiple of 16 bytes boundaries.

What that means is that getting a blocks for 128 bit structures that are sometimes expected on some i86_64 (and MMX instuctions whatever it means) processors are wrongly aligned when using cmem with the heading ptruint.

See https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 for requirement for 16 byte alignment.

It should be checked on other intel OSes (linux etc ..) how their malloc/realloc align the heap memory blocks and then correct the cmem code to get rid of the leading "size as ptruint" when a equivalent to window's _msize exists.

About attachements  :
winheap.pp in rtl/win may work when defined as first unit in project and buildrtl has been modified.
cmem is there just to give an idea of how to change non windows code but wont work straight out because in my FPC the memory manager is loaded BEFORE system (compiler changes) and some modifications in heap.inc must be done to correctly initialize / finalize code.

If some developer wants to get the details, tell me where to post the changes I made to get things running correctly.

TinyPortal © 2005-2018