Recent

Author Topic: Why loaded units are TLinkedList and not LinkedDictionary  (Read 2589 times)

daniel_sap

  • New Member
  • *
  • Posts: 19
Why loaded units are TLinkedList and not LinkedDictionary
« on: March 15, 2021, 07:57:27 am »
Hi,

I am porting Delphi app to Lazarus and see that it takes a little longer to compile compared to Delphi.
I opened the ppcx64 project from compiler directory and start debugging the compiling of my project.

One thing I noticed is that all units that are loaded are kept in
loaded_units: TLinkedList (in fmodule.pas)

Procedure addloadedunit (in fmodule.pas) is used to add a unit to loaded_units. Called by registerunit (in fppu.pas)
It is fast to add item in linked list. No reallocation and move of memory.

There are some methods that iterates through loaded_units linked list
resolve_unit (in fmodule.pas)
registerunit (in fppu.pas)

This registerunit function is called every time a unit is read/loaded (readloadunit in fppu.pas)
and it iterates through all linked list items of loaded_units.
In many cases this iterates through all items without finding anything (every time a new unit is loaded)
If there are 1000 units in project it will do 1000*500 unnecesary iterations. But if you have more this grows exponentially.
Generally registerunit is called when a unit is in uses clause of compiling unit. This means that it will do these iterations not only when unit is loaded for first time.
It is called every time the unit is used. Which makes the number of iterations even more. In this case it will not iterate through all but still.
Cause registerunit is searching for module object by given unit name for me it looks natural to use Dictionary.
Why to make so many unnecessary iterations.

However, I suppose that Linked list is preferred for faster add and remove from loaded_units avoiding the memory reallocation;
In java I typically use LinkedHashMap class which combines both of the strengths of LinkedList and Dictionary.

My question is, cause I don't have much experience with compiler code.
Am I thinking right? Is there something that I'm missing.
Can this functionality be improved? How can this happen?
Is there such thing like LinkedHashMap in free pascal?

Regards and thank you
« Last Edit: March 15, 2021, 08:36:12 am by daniel_sap »

440bx

  • Hero Member
  • *****
  • Posts: 2469
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #1 on: March 15, 2021, 08:42:40 am »
If there are 1000 units in project it will do 1000*500 unnecesary iterations. But if you have more this grows exponentially.
Since I haven't looked in detail at the entire mechanism FPC uses to keep track of units, I could be wrong about what follows but, units are scoped, therefore the number of units the compiler has to keep track of for either another unit or the program itself should normally be small.  That is often _not_ a large number of units and, in many cases, it should be a small number, usually 10 or less (unit names in scope, that is, listed in the unit's "uses" clause.)  Because of that, it really doesn't make much sense to use an elaborate structure to keep track of them, IOW, the total time spent adding and searching the units in a linear structure will often be as fast or faster than the total time used by a more sophisticated structure/algorithm.

Now, if one unit or program _directly_ uses 1000 units, (that is they are all in the same scope), then yes, the search is going to be slow but, if I saw a unit or program listing 1000 units in its "uses" clauses, I wouldn't even bother to read the code.

HTH.
FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 9535
  • FPC developer.
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #2 on: March 15, 2021, 09:23:29 am »
1. TDictionary is new since last 3.2.0 last may
2. Compiler can only use RTL, not packages.

That is not a whole excuse, quite often scaled down container types are brought into the compiler when there is need.

IOW usually historic reasons.
« Last Edit: March 15, 2021, 09:32:13 am by marcov »

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3962
  • I like bugs.
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #3 on: March 15, 2021, 09:53:52 am »
That is not a whole excuse, quite often scaled down container types are brought into the compiler when there is need.
This interests me because I optimized Lazarus IDE code recently.
I wrote in Laz mailing list about the tools "Profiling with Valgrind and KCacheGrind".
It makes a nice pair of tools.
How much are people profiling FPC? I may do it at some point. The results of profiling often surprise. The solutions to optimize are not always obvious though.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 899
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #4 on: March 15, 2021, 11:31:20 am »
I profile the compiler occasionally. There are seldom specific places where a lot of time is spent. It obviously also depends on the source code you are compiling. Compiling the MacOSAll unit (which contains almost only declarations and barely any code) gives a quite different result than compiling the compiler itself. Most performance gains until now were actually by adding prefetch intrinsics in places where we walk over large linked lists (but that only helps if the loop does a lot of work, so the prefetch has time to complete before the next element is fetched).

daniel_sap

  • New Member
  • *
  • Posts: 19
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #5 on: March 15, 2021, 06:14:38 pm »
If there are 1000 units in project it will do 1000*500 unnecesary iterations. But if you have more this grows exponentially.
Since I haven't looked in detail at the entire mechanism FPC uses to keep track of units, I could be wrong about what follows but, units are scoped, therefore the number of units the compiler has to keep track of for either another unit or the program itself should normally be small.  That is often _not_ a large number of units and, in many cases, it should be a small number, usually 10 or less (unit names in scope, that is, listed in the unit's "uses" clause.)  Because of that, it really doesn't make much sense to use an elaborate structure to keep track of them, IOW, the total time spent adding and searching the units in a linear structure will often be as fast or faster than the total time used by a more sophisticated structure/algorithm.

Now, if one unit or program _directly_ uses 1000 units, (that is they are all in the same scope), then yes, the search is going to be slow but, if I saw a unit or program listing 1000 units in its "uses" clauses, I wouldn't even bother to read the code.

HTH.

There are a few units with around 100 uses. Most of the units are very compact with just a few units in the uses clause.
However, in the loaded_units list i have thousands of loaded units, which makes me think that this could be improved.
I didn't check how much time exactly is lost in this iterations, just see they are a lot.

I started the debugging of the code cause in Delphi the same program compiles for a minute or so, and here the compilation takes like 7 times more.
I try with quick compile, but sometimes I face issues with that. May be because I use the code of the trunk.
However, I will be happy to save more time on compilation and that's why I started to dig deeper in the compiler code and make notices about the potential issues.
As I am new to the code, it is possible to make wrong assumptions, that's why I write here in the forum.

To visualize better the issue I attached a screen shot. There I put a breakpoint in the addloadedunit procedure with hitcount 200 in the breakpoint properties.

I profile the compiler occasionally. There are seldom specific places where a lot of time is spent. It obviously also depends on the source code you are compiling. Compiling the MacOSAll unit (which contains almost only declarations and barely any code) gives a quite different result than compiling the compiler itself. Most performance gains until now were actually by adding prefetch intrinsics in places where we walk over large linked lists (but that only helps if the loop does a lot of work, so the prefetch has time to complete before the next element is fetched).

I was wandering if I can profile the compilation also. May be you use some tool and give me some advice how can I proceed with that.

1. TDictionary is new since last 3.2.0 last may
2. Compiler can only use RTL, not packages.

That is not a whole excuse, quite often scaled down container types are brought into the compiler when there is need.

IOW usually historic reasons.
Good to know.
I suppose that the TDictionary will be fast for searching but slow for adding and removing.
However, I was thinking about something like LinkedHashMap which will make the adding and removing very fast, also the searching very fast.
I see implementation of Dictionary in cclasses.pas which looks a little specific for concrete purpose. Looks like it locates symbols by name - TFPHashList.
I will check this TFPHashMap thoroughly, may be this could be a good foundation for LinkedHashMap.
As far as I know some kind of equivalent of LinkedHashMap in Delphi is TStringList. Need to check if it is fast enough on adding and removing single items. May be with beginUpdate is fast but for single item could be slower.

I'm planing to change the
loaded_units      : tlinkedlist; { All loaded units }
to
loaded_units      : TLinkedHasMap; { All loaded units }

However, I'm not deep in the code and don't feel so sure in my assumptions. That's one of the reasons I asked here if someone knows better and explain the reason why TLinkedList is used.
And also if someone knows some good LinkedHashMap equivalent for FPC
« Last Edit: March 15, 2021, 06:20:26 pm by daniel_sap »

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 899
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #6 on: March 15, 2021, 09:48:32 pm »
Quote
I was wandering if I can profile the compilation also. May be you use some tool and give me some advice how can I proceed with that.
I do this under macOS. I compile the compiler with debug information and then run it with the "instruments" tool with the "Time Profile" template. That temlpate samples the process every 1ms and generates a profile based on that.

ASBzone

  • Hero Member
  • *****
  • Posts: 614
  • Automation leads to relaxation...
    • Free Console Utilities for Windows (and a few for Linux) from BrainWaveCC
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #7 on: March 15, 2021, 09:56:32 pm »
Quote
I was wandering if I can profile the compilation also. May be you use some tool and give me some advice how can I proceed with that.
I do this under macOS. I compile the compiler with debug information and then run it with the "instruments" tool with the "Time Profile" template. That temlpate samples the process every 1ms and generates a profile based on that.

@Jonas, is there any value to profiling the compiler on multiple OSes/platforms?
-ASB: https://www.BrainWaveCC.com/

Lazarus v2.0.13 r64843 / FPC v3.2.1-r49055 (via FpcUpDeluxe) -- Windows 64-bit install w/Win32 and Linux/Arm cross-compiles
Primary System: Windows 10 Pro x64, Version 2009 (Build 19042)
Other Systems: Windows 10 Pro x64, Version 2009 (Build 19042) or greater

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 899
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #8 on: March 15, 2021, 10:02:18 pm »
Quote
I was wandering if I can profile the compilation also. May be you use some tool and give me some advice how can I proceed with that.
I do this under macOS. I compile the compiler with debug information and then run it with the "instruments" tool with the "Time Profile" template. That temlpate samples the process every 1ms and generates a profile based on that.

@Jonas, is there any value to profiling the compiler on multiple OSes/platforms?
There is some value in doing it on multiple architectures. Doing it on different OSes won't generally show much difference.

ASBzone

  • Hero Member
  • *****
  • Posts: 614
  • Automation leads to relaxation...
    • Free Console Utilities for Windows (and a few for Linux) from BrainWaveCC
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #9 on: March 15, 2021, 10:04:31 pm »
@Jonas, is there any value to profiling the compiler on multiple OSes/platforms?
There is some value in doing it on multiple architectures. Doing it on different OSes won't generally show much difference.

Thanks, @Jonas
-ASB: https://www.BrainWaveCC.com/

Lazarus v2.0.13 r64843 / FPC v3.2.1-r49055 (via FpcUpDeluxe) -- Windows 64-bit install w/Win32 and Linux/Arm cross-compiles
Primary System: Windows 10 Pro x64, Version 2009 (Build 19042)
Other Systems: Windows 10 Pro x64, Version 2009 (Build 19042) or greater

daniel_sap

  • New Member
  • *
  • Posts: 19
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #10 on: March 16, 2021, 11:24:56 am »
Thank you @Jonas for the info and @ASBzone for pointing to different platforms and architectures speeds.

I was wandering how you decide at the end if compilation is slow or fast.
Is there some criteria that can measure if compiler is in acceptable range of speed.

Because FAST is not exact definition.

We can say, for example:
- something is faster than the other
- no one complains about the speed
- i have an internal feeling that this is fast
- before it was much slower, now is faster
- everybody says it's fast
- fast enough is when there is no way to make it faster
- fast is because it's not slower than before

I profile the compiler occasionally. There are seldom specific places where a lot of time is spent.
I suppose you profile the compiler occasionally and checking if there is some slowness introduced by some new code.
« Last Edit: March 16, 2021, 11:48:42 am by daniel_sap »

Akira1364

  • Hero Member
  • *****
  • Posts: 543
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #11 on: March 22, 2021, 02:56:39 am »
I've long suspected that numerous parts of the compiler which currently use linked lists should really be using contiguous dynamic array (or at least get-memmed-pointer) based storage classes as opposed to linked lists, personally. It doesn't seem likely that it would default to linked lists in so many places if it were being written from scratch today, as the performance downside of linked lists are fairly well known nowadays.

daniel_sap

  • New Member
  • *
  • Posts: 19
Re: Why loaded units are TLinkedList and not LinkedDictionary
« Reply #12 on: March 26, 2021, 09:44:50 pm »
I've long suspected that numerous parts of the compiler which currently use linked lists should really be using contiguous dynamic array (or at least get-memmed-pointer) based storage classes as opposed to linked lists, personally. It doesn't seem likely that it would default to linked lists in so many places if it were being written from scratch today, as the performance downside of linked lists are fairly well known nowadays.

That's what I suspected too. However, I have some doubts, cause for example, for some architectures, may be, it could be better to have these linked lists.
For example where the memory is more important then the speed. In the past I had a few situations where there where no alternative than using LinkedLists.
However, it is possible to create a Class to handle "loaded units" differently for different architectures.

Here, I made a list of the code that uses loaded_units. It looks that most of the time it iterates the LinkedList to find some item by some id, name.
That's why I'm thinking about Dictionary(Map).
Also, a few dictionaries can be created, to serve the searching of items by unit_index, moduleId, name etc.
Another idea is to keep the items in the Dictionary and for removing the item not to remove it from the dictionary, just to mark it.
When you need to find an item by name, or id, it gets it from the dictionary and if it is marked it returns it. (this will help not to remove from dictionary, which is slow. some memory will be kept but this could be acceptable)
However, it looks good if it is set by default how loaded_units will be treated for different architectures, and some compiler option which will set user preferences for treating the loaded units.

In the following list all the uses of loaded_units is described.
There are 7 units where loaded_units is used. They are numbered inside round brackets (1), (2)...(7).
With minus sign the method in the unit is represented -addloadedunit. After it there is short description what it does in this method with the loaded_units

Code: Text  [Select][+][-]
  1. I. Uses of loaded_units
  2. ----------------
  3. (1) in fmodule.pas
  4. - addloadedunit
  5.   sets moduleid and adds the module in loaded_units
  6.  
  7. - find_module_from_symtable
  8.   iterates to find mudule in loaded_units by moduleid
  9.   Doesn't have to put in the name of the function the caller - 'from_symtable'
  10.  
  11. - get_module
  12.   iterates to find mudule in loaded_units by unit_index
  13.  
  14. - tmodule.updatemaps
  15.  
  16. - tmodule.resolve_unit
  17.   Iterates to resolve unit by name.
  18.  
  19. ----------------
  20. (2) in parser.pas
  21. - initparser
  22.   Create loaded_units
  23.  
  24. - doneparser
  25.   Free loaded_units
  26.  
  27. - compile
  28.   iterates all to remove all items except one
  29.  
  30. ----------------
  31. (3) in fppu.pas
  32. - tppumodule.reload_flagged_units
  33.   iteration
  34.   There is nice mechanism to check for generation needed ->defsgeneration.
  35.   However, this doesn't help if we have to iterate through all.
  36.  
  37. - registerunit
  38.   iteration
  39.  
  40. ----------------
  41. (3) in pmodules.pas
  42. - maybeloadvariantsunit
  43. iterate to search by name
  44. First searches for the unit in loaded_units.
  45. If not found searches again through all loaded_units. And finally adds it.
  46.  
  47. - MaybeRemoveResUnit
  48. iterate to search by name
  49. remove from list
  50.  
  51. - proc_package
  52. iterates all to do some checks with inner iterations (sequence not needed)
  53. iterates all to remove all unused units.(sequence not needed)
  54. iterates all to add implicitly imported and contains section units(sequence not needed)
  55.  
  56. - proc_program
  57. iterates all to remove(sequence not needed)
  58. iterates all to move to unloaded(sequence not needed)
  59. iterates all to remove(sequence not needed)
  60.  
  61. ----------------
  62. (4) in compiler/systems/t_win.pas
  63. - GlobalInitSysInitUnitName
  64. iterate (sequence not needed)
  65.  
  66. ----------------
  67. (5) in pkgutil.pas
  68. - processimportedsyms
  69. iterate to search by globalsymtable or localsymtable
  70. iterate to process imported syms
  71. iterate to process all symbols loaded by asm name.
  72.  
  73. ----------------
  74. (6) in compiler/systems/t_linux.pas
  75. - ModulesLinkToLibc
  76. iterate (sequence not needed)
  77.  
  78. ----------------
  79. (7) in compiler/systems/t_bsd.pas
  80. - ModulesLinkToLibc
  81. iterate (sequence not needed)
  82.  
« Last Edit: March 26, 2021, 09:53:56 pm by daniel_sap »

 

TinyPortal © 2005-2018