Recent

Author Topic: Dumping preprocessed working trees  (Read 1323 times)

Чебурашка

  • Hero Member
  • *****
  • Posts: 593
  • СЛАВА УКРАЇНІ! / Slava Ukraïni!
Dumping preprocessed working trees
« on: February 08, 2026, 08:21:07 pm »
Hello,
I have been painfully faced with legacy software developed over many decades that is in a state of high entropy. I would like to investigate whether I can reduce it and improve the internal organization (or at least understand if it can be done).

The software makes extensive use of a technique based on conditional compilation.

This method basically works as follows:

1. during compilation, I pass a -dMaster1

2. this Master1 define is expanded into a set of derived defines using a mapping materialized in a file for this purpose:
Code: Pascal  [Select][+][-]
  1. {$IFDEF Master1}
  2.    {$DEFINE _a1}
  3.    {$DEFINE _a2}
  4.    ..
  5. {$ENDIF}
  6. {$IFDEF Master2}
  7.    {$DEFINE _a1}
  8.    {$DEFINE _D2_s}
  9.    ..
  10. {$ENDIF}
  11.  

3. In the various source modules, the enabling of the necessary code portions is regulated in almost all cases by the derived defines, except sometimes when the Master1 define is also used directly:
Code: Pascal  [Select][+][-]
  1. {$IFDEF _a2}
  2.     procedure RunMasterMotor();
  3.     ...
  4. {$ENDIF}
  5.  
  6. {$IFDEF _a2}
  7.   RunMasterMotor();
  8. {$ENDIF}
  9.  

    The problem, of course, is that the code has become very difficult to read because of these countless {$IFDEF ...}.

    I would like to find a way to obtain the source code that derives from the application of the master define at compile time, so that I can then repeat the process for all existing master defines, which are about 150, and see how different the sources are from each other as the master define varies, to see if I can build macro groups that share the same (or very similar) code as a whole.

    I had thought of hooking into the compilation process, essentially stopping before the assembly generation, and converting the parsed structures in memory (and stripped of the code parts excluded from preprocessing) back into source files in separate directories.

    Does anyone have any suggestions?
    Thank you

    FPC 3.2.0/Lazarus 2.0.10+dfsg-4+b2 on Debian 11.5
    FPC 3.2.2/Lazarus 2.2.0 on Windows 10 Pro 21H2

    440bx

    • Hero Member
    • *****
    • Posts: 6159
    Re: Dumping preprocessed working trees
    « Reply #1 on: February 08, 2026, 08:56:51 pm »
    If I faced the problem you described, I would solve it as follows:

    Step 1.

    Write a scanner that tokenizes the source code.  The important part here is that I wouldn't try to interpret the {$ifdef ...} in this pass.   I'd simply build an array (or set of tables, your choice of implementation) consisting of all the tokens without any loss of information.  IOW, unlike a typical scanner that throws away stuff such as comments, this scanner would throw away absolutely nothing!. 

    Step 2.

    Write a customized parser that uses the set of tables produced in step 1. to (a) determine all the possible paths $ifdef based (you need that information) and (b) allow you to select a path based on a specific {$ifdef (one or more or all of them) and produce a textual output for that path.  This part (b) could handle a single path at a time (which keeps the code quite simple) or more than one path at a time.  One text file would be produce for each individual path selected.

    This may sound like a lot of work but it actually isn't.  The scanner part is a typical Pascal scanner and could easily be derived from FPC's scanner.  The difference is that the scanner would not be subservient to the parser (iow, it isn't the parser that pulls tokens), the scanner is independent and the parser gets to run only when all of the parsing is done and the token tables have been produced.

    The parser portion, could be greatly simplified because, the only part the parser really cares about is interpreting the {$ifdef ...} and identifying which code belongs to a specific {$ifdef ...}.  IOW, it doesn't really have to parse Pascal statements.

    Having the scanner produce an array of all the tokens without interpreting the tokens makes it very easy to test.  Once the scanning is done and, if no information whatsoever is discarded and, if it is correct then a subsequent step should be able to reproduce an exact duplicate of the scanned source code.   This doesn't guarantee the parsing step is bug free but, it makes writing verification code quite a bit simpler.

    HTH.
    FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

    Чебурашка

    • Hero Member
    • *****
    • Posts: 593
    • СЛАВА УКРАЇНІ! / Slava Ukraïni!
    Re: Dumping preprocessed working trees
    « Reply #2 on: February 08, 2026, 09:09:35 pm »
    If I faced the problem you described, I would solve it as follows:


    First, thank you for the answer, I will think some more of this, because it could be a way that deserves exploration, and could already give significant insights.

    However the main pain I foresee in this proposal comes from the fact that in many cases the "entropic" code in question, does not use {$IFDEF} as preprocessor directive, but {$IF ...} and the latter makes things more complicated due the support of complete booelan expressions in the condition part.
    FPC 3.2.0/Lazarus 2.0.10+dfsg-4+b2 on Debian 11.5
    FPC 3.2.2/Lazarus 2.2.0 on Windows 10 Pro 21H2

    Martin_fr

    • Administrator
    • Hero Member
    • *
    • Posts: 12209
    • Debugger - SynEdit - and more
      • wiki
    Re: Dumping preprocessed working trees
    « Reply #3 on: February 08, 2026, 09:09:58 pm »
    Well, you can compile them, then use a tool like objdump to extract line info from the exe (if you compiled with debug info). That would give you a list of all used lines in each file. (iirc "objdump --dwarf=decodedline")

    Importance is to compile without any optimization: -O-
    Even -O1 will remove some lines...

    (also ensure: no inline / if need put a {$inline off} into each file)

    You still have some work in front of you, you need to parse the output of objdump (rather large files), and then process all your source files, to keep only the lines that were used (blank out, remove or comment the others).

    ----

    The advantage is, you use the compiler to do the parsing => so you get exactly the result that also goes into each exe.
    « Last Edit: February 08, 2026, 09:11:47 pm by Martin_fr »

    Чебурашка

    • Hero Member
    • *****
    • Posts: 593
    • СЛАВА УКРАЇНІ! / Slava Ukraïni!
    Re: Dumping preprocessed working trees
    « Reply #4 on: February 08, 2026, 09:11:23 pm »
    Well, you can compile them, then use a tool like objdump to extract line info from the exe (if you compiled with debug info). That would give you a list of all used lines in each file. (iirc "objdump --dwarf=decodedline")

    Importance is to compile without any optimization: -O-
    Even -O1 will remove some lines...

    You still have some work in front of you, you need to parse the output of objdump (rather large files), and then process all your source files, to keep only the lines that were used (blank out, remove or comment the others).

    ----

    The advantage is, you use the compiler to do the parsing => so you get exactly the result that also goes into each exe.

    This seems promising. Thank you too.
    FPC 3.2.0/Lazarus 2.0.10+dfsg-4+b2 on Debian 11.5
    FPC 3.2.2/Lazarus 2.2.0 on Windows 10 Pro 21H2

    Martin_fr

    • Administrator
    • Hero Member
    • *
    • Posts: 12209
    • Debugger - SynEdit - and more
      • wiki
    Re: Dumping preprocessed working trees
    « Reply #5 on: February 08, 2026, 09:15:03 pm »
    Just had a 2nd similar idea.

    -al

    compile to assembler output. IIRC that has the source lines included, with a specific comment prefix.
    You can then filter the file based on that comment.

    But not sure about the procedure (header) lines themself.



    Also both approaches work well for code lines. But not for type/var sections that don't produce code. They don't have line info, and wont be in the asm. (or line info)



    Martin_fr

    • Administrator
    • Hero Member
    • *
    • Posts: 12209
    • Debugger - SynEdit - and more
      • wiki
    Re: Dumping preprocessed working trees
    « Reply #6 on: February 08, 2026, 09:17:57 pm »
    Alternative: You can pre-process the sources.

    Put a {$warning this is line xyz} after each {$IF}.

    Then when you compile collect the warnings. And you know what was compiled.
    (and that works for var/type blocks too)

    Чебурашка

    • Hero Member
    • *****
    • Posts: 593
    • СЛАВА УКРАЇНІ! / Slava Ukraïni!
    Re: Dumping preprocessed working trees
    « Reply #7 on: February 08, 2026, 09:23:04 pm »
    Alternative: You can pre-process the sources.

    Put a {$warning this is line xyz} after each {$IF}.

    Then when you compile collect the warnings. And you know what was compiled.
    (and that works for var/type blocks too)

    I grepped on the source tree and it says {$IF + {$IFDEF = 31462
    Perhaps a automatic numbering right after each conditional block might be viable.
    I have one more strategy now. 
    FPC 3.2.0/Lazarus 2.0.10+dfsg-4+b2 on Debian 11.5
    FPC 3.2.2/Lazarus 2.2.0 on Windows 10 Pro 21H2

    440bx

    • Hero Member
    • *****
    • Posts: 6159
    Re: Dumping preprocessed working trees
    « Reply #8 on: February 08, 2026, 10:15:04 pm »
    However the main pain I foresee in this proposal comes from the fact that in many cases the "entropic" code in question, does not use {$IFDEF} as preprocessor directive, but {$IF ...} and the latter makes things more complicated due the support of complete booelan expressions in the condition part.
    That would require interpreting the boolean expressions which obviously requires some work on your part.  OTH, having a piece of code that tokenizes source and interprets compiler directives can be quite useful for other things too.

    All this said, Martin_fr offered some interesting suggestions... using the debug info might be a practical way of getting a list of the source lines of interest.
    FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

    Martin_fr

    • Administrator
    • Hero Member
    • *
    • Posts: 12209
    • Debugger - SynEdit - and more
      • wiki
    Re: Dumping preprocessed working trees
    « Reply #9 on: February 08, 2026, 10:46:50 pm »
    If he can modify the code, then the {$warning} might be the best idea. It gives him the yes/no for each and every $IF that he has.

    A script to add them shouldn't be too hard, well at least if there are no nested comments in the directives.

    And, for removing code, once he got the lines from the compiler output, he doesn't even need to match the if/else/endif. He just copies to the next {$warning} that hasn't been in the output.


    Obviously, need to make a copy or backup before. Or best, if not yet done, put the whole thing under revision control (e.g. git / just locally).



    Martin_fr

    • Administrator
    • Hero Member
    • *
    • Posts: 12209
    • Debugger - SynEdit - and more
      • wiki
    Re: Dumping preprocessed working trees
    « Reply #10 on: February 08, 2026, 10:52:23 pm »
    Btw, given the apparent size of the code base. And that there are over 100 variations....

    My suggestion would be
    - don't strip of non-used code
    - instead annotate existing code.

    Example, run the $warnings with MASTER1
    Then after each IF/IFDEF/ELSE start a comment.
    And depending on MASTER 1 produced a warning for it or not, then add // MASTER1+ or MASTER1- as part of the comment.

    Same for all others.

    In the end, after each  IF/IFDEF/ELSE there will be a comment with a list of all the defines that compile the block starting here.

    And it is easy to search for MASTERn+ or MASTERn-
    Even possible to rewrite all IF to directly use the MASTER defines, instead of all the sub-defines.

    Just an idea.

    440bx

    • Hero Member
    • *****
    • Posts: 6159
    Re: Dumping preprocessed working trees
    « Reply #11 on: February 08, 2026, 11:37:20 pm »
    If he can modify the code, then the {$warning} might be the best idea. It gives him the yes/no for each and every $IF that he has.
    A macro could probably be used to implement that.  Something that would add {$warning this is line xxx in file yyy} after each {$if...} and {$ifdef...}

    I don't remember if there is a SynEdit macro function that returns the file name but even if there isn't, it wouldn't be too onerous to manually add a comment that includes the file name that the macro would read.   That said, if all the files are in the same directory, it would be trivial to write a little FPC program to do that (depending on how many files are involved, it could be worth it.)
    FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

    Martin_fr

    • Administrator
    • Hero Member
    • *
    • Posts: 12209
    • Debugger - SynEdit - and more
      • wiki
    Re: Dumping preprocessed working trees
    « Reply #12 on: February 09, 2026, 12:04:55 am »
    A macro could probably be used to implement that.  Something that would add {$warning this is line xxx in file yyy} after each {$if...} and {$ifdef...}

    It should be possible with a regexp replace. Though not necessarily with the regex that is supported by SynEdite.

    Personally I would do a Perl script for that (bit older tech, but would be quick and easy). But I guess other script languages will do too. Advantage, they can iterate all files.
    Or write a small Pascal app, that iterates all files, and use the regexp from fpc (or the updated version from github, that is rather powerful).

    Something among the lines (multiline pattern)
    Code: Text  [Select][+][-]
    1. \{\$(IF|IFDEF|ELSE|ELSEIF)\b[^}]*\}

    440bx

    • Hero Member
    • *****
    • Posts: 6159
    Re: Dumping preprocessed working trees
    « Reply #13 on: February 09, 2026, 01:59:14 am »
    Or write a small Pascal app, that iterates all files, ...
    That would probably be my choice, have some Pascal fun along the way :)
    FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

    MarkMLl

    • Hero Member
    • *****
    • Posts: 8551
    Re: Dumping preprocessed working trees
    « Reply #14 on: February 09, 2026, 09:29:13 am »
    compile to assembler output. IIRC that has the source lines included, with a specific comment prefix.
    You can then filter the file based on that comment.

    Something like that would be my chosen approach: rely on the compiler as far as possible.

    I'd particularly caution against your later suggestion using Perl. There are members of this community for whom Perl is such a dirty word that any suggestion that it was being used would guarantee their non-cooperation.

    MarkMLl
    MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
    Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
    Pet hate: people who boast about the size and sophistication of their computer.
    GitHub repositories: https://github.com/MarkMLl?tab=repositories

     

    TinyPortal © 2005-2018