Recent

Author Topic: Reviewing program binary  (Read 2122 times)

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Reviewing program binary
« on: December 04, 2020, 12:51:43 pm »
I put a program written in FPC up on Github a few days ago... it's for a specific problem domain and what it does is probably of no interest to the FPC community. I find myself with two retrospective questions:

At present it doesn't present anything useful in response to --version. Reviewing the binary, I notice the string FPC 2.6.4 [2015/08/20] for x86_64 - Linux towards the end... is this string, verbatim, accessible in e.g. system so it can be displayed as part of help/version output? I'm not interested in reconstructing it in my program, I want this specific string, including compiler date, i.e. what somebody would see if he waded in with a binary debugger.

I'm a bit concerned about binary size. The runtimes were built with -CX in fpc.cfg, nothing is imported into the single source file other than Classes and SysUtils, the project options include -CX and -XX, but even after stripping it's still 300+K.

Damningly, it's full of literal text identifying itself as being from rtlconsts and sysconst... why isn't this being chopped?

I've tried various versions of FPC between 2.6.4 and 3.0.4, and the end result is pretty similar. Now if it were just stuff for my own use I wouldn't be particularly bothered, but if I ever have to answer questions on Github from somebody trying to build the binaries himself... well, it's embarrassing.

MarkMLl


MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Reviewing program binary
« Reply #1 on: December 04, 2020, 02:47:59 pm »
At present it doesn't present anything useful in response to --version. Reviewing the binary, I notice the string FPC 2.6.4 [2015/08/20] for x86_64 - Linux towards the end... is this string, verbatim, accessible in e.g. system so it can be displayed as part of help/version output? I'm not interested in reconstructing it in my program, I want this specific string, including compiler date, i.e. what somebody would see if he waded in with a binary debugger.

This string is an implementation detail and merely intended for identification purposes of the generated binary.

You're better of using the various {$I %xxx%} directives to build this string manually:

Code: Pascal  [Select][+][-]
  1. begin
  2.   Writeln('FPC ' + {$I %FPCVERSION%} + ' [' + {$I %FPCDATE%} + '] for ' + {$I %FPCTARGETCPU%} + ' - ' + {$I %FPCTARGETOS%});
  3. end.

That said, if you really want to access that builtin string and are aware that this might change with any new version you can do it like this (Note: this must be in the main program):

Code: Pascal  [Select][+][-]
  1. var
  2.   _FPCIdent: record end external name '__fpc_ident';
  3.   FPCIdent: PChar = @_FPCIdent;
  4.  
  5. begin
  6.   Writeln(FPCIdent);
  7. end.

I'm a bit concerned about binary size. The runtimes were built with -CX in fpc.cfg, nothing is imported into the single source file other than Classes and SysUtils, the project options include -CX and -XX, but even after stripping it's still 300+K.

The SysUtils and Classes units simply provide a certain size due to them having initialization and finalization sections. Take the following sizes for an empty program with various units used (i386-win32):

Code: [Select]
33280 tempty.exe ; just "begin end."
 33280 tempty-objfpc.exe ; added "{$mode objfpc}" which adds unit "objpas"
 89088 tempty-sysutils.exe ; added "uses SysUtils"
189952 tempty-classes.exe ; added "uses Classes" (Classes uses SysUtils)

Damningly, it's full of literal text identifying itself as being from rtlconsts and sysconst... why isn't this being chopped?

Because they are used. These are simply parts that are directly or indirectly referenced due to the initialization sections of SysUtils and (mainly) Classes.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Reviewing program binary
« Reply #2 on: December 04, 2020, 05:41:57 pm »
You're better of using the various {$I %xxx%} directives to build this string manually:

Thanks, both methods noted. TBH given a choice I'd be happier using the "naughty" way: either it will work or it won't work, and there will be no messing around trying to find out exactly what versions of the compiler document their support for the predefines. In practice (using grep) I see that the predefines you've given all go back at least as far as 2.6.4, while the "naughty" way appears to have come in with 3.2.0.

Turning now to binary sizes.

Quote
The SysUtils and Classes units simply provide a certain size due to them having initialization and finalization sections. Take the following sizes for an empty program with various units used (i386-win32):

Code: [Select]
33280 tempty.exe ; just "begin end."
 33280 tempty-objfpc.exe ; added "{$mode objfpc}" which adds unit "objpas"
 89088 tempty-sysutils.exe ; added "uses SysUtils"
189952 tempty-classes.exe ; added "uses Classes" (Classes uses SysUtils)

OK, so if we take a "cogito ergo sum" that looks like this:

Code: Pascal  [Select][+][-]
  1. program tempty;
  2.  
  3. begin
  4. end.
  5.  

Compiled using 3.0.4 on x86_64 with default options that results in a binary of 210,376 bytes. With -CX and -XX that drops to 34,680  which is "near as damnit" to yours.

With $mode objfpc I get 34,712.

Adding SysUtils I get 105,528. Removing SysUtils and adding Classes I get 230,888 which is still not unreasonably unlike yours. But I have to conclude that either "smart linking" doesn't exactly live up to its name, or there's something sufficiently convoluted in the RTL/FCL that it's unable to cope.

But it's obvious that if a fairly simple parser comes to much larger than this then I need to look at what's going on... if I build it from the command line it's 273,320 but Lazarus gives me 744,880 even after stripping it manually.

I'll report back on that one.

Quote
Damningly, it's full of literal text identifying itself as being from rtlconsts and sysconst... why isn't this being chopped?

Because they are used. These are simply parts that are directly or indirectly referenced due to the initialization sections of SysUtils and (mainly) Classes.
[/quote]

Not by me they're not. And while having that stuff in there arguably makes FPC no worse than say an 80s-style "4GL" it makes it very difficult to position it as a viable competitor to general-purpose languages.

MarkMLl
« Last Edit: December 04, 2020, 07:29:48 pm by MarkMLl »
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Reviewing program binary
« Reply #3 on: December 04, 2020, 10:42:10 pm »
But I have to conclude that either "smart linking" doesn't exactly live up to its name, [...]

Smart-linking does what it's supposed to do: it links in only those functions that are (however remotely) used in the code but it doens't deal at all with constants, etc.

The problem here is not only that the initialization/finalization sections of some RTL units (the, let's say,  "base" ones, at that) do use some code which, from your point of view (or anyone',s for that matter), does nothing for your program, but also uses a fair ammount of constants, most to initialize variables which, depending on the program, might never be used (format settings, day and month names, etc.) and which take their fair share of space in the binary.

The only thing you can do about it is what some small-systems programmers do: build your own tailored system (and whatever others) unit.

But IMHO that's not very important, because one characteristic of Pascal programs (generally speaking) is that while the start relatively "fat", they grow comparatively slowly. On the other hand C programs (for example) can start rather lean (a few tens of KiB) but grow up very quickly with every feature you add to them, because each little thing you add to them frequently needs yet another #include "somelib" for some functionality. So both programs (Pascal/C) end up at arround the same weight. :)

Which is why C programmers in the aforementioned small-systems never "printf" anything and, if they do, they use their own "just what I need and no more" printf. In other words: stripped down system unit ;)
« Last Edit: December 04, 2020, 10:46:14 pm by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Reviewing program binary
« Reply #4 on: December 05, 2020, 10:15:01 am »
Leave this one with me for a little while. The binary I started off with (even after being stripped) has got lots of stuff like this embedded


...
Points
rtlconsts.spointsdescription
Rods
rtlconsts.srodsdescription
Yards
rtlconsts.syardsdescription
Acres
rtlconsts.sacresdescription
Area
rtlconsts.sareadescription
Ares
rtlconsts.saresdescription
...


for no obvious reason, which is what I found particularly irritating. I'll try to work out what's going on.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Reviewing program binary
« Reply #5 on: December 05, 2020, 10:42:49 am »
You're better of using the various {$I %xxx%} directives to build this string manually:

Thanks, both methods noted. TBH given a choice I'd be happier using the "naughty" way: either it will work or it won't work, and there will be no messing around trying to find out exactly what versions of the compiler document their support for the predefines. In practice (using grep) I see that the predefines you've given all go back at least as far as 2.6.4, while the "naughty" way appears to have come in with 3.2.0.

The data itself exists for ages, but the __fpc_ident symbol was only added in light of the introduction of the LLVM backend.

Adding SysUtils I get 105,528. Removing SysUtils and adding Classes I get 230,888 which is still not unreasonably unlike yours. But I have to conclude that either "smart linking" doesn't exactly live up to its name, or there's something sufficiently convoluted in the RTL/FCL that it's unable to cope.

Using the Classes unit will also indirectly use the SysUtils unit.

Also smart linking is working exactly as it should. The “problem” is that both Classes and SysUtils have initialization and finalization which in turn causes quite some code to be linked in as well (e.g. the complete TComponent class even if it's not used).

Quote
Damningly, it's full of literal text identifying itself as being from rtlconsts and sysconst... why isn't this being chopped?

Because they are used. These are simply parts that are directly or indirectly referenced due to the initialization sections of SysUtils and (mainly) Classes.

Not by me they're not.

Yes, they are, because you're simply using the units. As soon as a unit has an initialization or finalization section (in this case Classes and SysUtils) it must be used by the compiler and that includes linking in all code that might be required from there.

for no obvious reason, which is what I found particularly irritating. I'll try to work out what's going on.

You can use -Xm to have the compiler generate a map file that contains information about which sections are referenced (at least the first reference that triggers the section to be used). You can find out this way what chain leads to the use of the resource strings.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Reviewing program binary
« Reply #6 on: December 05, 2020, 11:12:58 am »
The data itself exists for ages, but the __fpc_ident symbol was only added in light of the introduction of the LLVM backend.

Thanks, noted. However for me there appears to be a "sweet spot" around FPC 3.0.4 and Lazarus 2.0.6, the (by default) break of backward compatibility in Lazarus 2.0.8 is a killer from my POV and now that I can't even run Lazarus trunk built with 3.2.0 I think the writing's on the wall.

TBH I've long felt that Lazarus 1.0 and FPC 3.0 had synchronised as a long-term supported release, rather than muddying the water with things like OPM (in Lazarus) and generics in FPC.

Quote
Adding SysUtils I get 105,528. Removing SysUtils and adding Classes I get 230,888 which is still not unreasonably unlike yours. But I have to conclude that either "smart linking" doesn't exactly live up to its name, or there's something sufficiently convoluted in the RTL/FCL that it's unable to cope.

Using the Classes unit will also indirectly use the SysUtils unit.

Yes, I got that which was why I removed SysUtils which I assumed was what you'd done when you produced your sizes.

Quote
Also smart linking is working exactly as it should. The “problem” is that both Classes and SysUtils have initialization and finalization which in turn causes quite some code to be linked in as well (e.g. the complete TComponent class even if it's not used).

Quote
Damningly, it's full of literal text identifying itself as being from rtlconsts and sysconst... why isn't this being chopped?

Because they are used. These are simply parts that are directly or indirectly referenced due to the initialization sections of SysUtils and (mainly) Classes.

Not by me they're not.

Yes, they are, because you're simply using the units. As soon as a unit has an initialization or finalization section (in this case Classes and SysUtils) it must be used by the compiler and that includes linking in all code that might be required from there.

for no obvious reason, which is what I found particularly irritating. I'll try to work out what's going on.

You can use -Xm to have the compiler generate a map file that contains information about which sections are referenced (at least the first reference that triggers the section to be used). You can find out this way what chain leads to the use of the resource strings.

Those strings I C&Ped earlier do not appear in the minimal programs I ran off yesterday, and the program that's suddenly sprouted them does nothing out of the ordinary. But it's clear that /something/ I've put into it has caused them to erupt... I'll try to work out what later but unlike the error messages etc. visible when a program has SysUtils/Classes they're very public cruft that looks bad to anybody inclined to look.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Reviewing program binary
« Reply #7 on: December 05, 2020, 04:37:36 pm »
The data itself exists for ages, but the __fpc_ident symbol was only added in light of the introduction of the LLVM backend.

Thanks, noted. However for me there appears to be a "sweet spot" around FPC 3.0.4 and Lazarus 2.0.6, the (by default) break of backward compatibility in Lazarus 2.0.8 is a killer from my POV and now that I can't even run Lazarus trunk built with 3.2.0 I think the writing's on the wall.

I don't know what you're doing, I run Lazarus trunk with FPC 3.2.0 without any problems.

TBH I've long felt that Lazarus 1.0 and FPC 3.0 had synchronised as a long-term supported release, rather than muddying the water with things like OPM (in Lazarus) and generics in FPC.

Generics have been in FPC for a long time already. They're just getting more workout though.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Reviewing program binary
« Reply #8 on: December 05, 2020, 05:50:14 pm »
I don't know what you're doing, I run Lazarus trunk with FPC 3.2.0 without any problems.

As I've said before, there's something wrong with the fppkg initialisation when the IDE starts up. A couple of other people have tried to duplicate it with limited success so it's obviously down to either my filesystem layout or directory/file access rights, however older versions are OK and by now I'm quite simply past caring enough to fight another battle. Sorry.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: Reviewing program binary
« Reply #9 on: December 05, 2020, 08:43:00 pm »
More likely to be a matter of old state than the used source.

On *nix, I assume fppkg state is in .fppkg ?

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Reviewing program binary
« Reply #10 on: December 05, 2020, 11:15:34 pm »
I'm pretty sure I wiped all configuration files but will try again when I have time. When I raised this a couple of weeks ago nobody could say what was actually being looked for... I could see the bit in the IDE setup source which was generating the message and it had a compiler-version conditional around it.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Reviewing program binary
« Reply #11 on: December 06, 2020, 11:47:41 am »
TL;DR What I was seeing was caused by the Lazarus IDE's debugging option breaking smartlinking, which left strings imported with Classes in the final binary.

Those strings I C&Ped earlier do not appear in the minimal programs I ran off yesterday, and the program that's suddenly sprouted them does nothing out of the ordinary. But it's clear that /something/ I've put into it has caused them to erupt... I'll try to work out what later but unlike the error messages etc. visible when a program has SysUtils/Classes they're very public cruft that looks bad to anybody inclined to look.

There appear to be two things here.

The first is that the default debugging format used by the Lazarus IDE implies FPC's -g option, this results in a warning which is visible if run from the command line but is lost by the IDE. Quoting selectively:


$ fpc tempty-sysutils

$ ls -l tempty-sysutils
-rwxr-xr-x 1 markMLl markMLl 517440 Dec  6 10:12 tempty-sysutils

$ strip tempty-sysutils
$ ls -l tempty-sysutils
-rwxr-xr-x 1 markMLl markMLl 517440 Dec  6 10:12 tempty-sysutils

$ fpc -CX -XX tempty-sysutils

$ ls -l tempty-sysutils
-rwxr-xr-x 1 markMLl markMLl 105528 Dec  6 10:12 tempty-sysutils

$ strip tempty-sysutils
$ ls -l tempty-sysutils
-rwxr-xr-x 1 markMLl markMLl 105528 Dec  6 10:13 tempty-sysutils

$ fpc -g -CX -XX tempty-sysutils
Note: DWARF debug information cannot be used with smart linking on this target, switching to static linking

$ ls -l tempty-sysutils
-rwxr-xr-x 1 markMLl markMLl 1784912 Dec  6 10:13 tempty-sysutils

$ strip tempty-sysutils
$ ls -l tempty-sysutils
-rwxr-xr-x 1 markMLl markMLl 517632 Dec  6 10:13 tempty-sysutils


So a minimal program linking SysUtils will come to about 105K with smartlinking operative, which I guess isn't bad by today's standards but is still substantially larger than the 34K of a (smartlinked) program without SysUtils.

If I duplicate that sequence with a minimal program importing Classes but not SysUtils:


$ fpc tempty-classes

$ ls -l tempty-classes
-rwxr-xr-x 1 markMLl markMLl 847992 Dec  6 10:30 tempty-classes

$ strip tempty-classes
$ ls -l tempty-classes
-rwxr-xr-x 1 markMLl markMLl 847992 Dec  6 10:31 tempty-classes

$ fpc -CX -XX tempty-classes

$ ls -l tempty-classes
-rwxr-xr-x 1 markMLl markMLl 230888 Dec  6 10:31 tempty-classes

$ strip tempty-classes
$ ls -l tempty-classes
-rwxr-xr-x 1 markMLl markMLl 230888 Dec  6 10:31 tempty-classes

$ fpc -g -CX -XX tempty-classes
Note: DWARF debug information cannot be used with smart linking on this target, switching to static linking

$ ls -l tempty-classes
-rwxr-xr-x 1 markMLl markMLl 2776160 Dec  6 10:31 tempty-classes

$ strip tempty-classes
$ ls -l tempty-classes
-rwxr-xr-x 1 markMLl markMLl 848264 Dec  6 10:31 tempty-classes


So the minimum size here of a smartlinked program is about 230K.

However the thing that I was really questioning was this sort of cruft:


$ fpc tempty-classes

$ strings tempty-classes | grep -i furlong
Furlongs
rtlconsts.sfurlongsdescription

$ fpc -CX -XX tempty-classes

$ strings tempty-classes | grep -i furlong

$ fpc -g -CX -XX tempty-classes

$ strings tempty-classes | grep -i furlong
Furlongs
rtlconsts.sfurlongsdescription
SFURLONGSDESCRIPTION
RESSTR_$RTLCONSTS_$$_SFURLONGSDESCRIPTION

$ strip tempty-classes
$ strings tempty-classes | grep -i furlong
Furlongs
rtlconsts.sfurlongsdescription


However, it's now clear that (a) text like rtlconsts.sfurlongsdescription was brought in by importing Classes, and provided that smartlinking is operative is correctly chopped out if not being used.

So I suppose the residual question is whether one of the other debugging formats allows smartlinking to proceed, and whether the IDE could (should?) choose this as the default if the local debugger (gdb etc.) is capable of using it.

MarkMLl















MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Thaddy

  • Hero Member
  • *****
  • Posts: 14204
  • Probably until I exterminate Putin.
Re: Reviewing program binary
« Reply #12 on: December 06, 2020, 12:30:20 pm »
three notes
strippimg is not debugger friendly.
-XX -Xs (or in one go -XXs) often does a somewhat better job than calling strip.
The -Xg option stores debug info in a separate file and creates a debuginfo section to use it
« Last Edit: December 06, 2020, 12:43:32 pm by Thaddy »
Specialize a type, not a var.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Reviewing program binary
« Reply #13 on: December 06, 2020, 12:54:47 pm »
three notes
strippimg is not debugger friendly.
-XX -Xs (or in one go -XXs) often does a somewhat better job than calling strip.
The -Xg option stores debug info in a separate file and creates a debuginfo section to use it

True, but I think that the warning from the compiler is fairly conclusive... and fair under the circumstances, and is almost certainly something I've come across in the past.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9792
  • Debugger - SynEdit - and more
    • wiki
Re: Reviewing program binary
« Reply #14 on: December 06, 2020, 01:30:30 pm »
So I suppose the residual question is whether one of the other debugging formats allows smartlinking to proceed, and whether the IDE could (should?) choose this as the default if the local debugger (gdb etc.) is capable of using it.

The IDE is and will be gearing towards dwarf.

Using smartlinking and debugging has also in the past caused issues with the debug info, and impacted debug-ability. So that is not recommended anyway.
What is missing, is a warning when configuring this.

As for the fpc warning, it would be an idea for the IDE to compile an empty prog, and report any warnings (each time project opts are changed)

 

TinyPortal © 2005-2018