Recent

Author Topic: Fingerprinting source units  (Read 656 times)

MarkMLl

  • Hero Member
  • *****
  • Posts: 5923
Fingerprinting source units
« on: October 11, 2022, 10:54:05 am »
A few days ago there was discussion relating to making the full path of a unit accessible as a $I expansion, in the context of diagnostic messages etc. This culminated in the addition of the %sourcefile% predefined https://forum.lazarus.freepascal.org/index.php/topic,60793.0.html which will hopefully arrive in the compiler in due course.

Would it be possible to have something similar which presented a checksum or hash of the sourcefile, e.g. (with a nod to whoever selected a Cheetah as the project's mascot) using the Tiger algorithm?

My rationale is this. A few days ago I raised an issue on StackExchange relating to "blessing" a Linux binary with rights to allow it to e.g. access raw sockets https://unix.stackexchange.com/questions/720010/preventing-posix-capabilities-proliferation . Since I've not been shot down in flames I'll take it to the kernel mailing list (the issue isn't doing it, it's preventing it from proliferating).

In principle, an IDE could include code that allowed it to bless any program it built, but that didn't give the user carte blanche to assign enhanced capabilities to an arbitrary binary elsewhere on the system.

The administrator who was asked to bless the IDE would need some degree of confidence that it had been built with unmodified sourcefiles. In this context, a fingerprint of the binary isn't entirely suitable, since it might have been rebuilt for an unfamiliar processor or with an unexpected level of runtime checks.

In order to have some confidence in the fact that the IDE hasn't been modified, a minimal precaution would be if the main unit- which by convention imports all others- had access to every unit's fingerprint which it could combine and report. That's by no means foolproof, but knowing the file that has originated each fingerprint (i.e. the new %sourcefile% expansion) it should be easy enough to check that the fingerprint isn't being spoofed:

Code: Pascal  [Select][+][-]
  1. unit SomeUnit;
  2.  
  3. interface;
  4.  
  5. const
  6. //  UnitFingerprint= {$I %sourcehash% } ;
  7.   UnitFingerprint= '1234567890';         // LOOKIT ME: I'M A L33T H4CK3R :-)
  8. ...
  9.  

As I've said, it's not foolproof, but I think it would be a start particularly for targets such as Linux that don't have agreed conventions for binary signing.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Kays

  • Sr. Member
  • ****
  • Posts: 494
  • Whasup!?
    • KaiBurghardt.de
Re: Fingerprinting source units
« Reply #1 on: October 11, 2022, 11:55:30 am »
[…] Would it be possible to have something similar which presented a checksum or hash of the sourcefile […] ? […]
Every compiled unit has a couple checksums, see
Code: Bash  [Select][+][-]
  1. ppudump -vh someunit.ppu
I don’t think anything like a {$I %checksum%} will be ever supported.
  • Inclusion of a checksum string will necessarily affect the calculated checksum.
  • You may use {$I %checksum%} multiple times.
  • And then you have the task to find a checksum string that (embedded in a unit) yields the checksum?
That’s not a compiler’s job, and unless you happen to have a quantum computer at home it can be unreasonably slow.
Yours Sincerely
Kai Burghardt

MarkMLl

  • Hero Member
  • *****
  • Posts: 5923
Re: Fingerprinting source units
« Reply #2 on: October 11, 2022, 12:15:50 pm »
Inclusion of a checksum string will necessarily affect the calculated checksum.

No, it only affects the symbol table value in exactly the same way as %file% etc.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Kays

  • Sr. Member
  • ****
  • Posts: 494
  • Whasup!?
    • KaiBurghardt.de
Re: Fingerprinting source units
« Reply #3 on: October 11, 2022, 12:52:51 pm »
Inclusion of a checksum string will necessarily affect the calculated checksum.
No, it only affects the symbol table value in exactly the same way as %file% etc.
I cannot verify that
Code: Bash  [Select][+][-]
  1. $ cat > someunit.pas << EOT
  2. unit someunit;
  3.         interface
  4.                 const
  5.                         F = {\$I %file%};
  6.         implementation
  7. end.
  8. EOT
  9. $ ln -s someunit.pas someunit.pp
  10. $ fpc someunit.pas
  11. $ ppudump someunit.ppu | grep Checksum
  12. Checksum                : 37521D2F
  13. Interface Checksum      : A98CDFB9
  14. Indirect Checksum       : E1A3CEBA
  15. $ fpc someunit.pp
  16. $ ppudump someunit.ppu | grep Checksum
  17. Checksum                : 38DB330D
  18. Interface Checksum      : 97E2CB1D
  19. Indirect Checksum       : E1A3CEBA
Yours Sincerely
Kai Burghardt

MarkMLl

  • Hero Member
  • *****
  • Posts: 5923
Re: Fingerprinting source units
« Reply #4 on: October 11, 2022, 02:07:04 pm »
What the Hell are you on about man? I said absolutely nothing about the .ppu, I said take a hash of the SOURCE FILE and assign that as a constant value, In EXACTLY THE SAME WAY as the existing %file% expansions work.

Hence tentatively

Code: Pascal  [Select][+][-]
  1. program test;
  2.  
  3. const
  4.   fingerprint= {$i %file% } + ' ' + {$i %sourcehash% } ;
  5.  
  6. begin
  7.   WriteLn('Program generated using ', fingerprint)
  8. end.
  9.  

So this is absolutely noting to do with a preprocessor substitution of %sourcehash%- which as you observe would obviously change the checksum- but entirely the same as the compiler recognises and substitutes %file% etc.

It would, obviously, result in a performance penalty: but this is not something that everybody would need, and if they did it might not be needed for every unit of a program (in the case of a program which set capabilities, it might only be needed in the one unit which called the system-level library).

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Kays

  • Sr. Member
  • ****
  • Posts: 494
  • Whasup!?
    • KaiBurghardt.de
Re: Fingerprinting source units
« Reply #5 on: October 11, 2022, 02:54:57 pm »
[…] I said absolutely nothing about the .ppu, […]
Excuse me, did you even read my first post. I stated
Every compiled unit has a couple checksums, see
Code: Bash  [Select][+][-]
  1. ppudump -vh someunit.ppu
[…] I said take a hash of the SOURCE FILE and assign that as a constant value, […]
I’m sorry, but it doesn’t make sense to me to take the hash of the, quote, SOURCE FILE. The following two source files
Code: Pascal  [Select][+][-]
  1. unit foobar;
  2. interface
  3. implementation
  4. end.
Code: Pascal  [Select][+][-]
  1. // Great unit!
  2. unit foobar;
  3. interface
  4. implementation
  5. end.
will have the exactly same checksums as reported by ppudump(1) but, for instance, sha512sum(1) will report different checksums. It sounds unreasonable to me why (a change in) a comment should “sound the alarms”.
Yours Sincerely
Kai Burghardt

MarkMLl

  • Hero Member
  • *****
  • Posts: 5923
Re: Fingerprinting source units
« Reply #6 on: October 11, 2022, 04:42:31 pm »
It sounds unreasonable to me why (a change in) a comment should “sound the alarms”.

The file's changed. Of course it gets a different checksum: in exactly the way that a distribution package will get a different checksum if the release note's changed even if the binary's functionality isn't.

And I'm talking about being able to put the checksum into a constant, so that a unit can mark at the time of compilation that it's derived from such-and-such a sourcefile, and that the hash of that sourcefile can be verified by a standard utility.

If you wanted to kvetch a far more significant failing is that the unit checksum wouldn't automatically incorporate that of included files, although I'd counter that that could be handled by usage conventions.

And that's what this suggestion boils down to: an additional expansion which would allow a particular usage convention which at present is not easily attainable.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

 

TinyPortal © 2005-2018