Recent

Author Topic: Some observations on the use of RTTI  (Read 10793 times)

Joanna

  • Hero Member
  • *****
  • Posts: 1461
Re: Some observations on the use of RTTI
« Reply #45 on: October 22, 2024, 12:15:00 pm »
Fibonacci is it possible to revert to an earlier version of Lazarus/fpc before the rtti stuff was introduced? It was not always this way was it?

Another option is to rename whatever would be crackers will see to be misleading and confusing. I’m not sure how much that will help though. I really don’t like this idea of everything being visible either.

Fibonacci

  • Hero Member
  • *****
  • Posts: 1000
  • Behold, I bring salvation - FPC Unleashed
Re: Some observations on the use of RTTI
« Reply #46 on: October 22, 2024, 12:38:28 pm »
I dont know, RTTI is pretty old, and its only getting worse. Thats what Ive heard somewhere. PascalDragon once said you would have to revert back decades ago IIRC.

If you are really interested, I do have a method to disable most of the RTTI (any version, but I use trunk), with modeswitches to turn it on or off. Maybe Ill create a private repo and share it? But Im not sure I want to do that ;) Its more for my personal use. Id need to know what youre coding and why you need it. If you want, PM me, though I’ve been a bit busy for direct messaging lately, so pardon me for the delays.

Without RTTI there are side effects tho.
FPC Unleashed - inline vars, tuples, statement expressions, array equality, compound assignments, indexed/lazy labels, no-RTTI & more. ⭐ Star it on GitHub!

MarkMLl

  • Hero Member
  • *****
  • Posts: 8572
Re: Some observations on the use of RTTI
« Reply #47 on: October 22, 2024, 12:53:58 pm »
My post is now buried and invisible, so let me remind what this is about: the level of difficulty. RTTI makes reverse engineering a walk in the park, and there is really no room for debate on that. Everything can be reverse engineered, but the real question is: whats the cost of that?

A sophisticated adversary will know exactly what your binary does, but with RTTI, even a kid can pop open your binary in some "exe analyzer", click "gimme list of the strings" and voila! They see all your precious secrets. They know you are using that specific AES256-CTR encryption, that particular TSHA256 hash class, that fancy component to display output in a nice table, and even that licensing component which screams at you to search for "componentname keygen/crack".

Simple: RTTI not used should not be visible.

While I agree with your conclusions, I think that you're rather trivialising the level of difficulty of effective reverse engineering.

Both unstripped binaries and RTTI give substantial information away, but so (on Linux) does the ELF file header. I'd add that my understanding is that non-Windows OSes are at a particular disadvantage, since entries into a .so file can't be specified by numeric index so the function names /have/ to be retained.

Yes, it is true that the presence of symbolic information gives information away, but effective exploitation requires detailed disentangling of control flow using specialist tools with substantial learning curves: if your copy protection can be circumvented by somebody who downloads a key from some shady website then you're doing it wrong.

To actually intervene in the operation of a binary, as distinct from merely looking for unmangled strings, requires specialist tools, and these days the top of that particular pile is occupied by Ghidra. It appears that somebody is agitating for the development of a plugin to interpret Borland-style RTTI because of its use by C++, I would presume that the format of Lazarus-style RTTI- particularly on Linux- differs.

I am not advocating security by obscurity here by arguing that because the population of Pascal developers is insignificant (relative to the total number of software developers) nothing needs to be done.

But whether or not the current situation is "good enough", having unstrippable RTTI (unmangled and unencrypted) in a binary will impact security.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

dbannon

  • Hero Member
  • *****
  • Posts: 3826
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Some observations on the use of RTTI
« Reply #48 on: October 22, 2024, 01:35:58 pm »
It also seems like leaving symbols in would make the exe larger.
It depends, executables are padded up to multiple kilobytes. So if the symbol names total to less than the padding, you will have 0 difference.
But sure in any larger program having all symbol names in there adds slightly to the size. But let's do a simple calculation. A 500gb SSD costs around 30€, let's say all the symbol names add up to a whopping 100kb (100kb of raw ASCII text us a lot), we end up with an additional cost of 0.000006€ for your application.

Now, that prompted me to look at my application, tomboy-ng, the Qt5 variant.

The (unix) strings command shows me that in the 7Meg binary (stripped, no RTTI) there is 1.8Meg of "string like content". Just scrolling through it superficially says most look like LCL identifiers. Few if any of my identifiers ? My application is not large but 18 times your estimate of 100K ?  Hmm....

Now, your argument about cost of disk space is valid, who cares ?  But a bigger binary also takes up more memory and, arguably, is slower, (but I am sure, only by a small margin).

Still, I am personally surprised that about 25% of my binary is "string like" content !

So, I looked at Lazarus, 'lazarus' the binary :

Code: Bash  [Select][+][-]
  1. $> strings lazarus > lazarus.strings
  2. $> ls -la lazarus*
  3.  
  4. $> ls -lah lazarus*
  5. -rwxr-xr-x 1 dbannon dbannon 166M Oct  3 21:57 lazarus
  6. -rw-r--r-- 1 dbannon dbannon  60M Oct 22 22:27 lazarus.strings

More than a third ?

Davo
Lazarus 4, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

Joanna

  • Hero Member
  • *****
  • Posts: 1461
Re: Some observations on the use of RTTI
« Reply #49 on: October 22, 2024, 01:48:19 pm »
I dont know, RTTI is pretty old, and its only getting worse. Thats what Ive heard somewhere. PascalDragon once said you would have to revert back decades ago IIRC.

If you are really interested, I do have a method to disable most of the RTTI (any version, but I use trunk), with modeswitches to turn it on or off. Maybe Ill create a private repo and share it? But Im not sure I want to do that ;) Its more for my personal use. Id need to know what youre coding and why you need it. If you want, PM me, though I’ve been a bit busy for direct messaging lately, so pardon me for the delays.

Without RTTI there are side effects tho.
Thank you for your generous offer. I’m currently not sharing my application with anyone else but I would like to in the future. What sort of side effects might I expect without rtti? If it’s secret feel free to pm me  :)

MarkMLl

  • Hero Member
  • *****
  • Posts: 8572
Re: Some observations on the use of RTTI
« Reply #50 on: October 22, 2024, 01:49:02 pm »
More than a third ?

So the next question is: how much of that is taken up by stuff that you're definitely not using, e.g. generics support which is now compiled into a program that predates FPC's use of generics.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Bogen85

  • Hero Member
  • *****
  • Posts: 703
Re: Some observations on the use of RTTI
« Reply #51 on: October 22, 2024, 02:16:55 pm »
So, I looked at Lazarus, 'lazarus' the binary :

Code: Bash  [Select][+][-]
  1. $> strings lazarus > lazarus.strings
  2. $> ls -lah lazarus*
  3. -rwxr-xr-x 1 dbannon dbannon 166M Oct  3 21:57 lazarus
  4. -rw-r--r-- 1 dbannon dbannon  60M Oct 22 22:27 lazarus.strings

More than a third ?

Some of that text is labels, menus, other actual text...

But it would seem even that is not going to be much compared to the symbols.

Thaddy

  • Hero Member
  • *****
  • Posts: 19268
  • Glad to be alive.
Re: Some observations on the use of RTTI
« Reply #52 on: October 22, 2024, 02:48:25 pm »
e.g. generics support which is now compiled into a program that predates FPC's use of generics.
Generics do not end up in a binary....Ever.
If you can prove that, that would be a bug, because generics are just code templates
« Last Edit: October 22, 2024, 02:56:14 pm by Thaddy »
objects are fine constructs. You can even initialize them with constructors.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8572
Re: Some observations on the use of RTTI
« Reply #53 on: October 22, 2024, 06:52:47 pm »
Generics do not end up in a binary....Ever.
If you can prove that, that would be a bug, because generics are just code templates

Step into the RTL/FCL and you find yourself going via generics support routines.

Updated: https://forum.lazarus.freepascal.org/index.php/topic,57934.msg433033.html#msg433033

MarkMLl
« Last Edit: October 22, 2024, 07:23:55 pm by MarkMLl »
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Warfley

  • Hero Member
  • *****
  • Posts: 2066
Re: Some observations on the use of RTTI
« Reply #54 on: October 22, 2024, 08:15:33 pm »
well obviously if you have something like Say a banking app you don’t want people getting into your source code looking for exploits. Also as Fibonacci said earlier. There are people who will clone your apps and sell them as their own ..  who would want that? Not me !
Is this something that actually happens or something you are afraid of because you think it might happen?

Because look at the panic software leak a few years ago. Panic software is a company that produces closed source software for MacOS and very successful at that. Recently all their source codes got leaked because hackers got into their source control and they refused to pay the ransom.

So it's a prime example, their apps make millions so anyone who could clone their apps could bankrupt them right? Well... Nothing happened. Turns out source is not the most important thing to an app. The hard thing with developing an app is the maintainance. Most apps are not so complex that no one else can clone them. What's important is the experience gained at developing them and using this experience to maintain those apps.
The interesting thing about source code is not how things were solved, but why things were solved a certain way. This is why when someone joins a software project they are very useless for like the first 6month to 1year, because they need to build that experience.l, figure out why things were built a certain way, etc.

Even if you got the full source code to Microsoft office, to build a competitor that's comparable to MS Office you'd still need a team as big and experienced as what MS can provide.

Look at the history of software development, how many cases of companies do you know that went bankrupt after a competitor cloned their app after a source code leak? I do not know of a single case.
This is not a concern for the real world
« Last Edit: October 22, 2024, 08:36:51 pm by Warfley »

Warfley

  • Hero Member
  • *****
  • Posts: 2066
Re: Some observations on the use of RTTI
« Reply #55 on: October 22, 2024, 08:42:59 pm »
So, I looked at Lazarus, 'lazarus' the binary :

Code: Bash  [Select][+][-]
  1. $> strings lazarus > lazarus.strings
  2. $> ls -la lazarus*
  3.  
  4. $> ls -lah lazarus*
  5. -rwxr-xr-x 1 dbannon dbannon 166M Oct  3 21:57 lazarus
  6. -rw-r--r-- 1 dbannon dbannon  60M Oct 22 22:27 lazarus.strings

More than a third ?

Davo
You know how Lazarus compiles resources right? All images, icons and co are encoded as strings which are decoded at runtime.

Strings just gives you everything that looks like a string within a file. So it's not all Symbols. It's all the graphics, configurations and all the other resources.
If you want to figure out how many of those are symbols, go through all entries outputted by strings and check if there's a type definition within the Lazarus source code that corresponds to said string.
There's there's a Linux command line tool to get identifiers from source code which you could use for that, it's not perfect as it does not really understand Pascal, but it would give you a first indication

But I can't remember the name sadly
« Last Edit: October 22, 2024, 08:45:40 pm by Warfley »

MarkMLl

  • Hero Member
  • *****
  • Posts: 8572
Re: Some observations on the use of RTTI
« Reply #56 on: October 22, 2024, 09:34:55 pm »
I think we could do with getting a bit more focused here: simply looking at the amount of recognisable text in a binary is... well frankly pretty pathetic, particularly since tools like binwalk ** are smart enough to look for binary blocks which are "more random than expected" and treat them as candidate encryption keys.

Here's a program I wrote over the Summer, which embeds LazMapViewer hence various SSL etc. stuff, which probably makes it fairly typical "I'd rather that people kept their fingers out of this" fodder.

Code: Text  [Select][+][-]
  1. $ ls -lh local1090-x86_64-linux-gtk2
  2. -rwxr-xr-x 1 markMLl markMLl 27M Oct 12 09:15 local1090-x86_64-linux-gtk2
  3. $ strip local1090-x86_64-linux-gtk2
  4. $ ls -lh local1090-x86_64-linux-gtk2
  5. -rwxr-xr-x 1 markMLl markMLl 7.8M Oct 22 19:54 local1090-x86_64-linux-gtk2
  6.  

So, roughly three quarters of the binary is symbolic information which, if left in there, is definitely "of value to the enemy" since the format is entirely understood.

What does that leave us? Well the first thing that stands out is the named entry points of all system libraries being used:

Code: [Select]
$ strings -n 8 local1090-x86_64-linux-gtk2 | less

/lib64/ld-linux-x86-64.so.2
pango_layout_get_extents
g_type_interface_peek
g_value_unset
g_list_find
gdk_pixbuf_get_height
g_slist_free
gdk_pixbuf_new_from_data
...

That, trivially and (on non-Windows OSes) unavoidably, will tell a miscreant whether any special encryption etc. libraries are being used.

After that it gets interesting: there's a great deal of text which looks like assertions in various libraries (which an attacker could match with a bit of Googling), and what looks like more unstripped symbols (which could possibly be RTTI) *** :

Code: [Select]
...
RaiseOwnerCircle AValue=
TAnchorSide.SetControl AValue=FOwner
TAnchorSide.CheckSidePosition Circle,
TAnchorSide.CheckSidePosition invalid anchor control,
TAnchorSide.CheckSidePosition invalid Side
LCLSTRCONSTS
lclstrconsts.rsmbyes
lclstrconsts.rsmbno
lclstrconsts.rsmbok
lclstrconsts.rsmbcancel
lclstrconsts.rsmbabort
...

I definitely don't want to "do a Joanna" and start screaming that we're all doomed **** . But equally definitely, there's stuff in that binary which would definitely be useful /if/ an attacker thought that it was hiding something interesting.

** I don't claim to be fully up to date, but please note that I'm actually able to "name names" rather than making aggrieved noises about potential script kiddies with hypothetical magic toolkits.

*** Note that the FPC used to build Lazarus was low on optimisation and high on debugging checks.

**** Even if we're using Pascal.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Warfley

  • Hero Member
  • *****
  • Posts: 2066
Re: Some observations on the use of RTTI
« Reply #57 on: October 22, 2024, 10:23:59 pm »
But assertions have nothing to do with rtti. Also like any messages like error messages or exception names and stuff like that will be visible in the binary.

Like let's say you have some software that checks a product key as a form of DRM, and you are a hacker who wants to crack that software to use for free. What you will do is you check what error message pops up when you enter the wrong key, e.g. if there's a pop-up saying "Invalid serial number" you open that binary in ghidra, search that string, then search where that string is used, find the function that triggers the pop-up and you already got the interesting piece of code.

There is usually no need to look for RTTI or assertions or stuff like that, because any app that should face the user is full of strings you can use to find where what functionality is called

PascalDragon

  • Hero Member
  • *****
  • Posts: 6397
  • Compiler Developer
Re: Some observations on the use of RTTI
« Reply #58 on: October 22, 2024, 10:49:18 pm »
Fibonacci is it possible to revert to an earlier version of Lazarus/fpc before the rtti stuff was introduced? It was not always this way was it?

The first commits for RTTI support where around 23 years ago and it expanded since that, so have fun with that. ;)

Also the LCL heavily relies on RTTI, so have even more fun with that.

There is usually no need to look for RTTI or assertions or stuff like that, because any app that should face the user is full of strings you can use to find where what functionality is called

Indeed. Which is why at work (with C++) we obfuscate strings at compile time for code that's critical to our copy protection system (and only that) plus using a small virtual machine with proof of work concepts to protect it even further and temper detection mechanisms that separate detection and action. The main point of all this is not necessarily to prevent adversaries cracking the software at all, but to be annoying enough that it's so low on the priority list of them that we can release a new version with new features (and adjusted piracy protection) before they're able to crack the released software. (Though to be fair our software is essentially based on a user space virtual machine, so we have a bit more freedom with protecting things than singular binaries have)

Joanna

  • Hero Member
  • *****
  • Posts: 1461
Re: Some observations on the use of RTTI
« Reply #59 on: October 23, 2024, 02:01:24 am »
Even if rtti information is exposed, Is there any style of writing code that makes it more difficult to reverse engineer maybe?

 

TinyPortal © 2005-2018