Recent

Author Topic: [SOLVED] Executing x86_64 code from dynamic array of byte  (Read 4967 times)

Thaddy

  • Hero Member
  • *****
  • Posts: 15545
  • Censorship about opinions does not belong here.
Re: Executing x86_64 code from dynamic array of byte
« Reply #30 on: August 30, 2024, 02:07:00 pm »
It doesn't, because InterlockedIncrement (etc) is part of the Windows API. Since this code is only running on x86-64,
It is not, as Marco explained: fpc's implementation is also cross platform and independent of OS or CPU.
I am curious where you got that information from, because that is fake news.
So we need to kill correct that source of information.
Furthermore PascalDragon gave you detailed information on how to use what you want on the first page.
« Last Edit: August 30, 2024, 02:15:42 pm by Thaddy »
My great hero has found the key to the highway. Rest in peace John Mayall.
Playing: "Broken Wings" in your honour. As well as taking out some mouth organs.

wizzwizz4

  • New Member
  • *
  • Posts: 10
Re: [SOLVED] Executing x86_64 code from dynamic array of byte
« Reply #31 on: August 30, 2024, 03:49:09 pm »
mmap and mprotect have fp* equivalents in baseunix, and they use syscalls or libc depending on the platform. I suggest to use those as much as possible.

That's the final part I didn't understand about the code in the Rtti unit. (fp stands for "Free Pascal", presumably.)

I am curious where you got that information from, because that is fake news.
My best guess is that I made it up. If I'd read such a surprising claim, I'd like to think I would've checked it; and this is incredibly easy to check. Sadly, I often don't apply this to the convoluted process that I perceive as "remembering".

Furthermore PascalDragon gave you detailed information on how to use what you want on the first page.
Yes, though Rtti.AllocateMemory is not reference-counted (and afaict the entire Rtti unit is undocumented, not that that stops me from adding comments); and I only knew about Initialize and Finalize, not AddRef and Copy. Khrys's post was the first to fully answer the first post. I think you're hinting that I should mark the thread resolved.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11732
  • FPC developer.
Re: [SOLVED] Executing x86_64 code from dynamic array of byte
« Reply #32 on: August 30, 2024, 04:46:40 pm »
mmap and mprotect have fp* equivalents in baseunix, and they use syscalls or libc depending on the platform. I suggest to use those as much as possible.

That's the final part I didn't understand about the code in the Rtti unit. (fp stands for "Free Pascal", presumably.)

Yes, posix_ and unx_ were also proposed, but other people didn't like the underscores ("unpascalish"), and I didn't want to commit to either posix or unix as those standards were a bit confusing from the viewpoint of a non C compiler (what is header, what isn't). Keep in mind that was during stat64 times, which were then papered over with macros in the C compilers.

I just wanted a(ny) prefix to avoid endless reports about why write(2) or read(2) etc were not working (clashing with built-in functions) or not named as expected, so I wanted one general rule for transforming man/libc page name to fpc name, with no exceptions.

See http://www.stack.nl/~marcov/unixrtl.pdf and https://wiki.freepascal.org/libc_unit


Khrys

  • Jr. Member
  • **
  • Posts: 82
Re: Executing x86_64 code from dynamic array of byte
« Reply #33 on: September 02, 2024, 09:04:54 am »
Thanks a lot for the detailed feedback! I think I overreached a bit with my code and should've stuck to a minimal example of a new reference-counted type specifically in Pascal (which operators to override etc.) and put a warning on my lack of expertise  :-[

Runs on both Windows and Linux (x86_64)
It doesn't, because InterlockedIncrement (etc) is part of the Windows API. Since this code is only running on x86-64, which has a fairly strong memory model (Intel® 64 and IA-32 Software Developer's Manual, volume 3 chapter 9) we should be able to use the assembly instructions MOV for reading, LOCK INC for incrementing, and LOCK DEC for decrementing. Don't quote me on this, though: I'm not an x86-64 assembly programmer, and certainly should not be trusted as an authority for mission-critical code.

If you do this, you must check the flags in TThunk.DecreaseReferenceCount (perhaps indirectly – see Why did InterlockedIncrement/Decrement only return the sign of the result? for the NT 3.51 / 95 design). Do not read the memory a second time to check it. Why? Assume there are two references, and consider the following sequence:
  • Thread A decreases reference count (2 to 1)
  • Thread B decreases reference count (1 to 0)
  • Thread B detects ReferenceCount = 0, frees the resources.
  • Thread A detects ReferenceCount = 0: double free!
This is not a bug in your code, only one that might be introduced by the careless programmer. If you do a straightforward InterlockedIncrement shim (see How does InterlockedIncrement work internally?), I believe your code should work fine.

I intended to use FPC's own version of this function (unfortunately named exactly the same as the Windows one), which so far hasn't provided me with any such nasty surprises - IOW I didn't even think of this as a weak point

Your debug code performs non-atomic reads, which is UB in some high-level languages and might be a bug on processors with a weaker memory model (but I think is fine on x86-64). For efficiency on some other architectures, you should use InterlockedIncrementAcquire and InterlockedDecrementRelease (does not affect x86-64).

Yeah, I should've removed the "debug code" prior to uploading... it was just a dirty hack to verify the reference-counting aspect, with no considerations about multi-threading

Your code assumes that mmap and mprotect are implemented as numbered syscalls. This is a valid assumption for Linux, but most UNIX-like OSs don't have a stable syscall ABI, if they even implement these as discrete syscalls. Seeing as you're only using POSIX functions, there's no need to restrict yourself to Linux! Adding a libc fallback mode for {$ELSEIF UNIX} might be nice. Alternatively, you should make it explicitly fail to compile on unsupported systems.

I regret including this part the most... being inexperienced with non-toy UNIX development (especially in Free Pascal, and only considering Linux) and being in a rush when I wrote this, when I couldn't find an FPC declaration of  mprotect  within a few Google searches, I jumped straight to the generic syscall wrapper

Fulfillment of W^X - the code buffer is never writable and executable at the same time
Your implementation is unsafe: we can read .Code, access .Data, then try to use .Code and get fireworks. Rather than storing Executable: Boolean, could you store a separate "executable reference count", and then forbid .Data accesses while there's an outstanding .Code access? (You could implement that by having .Code return an advanced record wrapper around a PThunkInstance (or, perhaps better, a ^SizeInt Pointer pair), though I'm not sure whether this would end up expensive. It's free in Rust, but rustc does heavier optimisations than fpc.)

I was well aware of this... should've at least pointed it out  :(



I added a warning to my original reply for future visitors of this thread. I'll try not to oversell the quality of my answers in the future  :)

PascalDragon

  • Hero Member
  • *****
  • Posts: 5649
  • Compiler Developer
Re: [SOLVED] Executing x86_64 code from dynamic array of byte
« Reply #34 on: September 04, 2024, 08:29:09 pm »
Furthermore PascalDragon gave you detailed information on how to use what you want on the first page.
Yes, though Rtti.AllocateMemory is not reference-counted (and afaict the entire Rtti unit is undocumented, not that that stops me from adding comments); and I only knew about Initialize and Finalize, not AddRef and Copy. Khrys's post was the first to fully answer the first post. I think you're hinting that I should mark the thread resolved.

My point was that you take the code from the Rtti unit as an inspiration, especially regarding the cross platform functionality. You can then add things like memory pooling to avoid waste of memory and reference counting yourself. The code from the Rtti unit is geared towards the use cases of that unit.

 

TinyPortal © 2005-2018