In any event, if doing something like this you have to be 100% confident that the start of the binary is in a format that the loader can handle. It's very common for this operation to be handled by the linker (which is OS-specific, or at least has command-line options to tailor it to a particular OS), and it's very common for it to derive from RTL initialisation code handcrafted in assembler: hence somebody's mention of objcopy to tune this.
When I was doing this sort of thing I had a custom binder program which put descriptor tables at the start of the binary. Input syntax was based on Intel documents, the implementation was my own.
I'd mention here that there used to be a mainframe range which was very heavily promoted as having all system software (i.e. including the OS) written in ALGOL. However when one scratched the surface one found that the OS had a special compiler distinct from that used for application code, and that the initial loader was a carefully-constructed binary: whether there was, in fact, an internal-use assembler is lost in the mists of time.
MarkMLl