I think you are on the right track now, if you really want to bring down code size you have to check your every move and it's costs.
Strings are very expensive both as ansistrings (possible for e.g. arm, mips targets), there you will in most cases need to add heapmgr (Hint! Hint!) to make them properly work and the code for handling them is quite complex (--> big)
and also as shortstrings (avr) where they consume 256 bytes of precious ram on your stack for every function call that uses strings. Take a look at a function that concats two strings and returns the result, and while doing that say goodbye to 768 bytes of precious stack....
Also, for example using classes also increases code size, they need heapmgr and precious RAM is also used by VMT's when virtual methods are used.
The list goes on, when you want to keep codesize down you need to always check the generated assembler code and prepare for surprises
The question I always ask myself is WHY people today choose severely cpu and memory limited chips for freepascal...
I can understand it from the educational aspect, you learn a lot when trying to squeeze complex code into 32kb of Flash, but how many of the people using such a small system actually want to go through this learning curve?
There are quite capable and cheap boards out there in the sub-10$ range that offer 64-128kb of flash (look for STM32F103C8T6 or maple mini) and come with a stlinkv2 compatible debugger, there are even more capable boards with STM32F4 Chips (search for STM32F411CEU6) with 128kb RAM and 512kb Flash or ATSAMD21 Boards like Arduino-Zero or clones.
Thoses Chipsets eat through most of the limitations that come with todays freepascal, they have enough flash for some not so optimized code and they have enough CPU power to easily execute the complex string methods in freepascal.
Please do not misunderstand me, I like the smell of freshly generated assembler output of fpc, but I ask myself does everybody want to start that hard?
One could argue that AVR chips and older Arm Chips are easier to understand because of less complexity, but is this really true?
Me, I do not think so, you can also start simple with newer arm or mips chips, but there's a lot more (like DMA, advanced timers) that comes in handy once you have mastered the basics and start with the more challenging stuff.
But until that point you have already had a lot of fun with blinking LED's, OLED-Displays or whatever without hitting the out of RAM/Flash-Memory wall even once.
Michael