It seems that many applications, like Word and OpenOffice only render combining characters correctly when there already is another character that looks the same. They can only display a single glyph at any location. While a browser stacks them with an offset. In other words, what you see depends on the rendering engine used.
Is there one that actually combines them as is intended? LaTeX?
But that also means, that each application (depending on the Unicode table and rendering engine used), has it's own Unicode subset, that might or might not look the same as any of the others when put on the screen or printer.
And I think the best way to compare Unicode chars would be to split them out into the base shape and the separate, combining characters. Then again, that would require expanding those, as there are "attachments" not covered by them.
Ok, that would make it even harder to determine how much storage space you need to reserve.
Actually, the best way to store them would probably be like Huffman encoding (7-zip etc). Expand each character you come across, make a list and only store the index in your string or table. That way, they will all be the same when they look the same and fit in a single, 32-bit value. And always display them multi-pass, the parts on top of each other.
I think that's how Unicode should have been.
On the other hand, that won't fix the sorting problem. You still need a separate table for each language. Although you can limit those to only the base shape and attachments that make a difference.