There we use only simple indexed access to codepoints (and ignore surrogate pairs).
Well your choice to ignore surrogates. If you (and/or your boss) know what you are doing, and have good reasons, then you can do that.
But the statement above is misleading by omission.
- You state that you ignore surrogates.
- You should also explicitly mention that "access to codepoints" means exactly that: codepoints. You are *not* interested in characters. Only in codepoints.
You may be aware of it, and you may have meant this when you wrote it that way. But there are lots of readers here, who will read your statement and think that character and codepoint are the same. And they are not. (not even in utf32)
------------------
The other thing is, what you want to archive by "index access"
1) Easier to write code?
Maybe, but not really, since plenty of good helper functions exist for utf8. Its only about getting used to them.
2) execution speed.
Not necessarily. If you write efficient utf8 code, then this can be faster than the best utf16 code (at least with european based text). Why? Because if the text is really large, then organizing your data in memory in a way that reduces cache misses has a huge influence on speed. Using more memory for your data increases cache misses, and slows the app down.
With Chinese text utf16 (with full support, and no index access) may be faster then.
Of course archiving 2, means that you loose on 1. You may have to spent more time coding.