1. Your assuming UTF16 can hold all characters in 1, again it's potentially possible for a unicode char to be made up of 2 UTF16 codepoints. The only unicode format guaranteed to take up one is a UTF32.
Quite so, I thought that UTF16 is always
2 bytes per char, but it occurred that they might be up to 4.
2. Your mixing UTF8 & UTF16 on the same line
Yes, this was the only of the three option, that seemingly gave a proper result.
3. It doesn't work.
That's a point.
Try this with your code.
edit1.text := utf8stringReplace('ii_abcDEfghIIjklmn_iI','İi','舒淇');
result->
舒淇_abcDEfghIIjklmn_iI
For me the result looks quite differently, but since I insert two squares, I should get two squares.
The odd thing is that I get the improper chars at the proper place.
BTW, I tried ShowMessage (utf16toutf8(UTF8ToUTF16('舒淇'))); and it works fine (Well, assuming that the squares displayed in the Lazarus IDE are what they should be).
To make it work better, you could of course put a UTF8ToUTF16 around your NewPattern, but you still have the potential of a double UTF16, not sure if there any double UTF16 that would change the number of codepoints if transformed, so it might be OK, not sure.
You are absolutely right here, now all 3 examples works.. .or at least they seem to, since the hieroglyphs are squares for me.
function UTF8StringReplace(const S, OldPattern, NewPattern: string{; Flags: TReplaceFlags}): string;
var
StringFull: UTF16String;
StartPosition: integer;
begin
StartPosition:=PosEx (UTF8LowerCase(OldPattern),UTF8LowerCase (s),1);
if StartPosition= 0 then
Result:= s
else
begin
StringFull:= UTF8ToUTF16 (s) ;
Result:= UTF16ToUTF8(LeftStr(StringFull,StartPosition-1)+ UTF8ToUTF16(NewPattern)+ RightStr(StringFull, UTF8Length (StringFull)-StartPosition-UTF8Length(OldPattern)+1));
end;
end;Also you might have noticed my code already handles the rfReplaceAll flag.
Surely, I did. Maybe I'll spend some time to add the cycling.