I want to manipulate strings containing two byte UTF8 character symbols and I am at a loss on how to proceed. Some of my problems are illustrated in the little program below.
The writeln statement produces no visible output.
Running hexdump on the two files produced produces:
hexdump utfbug0.txt
0000000 b6c3 b6c3 b6c3
hexdump utfbug1.txt
0000000 b6b6 b6c3 b6c3
so sr[1]:=s[2]; doesn't work as expected
and neither can I do s[1]:='ö'; which produces a compiler error.
So it seems that accessing a string symbol with a statement like s[n] only accesses the nth byte.
What have I misunderstood and what can I do to access and manipulate individual character symbols?
Thanks in advance for any help
H
uses
{$IFDEF UNIX}{$IFDEF UseCThreads}
cthreads,
{$ENDIF}{$ENDIF}
Classes,sysutils
{ you can add units after this };
procedure savestring(fname, s: string);
var str:tfilestream; p:pbyte;
begin
str:=tfilestream.Create(fname,fmcreate);
p:=@s[1];
while p^<>0 do begin
str.Write(p^,1);
inc(p)
end;
str.Free;
end;
var s,sn,sr:string; n:integer; p:pbyte;
begin
s:='äöä';
writeln(s[1]);
sr:=s;
sr[1]:=s[2];
savestring('utfbug0.txt',s);
savestring('utfbug1.txt',sr);
end.