Hmm, it probably changes "character = character" to "string" = string" then (not tested).
No idea, but that it is
a serious silent error which the developer would not know about, and
would be extremely hard to debug weeks after it was written. It is these pitfalls which I don't like about UTF-16 (in terms of Object Pascal). Many seem to treat UTF-16 text as if it is UCS2, and completely forgets (or ignores) the Unicode SMP range. That is not "convenience", but a huge programmer error.
That probably can be avoided by...
And there you loose the "convenience" of treating UTF-16 like AnsiString or UCS2. You had to know about the issue (FPC not raising an error), which apparently nobody knows about, and then modify the code to suite (untested solution).
Instead, the following UTF-8 code (as would be written in fpGUI) works 100% with the full Unicode range... BMP and SMP. No hidden issues. The code, when read by a human, is obvious in functionality, and can easily be added in an iteration loop.
procedure MainProc;
var
s: TfpgString;
c: TfpgChar;
begin
s := 'Hello World Ä🀰 o😋eo';
c := fpgCharAt(s, 13);
if c = 'Ä' then
writeln('True')
else
writeln('False');
end;
(tested and produces the correct results even if I use the Domino Tile or Emoji characters).
And if I had to write code that iterates over characters in a UTF-8 encoded string, I would use a dedicated UTF-8 String Iterator instead. This will be more optimised than the above code (keeping track of byte offsets etc), and will correctly handle the full Unicode range without issues. An example of such Iterator usage is:
var
itr: ICharIterator; // interface reference
s := TfpgString;
c := TfpgChar;
begin
s := 'Hello World Ä🀰 o😋eo';
itr := gIteratorFactory.CharIterator(s);
while itr.HasNext do
begin
c := itr.Next;
// do something with c
end;
The Iterator interface supports
HasNext,
Next, H
asPrevious,
Previous etc functions.
NOTE: This Forum software changed my string constants in the code examples to escaped sequences. They meant to display normal text strings containing Unicode characters.