Thanks a lot to all for your many replies. I will answer them ony by one.
@Thaddy:
Yes, variables of type AnsiString have a split functionality by TStringHelper, but variables of your type 'UTF8String' unfortunately seem not to have this (FPC 3.0.4).
And all this split functions need a 'separator'. But I want to split an UTF8-string into all characters, so how could this work?
@Bart:
I tried to use Unit MaskEdit, but the Compiler showed me many Compiler-Errors like:
/usr/share/lazarus/1.8.4/lcl/units/x86_64-linux/wsmenus.o: In Funktion »REGISTERMENUITEM«:
/home/mattias/tmp/lazarus-project1.8.4/lazarus-project_build/usr/share/lazarus/1.8.4/lcl//widgetset/wsmenus.pp:221: Warnung: undefinierter Verweis auf »WSRegisterMenuItem«
/usr/share/lazarus/1.8.4/lcl/units/x86_64-linux/wsmenus.o: In Funktion »REGISTERMENU«:
/home/mattias/tmp/lazarus-project1.8.4/lazarus-project_build/usr/share/lazarus/1.8.4/lcl//widgetset/wsmenus.pp:232: Warnung: undefinierter Verweis auf »WSRegisterMenu«
Strange is, that I don't have a folder /home/mattias/ and never had.
But then I looked into Unit MaskEdit and saw, that this is a GUI Unit, while I want a solution for a console program. Sorry that I did not mention this (I thought it would make no difference).
In Unit MaskEdit I found your recommended function GetCodePoint() and first thought, I could make a copy of it, but it needs Unit LazUTF8, which I want to avoid, because:
With Unit LazUTF8 I faced a lot of problems and disadvantages in the past. Some Examples I remember immediately:
- on Windows in console programs adding Unit LazUTF8 changes the charset of the filenames reported by sysutils.FindFirst() and FindNext() and the charset of the results of ParamStr() and the results of readln(). Without Unit LazUTF8 they return Windows-Charset (Ansi 1252?) / with Unit LazUTF8 they return UTF8. This difference makes live not easy.
- during decades I have written a couple of common libraries for console programs, which partly have problems with this differing charsets, so I get wrong results with Unit LazUTF8 because of the changed charset
- for a couple of programs I still use FPC 2.6.4 (with the same common libraries), but in FPC 2.6.4 obove functions never return UTF8 on Windows
- Windows-charset generally is much easier than UTF8 (as we see now), because each char is only 1 byte long
- in some cases (but not always) I had problems, that writeln() for an 'ansistring' showed damaged characters for Ä Ö Ü ä ö ü ß after I added Unit LazUTF8.
And there were more issues, which I only don't remember in a sudden. So I want to avoid Unit LazUTF8, especially in console programs, wherever possible.
Question:
Is Unit LazUTF8 the only way in FPC to have such primitive UTF8-functions which I want now?
@wp: Thank you for your demos.
Your 1st demo = TForm1.Button1Click() works in a GUI program, but not in a console program (FPC 3.0.4). There Length(ch) is always = 1. Do you have an idea why? Currently I want this in a console program. Sorry that I did not mention this (I thought it would make no difference).
Correction: now it works in a console program too. My fault. Sorry for confusion.
Your 2nd demo = TForm1.Button2Click() works also in a console program. But I want to compare 2 UTF8-strings in a loop, character by character, so that I can do some action for every character, depending if both are equal or not. So I need a function, which returns the n'th character of an UTF8-string. What I found is UTF8Copy(), but it needs Unit LazUTF8, which I want to avoid if possible (see above) for such a primitive usage.
In your 3nd demo = TForm1.Button3Click() we misunderstood: what I searched was something similar to LazUTF8.UTF8Copy().
@all:
For tonight I must stop. I will check the other replies tomorrow and answer then. Have a good night.