Recent

Author Topic: TStringList Delimiter Char  (Read 1922 times)

jcmontherock

  • Sr. Member
  • ****
  • Posts: 277
TStringList Delimiter Char
« on: March 04, 2024, 11:35:02 pm »
Could we use any characters as string delimiter ? I try '§' without success.
Windows 11 UTF8-64 - Lazarus 4.0RC2-64 - FPC 3.2.2

jamie

  • Hero Member
  • *****
  • Posts: 6800
Re: TStringList Delimiter Char
« Reply #1 on: March 04, 2024, 11:57:52 pm »
That is unicode, its more than a single character.

I believe there is a UTF8StringList somewhere in the collection, that may do it.?
The only true wisdom is knowing you know nothing

jamie

  • Hero Member
  • *****
  • Posts: 6800
Re: TStringList Delimiter Char
« Reply #2 on: March 05, 2024, 12:10:30 am »
Just looking and found TstringlistUtf8GFast, but that does not seem to allow me to set a Utf8Char, instead its still char.

You could use   yourString.Slip, that should work.
The only true wisdom is knowing you know nothing

Bart

  • Hero Member
  • *****
  • Posts: 5510
    • Bart en Mariska's Webstek
Re: TStringList Delimiter Char
« Reply #3 on: March 05, 2024, 09:42:20 am »
Just looking and found TstringlistUtf8GFast, but that does not seem to allow me to set a Utf8Char, instead its still char.

You could use   yourString.Slip, that should work.

Problems with your keyboard, or with your coordination  O:-)

Bart

jcmontherock

  • Sr. Member
  • ****
  • Posts: 277
Re: TStringList Delimiter Char
« Reply #4 on: March 05, 2024, 10:01:26 pm »
What is: String.Slip ?
Windows 11 UTF8-64 - Lazarus 4.0RC2-64 - FPC 3.2.2

TRon

  • Hero Member
  • *****
  • Posts: 3930
Re: TStringList Delimiter Char
« Reply #5 on: March 05, 2024, 10:07:27 pm »
I think that jamie wanted to write yourString.Split  (but somehow managed to flip his keyboard during the process) :)
« Last Edit: March 05, 2024, 10:09:51 pm by TRon »
I do not have to remember anything anymore thanks to total-recall.

Bart

  • Hero Member
  • *****
  • Posts: 5510
    • Bart en Mariska's Webstek
Re: TStringList Delimiter Char
« Reply #6 on: March 05, 2024, 10:11:34 pm »
What is: String.Slip ?
String.Split
Code: Pascal  [Select][+][-]
  1. uses
  2.   SysUtils;
  3. var
  4.   Arr: TStringArray;
  5.   S: String;
  6.   i: Integer;
  7. begin
  8.   S := 'abc§def§ghi§jkl§';
  9.   Arr := S.Split('§', TStringSplitOptions.ExcludeLastEmpty);
  10.   for i := Low(Arr) to High(Arr) do writeln(Arr[i]);
  11. end.

This will split the string S, using '§' as the separator.
Split accepts both chars and strings as separator.
TStringSplitOptions.ExcludeLastEmpty ensures that the last value will not be an empty string.

Bart

Zvoni

  • Hero Member
  • *****
  • Posts: 2821
Re: TStringList Delimiter Char
« Reply #7 on: March 06, 2024, 08:15:58 am »
Errrr..... someone will have to explain to me, how "§" (=ASCII 167) is Unicode/UTF8?!?!?!
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

dseligo

  • Hero Member
  • *****
  • Posts: 1458
Re: TStringList Delimiter Char
« Reply #8 on: March 06, 2024, 10:14:18 am »
Errrr..... someone will have to explain to me, how "§" (=ASCII 167) is Unicode/UTF8?!?!?!

It is U+00A7:

Code: Pascal  [Select][+][-]
  1. var
  2.   s: String = '§';
  3.  
  4. begin
  5.   WriteLn(Length(s)); // 2
  6.   WriteLn(Ord(s[1])); // 194
  7.   WriteLn(Ord(s[2])); // 167
  8. end.

P.S.: I tried this in Lazarus 3.2.
« Last Edit: March 06, 2024, 10:16:18 am by dseligo »

wp

  • Hero Member
  • *****
  • Posts: 12589
Re: TStringList Delimiter Char
« Reply #9 on: March 06, 2024, 10:38:29 am »
someone will have to explain to me, how "§" (=ASCII 167) is Unicode/UTF8?!?!?!
Only characters below #128 are 1 byte. Anything above may depend on the code page. #167 is `§`on the CP125x code pages, but on CP437 (original IBM) or 850 (Latin 1) it is 'º', and on CP852 (Latin 2) it is 'ž'. Therefore, it consists of at least two bytes in UTF8. A nice tool is the Lazarus character map in menu Edit.
« Last Edit: March 06, 2024, 10:41:08 am by wp »

Zoran

  • Hero Member
  • *****
  • Posts: 1899
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: TStringList Delimiter Char
« Reply #10 on: March 06, 2024, 03:56:48 pm »
Errrr..... someone will have to explain to me, how "§" (=ASCII 167) is Unicode/UTF8?!?!?!

I'll try. :)

In short:

It is not ASCII 167.
This character is encoded as (in hex):
  - a7 (167 decimal) in cp1250 (not ASCII!),
  - 00 a7 in UTF-16
  - c2 a7 in UTF-8
  - has no representation in ASCII


Longer explanation:

This is not ASCII 167. There is no ASCII 167.
ASCII is 7-bit encoding. There are no values above 127 in ASCII!
There are several so called ANSI 8-bit encodings (code pages), also known as Windows code pages. You probably mixed up one of these. This character is indeed encoded as 167 in cp1250 (sometimes called win-1250) which is the ANSI page used for Croat alphabet, as well as for several other east European latin languages -- Check, Slovak, Hungarian, Polish... For other languages there are other ANSI code pages, such as cp1252 for west european latin languages, cp1251 for cyrilic languages, cp1253 for Greek, etc.
It is not ASCII!

All these 8-bit ANSI encodings are compatible with ASCII in the first 128 characters, which have values below 127. Every ANSI page, such as cp1250 mentioned above, have the same first 128 characters (0-127); these are taken from ASCII. However, they differ in values above 128 (in these position above 128 -- cp1250 has letters used in east european latin languages, such as č, ć, đ ..., cp1252 has letters used in west european languages, such as ü, ö, ç, ..., cp1251 has cyrilic letters there, etc.).

Furthermore, these ASCII values up to 127 are also encoded the same in utf-8, but these are only characters which have one-byte encodings in utf-8.

Any character which appears in some ANSI (not ASCII!) encoding with a value 128 or above (that is, which has the bit 7 set), such as § which is encoded as 167 in cp1250, is represented in utf-8 with at least two bytes.
Swan, ZX Spectrum emulator https://github.com/zoran-vucenovic/swan

dseligo

  • Hero Member
  • *****
  • Posts: 1458
Re: TStringList Delimiter Char
« Reply #11 on: March 06, 2024, 04:27:53 pm »
It is not ASCII 167.
This character is encoded as (in hex):
  - a7 (167 decimal) in cp1250 (not ASCII!),
  - 00 a7 in UTF-16
  - c2 a7 in UTF-8
  - has no representation in ASCII

And in command prompt of my Windows 11, § sign has value 245 (I have code page 852).

jcmontherock

  • Sr. Member
  • ****
  • Posts: 277
[Solved] TStringList Delimiter Char
« Reply #12 on: March 06, 2024, 04:56:15 pm »
Thanks to everybody.
I finally found that a character like '§' or '°', in windows 11 running in utf-8 are not a really a char, using 1 byte. 
Windows 11 UTF8-64 - Lazarus 4.0RC2-64 - FPC 3.2.2

 

TinyPortal © 2005-2018