Recent

Author Topic: Dynamic Array of String / unicode characters?  (Read 4162 times)

magu

  • New Member
  • *
  • Posts: 36
Dynamic Array of String / unicode characters?
« on: August 29, 2016, 10:36:48 pm »
Is it possible to create a dynamic array of unicode characters?

I have a "string" of unicode characters, which I wanted to split into there individual characters, so I
used the following:

uText : Array[0..64] of String;

each element of the Array contains one unicode character (inserted by using UTF8Copy(s,i,1)).

I would like this array to be created dynamically, but if I understood correctly the size of each unicode character varies, so how do I establish the length for the 'SetLength' statement?
And how do I ensure that each element will contain exactly one unicode character?

Thank you
Magu

lainz

  • Hero Member
  • *****
  • Posts: 4468
    • https://lainz.github.io/
Re: Dynamic Array of String / unicode characters?
« Reply #1 on: August 29, 2016, 11:22:40 pm »
each unicode caracter can be 1 to 4 bytes if i'm not wrong.

UTF8Length gets the entire length of any string or character.

Remy Lebeau

  • Hero Member
  • *****
  • Posts: 1314
    • Lebeau Software
Re: Dynamic Array of String / unicode characters?
« Reply #2 on: August 30, 2016, 02:45:25 am »
each unicode caracter can be 1 to 4 bytes if i'm not wrong.

That is true for UTF-8 encoded strings.  For UTF-16 encoded strings, Unicode characters can be 2 or 4 bytes, depending on use of surrogate pairs.

UTF8Length gets the entire length of any string or character.

For a UTF-8 encoded string.  A string length in UTF-8 is different than a string length in UTF-16.
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

lainz

  • Hero Member
  • *****
  • Posts: 4468
    • https://lainz.github.io/
Re: Dynamic Array of String / unicode characters?
« Reply #3 on: August 30, 2016, 02:56:51 am »
each unicode caracter can be 1 to 4 bytes if i'm not wrong.

That is true for UTF-8 encoded strings.  For UTF-16 encoded strings, Unicode characters can be 2 or 4 bytes, depending on use of surrogate pairs.

UTF8Length gets the entire length of any string or character.

For a UTF-8 encoded string.  A string length in UTF-8 is different than a string length in UTF-16.

It's ok, I mean default strings in Lazarus.

Fungus

  • Sr. Member
  • ****
  • Posts: 353
Re: Dynamic Array of String / unicode characters?
« Reply #4 on: August 30, 2016, 01:20:28 pm »
UTF8Length() returns the number of "code points" - or individual characters. This might be less than Length() which returns the total number of bytes in the string.

@OP: I do not really understand the question, but what you want is an array of string where the number of strings should be allocated dynamically? If that is the case, the obvious sollution would be to use a TStringList :-)
« Last Edit: August 30, 2016, 01:33:54 pm by Fungus »

 

TinyPortal © 2005-2018