The thing is, one can still store UTF-8 in short strings, so the size limit can be confusing.
The size-specifier in "string[n]" always gives the size in "char", where "char" is the Pascal type "char" which is 1 byte in size.
The Pascal size "char" is not the same with what a human may perceive as a char (in Unicode, or multi-byte ASCII).
In the same way length, setlength, copy, and the index on any string (short, long, wide) is based an the relevant Pascal types.
That is "char" for short, ansi, unicode string. And that is widechar for widestring. ...
And just to be clear "widechar" is
not the same as (what is perceived as a) Unicode char.
In Utf16 some Unicode-Codepoints ("Chars") are represented as surrogates. That takes two code-units. I.e. two widechar.
And in Unicode (independent of the transfer encoding) there are plenty of chars (human perceived chars) that are represented by several codepoints (using combining codepoints). The also need several char or widechar.
The term "char" is a very loose descriptor....
char can be
- Pascal type char (usually holding a Unicode transfer encoding Code-Unit.)
- Unicode codepoint (i.e. independent of transfer encoding / just U####). Potentially even a stand alone combining codepoint).
- Human perceived token (usually a Unicode entity with/without combining codepoints / but not necessarily limited to that)
- In rare (and even more loose) context to describe a Glyph. Where a glyph can represent one or more Unicode entities.
This list is just meant as example... It is neither complete nor correct => as in exceptions can probably be found to any of the statements. Which is to expected for a term ("char") that can be (and has been) used for nearly anything.