Question What's with this? UTF8ToWinCP(Item)[1];
Item is a string and CP is a Character so I assume you are passing the first character of Item as a parameter to
UTF8ToWinCP.
Why wouldn't it be written UTF8ToWinCP(Item[1]); ?
"CP" here does not mean "character" but "codepage". The function UTF8ToWinCP converts a string from UTF8-encoding to the codepage used by your windows. While in the codepage-based encoding each "character" is 1 byte, a "character" in UTF8 consists of up to 4 bytes, i.e. the concept of the char datatype is not applicable for UTF8-encoded strings.
For some reason you want to create the character map of the characters on your code page.
Since all Lazarus controls work with UTF8 we convert the codepage-based characters from the Windows codepage to UTF8 (WinCPToUTF8) and display them in a listbox. This way the codepage character 'Á' (ordinal value 193, or $C1) becomes the utf8 string #$C3#$81 (which is displayed as 'Á' in the Listbox) (use the Lazarus character map to verify these values!).
When the user clicks on a listbox item we want to display the ordinal value of the displayed utf8 "character". In order to determine the ordinal value we use the "ord()" function which gets a Pascal char as input parameter. But: the "character" selected in the listbox is not a Pascal char, but a UTF8 string. The string was created by the function WinCPToUTF8, therefore we apply the inverse function UTF8ToWinCP to convert the string from UTF8 to the system code-page: it takes a UTF8 string as input parameter and returns its code-page encoded ansistring counterpart. In above example, the input string would be #$C3#$81, and the output string would be #$C1. Although the output string consists only of a single character it is still a string and thus not accepted by the "ord()" function which wants a char variable. Therefore, we extract the first byte of the string which is the equivalent of a char variable - this happens by applying the "[1]" to the string. Since the string consists of only a single character nothing is lost when doing so.
Therefore, the entire determination of the ordinal value of the selected listbox item has to be done like this (here, step by step):
var
s: String;
ch: char;
ordVal: Integer;
...
// Item is the UTF8-equivalent string of a code-page character.
s := UTF8ToWinCP(Item); // s has the encoding of the code page
ch := s[1]; // use only the 1st character of the code-page string as a Pascal char variable, well, it's the only character here
ordVal := ord(ch); // determine the ordinal value of this char variable
You are asking about some similar sequence:
There are two mistakes:
- Item is a utf8 string. You extract the first byte of it - in the case of above example you destroy the UTF8 encoding this way. Because #$C3 is only valid in combination with the #$81!
- Item[1] is a Pascal char data type. But the function UTF8ToWinCP wants a string as parameter. A string is not a char even if it consists of a single character