Recent

Author Topic: Extended ASCII Chars Ord Value Questions  (Read 742 times)

wp

  • Hero Member
  • *****
  • Posts: 6465
Re: Extended ASCII Chars Ord Value Questions
« Reply #15 on: August 18, 2019, 08:05:50 pm »
Question What's with this?  UTF8ToWinCP(Item)[1];
Item is a string and CP is a Character so I assume you are passing the first character of Item as a parameter to
UTF8ToWinCP.

Why wouldn't it be written UTF8ToWinCP(Item[1]); ?
"CP" here does not mean "character" but "codepage". The function UTF8ToWinCP converts a string from UTF8-encoding to the codepage used by your windows. While in the codepage-based encoding each "character" is 1 byte, a "character" in UTF8 consists of up to 4 bytes, i.e. the concept of the char datatype is not applicable for UTF8-encoded strings.

For some reason you want to create the character map of the characters on your code page.

Since all Lazarus controls work with UTF8 we convert the codepage-based characters from the Windows codepage to UTF8 (WinCPToUTF8) and display them in a listbox. This way the codepage character 'Á' (ordinal value 193, or $C1) becomes the utf8 string #$C3#$81 (which is displayed as 'Á' in the Listbox) (use the Lazarus character map to verify these values!).

When the user clicks on a listbox item we want to display the ordinal value of the displayed utf8 "character". In order to determine the ordinal value we use the "ord()" function which gets a Pascal char as input parameter. But: the "character" selected in the listbox is not a Pascal char, but a UTF8 string. The string was created by the function WinCPToUTF8, therefore we apply the inverse function UTF8ToWinCP to convert the string from UTF8 to the system code-page: it takes a UTF8 string as input parameter and returns its code-page encoded ansistring counterpart. In above example, the input string would be #$C3#$81, and the output string would be #$C1. Although the output string consists only of a single character it is still a string and thus not accepted by the "ord()" function which wants a char variable. Therefore, we extract the first byte of the string which is the equivalent of a char variable - this happens by applying the "[1]" to the string. Since the string consists of only a single character nothing is lost when doing so.

Therefore, the entire determination of the ordinal value of the selected listbox item has to be done like this (here, step by step):
Code: Pascal  [Select]
  1. var
  2.   s: String;
  3.   ch: char;
  4.   ordVal: Integer;
  5. ...
  6.   // Item is the UTF8-equivalent string of a code-page character.
  7.   s := UTF8ToWinCP(Item);   // s has the encoding of the code page
  8.   ch := s[1];   // use only the 1st character of the code-page string as a Pascal char variable, well, it's the only character here
  9.   ordVal := ord(ch);  // determine the ordinal value of this char variable

You are asking about some similar sequence:
Code: Pascal  [Select]
  1. UTF8ToWinCP(Item[1]);
There are two mistakes:
  • Item is a utf8 string. You extract the first byte of it - in the case of above example you destroy the UTF8 encoding this way. Because #$C3 is only valid in combination with the #$81!
  • Item[1] is a Pascal char data type. But the function  UTF8ToWinCP wants a string as parameter. A string is not a char even if it consists of a single character
 
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

JLWest

  • Hero Member
  • *****
  • Posts: 612
Re: Extended ASCII Chars Ord Value Questions
« Reply #16 on: August 19, 2019, 12:16:34 am »
The function accepts and returns a string.

In your case "item" is a string that represents a single character so there is no need to index it or nor should  you for the parameter.

 The returning type is also a string but you are setting  it to a CHAR which is only 1 byte which is why it's being index so that only a character is returned instead.

 Getting back to your project, it seems that you may still be working on the same one you were before, are you really sure the extended set isn't the old 850/437 code page? I don't thing 1251 supports all of those but I could be wrong, been there before  %)

Jamie I'm not really that sure of anything when it comes to character sets, code pages and character conversions.
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

JLWest

  • Hero Member
  • *****
  • Posts: 612
Re: Extended ASCII Chars Ord Value Questions
« Reply #17 on: August 19, 2019, 12:24:22 am »
@All

I'll have to play with this a bit to try ad figure it out.

Thanks
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB