### Bookstore

 Computer Math and Games in Pascal (preview) Lazarus Handbook (preview only)

### Author Topic: Extended ASCII Chars Ord Value Questions  (Read 834 times)

#### JLWest

• Hero Member
• Posts: 627
##### Extended ASCII Chars Ord Value Questions
« on: August 18, 2019, 12:10:38 am »
I create the ASCII character set in a listbox using the following code. But when I try to convert the characters some don't convert back to the same integer value.

Code: Pascal  [Select]
1. unit Unit1;
2.
3. {\$mode objfpc}{\$H+}
4.
5. interface
6.
7. uses
8.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs,
9.   StdCtrls,  StrUtils, LazUTF8;
10.
11. type
12.
13.   { TForm1 }
14.
15.   TForm1 = class(TForm)
16.     Edit1: TEdit;
17.     Edit2: TEdit;
18.     Label1: TLabel;
19.     ListBox1: TListBox;
20.
21.     procedure FormCreate(Sender: TObject);
22.     procedure ListBox1Click(Sender: TObject);
23.
24.
25.   private
26.
27.   public
28.
29.   end;
30.
31. var
32.   Form1: TForm1;
33.
34. implementation
35.
36. {\$R *.lfm}
37.
38. { TForm1 }
39.
40. procedure TForm1.ListBox1Click(Sender: TObject);
41.  Var i : Integer = -1;
42.   Bit1 : String;
43.   Item : String;
44.
45. begin
46.  i := ListBox1.ItemIndex;
47.  if (i = -1) or (i = 0) then begin Exit; end;
48.  Item := Listbox1.Items[i];
49.
50.  Bit1 := Copy2SpaceDel(item);
51.  Item := Trim(Item);
52.  Bit1 := Copy2SpaceDel(item);
53.  Item := Trim(Item);
54.  Bit1 := Copy2SpaceDel(item);
55.  Item := Trim(Item);
56.
57.  Label1.Caption :=  Item;
58.  Edit1.Text := Item;
59.
60.  Item := IntToStr(Ord(Item[1]));
61.  Edit2.Text := Item;
62.  i := i;
63.
64.
65. end;
66.
67. procedure TForm1.FormCreate(Sender: TObject);
68. var
69.   i: Integer;
70. begin
71.   ListBox1.Items.Add('Ascii ' + IntToStr(32) + ' = ' + 'Space'  );
72.   for i := 33 to 255 do begin
73.     ListBox1.Items.Add('Ascii ' + IntToStr(i) + ' =    ' + WinCPToUTF8(String(Chr(i)))  );
74.   end;
75. end;
76.
77.
78.
79. end.
80.
FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

#### jamie

• Hero Member
• Posts: 2262
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #1 on: August 18, 2019, 12:59:07 am »
I think you need to use CP850TOUTF8 function instead.

it could also be CP437ToUTF8

If you are trying for the old IBM / DOS sets I believe those are a good starting point.
Number 1 at blue screen app creations!

#### winni

• Hero Member
• Posts: 719
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #2 on: August 18, 2019, 01:12:08 am »

To make it clear: You are working with utf8, which is the Lazarus standard. Utf8 and ASCII are only the same in [32..127].

In [128..255] you get the "latin supplement" - look here:

https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)

The days of ASCII, ANSI and IBM8 are gone.

Winni

#### jamie

• Hero Member
• Posts: 2262
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #3 on: August 18, 2019, 01:16:03 am »
Hey, I resent that or wait, resemble that

Yes my forehead is shiny these days!
Number 1 at blue screen app creations!

#### jamie

• Hero Member
• Posts: 2262
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #4 on: August 18, 2019, 02:45:51 am »
I am looking at the lconvEncoding file, looks like there is a lot of work in there, many case steps.

Wouldn't it be more efficient to use a 2 dim static array and do a quick scan on it?

Also I notice many CP..To..XX function call the same inner function but I don't see any inline attempt being made ? It would save on stack code allocations and speed things up. I mean you still need a to call a function but this is a double step instead of a single step.

Could also do it like apple does, have a resource table in the bundle folder that it could read per code page and this could be easily edited for corrections or additions.

Something to think about I guess.

Number 1 at blue screen app creations!

#### Handoko

• Hero Member
• Posts: 3296
• My goal: build my own game engine using Lazarus
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #5 on: August 18, 2019, 08:02:01 am »
@JLWest

I think I have solved you issue.

Using my Character Map, these are what I found:
- ASCII #128 .. #191 will be mapped to #194 + C
- ASCII #192 .. #255 will be mapped to #195 + (C-64)
- ASCII #127 .. #160 are non-displayable characters (at least on my system)

My solution is to write 2 functions: ASCII2UTF8 and UTF82ASCII:

Code: Pascal  [Select]
1. function ASCII2UTF8(C: Char): string;
2. begin
3.   Result := '';
4.   case C of
5.     #128..#191 : Result := chr(194) + C;
6.     #192..#255 : Result := chr(195) + chr(Ord(C)-64);
7.     else
8.       Result := C;
9.   end;
10. end;
11.
12. function ASCII2UTF8(B: Byte): string;
13. begin
14.   Result := ASCII2UTF8(chr(B));
15. end;
16.
17. function UTF82ASCII(const S: string): Char;
18. var
19.   C1, C2: Char;
20. begin
21.   Result := #0;
22.   if Length(S) <= 1 then
23.   begin
24.     if S = '' then Exit;
25.     Result := S[1];
26.     Exit;
27.   end;
28.   C1 := S[1];
29.   C2 := S[2];
30.   case C1 of
31.     #194 : Result := C2;
32.     #195 : Result := chr(Ord(C2)+64);
33.   end;
34. end;

My solution was only tested on Ubuntu Mate GTK2, it may or may not works on Windows. Also, it does not try to correctly remap the characters \$7F..\$A0 as they are non displayable on my system (see img2), I have no clue how to map them.

Below is the whole source code:
Code: Pascal  [Select]
1. unit Unit1;
2.
3. {\$mode objfpc}{\$H+}
4.
5. interface
6.
7. uses
8.   Classes, SysUtils, Forms, Controls, StdCtrls, StrUtils;
9.
10. type
11.
12.   { TForm1 }
13.
14.   TForm1 = class(TForm)
15.     Edit1: TEdit;
16.     Edit2: TEdit;
17.     Label1: TLabel;
18.     ListBox1: TListBox;
19.     procedure FormCreate(Sender: TObject);
20.     procedure ListBox1Click(Sender: TObject);
21.   end;
22.
23. var
24.   Form1: TForm1;
25.
26. implementation
27.
28. {\$R *.lfm}
29.
30. { TForm1 }
31.
32. function ASCII2UTF8(C: Char): string;
33. begin
34.   Result := '';
35.   case C of
36.     #128..#191 : Result := chr(194) + C;
37.     #192..#255 : Result := chr(195) + chr(Ord(C)-64);
38.     else
39.       Result := C;
40.   end;
41. end;
42.
43. function ASCII2UTF8(B: Byte): string;
44. begin
45.   Result := ASCII2UTF8(chr(B));
46. end;
47.
48. function UTF82ASCII(const S: string): Char;
49. var
50.   C1, C2: Char;
51. begin
52.   Result := #0;
53.   if Length(S) <= 1 then
54.   begin
55.     if S = '' then Exit;
56.     Result := S[1];
57.     Exit;
58.   end;
59.   C1 := S[1];
60.   C2 := S[2];
61.   case C1 of
62.     #194 : Result := C2;
63.     #195 : Result := chr(Ord(C2)+64);
64.   end;
65. end;
66.
67. procedure TForm1.ListBox1Click(Sender: TObject);
68. Var
69.   Item : string;
70.   Bit1 : string;
71.   i    : Integer = -1;
72. begin
73.   i := ListBox1.ItemIndex;
74.   if (i = -1) or (i = 0) then Exit;
75.   Item := Listbox1.Items[i];
76.
77.   Bit1 := Copy2SpaceDel(item);
78.   Item := Trim(Item);
79.   Bit1 := Copy2SpaceDel(item);
80.   Item := Trim(Item);
81.   Bit1 := Copy2SpaceDel(item);
82.   Item := Trim(Item);
83.   Label1.Caption := Item;
84.   Edit1.Text     := Item;
85.
86.   Edit2.Text := Ord(UTF82ASCII(Item)).ToString;
87. end;
88.
89. procedure TForm1.FormCreate(Sender: TObject);
90. var
91.   i: Integer;
92. begin
93.   ListBox1.Items.Add('Ascii ' + IntToStr(32) + ' = ' + 'Space');
94.   for i := 33 to 255 do
95.     ListBox1.Items.Add('Ascii ' + IntToStr(i) + ' =    ' + ASCII2UTF8(i));
96. end;
97.
98. end.
« Last Edit: August 18, 2019, 08:03:39 am by Handoko »

• Hero Member
• Posts: 9436
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #6 on: August 18, 2019, 08:21:44 am »
Why not:
Code: Pascal  [Select]
1. function cvAnsiToUni(const a:AnsiChar):UnicodeChar;inline;
2. begin
3.   Result := a; // compiler converts this.
4. end;
5.

The unicodechar is assignment compatible to utf8char and this code will also work in console apps (needs an unicode  terminal);
« Last Edit: August 18, 2019, 08:48:38 am by Thaddy »
also related to equus asinus.

#### Handoko

• Hero Member
• Posts: 3296
• My goal: build my own game engine using Lazarus
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #7 on: August 18, 2019, 08:34:52 am »
I've just test tested your suggestion. Unfortunately cvAnsiToUni only works on standard ASCII characters. On extended ASCII characters, it shows a question mark symbol.

• Hero Member
• Posts: 9436
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #8 on: August 18, 2019, 08:49:22 am »
Unexpected. should work.
also related to equus asinus.

#### JLWest

• Hero Member
• Posts: 627
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #9 on: August 18, 2019, 08:58:03 am »
I am looking at the lconvEncoding file, looks like there is a lot of work in there, many case steps.

Wouldn't it be more efficient to use a 2 dim static array and do a quick scan on it?

Also I notice many CP..To..XX function call the same inner function but I don't see any inline attempt being made ? It would save on stack code allocations and speed things up. I mean you still need a to call a function but this is a double step instead of a single step.

Could also do it like apple does, have a resource table in the bundle folder that it could read per code page and this could be easily edited for corrections or additions.

Something to think about I guess.

I thought a resource file would be the thing unfortunately  I can't figure out how to set one up or use it.
FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

#### Munair

• Sr. Member
• Posts: 479
• KISS (keep it simple, smart)
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #10 on: August 18, 2019, 10:30:55 am »
This article explains very well why UTF8 and Extended Ascii (128..255) collide.
https://iconoun.com/articles/collisions/
Lazarus 2.0.6; Manjaro, Windows

• Hero Member
• Posts: 9436
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #11 on: August 18, 2019, 11:20:26 am »
I have this:
Code: Pascal  [Select]
1. uses iconvenc;
2.
3.   function AnsiCharToUnicode(const a:ansichar;cp:string ='CP1250'):string;inline;
4.   begin
5.     Result:='';
6.     // should test for inconvert() = 0,
7.     // but if the conversion fails result is still empty
8.     iconvert(a,result,cp,'UTF-8');
9.   end;
10.
also related to equus asinus.

#### wp

• Hero Member
• Posts: 6661
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #12 on: August 18, 2019, 11:45:57 am »
But when I try to convert the characters some don't convert back to the same integer value.

You encode the ANSI characters with WinCPToUTF8 for populating the listbox, but when you want to extract the numeric value back you do not call the inverse function UTF8ToCP. This is how it works:

Code: Pascal  [Select]
1. procedure TForm1.ListBox1Click(Sender: TObject);
2.  Var
3.   i : Integer = -1;
4.   Item : String;
5.   ch: Char;
6.   p: Integer;
7.
8. begin
9.   i := ListBox1.ItemIndex;
10.   if (i = -1) then
11.     Exit;
12.
13.   Item := Listbox1.Items[i];
14.   p := pos('=', Item);
15.   Item := Trim(Copy(Item, p+1, MaxInt));
16.
17.   Label1.Caption :=  Item;
18.   Edit1.Text := Item;
19.
20.   if Item = 'Space' then
21.     ch := #32
22.   else
23.     // UTF8ToWinCP converts the string "Item" to an Ansistring consisting here of 1 character only.
24.     // However, it cannot be applied to the function ord() because that requires a Char as argument.
25.     // Therefore, we extract the first (and only) character of the 1-character string "Item".
26.     ch := UTF8ToWinCP(Item)[1];
27.   Edit2.Text := IntToStr(Ord(ch));
28. end;
« Last Edit: August 18, 2019, 11:53:06 am by wp »
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

#### JLWest

• Hero Member
• Posts: 627
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #13 on: August 18, 2019, 05:59:46 pm »
@WP
Yea I see. Well I wasn't aware there was a UTF8ToWinCp() function. The code I wrote (Or didn't write) was copied from this site and put together.

I wasn't very sure if this was going to work. What I was after was a function that I could pass a character to and if it was an extended character it would  return  an ASCII. Something like this:

function TForm1.CharacterSwap(ASTRING : String) : String;
Var i : Integer ;
Item : String[1];
Begin
?
Result := Item.
end;

Question What's with this?  UTF8ToWinCP(Item)[1];
Item is a string and CP is a Character so I assume you are passing the first character of Item as a parameter to
UTF8ToWinCP.

Why wouldn't it be written UTF8ToWinCP(Item[1]); ?

FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

#### jamie

• Hero Member
• Posts: 2262
##### Re: Extended ASCII Chars Ord Value Questions
« Reply #14 on: August 18, 2019, 06:47:23 pm »
The function accepts and returns a string.

In your case "item" is a string that represents a single character so there is no need to index it or nor should  you for the parameter.

The returning type is also a string but you are setting  it to a CHAR which is only 1 byte which is why it's being index so that only a character is returned instead.

Getting back to your project, it seems that you may still be working on the same one you were before, are you really sure the extended set isn't the old 850/437 code page? I don't thing 1251 supports all of those but I could be wrong, been there before
Number 1 at blue screen app creations!