Recent

Author Topic: Extended ASCII Chars Ord Value Questions  (Read 3118 times)

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Extended ASCII Chars Ord Value Questions
« on: August 18, 2019, 12:10:38 am »
I create the ASCII character set in a listbox using the following code. But when I try to convert the characters some don't convert back to the same integer value.


Code: Pascal  [Select][+][-]
  1. unit Unit1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs,
  9.   StdCtrls,  StrUtils, LazUTF8;
  10.  
  11. type
  12.  
  13.   { TForm1 }
  14.  
  15.   TForm1 = class(TForm)
  16.     Edit1: TEdit;
  17.     Edit2: TEdit;
  18.     Label1: TLabel;
  19.     ListBox1: TListBox;
  20.  
  21.     procedure FormCreate(Sender: TObject);
  22.     procedure ListBox1Click(Sender: TObject);
  23.  
  24.  
  25.   private
  26.  
  27.   public
  28.  
  29.   end;
  30.  
  31. var
  32.   Form1: TForm1;
  33.  
  34. implementation
  35.  
  36. {$R *.lfm}
  37.  
  38. { TForm1 }
  39.  
  40. procedure TForm1.ListBox1Click(Sender: TObject);
  41.  Var i : Integer = -1;
  42.   Bit1 : String;
  43.   Item : String;
  44.  
  45. begin
  46.  i := ListBox1.ItemIndex;
  47.  if (i = -1) or (i = 0) then begin Exit; end;
  48.  Item := Listbox1.Items[i];
  49.  
  50.  Bit1 := Copy2SpaceDel(item);
  51.  Item := Trim(Item);
  52.  Bit1 := Copy2SpaceDel(item);
  53.  Item := Trim(Item);
  54.  Bit1 := Copy2SpaceDel(item);
  55.  Item := Trim(Item);
  56.  
  57.  Label1.Caption :=  Item;
  58.  Edit1.Text := Item;
  59.  
  60.  Item := IntToStr(Ord(Item[1]));
  61.  Edit2.Text := Item;
  62.  i := i;
  63.  
  64.  
  65. end;
  66.  
  67. procedure TForm1.FormCreate(Sender: TObject);
  68. var
  69.   i: Integer;
  70. begin
  71.   ListBox1.Items.Add('Ascii ' + IntToStr(32) + ' = ' + 'Space'  );
  72.   for i := 33 to 255 do begin
  73.     ListBox1.Items.Add('Ascii ' + IntToStr(i) + ' =    ' + WinCPToUTF8(String(Chr(i)))  );
  74.   end;
  75. end;
  76.  
  77.  
  78.  
  79. end.
  80.  
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: Extended ASCII Chars Ord Value Questions
« Reply #1 on: August 18, 2019, 12:59:07 am »
I think you need to use CP850TOUTF8 function instead.

 it could also be CP437ToUTF8

 If you are trying for the old IBM / DOS sets I believe those are a good starting point.
The only true wisdom is knowing you know nothing

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Extended ASCII Chars Ord Value Questions
« Reply #2 on: August 18, 2019, 01:12:08 am »
 Yes, follow jamies hints.

To make it clear: You are working with utf8, which is the Lazarus standard. Utf8 and ASCII are only the same in [32..127].

In [128..255] you get the "latin supplement" - look here:

https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)

The days of ASCII, ANSI and IBM8 are gone.

Winni


jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: Extended ASCII Chars Ord Value Questions
« Reply #3 on: August 18, 2019, 01:16:03 am »
Hey, I resent that or wait, resemble that  :o

Yes my forehead is shiny these days!
The only true wisdom is knowing you know nothing

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: Extended ASCII Chars Ord Value Questions
« Reply #4 on: August 18, 2019, 02:45:51 am »
I am looking at the lconvEncoding file, looks like there is a lot of work in there, many case steps.

Wouldn't it be more efficient to use a 2 dim static array and do a quick scan on it?

Also I notice many CP..To..XX function call the same inner function but I don't see any inline attempt being made ? It would save on stack code allocations and speed things up. I mean you still need a to call a function but this is a double step instead of a single step.
 
 Could also do it like apple does, have a resource table in the bundle folder that it could read per code page and this could be easily edited for corrections or additions.

 Something to think about I guess.

The only true wisdom is knowing you know nothing

Handoko

  • Hero Member
  • *****
  • Posts: 5131
  • My goal: build my own game engine using Lazarus
Re: Extended ASCII Chars Ord Value Questions
« Reply #5 on: August 18, 2019, 08:02:01 am »
@JLWest

I think I have solved you issue.

Using my Character Map, these are what I found:
- ASCII #128 .. #191 will be mapped to #194 + C
- ASCII #192 .. #255 will be mapped to #195 + (C-64)
- ASCII #127 .. #160 are non-displayable characters (at least on my system)

My solution is to write 2 functions: ASCII2UTF8 and UTF82ASCII:

Code: Pascal  [Select][+][-]
  1. function ASCII2UTF8(C: Char): string;
  2. begin
  3.   Result := '';
  4.   case C of
  5.     #128..#191 : Result := chr(194) + C;
  6.     #192..#255 : Result := chr(195) + chr(Ord(C)-64);
  7.     else
  8.       Result := C;
  9.   end;
  10. end;
  11.  
  12. function ASCII2UTF8(B: Byte): string;
  13. begin
  14.   Result := ASCII2UTF8(chr(B));
  15. end;
  16.  
  17. function UTF82ASCII(const S: string): Char;
  18. var
  19.   C1, C2: Char;
  20. begin
  21.   Result := #0;
  22.   if Length(S) <= 1 then
  23.   begin
  24.     if S = '' then Exit;
  25.     Result := S[1];
  26.     Exit;
  27.   end;
  28.   C1 := S[1];
  29.   C2 := S[2];
  30.   case C1 of
  31.     #194 : Result := C2;
  32.     #195 : Result := chr(Ord(C2)+64);
  33.   end;
  34. end;

My solution was only tested on Ubuntu Mate GTK2, it may or may not works on Windows. Also, it does not try to correctly remap the characters $7F..$A0 as they are non displayable on my system (see img2), I have no clue how to map them.

Below is the whole source code:
Code: Pascal  [Select][+][-]
  1. unit Unit1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, Forms, Controls, StdCtrls, StrUtils;
  9.  
  10. type
  11.  
  12.   { TForm1 }
  13.  
  14.   TForm1 = class(TForm)
  15.     Edit1: TEdit;
  16.     Edit2: TEdit;
  17.     Label1: TLabel;
  18.     ListBox1: TListBox;
  19.     procedure FormCreate(Sender: TObject);
  20.     procedure ListBox1Click(Sender: TObject);
  21.   end;
  22.  
  23. var
  24.   Form1: TForm1;
  25.  
  26. implementation
  27.  
  28. {$R *.lfm}
  29.  
  30. { TForm1 }
  31.  
  32. function ASCII2UTF8(C: Char): string;
  33. begin
  34.   Result := '';
  35.   case C of
  36.     #128..#191 : Result := chr(194) + C;
  37.     #192..#255 : Result := chr(195) + chr(Ord(C)-64);
  38.     else
  39.       Result := C;
  40.   end;
  41. end;
  42.  
  43. function ASCII2UTF8(B: Byte): string;
  44. begin
  45.   Result := ASCII2UTF8(chr(B));
  46. end;
  47.  
  48. function UTF82ASCII(const S: string): Char;
  49. var
  50.   C1, C2: Char;
  51. begin
  52.   Result := #0;
  53.   if Length(S) <= 1 then
  54.   begin
  55.     if S = '' then Exit;
  56.     Result := S[1];
  57.     Exit;
  58.   end;
  59.   C1 := S[1];
  60.   C2 := S[2];
  61.   case C1 of
  62.     #194 : Result := C2;
  63.     #195 : Result := chr(Ord(C2)+64);
  64.   end;
  65. end;
  66.  
  67. procedure TForm1.ListBox1Click(Sender: TObject);
  68. Var
  69.   Item : string;
  70.   Bit1 : string;
  71.   i    : Integer = -1;
  72. begin
  73.   i := ListBox1.ItemIndex;
  74.   if (i = -1) or (i = 0) then Exit;
  75.   Item := Listbox1.Items[i];
  76.  
  77.   Bit1 := Copy2SpaceDel(item);
  78.   Item := Trim(Item);
  79.   Bit1 := Copy2SpaceDel(item);
  80.   Item := Trim(Item);
  81.   Bit1 := Copy2SpaceDel(item);
  82.   Item := Trim(Item);
  83.   Label1.Caption := Item;
  84.   Edit1.Text     := Item;
  85.  
  86.   Edit2.Text := Ord(UTF82ASCII(Item)).ToString;
  87. end;
  88.  
  89. procedure TForm1.FormCreate(Sender: TObject);
  90. var
  91.   i: Integer;
  92. begin
  93.   ListBox1.Items.Add('Ascii ' + IntToStr(32) + ' = ' + 'Space');
  94.   for i := 33 to 255 do
  95.     ListBox1.Items.Add('Ascii ' + IntToStr(i) + ' =    ' + ASCII2UTF8(i));
  96. end;
  97.  
  98. end.
« Last Edit: August 18, 2019, 08:03:39 am by Handoko »

Thaddy

  • Hero Member
  • *****
  • Posts: 14204
  • Probably until I exterminate Putin.
Re: Extended ASCII Chars Ord Value Questions
« Reply #6 on: August 18, 2019, 08:21:44 am »
Why not:
Code: Pascal  [Select][+][-]
  1. function cvAnsiToUni(const a:AnsiChar):UnicodeChar;inline;
  2. begin
  3.   Result := a; // compiler converts this.
  4. end;
  5.  

The unicodechar is assignment compatible to utf8char and this code will also work in console apps (needs an unicode  terminal);
« Last Edit: August 18, 2019, 08:48:38 am by Thaddy »
Specialize a type, not a var.

Handoko

  • Hero Member
  • *****
  • Posts: 5131
  • My goal: build my own game engine using Lazarus
Re: Extended ASCII Chars Ord Value Questions
« Reply #7 on: August 18, 2019, 08:34:52 am »
I've just test tested your suggestion. Unfortunately cvAnsiToUni only works on standard ASCII characters. On extended ASCII characters, it shows a question mark symbol.

Thaddy

  • Hero Member
  • *****
  • Posts: 14204
  • Probably until I exterminate Putin.
Re: Extended ASCII Chars Ord Value Questions
« Reply #8 on: August 18, 2019, 08:49:22 am »
Unexpected. should work.
Specialize a type, not a var.

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Extended ASCII Chars Ord Value Questions
« Reply #9 on: August 18, 2019, 08:58:03 am »
I am looking at the lconvEncoding file, looks like there is a lot of work in there, many case steps.

Wouldn't it be more efficient to use a 2 dim static array and do a quick scan on it?

Also I notice many CP..To..XX function call the same inner function but I don't see any inline attempt being made ? It would save on stack code allocations and speed things up. I mean you still need a to call a function but this is a double step instead of a single step.
 
 Could also do it like apple does, have a resource table in the bundle folder that it could read per code page and this could be easily edited for corrections or additions.

 Something to think about I guess.

I thought a resource file would be the thing unfortunately  I can't figure out how to set one up or use it.
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

munair

  • Hero Member
  • *****
  • Posts: 798
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Extended ASCII Chars Ord Value Questions
« Reply #10 on: August 18, 2019, 10:30:55 am »
This article explains very well why UTF8 and Extended Ascii (128..255) collide.
https://iconoun.com/articles/collisions/
keep it simple

Thaddy

  • Hero Member
  • *****
  • Posts: 14204
  • Probably until I exterminate Putin.
Re: Extended ASCII Chars Ord Value Questions
« Reply #11 on: August 18, 2019, 11:20:26 am »
I have this:
Code: Pascal  [Select][+][-]
  1. uses iconvenc;
  2.  
  3.   function AnsiCharToUnicode(const a:ansichar;cp:string ='CP1250'):string;inline;
  4.   begin
  5.     Result:='';
  6.     // should test for inconvert() = 0,
  7.     // but if the conversion fails result is still empty
  8.     iconvert(a,result,cp,'UTF-8');
  9.   end;
  10.  
Specialize a type, not a var.

wp

  • Hero Member
  • *****
  • Posts: 11856
Re: Extended ASCII Chars Ord Value Questions
« Reply #12 on: August 18, 2019, 11:45:57 am »
But when I try to convert the characters some don't convert back to the same integer value.

You encode the ANSI characters with WinCPToUTF8 for populating the listbox, but when you want to extract the numeric value back you do not call the inverse function UTF8ToCP. This is how it works:

Code: Pascal  [Select][+][-]
  1. procedure TForm1.ListBox1Click(Sender: TObject);
  2.  Var
  3.   i : Integer = -1;
  4.   Item : String;
  5.   ch: Char;
  6.   p: Integer;
  7.  
  8. begin
  9.   i := ListBox1.ItemIndex;
  10.   if (i = -1) then
  11.     Exit;
  12.  
  13.   Item := Listbox1.Items[i];
  14.   p := pos('=', Item);
  15.   Item := Trim(Copy(Item, p+1, MaxInt));
  16.  
  17.   Label1.Caption :=  Item;
  18.   Edit1.Text := Item;
  19.  
  20.   if Item = 'Space' then
  21.     ch := #32
  22.   else
  23.     // UTF8ToWinCP converts the string "Item" to an Ansistring consisting here of 1 character only.
  24.     // However, it cannot be applied to the function ord() because that requires a Char as argument.
  25.     // Therefore, we extract the first (and only) character of the 1-character string "Item".
  26.     ch := UTF8ToWinCP(Item)[1];
  27.   Edit2.Text := IntToStr(Ord(ch));
  28. end;
« Last Edit: August 18, 2019, 11:53:06 am by wp »

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Extended ASCII Chars Ord Value Questions
« Reply #13 on: August 18, 2019, 05:59:46 pm »
@WP
Yea I see. Well I wasn't aware there was a UTF8ToWinCp() function. The code I wrote (Or didn't write) was copied from this site and put together. 

I wasn't very sure if this was going to work. What I was after was a function that I could pass a character to and if it was an extended character it would  return  an ASCII. Something like this:

function TForm1.CharacterSwap(ASTRING : String) : String;
 Var i : Integer ;
  Item : String[1];
 Begin
 ?
  Result := Item.
 end;


 Question What's with this?  UTF8ToWinCP(Item)[1];
Item is a string and CP is a Character so I assume you are passing the first character of Item as a parameter to
UTF8ToWinCP.

Why wouldn't it be written UTF8ToWinCP(Item[1]); ?



FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: Extended ASCII Chars Ord Value Questions
« Reply #14 on: August 18, 2019, 06:47:23 pm »
The function accepts and returns a string.

In your case "item" is a string that represents a single character so there is no need to index it or nor should  you for the parameter.

 The returning type is also a string but you are setting  it to a CHAR which is only 1 byte which is why it's being index so that only a character is returned instead.

 Getting back to your project, it seems that you may still be working on the same one you were before, are you really sure the extended set isn't the old 850/437 code page? I don't thing 1251 supports all of those but I could be wrong, been there before  %)
The only true wisdom is knowing you know nothing

 

TinyPortal © 2005-2018