ASCII characters Questions

JLWest

Hero Member
Posts: 1293

The word I'm trying to display in ASCII Code is 'Afrânio'.

The Demo program gives me:
'A' = 65 Ok
'f' = 102 OK
'r' = 114 OK
'├' = 195 for the 4th character ? Have No Idea
' ' = 162 for the 5 th character shows as a blank it as a 162
'n' = 110 shows 6th char as an 'n' but 5th in Afranio
'i' = 111 shows 7th char as an 'i' 'but 6 in Afrânio'.
'o' = 111 shows 8th char as an 'o' but 7th in Afrânio'.

Line 155 (Lgth := Length(RCD)

FPC says there are 8 characters. I count 7 visually.
There is something going on I can't figure out.

Code: Pascal [Select][+]

unit Unit1;
 
{$mode objfpc}{$H+}
 
interface
 
uses
  Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls;
 type
 
  { TForm1 }
 
  TForm1 = class(TForm)
    btnName: TButton;
    Edit1: TEdit;
    Edit2: TEdit;
    Edit3: TEdit;
    Edit4: TEdit;
    Edit5: TEdit;
 
  procedure btnNameClick(Sender: TObject);
  procedure Convert;
  procedure FormCreate(Sender: TObject);
 
  private
 
  public
 
  end;
 
var
  Form1: TForm1;
 
implementation
 
{$R *.lfm}
 
  procedure TForm1.FormCreate(Sender: TObject);
  Var S : String =  'Afrânio';
   begin
    Edit3.Text := S;
    Edit1.Text := '';
    Edit2.Text := '';
    Edit4.Text := '';
    Edit5.Text := '';
  end;
 
 procedure TForm1.Convert;
  Var  idx : integer = -1;
   RCD : String[10] = 'Afrânio';
   AChar : string[1] = '';
   i : Integer;
   Lgth : Integer;
  begin
     Lgth := Length(RCD);
    for idx := 1 to Lgth do begin
       Edit4.Text := RCD[IDX];
       Edit5.Text := IntToStr(Idx);
       i := (Ord(RCD[idx]));
       AChar := (IntToStr(Ord(RCD[idx])));
       Edit1.Text := IntToStr(i);
       Showmessage('');
      end;
    Edit1.Text := '';
    Edit2.Text := '';
    Edit4.Text := '';
    Edit5.Text := '';
  end;
 
 procedure TForm1.btnNameClick(Sender: TObject);
   Var S : String =  'Afrânio';
  begin
   Convert;
  end;
end.
                               

Logged

FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

Birger52

Sr. Member
Posts: 309

Re: ASCII characters Questions

« Reply #1 on: May 11, 2019, 03:04:04 pm »

â is 226 ASCII

But your sting is utf8, not ASCII
In utf8 â is c3 a2 (195 162) - two bytes. (https://www.i18nqa.com/debug/utf8-debug.html)
So the (byte)length of your sting is 8 - not 7
195 is Ã and 162 is ¢ in ASCII (https://www.rapidtables.com/code/text/ascii-table.html)

You probably need to use some other type than string...
And past that, I'm afraid you need to look to someone else for explanations/solutions.
Not that I won't - I do not know.

Google "lazarus charater sets" for instance...

« Last Edit: May 11, 2019, 03:19:23 pm by Birger52 »

Logged

Lazarus 2.0.8 FPC 3.0.4
Win7 64bit
Playing and learning - strictly for my own pleasure.

JLWest

Hero Member
Posts: 1293

Re: ASCII characters Questions

« Reply #2 on: May 11, 2019, 03:42:31 pm »

ASCII 226 Alt 226 = 'Γ'

I really don't understand this.

Logged

FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

Zoran

Hero Member
Posts: 1831

Re: ASCII characters Questions

« Reply #3 on: May 11, 2019, 05:20:52 pm »

Quote from: JLWest on May 11, 2019, 03:42:31 pm

ASCII 226 Alt 226 = 'Γ'

I really don't understand this.

First, note that within one byte (8 bits), it is not possible to have more than 256 (2⁸) different characters.

There are different character encodings. Some of them use only one byte (or even less, as ASCII, see bellow), and some of them use more, to be able to have more characters encoded.
Some encodings have variable length, the example of this is popular UTF-8 encoding.

So, some encondigs cannot show letter â, and some of them can.

ASCII standard is 7-bit encoding, so it has only 128 codes. It contains english upper and lower latin letters A-Z, a-z, digits 0-9, standard punctuation charactes, and control characters.
ASCII does not contain letter â.
Note that the fact that my text here contains the letter â means that this forum use another encoding, not plain ASCII.

Historicaly, it was a good solution for English language, Americans created it and it was all they need.

However, as our world (still) uses other languages, and most of these languages use more characters, ASCII was not a solution for them.

Then, ANSI encodings came. It is not one encoding, it is a family of one-byte encodings.
The idea is to use the fact that ASCII is 7-bit encoding and computers memory is always organized in bytes (8-bits), so each ASCII character, written in one byte has a leading zero bit. Putting 1 in the leading bit, allows to extend this encoding with 128 more characters, and keeping compatibility with plain ASCII standard.

Still, it is not possible to have all characters the world uses in one-byte, so ASCII got several extensions -- one for west European latin languages -- the non-english characters of west European languages (Portugese, Spanish, French, German, Swedish, etc.) could fit in one Ansi standard -- one way to extend ASCII -- you can find there German letters Ä, Ö, Ü, ß; Spanish ñ, etc. -- all of these have value in upper range (128-255), and the lower range (0-127) keeps compatibility with ASCII, that is why I said, it is one ASCII extension.

This ANSI west european encoding (CP-1252) has the letter â encoded as 226.

However, the other european languages still cannot fit in this encoding -- the upper range (128-255) is not enough for all east European languages.

Then, another ANSI encoding covers east European latin languages (Polish, Slovenian, etc.). Another is added for cyrilic languages, one for Greek, one for Arabic, etc.

Each of these ANSI standar has lower range (0-127) compatible with ASCII, but upper range (128-255) has different characters.

So, using ANSI extensions can be enough if you don't need charactes from different languages which are not covered in one of these standards, but you cannot write latin letters Č (you can find it in cp1250, but not in the other standards) and Ü (found in cp1252) in one text with any of these encodings.
The fact that you can see both of them in the previous sentence, means that this forum does not use any of these ANSI encodings!

Also, there are languages in this world which have more than 128 characters, and surely cannot be covered with one-byte ASCII extension.

Then, UCS2 was invented. It is two-byte encoding. Each character is represented with two bytes.
It is compatible with old ASCII in this sense -- the first 128 (0-127 range) characters are same as ASCII characters. So, they have zeros in first 9 bits and then ASCII codes.
The idea was that it should be enough for all.
Most languages fit there, but if you write for instance German, you can see that your text file encoded in UCS2 requires twice storage comparing to same text saved in ANSI (cp1252) encoded file.

Then, the genial idea came -- UTF8 -- variable length encoding. ASCII characters take one byte, and all other character take more (all european upper-range characters from ANSI encodings take two bytes, but some far eastern letters take three and even four).
For example, German text mentioned before, now saved in UTF8 encoded file will be almost same size as ANSI encoded -- only special German characters will take two bytes, but most of characters in the text (letters a-z, digits, standard punctuation) will take just one byte.

There are more beautiful things about UTF8 -- read this: http://wiki.freepascal.org/UTF8_strings_and_characters

Lazarus uses UTF8.
In UTF8, the letter â, which you need is encoded with two bytes -- 195 and 162.

Just to be complete -- two byte encoding UCS2 was just not enough for all the characters our world needs. So it is therefore deprecated now and there is also UTF16 encoding standard. I'm not going further into this, I hope I helped.

Logged

lucamar

Hero Member
Posts: 4219

Re: ASCII characters Questions

« Reply #4 on: May 11, 2019, 05:32:02 pm »

Quote from: JLWest on May 11, 2019, 03:42:31 pm

I really don't understand this.

It's quite easy: the string is UTF8 encoded, so any character beyond the plain ASCII [0..127] (Unicode plane zero?) will be encoded in 2 to 5 bytes. Length(String) counts bytes, so it gives you the number of charcters plus the extra character byte of the encoding of "á". UTF8Length() will give you the number of characters.

Quote from: Birger52 on May 11, 2019, 03:04:04 pm

â is 226 ASCII

No, it isn't. It may be #226 in some so-called extended-ASCII encodings or in some WIndows code-page(s), but plain ASCII is a seven bits code: it defines only characters #0..#127.

One must be precise in these matters or chaos ensues

« Last Edit: May 11, 2019, 05:40:09 pm by lucamar »

Logged

Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!)

Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

VTwin

Hero Member
Posts: 1215
Former Turbo Pascal 3 user

Re: ASCII characters Questions

« Reply #5 on: May 11, 2019, 05:44:31 pm »

Quote from: JLWest on May 11, 2019, 03:42:31 pm

ASCII 226 Alt 226 = 'Γ'

I really don't understand this.

Already said, but that is for extended-ASCII, not UTF-8.

In short, your program tries to convert 195 and 162 (one character) to two characters that don't exist (extended-ASCII collision).

http://iconoun.com/articles/collisions/

« Last Edit: May 11, 2019, 05:46:57 pm by VTwin »

Logged

“Talk is cheap. Show me the code.” -Linus Torvalds

Free Pascal Compiler 3.2.2
macOS 12.1: Lazarus 2.2.6 (64 bit Cocoa M1)
Ubuntu 18.04.3: Lazarus 2.2.6 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.2.6 (64 bit on VBox)

JLWest

Hero Member
Posts: 1293

Re: ASCII characters Questions

« Reply #6 on: May 11, 2019, 05:54:38 pm »

Her is what I'm try to do but can't figure it out yet although I' getting closer.

I have a table of Cities and one of Countries. All toll about 40,000.

Some are UTF8 and some ASCII I guess. I need a function when given a string that looks like the following: 'öäüèéàCUT' will return the following: in ASCII 'oaueeaCUT'.

As far as I can determine there isn't a function like that in fpc (I'm surprised) so I suppose one has to write one.

Logged

FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

lucamar

Hero Member
Posts: 4219

Re: ASCII characters Questions

« Reply #7 on: May 11, 2019, 06:07:12 pm »

Quote from: JLWest on May 11, 2019, 05:54:38 pm

Some are UTF8 and some ASCII I guess.

Rather think of it as all being UTF8. What seems to be ASCII is really an UTF8 string where all characters fall in the set [#32..#127].

And you're right: AFAICT there is no conversion function for what you want. I looked for it some time ago (to ASCIIfy filenames) and could find nothing so I built my own to translate the characters most common around here (Spain), forgetting about cyrillic, greek, etc. I keep adding chars to it whenever I encounter one I don't have

Logged

Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!)

Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

jamie

Hero Member
Posts: 6131

Re: ASCII characters Questions

« Reply #8 on: May 11, 2019, 06:11:14 pm »

If you are in windows...

the table is utf8 encoded, its using Extend ASCII which is fine I guess..

The function WinCPToUTF8(String(#226)); displays your letter because it converts it to a utf8..

You can convert the whole string If you like but remember this, the string will not be a one to one index after that.

Logged

The only true wisdom is knowing you know nothing

JLWest

Hero Member
Posts: 1293

Re: ASCII characters Questions

« Reply #9 on: May 11, 2019, 06:18:02 pm »

@Jamie

The function WinCPToUTF8(String(#226)); displays your letter because it converts it to a utf8..

You can convert the whole string If you like but remember this, the string will not be a one to one index after that.

Is WinCPToUTF8 a Windows API and if so what do I need in my use clause?

" but remember this, the string will not be a one to one index after that."
I don't really understand what you are saying here.

Logged

FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

lucamar

Hero Member
Posts: 4219

Re: ASCII characters Questions

« Reply #10 on: May 11, 2019, 06:21:52 pm »

Quote from: jamie on May 11, 2019, 06:11:14 pm

the table is utf8 encoded, its using Extend ASCII which is fine I guess..

No: character data encoded as UTF8 is Unicode.

Of course, you can convert it to any Windows Code Page, but first one must ascertain which code-page will cause the less damage. Which is not as difficult as it sounds ... unless it's Russian text citing a Chinese philosopher citing a French novelist

Quote from: JLWest on May 11, 2019, 06:18:02 pm

" but remember this, the string will not be a one to one index after that."
I don't really understand what you are saying here.

I think he means that there isn't a byte to byte (or char to char) correspondece between the original UTF8 string and the ANSI one. Which is quite logical, since the double-byte UTF8 character will be converted to a single-byte ANSI one.

« Last Edit: May 11, 2019, 06:24:25 pm by lucamar »

Logged

Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!)

Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

Zoran

Hero Member
Posts: 1831

Re: ASCII characters Questions

« Reply #11 on: May 11, 2019, 06:26:45 pm »

Quote from: lucamar on May 11, 2019, 06:07:12 pm

Rather think of it as all being UTF8. What seems to be ASCII is really an UTF8 string where all characters fall in the set [#32..#127].

No, I don't think so. I believe that when he says ASCII, he means ANSI.

Logged

jamie

Hero Member
Posts: 6131

Re: ASCII characters Questions

« Reply #12 on: May 11, 2019, 06:42:39 pm »

He has strings from a file that is using the Extended ASCII letter sets. they are 128..255

in order for him to display the letter as it should look he needs to generate a utf8 string.

But this is the issue, as soon as he starts manipulating this data with utf8 strings, that value will be come
a lost value and thus be display as a ? or some other expected letter.

Logged

The only true wisdom is knowing you know nothing

JLWest

Hero Member
Posts: 1293

Re: ASCII characters Questions

« Reply #13 on: May 11, 2019, 06:44:03 pm »

Maybe I'm saying this wrong or something.

I would like to convert UTF8 strings to ANSCII ( American Standard Code Information Interchange)

S : String = 'ÄÖÜß ñâ'
C : String = '';

So if I called: C := ConvertUTF8ToASCII(S : String) : string;

I would get: C = 'AOUB na'

Logged

FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

Zoran

Hero Member
Posts: 1831

Re: ASCII characters Questions

« Reply #14 on: May 11, 2019, 06:54:02 pm »

Quote from: JLWest on May 11, 2019, 06:44:03 pm

So if I called: C := ConvertUTF8ToASCII(S : String) : string;

I would get: C = 'AOUB na'

No, you won't.
But Lucamar says he has some ASCIIfy function, he might share it with you:

Quote from: lucamar on May 11, 2019, 06:07:12 pm

And you're right: AFAICT there is no conversion function for what you want. I looked for it some time ago (to ASCIIfy filenames) and could find nothing so I built my own to translate the characters most common around here (Spain), forgetting about cyrillic, greek, etc. I keep adding chars to it whenever I encounter one I don't have

Logged

Lazarus

Bookstore

Search

Recent

Author Topic: ASCII characters Questions (Read 4984 times)

JLWest

ASCII characters Questions

Birger52

Re: ASCII characters Questions

JLWest

Re: ASCII characters Questions

Zoran

Re: ASCII characters Questions

lucamar

Re: ASCII characters Questions

VTwin

Re: ASCII characters Questions

JLWest

Re: ASCII characters Questions

lucamar

Re: ASCII characters Questions

jamie

Re: ASCII characters Questions

JLWest

Re: ASCII characters Questions

lucamar

Re: ASCII characters Questions

Zoran

Re: ASCII characters Questions

jamie

Re: ASCII characters Questions

JLWest

Re: ASCII characters Questions

Zoran

Re: ASCII characters Questions

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook