Forum > General
Extended ASCII gone wrong somewhere
stephanos:
Dear All
I am using Free Pascal Lazarus 2.0.8. I am writing a simple command line programme so as to test my understanding of and ability to use extended ASCII characters. However, it has not gone smoothly. I am using this table as a reference and for the most part it is accurate:
https://theasciicode.com.ar/
Here is some code with comments about the output
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program project1;uses crt, SysUtils;var IsItValid : string; isItValid2 : ansistring; count, size : integer; A : AnsiChar;begin count := 12; A := 'A'; isItValid := 'BAILE DAì AMIZADE.mp3'; writeln(IsItValid); // output of alt + 141 is corrupted but looks like chr(195) ├ writeln(chr(141)); // output ì as expected writeln(isItValid[2]); // output A as expected writeln(isItValid[9]); // output corrupted but looked like chr(195) ├ writeln(A); // output A as expected writeln(count); // output 12 as expected writeln(IntToStr(count)); // output 12 as expected writeln(Ord(A)); // output 65 as expected writeln(Ord(isItValid[9])); // output 195, not expected// so I made the string into an ansi string, though I am at the edge of my knowledge here isItValid2 := 'BAILE DAì AMIZADE.mp3'; writeln(IsItValid2); readln; // output corrupted but looked like chr(195) ├end.
Alt + 141 is the lowercase letter I with an accent. Except when placed in a string or ansi string. When in either string it becomes Alt + 195 ├.
My intention is to perform validation on extended ASCII characters in file names for my mp3 files, as my player does not read many extended ASCII characters and if the extended ASCII characters appear in a file name, when the file name is written to a playlist file, the file will not play. Validation will include writing the path/filename to a text file so that the file name can be changed and therefore used in a playlist file.
But things are not what they should be. How can Alt + 141, become Alt + 195?
Any help, pitched at my low level of competence, much appreciated and needed.
Bart:
The Lazarus IDE stores everything in UTF8 encoding.
The type String in Lazarus is by default also UTF8.
So, the string contains more bytes than "characters", since the lowercase i with accent is made up of 2 bytes.
Bart
stephanos:
Greetings Bart
Thanks for the reply. I do not fully understand it.
So what do I do about it?
Thanks and wait to hear
winni:
Hi!
Get used to UTF8. It is now 30 years old.
A basic introduction at wikipedia:
https://en.wikipedia.org/wiki/UTF-8
UTF8 unites all different codepages - what you call "extended ASCII" - all together in one system. This cannot be done with one byte per character. A UTF8-Char is between 1 and 4 bytes long. So it cannot be represented by the Pascal type "char" anymore, but it is now a string.
Unicode support in Lazarus:
https://wiki.freepascal.org/Unicode_Support_in_Lazarus
Winni
lucamar:
Alternatively:
* Set a {$codepage XXX} in your source so that all the literal strings are treated as SBCS with that codepage (see the charset unit of the RTL for possible values of XXX)
* Declare your strings as AnsiString(XXX) or RawByteString See section 3.2.4 - Single-Byte String Types (specifically Code page conversions) of the Free Pascal Reference Guide for more info.
Navigation
[0] Message Index
[#] Next page