Forum > General
Extended ASCII use - 2
raymond:
In the UTF8 code set 62 (55%) of the characters are for 'European' characters. !!!
( fpc for FreeDOS + code set 850 was nirvana ).
Does anybody KNOW how to manipulate strings/arrays of 'European' characters ??
Proved examples, please. I would faint with gratitude. Many thanks.
JuhaManninen:
What do you mean? UTF-8 encoding supports the full Unicode.
If you mean the 7-bit ASCII by 'European' characters, then it gets easy because UTF-8 is compatible with 7-bit ASCII.
engkin:
Use unit LazUTF8.
If you are working on a terminal/console app, you need to add LazUtils package where LazUTF8 is. You do that in:
Project - Project Inspector
Add - Add New Requirement
Type LazU and choose LazUtils
What you call character is actually more of a string
Use UTF8Copy, UTF8Insert...etc
engkin:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program Project1; {$mode objfpc}{$H+} uses {$IFDEF UNIX}{$IFDEF UseCThreads} cthreads, {$ENDIF}{$ENDIF} Classes ,LazUTF8 { you can add units after this }; var s:string; s1,s2:string;begin s := 'ÄÇ'; WriteLn(s); WriteLn(Length(s));//===> 4 WriteLn(UTF8Length(s));//===> 2 s1:=UTF8Copy(s,1,1); WriteLn(s1); // Ä s2:=UTF8Copy(s,2,1); WriteLn(s2); // Ç UTF8Insert(s2,s,1); // s is ÇÄÇ WriteLn(s); UTF8Delete(s,2,1); // s is ÇÇ WriteLn(s); ReadLn;end.
From your previous post, add cwstring unit if you are using Linux
engkin:
--- Quote from: raymond on January 06, 2022, 03:28:40 pm ---In the UTF8 code set 62 (55%) of the characters are for 'European' characters. !!!
--- End quote ---
Not sure where you got that.
UTF8 is ASCII compatible encoding of Unicode. Unicode codepoints can take up to 4 bytes when encoded using UTF8.
A is one byte and is ASCII compatible.
Ä is two bytes and is not compatible with ASCII.
A "character" can use more than one codepoint.
The same "character" could be represented with different codepoints.
Navigation
[0] Message Index
[#] Next page