Forum > General

Extended ASCII use - 2

(1/10) > >>

raymond:
In the UTF8 code set 62 (55%) of the characters are for 'European' characters. !!!
( fpc for FreeDOS + code set 850 was nirvana ).
Does anybody KNOW how to manipulate strings/arrays of 'European' characters ??
Proved examples, please. I would faint with gratitude. Many thanks.

JuhaManninen:
What do you mean? UTF-8 encoding supports the full Unicode.
If you mean the 7-bit ASCII by 'European' characters, then it gets easy because UTF-8 is compatible with 7-bit ASCII.

engkin:
Use unit LazUTF8.

If you are working on a terminal/console app, you need to add LazUtils package where LazUTF8 is. You do that in:
  Project - Project Inspector
    Add - Add New Requirement
      Type LazU and choose LazUtils

What you call character is actually more of a string

Use UTF8Copy, UTF8Insert...etc

engkin:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program Project1; {$mode objfpc}{$H+} uses  {$IFDEF UNIX}{$IFDEF UseCThreads}  cthreads,  {$ENDIF}{$ENDIF}  Classes  ,LazUTF8  { you can add units after this }; var  s:string;  s1,s2:string;begin  s := 'ÄÇ';  WriteLn(s);  WriteLn(Length(s));//===> 4  WriteLn(UTF8Length(s));//===> 2  s1:=UTF8Copy(s,1,1);  WriteLn(s1); // Ä  s2:=UTF8Copy(s,2,1);  WriteLn(s2); // Ç  UTF8Insert(s2,s,1); // s is ÇÄÇ  WriteLn(s);  UTF8Delete(s,2,1);  // s is ÇÇ  WriteLn(s);  ReadLn;end.
From your previous post, add cwstring unit if you are using Linux

engkin:

--- Quote from: raymond on January 06, 2022, 03:28:40 pm ---In the UTF8 code set 62 (55%) of the characters are for 'European' characters. !!!

--- End quote ---

Not sure where you got that.

UTF8 is ASCII compatible encoding of Unicode. Unicode codepoints can take up to 4 bytes when encoded using UTF8.

A is one byte and is ASCII compatible.
Ä is two bytes and is not compatible with ASCII.

A "character" can use more than one codepoint.

The same "character" could be represented with different codepoints.

Navigation

[0] Message Index

[#] Next page

Go to full version