Forum > General

more UTF8 confusing

<< < (2/7) > >>

KodeZwerg:
Can you attach a demo project that show your problem?

Bogen85:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program unicode; {$mode objfpc}{$h+}{$codepage utf8} uses  types; function utf8chars(const str_in: string; withCombiningDiacriticals: boolean = true): TStringDynArray;  procedure primary(const len: integer; i: integer=1; n: integer = 0);    procedure secondary(const n_bytes: integer);      begin        result[n] := copy(str_in, i, n_bytes);        inc(i, n_bytes);        inc(n);      end;    begin      setlength(result, len);      while i <= len do secondary(Utf8CodePointLen(@str_in[i], maxInt, withCombiningDiacriticals));      setlength(result, n);    end;  begin    result := default(TStringDynArray);    primary(length(str_in));  end; const  boo: string = 'ábcdéfghíǝ́Á̊ÅÁǺÁwow!'; var  str: string; begin  writeln(boo);  for str in utf8chars(boo) do writeln(str);end.
Utf8CodePointLen is also beneficial

utf8chars above will give the proper length for most strings (the length of the resulting array), providing the diacritical markers are combined correctly.
Each element in the array will be a unicode code point... (well, not exactly...) each element is a string which is supposed to contain one unicode character (which can be multi-btye).

paweld:
@Bogen85: https://wiki.freepascal.org/Unicode_Support_in_Lazarus#CodePoint_functions_for_encoding_agnostic_code

Bogen85:

--- Quote from: paweld on February 04, 2023, 07:49:35 am ---@Bogen85: https://wiki.freepascal.org/Unicode_Support_in_Lazarus#CodePoint_functions_for_encoding_agnostic_code

--- End quote ---

It is confusing to me that both Lazarus and Free Pascal both have units that provide similar functionality.

I know OP is expressly using Lazarus, but many Free Pascal programs not using Lazarus units need to do similar things with UTF8.

So duplicate functionality exists, but with different functions names and parameters for those...

So I find this confusing concerning FreePascal and UTF8, but not for the same reasons as OP most likely.
However, this is posted in Free Pascal General, and not in a Lazarus specific area...

lazer:
Wow, I had no idea of vipers next I was walking into just wanting a little twiddly bit on the bottom of the letter c !!!

Many thanks to Bogen85 for that full and explicit code sample.  I would never have got to that. I'm not even sure I understand the syntax of that procedure in procedure in function thing.  I never knew that was possible !

It is very unfortunate that this was not done in a coordinated way between fpc and Lazarus.

Anyway, it seems to be doing what I need now, so huge thanks for that code. It's insane that it's that complicated but at least I have a solution and have learnt a few new tricks with fpc.

 8-)

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version