Recent

Author Topic: TMemo issue?  (Read 745 times)

Ed78z

  • New Member
  • *
  • Posts: 35
TMemo issue?
« on: October 10, 2024, 10:58:23 pm »
I just noticed that a TMemo can't count a 4-bytes character as one character.
The TMemo can counts 1,2 or 3-bytes characters as one character but for a 4-bytes character it counts 2 :(

For example, if Memo1.text:='A👽Ω♥B'; //total of 5 characters
A: 1-byte character====>count as one character
Ω: 2-bytes character===>count as one character
♥: 3-bytes character===>count as one character
👽: 4-bytes character==>count as two character
B: 1-byte character====>count as one character

Both of Memo1.CaretPos.X and Memo1.SelStart will return 6 instead of 5.

Do you know how to fix this issue in a TMemo? (This causes wrong insertion or deletion a character at CaretPos location within a text)





Lazarus 3.6
Windows 11 x64
« Last Edit: October 10, 2024, 11:44:08 pm by Ed78z »

ASerge

  • Hero Member
  • *****
  • Posts: 2336
Re: TMemo issue?
« Reply #1 on: October 11, 2024, 09:24:50 pm »
I just noticed that a TMemo can't count a 4-bytes character as one character.
Ω: 2-bytes character===>count as one character
♥: 3-bytes character===>count as one character
👽: 4-bytes character==>count as two character
From Unicode:
CharacterUTF-8UTF-16UTF-32
ΩCE A903A9000003A9
E2 99 A5266500002665
👽F0 9F 91 BDD83D DC7D0001F47D
Windows uses Utf-16, so the result is normal.

Bart

  • Hero Member
  • *****
  • Posts: 5446
    • Bart en Mariska's Webstek
Re: TMemo issue?
« Reply #2 on: October 11, 2024, 10:58:49 pm »
Actually Windows lets you put the caret inbetween the 2 "widechars", so basically inside the 👽, which is a bit bonkers...

Bart

Ed78z

  • New Member
  • *
  • Posts: 35
Re: TMemo issue?
« Reply #3 on: October 14, 2024, 04:37:07 am »
I just noticed that a TMemo can't count a 4-bytes character as one character.
Ω: 2-bytes character===>count as one character
♥: 3-bytes character===>count as one character
👽: 4-bytes character==>count as two character
From Unicode:
CharacterUTF-8UTF-16UTF-32
ΩCE A903A9000003A9
E2 99 A5266500002665
👽F0 9F 91 BDD83D DC7D0001F47D
Windows uses Utf-16, so the result is normal.

I guess this is not "normal" :)
if there are 5 characters in TMemo, and CaretPos.X returns 6, then this causes wrong insertion/deletion point. I guess, the TMemo should handle it internally and in a correct way.

rvk

  • Hero Member
  • *****
  • Posts: 6497
Re: TMemo issue?
« Reply #4 on: October 14, 2024, 10:09:54 am »
I guess, the TMemo should handle it internally and in a correct way.
Blame Windows.

The same happens in notepad.exe.

(Can anyone confirm the same on Linux or Mac?)

Zvoni

  • Hero Member
  • *****
  • Posts: 2717
Re: TMemo issue?
« Reply #5 on: October 14, 2024, 10:45:48 am »
{$codepage UTF8} as first directive?
The "Lines" of a TMemo are a "TTextString"-Class (descendant of TStrings), and has default {$H+}, meaning AnsiString
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

Bart

  • Hero Member
  • *****
  • Posts: 5446
    • Bart en Mariska's Webstek
Re: TMemo issue?
« Reply #6 on: October 14, 2024, 12:12:59 pm »
Well, we can "fix" it but this has a rather severe speed penalty.
There is actually a bugreport about this (cannot find it right now) where that fix (for SelStart) is documented.

Bottom line: we're most likely not going to do that.

Bart

ASerge

  • Hero Member
  • *****
  • Posts: 2336
Re: TMemo issue?
« Reply #7 on: October 14, 2024, 08:00:17 pm »
I guess this is not "normal" :)
if there are 5 characters in TMemo, and CaretPos.X returns 6, then this causes wrong insertion/deletion point. I guess, the TMemo should handle it internally and in a correct way.
There are six UTF-16 code units or eleven UTF-8 code units in a string that uses FPC. Where is 5?

LV

  • Full Member
  • ***
  • Posts: 134
Re: TMemo issue?
« Reply #8 on: October 14, 2024, 09:52:07 pm »
I was checked (Windows 11 Lazarus 2.2.6). Maybe I'm missing something, but everything seems to work fine.

Code: Pascal  [Select][+][-]
  1. unit Unit1;
  2.  
  3. {$codepage UTF8}
  4. {$mode objfpc}{$H+}
  5.  
  6. interface
  7.  
  8. uses
  9.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls, SynEdit;
  10.  
  11. type
  12.  
  13.   { TForm1 }
  14.  
  15.   TForm1 = class(TForm)
  16.     Button1: TButton;
  17.     Memo1: TMemo;
  18.     procedure Button1Click(Sender: TObject);
  19.   private
  20.  
  21.   public
  22.  
  23.   end;
  24.  
  25. var
  26.   Form1: TForm1;
  27.  
  28. implementation
  29.  
  30. {$R *.lfm}
  31.  
  32. { TForm1 }
  33.  
  34. procedure TForm1.Button1Click(Sender: TObject);
  35. var
  36.   p : TPoint;
  37. begin
  38.   Memo1.Lines.Add('A👽Ω♥B');
  39.   p.X := Length('A');
  40.   p.Y := 1;
  41.   Memo1.CaretPos := p;
  42.   Memo1.SelText := '#';
  43.  
  44.   Memo1.Lines.Add('A👽Ω♥B');
  45.   p.X := Length('A👽');
  46.   p.Y := 2;
  47.   Memo1.CaretPos := p;
  48.   Memo1.SelText := '#';
  49. end;
  50.  
  51. end.    
  52.  

rvk

  • Hero Member
  • *****
  • Posts: 6497
Re: TMemo issue?
« Reply #9 on: October 14, 2024, 09:56:17 pm »
I was checked (Windows 11 Lazarus 2.2.6). Maybe I'm missing something, but everything seems to work fine.
Now do the following.
My caretpos says its column 15.
How do I add a # on that position?

Note, it's not actual position 15. The CaretPos says it's 15 but it's actually something different because there are 4 byte characters before this.

LV

  • Full Member
  • ***
  • Posts: 134
Re: TMemo issue?
« Reply #10 on: October 14, 2024, 10:09:26 pm »
I counted: the # symbol is in the fifteenth position

rvk

  • Hero Member
  • *****
  • Posts: 6497
Re: TMemo issue?
« Reply #11 on: October 14, 2024, 10:12:20 pm »
I counted: the # symbol is in the fifteenth position
No, I meant something like this:

Something like this:
Code: [Select]
var
  P: TPoint;
begin
  Memo1.Clear;
  Memo1.Lines.Add('👽👽👽👽👽👽👽👽👽👽');
  Memo1.Lines.Add('👽👽👽👽👽👽👽👽👽👽');
  P.X := 15;
  P.Y := 1;
  Memo1.CaretPos := P;
  Memo1.SelText := '#';
end;

Results in this:
Quote
👽👽👽👽👽👽👽👽👽👽
👽👽👽👽👽👽👽�#�👽👽

Granted... when quering Memo1.CaretPos.X it will never return 15 because that position does not exist... but you can't guarantee that all positions are valid. You should NEVER just assume the Col is correct and always count or query the position (for if it's a valid one).

But it is still strange to see the Col in an editor jump from 14 to 16 if you go over such a character  ::)
« Last Edit: October 14, 2024, 10:15:17 pm by rvk »

LV

  • Full Member
  • ***
  • Posts: 134
Re: TMemo issue?
« Reply #12 on: October 14, 2024, 10:21:11 pm »
You are right, I was wrong

 

TinyPortal © 2005-2018