Recent

Author Topic: TMemo selected text and Unicode Issue  (Read 5630 times)

sfeinst

  • Full Member
  • ***
  • Posts: 230
TMemo selected text and Unicode Issue
« on: May 31, 2016, 10:00:28 pm »
I'm looking for some help in the area of Unicode and a TMemo.  I just upgraded to Lazarus 1.6 to see if it would fix my problem and I still see it.

For all of my code related to processing text related to the TMemo, I am currently using the UTF8 functions (UTF8Copy, UTF8Pos, UTF8Length and UTF8RightStr).

Everything in my code seems to be working OK, except in the area of selected text in the TMemo.  What I have noticed is the getting text from the TMemo (after walking through the Lazarus code) is handled by converting UTF16 to UTF8.  All of my code is defined as String and I use the aforementioned Utf8 functions.

But, I noticed that the selected text and the selection start values from the TMemo are not correct if some Unicode values are used (I am testing with a thumbs-up pic I stole from another forum message looks like- 👍).  When a Unicode like that exists, the selected values do not work.

In tracing through the code, it looks like selection start just gets the value from the underlying Windows memo control.  Since that control apparently uses UTF16 (otherwise why would the Lazarus code convert from UTF16 to UTF8), the start position would be based on UTF16 not UTF8.  The selected text on the other hand, uses that same selection start, but works on UTF8 text.  This appears to be why the value is incorrect.

I do not have to use selected text, I could use selection start and selection length and copy the text, but I would have the same issue as the underlying LCL code.

One thought I had was to instead of using the UTF8 functions, instead, define my Strings as UTF16 (would that be Utf16String, WideString or UnicodeString) and change my calls to Utf8 functions back to the normal functions of Pos, Length, Copy and RightStr (this assumes they work correctly on the the specified type).

Would that be the way to go?

I'm also concerned about other OSes.  If Linux is UTF8 (I thought I read it was) then this would then break on Linux.  So should I be using compiler directives and define my Strings (just those dealing with the TMemo) as Utf16 for Windows and Utf8 for non-Windows?

Am I way off-base here?

BTW, to test, just create a form, put a TMemo on it and a TButton.  For the button click use the following code:
ShowMessage(Memo1.SelText);

Enter some text and select something.

With regular English text, the dialog displays the selected text.  If you then use the thumbs-up character in the middle and select something at the end (or anywhere after it) and click the button, you will most likely get text short a character at the beginning.

Thanks

tk

  • Sr. Member
  • ****
  • Posts: 361
Re: TMemo selected text and Unicode Issue
« Reply #1 on: May 31, 2016, 10:41:56 pm »
Given character is outside of the BMP (basic multilingual plane) and I can only show it in my browser.
Even my Word shows here only unknown character sign.
Which font do you use?

Anyway I remember having these problems when porting KMemo http://tkweb.eu/en/delphicomp/kmemo.html to Lazarus. KMemo shows/selects international characters correctly.

tk

  • Sr. Member
  • ****
  • Posts: 361
Re: TMemo selected text and Unicode Issue
« Reply #2 on: May 31, 2016, 11:02:24 pm »
Which font do you use?

Found now myself, is for example Segoe UI Symbol.
Could show in TKMemo and also select/copy and paste into an edit box in browser window. Didn't test with normal TMemo.

sfeinst

  • Full Member
  • ***
  • Posts: 230
Re: TMemo selected text and Unicode Issue
« Reply #3 on: May 31, 2016, 11:24:13 pm »
I'm using Arial, but I don't think it is a font issue.  I think it is the fact that the Windows memo control is UTF16 but Lazarus interaction is UTF8, except for the selected properties.  Just guessing by what I am seeing.

tk

  • Sr. Member
  • ****
  • Posts: 361
Re: TMemo selected text and Unicode Issue
« Reply #4 on: June 01, 2016, 12:50:02 am »
AFAIK standard Arial font on Windows has no such glyph to display but I am still using W7.
Now tested with standard TMemo, setting its font to Segoe UI symbol and I don't see any problem.

EDIT: There is problem indeed, but I could not find it with
Code: Pascal  [Select][+][-]
  1. ShowMessage(Memo1.SelText);
but by adding selected text to another TMemo (both have font Segoe UI symbol):
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button7Click(Sender: TObject);
  2. var
  3.   S: string;
  4. begin
  5.   S := Memo1.SelText;
  6.   Memo2.Append(S);
  7. end;
  8.  
The selected string from the first TMemo is incorrect, is bug in LCL.

FYI when I change Memo1 to my TKMemo the selection works correctly.
« Last Edit: June 01, 2016, 01:05:45 pm by tk »

fedkad

  • Full Member
  • ***
  • Posts: 176
Re: TMemo selected text and Unicode Issue
« Reply #5 on: September 03, 2016, 06:39:23 pm »
I have tested also myself and you are right. This seems to be a bug.
Lazarus 2.2.6 / FPC 3.2.2 on x86_64-linux-gtk2 (Ubuntu/GNOME) and x86_64-win64-win32/win64 (Windows 11)

 

TinyPortal © 2005-2018