Recent

Author Topic: Select word under caret in Memo - doesnt show the whole word under caret  (Read 1617 times)

Raf20076

  • Full Member
  • ***
  • Posts: 173
    • https://github.com/Raf20076
Hi guys

I converted the Delphi code into Lazarus code, exactly Select word under caret in Memo function.
It works like that;
when you click a word in Memo (the caret will be anywhere on a word) - a word should be chosen.
And it works but with some errors.

It doesn't choose the whole word if it has national letters (mean UTF8) for example:
if I click Polish word - powiedział - it will pickup only powiedzia

Can you look at it?

The whole code:
Code: Pascal  [Select][+][-]
  1. unit Unit1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls, Windows, LazUtf8 ;
  9.  
  10. type
  11.  
  12.   { TForm1 }
  13.  
  14.   TForm1 = class(TForm)
  15.     Button1: TButton;
  16.     Label1: TLabel;
  17.     Memo1: TMemo;
  18.     procedure Button1Click(Sender: TObject);
  19.   private
  20.  
  21.   public
  22.  
  23.   end;
  24.  
  25. var
  26.   Form1: TForm1;
  27.  
  28. implementation
  29.  
  30. {$R *.lfm}
  31.  
  32. function SelectWordUnderCaret(AMemo:TMemo):string;
  33. var
  34.    Line    : Integer;
  35.    Column  : Integer;
  36.    LineText: string;
  37.    InitPos : Integer;
  38.    EndPos  : Integer;
  39. begin
  40.    //Get the caret position
  41.    Line := SendMessage(AMemo.Handle, EM_LINEFROMCHAR,AMemo.SelStart, 0) ;
  42.    Column := AMemo.SelStart - SendMessage(AMemo.Handle, EM_LINEINDEX, Line, 0) ;
  43.    //Validate the line number
  44.    if AMemo.Lines.Count-1 < Line then Exit;
  45.    //Get the text of the line
  46.    LineText := AMemo.Lines[Line];
  47.    Inc(Column);
  48.    InitPos := Column;
  49.    //search the initial position using the space symbol as separator
  50.    while (InitPos > 0) and (InitPos <= UTF8Length(LineText)) AND (LineText[InitPos] <> ' ') do Dec(InitPos);
  51.    Inc(Column);
  52.    EndPos := Column;
  53.    //search the final position using the space symbol as separator
  54.    while (EndPos <= UTF8Length(LineText)) and (LineText[EndPos] <> ' ') do Inc(EndPos);
  55.    //Get the text
  56.    Result := Trim(Copy(LineText, InitPos, EndPos - InitPos));
  57.    //Finally select the text in the Memo
  58.    AMemo.SelStart  := SendMessage(AMemo.Handle, EM_LINEINDEX, Line, 0)+InitPos;
  59.    AMemo.SelLength := UTF8Length(Result);
  60. end;
  61.  
  62.  
  63. procedure TForm1.Button1Click(Sender: TObject);
  64. begin
  65.      Caption := SelectWordUnderCaret(Memo1) ; //the word will be shown in Form caption
  66. end;
  67.  
  68.  
  69. end.
  70.  
  71.  

Thanks


Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9754
  • Debugger - SynEdit - and more
    • wiki
Re: Select word under caret in Memo - doesnt show the whole word under caret
« Reply #1 on: January 22, 2021, 04:08:38 pm »
You are using  "LineText[InitPos]" => which accesses bytes and not utf8-chars. (actually: codepoints)

Utf8Lenght returns the length in utf8-chars, not bytes. The Utf8Lenght may be less than the byte "Lenght()" of the string. You need to work with the byte length.

Since space (#32) is a single byte, accessing bytes is fine for that. So it would be for tabs #9. But not for other Unicode white spaces, such as half or zero width space, none breaking space and others. (Well byte access still works, and is preferred, but needs a bit more work for such white spaces)

Not sure if you plan to deal with punctuation, and Unicode word-break codepoints? Or control codepoints in Unicode such as LTR,RTL,...



Further more, if you detect spaces and copy the text between (assuming you do that correctly), then you should not need "trim".
« Last Edit: January 22, 2021, 04:14:38 pm by Martin_fr »

Raf20076

  • Full Member
  • ***
  • Posts: 173
    • https://github.com/Raf20076
Re: Select word under caret in Memo - doesnt show the whole word under caret
« Reply #2 on: January 22, 2021, 04:38:37 pm »
I removed from code UTF8 and it seems working now correctly.

Code: Pascal  [Select][+][-]
  1. unit Unit1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls, Windows, LazUtf8 ;
  9.  
  10. type
  11.  
  12.   { TForm1 }
  13.  
  14.   TForm1 = class(TForm)
  15.     Button1: TButton;
  16.     Label1: TLabel;
  17.     Memo1: TMemo;
  18.     procedure Button1Click(Sender: TObject);
  19.   private
  20.  
  21.   public
  22.  
  23.   end;
  24.  
  25. var
  26.   Form1: TForm1;
  27.  
  28. implementation
  29.  
  30. {$R *.lfm}
  31.  
  32. function SelectWordUnderCaret(AMemo:TMemo):string;
  33. var
  34.    Line    : Integer;
  35.    Column  : Integer;
  36.    LineText: string;
  37.    InitPos : Integer;
  38.    EndPos  : Integer;
  39. begin
  40.    //Get the caret position
  41.    Line := SendMessage(AMemo.Handle, EM_LINEFROMCHAR,AMemo.SelStart, 0) ;
  42.    Column := AMemo.SelStart - SendMessage(AMemo.Handle, EM_LINEINDEX, Line, 0) ;
  43.    //Validate the line number
  44.    if AMemo.Lines.Count-1 < Line then Exit;
  45.    //Get the text of the line
  46.    LineText := AMemo.Lines[Line];
  47.    Inc(Column);
  48.    InitPos := Column;
  49.    //search the initial position using the space symbol as separator
  50.    while (InitPos > 0) and (InitPos <= Length(LineText)) AND (LineText[InitPos] <> ' ') do Dec(InitPos);
  51.    Inc(Column);
  52.    EndPos := Column;
  53.    //search the final position using the space symbol as separator
  54.    while (EndPos <= Length(LineText)) and (LineText[EndPos] <> ' ') do Inc(EndPos);
  55.    //Get the text
  56.    Result := Trim(Copy(LineText, InitPos, EndPos - InitPos));
  57.    //Finally select the text in the Memo
  58.    AMemo.SelStart  := SendMessage(AMemo.Handle, EM_LINEINDEX, Line, 0)+InitPos;
  59.    AMemo.SelLength := Length(Result);
  60. end;
  61.  
  62.  
  63. procedure TForm1.Button1Click(Sender: TObject);
  64. begin
  65.      Caption := SelectWordUnderCaret(Memo1) ; //the word will be shown in Form caption
  66. end;
  67.  
  68.  
  69. end.
  70.  
  71.  

Is it possible to change this for crossover platfrom? This code is for Windows only.
So lines like:
Code: Pascal  [Select][+][-]
  1. Line := SendMessage(AMemo.Handle, EM_LINEFROMCHAR,AMemo.SelStart, 0) ;
  2. Column := AMemo.SelStart - SendMessage(AMemo.Handle, EM_LINEINDEX, Line, 0) ;
  3.  
and
Code: Pascal  [Select][+][-]
  1.  AMemo.SelStart  := SendMessage(AMemo.Handle, EM_LINEINDEX, Line, 0)+InitPos;
  2.  
should be changed?


Raf20076

  • Full Member
  • ***
  • Posts: 173
    • https://github.com/Raf20076
Re: Select word under caret in Memo - doesnt show the whole word under caret
« Reply #3 on: January 22, 2021, 07:04:58 pm »
Sorry Guys

It show words correctly but doesn't highlight correctly.

Code: Pascal  [Select][+][-]
  1. AMemo.SelStart  := SendMessage(AMemo.Handle, EM_LINEINDEX, Line, 0)+InitPos;
  2. AMemo.SelLength := Length(Result);
  3. Amemo.SetFocus;
  4.  

For example sentence like that:

                 Pojechaliśmy wykapać się nad rzekę.
I click    Pojechaliśmy I got highlighted Pojechaliśmy
I click    wykapać I got highlighted wykąpać się
I click    się I got highlighted nad

What is the reason for that? It chooses words correctly but doesn't highlight them correctly? 



skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: Select word under caret in Memo - doesnt show the whole word under caret
« Reply #4 on: January 22, 2021, 07:28:42 pm »
how about this approach?
Code: Pascal  [Select][+][-]
  1. function isWhiteChar(c: WideChar): Boolean;
  2. begin
  3.   Result := (c=#32) or (c=#13) or (c=#10);
  4. end;
  5.  
  6. function SelectWordUnderCaret(AMemo:TMemo):string;
  7. var
  8.   LineText: WideString;
  9.   InitPos : Integer;
  10.   EndPos  : Integer;
  11. begin
  12.   LineText := UTF8Decode(AMemo.Text);
  13.   InitPos :=  AMemo.SelStart;
  14.   EndPos := InitPos;
  15.   while (InitPos > 0) and (InitPos <= Length(LineText)) AND (not isWhiteChar(LineText[InitPos])) do Dec(InitPos);
  16.   while (EndPos <= Length(LineText)) and (not isWhiteChar(LineText[EndPos])) do Inc(EndPos);
  17.   AMemo.SelStart  := InitPos;
  18.   AMemo.SelLength := EndPos - InitPos - 1;
  19.   Result := AMemo.SelText;
  20. end;    

"SelStart" and "SelLength" actually operate referring characters. (Roughly - wideStrings).
Where Text returned by LCL is UTF8.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9754
  • Debugger - SynEdit - and more
    • wiki
Re: Select word under caret in Memo - doesnt show the whole word under caret
« Reply #5 on: January 22, 2021, 07:36:27 pm »
Not tested, but if I had to guess....

Note for most cases codepoint might be read as char. But Unicode (utf8 and utf16 and any other encoding) has chars that consist of more than one codepoint. For those the rules are even more complex.


- In the LCL the text is stored using utf8.
- Afaik Memo.SelStart is the bytepos inside the utf8 encoded text (so NOT a codepoint pos) / but that may need verification.
  (But maybe skalogryz is right, and they are not)

Any direct communication with the windows API is based on either ASCII (local codepage) or utf16. The entire WinApi exist twice, for the 2 encodings.
Not sure, but lets assume the code uses the utf16 API.

Then Win API calls probably expect positions in codepoints (since all codepoints are always 2 bytes / but "chars" may be multi codepoint still).

So if you mix LCL numbers Memo.SelStart with windows calls, you may have to translate those numbers.
(If I am right, then you may also encounter, that if the caret is at the end of a line that has "special chars" the code will think it is on the line below)



This is all a bit complex....


Lets say using utf8 the text "AäA" is encode in bytes as  "## @@ @@ ##".
## is one byte representing A
@@ @@ is 2 bytes representing ä

But using utf16 the A and ä are both one word: #### @@@@ ####

Now in Lazarus the SelStart for each letter is
A => 1
ä => 2
A => 4  (4th byte)

But Windows uses utf16, and expects the number of words.
A => 1
ä => 2
A => 3  (3rd word)





EDIT:
Ok should have tested first: skalogryz is right

SelStart is the codepoint position. (and 0 based)

So you are just lucky if you get the correct word as result.

LazUtf8 has some functions to translate codepoint to byte index (afaik)
« Last Edit: January 22, 2021, 07:41:14 pm by Martin_fr »

Raf20076

  • Full Member
  • ***
  • Posts: 173
    • https://github.com/Raf20076
Re: Select word under caret in Memo - doesnt show the whole word under caret
« Reply #6 on: January 23, 2021, 08:15:22 am »
how about this approach?
Code: Pascal  [Select][+][-]
  1. function isWhiteChar(c: WideChar): Boolean;
  2. begin
  3.   Result := (c=#32) or (c=#13) or (c=#10);
  4. end;
  5.  
  6. function SelectWordUnderCaret(AMemo:TMemo):string;
  7. var
  8.   LineText: WideString;
  9.   InitPos : Integer;
  10.   EndPos  : Integer;
  11. begin
  12.   LineText := UTF8Decode(AMemo.Text);
  13.   InitPos :=  AMemo.SelStart;
  14.   EndPos := InitPos;
  15.   while (InitPos > 0) and (InitPos <= Length(LineText)) AND (not isWhiteChar(LineText[InitPos])) do Dec(InitPos);
  16.   while (EndPos <= Length(LineText)) and (not isWhiteChar(LineText[EndPos])) do Inc(EndPos);
  17.   AMemo.SelStart  := InitPos;
  18.   AMemo.SelLength := EndPos - InitPos - 1;
  19.   Result := AMemo.SelText;
  20.   AMemo.SetFocus; // Highlight selection correctly
  21. end;    

"SelStart" and "SelLength" actually operate referring characters. (Roughly - wideStrings).
Where Text returned by LCL is UTF8.

It seems to me that skalogryz example works correctly. It shows word correctly and highlights it correctly, but of course it needs more testing, but looks promising.

What do you think about it?
« Last Edit: January 23, 2021, 08:19:49 am by Raf20076 »

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: Select word under caret in Memo - doesnt show the whole word under caret
« Reply #7 on: January 23, 2021, 10:43:23 pm »
What do you think about it?
what I like about my approach is that it is cross-platform and doesn't use WinAPI specific SendMessages

The only thing that would concern me is the use of Emojis in the text (which became a norm in the last decade).
But from my tests they appear to work fine as well.

SelStart/SelLength recognize emoji as two-characters (surrogate symbol). So I'd think it the code should work fine.

Raf20076

  • Full Member
  • ***
  • Posts: 173
    • https://github.com/Raf20076
Re: Select word under caret in Memo - doesnt show the whole word under caret
« Reply #8 on: January 24, 2021, 08:16:29 am »
Quote
what I like about my approach is that it is cross-platform

I prefer the same cross-platform approach than rather app heavily based on Windows API. However there is a lot of Delphi Windows API codes there, but slowly when it comes for Object Pascal more people go for Lazarus than rather expensive Delphi.

Thanks

 

TinyPortal © 2005-2018