Lazarus

Programming => Packages and Libraries => RichMemo => Topic started by: rick2691 on August 27, 2016, 07:32:03 pm

Title: Searching Unicode
Post by: rick2691 on August 27, 2016, 07:32:03 pm
I am using the following code to search unicode texts. Where this is not true of my Syriac font, the Hebrew font will search for consonants and ignore the vowels. A very nice feature.

My problem is that the code line < PageMemo.SetSelLengthFor(SearchBox.Lines.Text);> is highlighting the find by the length of the search parameters.

Unfortunately, what is found includes vowels. The search string is 3 consonants, and what is found has 5 characters because of the vowels. Is there a way to get the length of the string that was found?

Code: Pascal  [Select][+][-]
  1. procedure TCmdForm.btnSearchRunClick(Sender: TObject);
  2. var
  3.   s : Integer;
  4.   l : Integer;
  5.   st: Integer;
  6.   opt: TSearchOptions;
  7.  
  8. begin
  9.   st:= PageMemo.SelStart;
  10.   l:= PageMemo.SelLength;
  11.   opt:= [];
  12.   if chkCaseSensitive.Checked then include(opt, RichMemo.soMatchCase); // setting the opt
  13.   if chkWholeWord.Checked then include(opt, RichMemo.soWholeWord);
  14.   if chkBackward.Checked then include(opt, RichMemo.soBackward);
  15.   // must do RichMemo.soMatchCase instead of soMatchCase
  16.   // due to soMatchCase conflict with TStringSearchOptions
  17.  
  18.   SearchPanel.Caption:= 'Searching...';
  19.   SearchPanel.repaint;
  20.   s:= PageMemo.Search(SearchBox.Lines.Text, PageMemo.SelStart, PageMemo.GetTextLen, opt);
  21.   if (s>=0) then
  22.      begin
  23.      if (st=s) and (l=UTF8Length(SearchBox.Lines.Text))
  24.          then s:= PageMemo.Search(SearchBox.Lines.Text, PageMemo.SelStart+1, PageMemo.GetTextLen, opt);
  25.      end;
  26.  
  27.   if (s>=0) then
  28.       begin
  29.       SearchPanel.Caption:= 'Search Found';
  30.       SearchPanel.repaint;
  31.       PageMemo.SelStart:= s;
  32.       PageMemo.SetSelLengthFor(SearchBox.Lines.Text);
  33.       end else begin
  34.                SearchPanel.Caption:= 'Search Failed';
  35.                SearchPanel.repaint;
  36.                //PageMemo.SelStart:= MaxInt; // moves cursor to end of PageMemo
  37.                PageMemo.SelLength:= 0;
  38.                SearchBox.SetFocus;
  39.                showmessage('None Found');
  40.                end;
  41.  
  42. end;  
  43.  

Rick
Title: Re: Searching Unicode
Post by: skalogryz on August 29, 2016, 03:23:47 pm
could you please make a sample of the text you're searching for and the text in you're searching within?

what the actual results are and what are the expected results.
Title: Re: Searching Unicode
Post by: rick2691 on August 29, 2016, 09:06:36 pm
I'll put it together.

Rick
Title: Re: Searching Unicode
Post by: rick2691 on August 30, 2016, 06:53:10 pm
The file is attached.

Rick
Title: Re: Searching Unicode
Post by: rick2691 on August 30, 2016, 07:15:43 pm
I forgot to mention. The found-word will have 8 characters.
Title: Re: Searching Unicode
Post by: skalogryz on August 31, 2016, 03:14:48 pm
aha, I see the issue. 
RichEdit is pretty smart control to find the word. (for example MS Office Word would not find, while Word Pad does).
 
SetSelLengthFor method is not as smart, since it goes by the number of UTF8 characters.

If you're saying that finding the word by consonants only is good, I'd recommend to come up with some sort of function that would adjust the sel length properly taking vowels into account.
-OR-
On the other hand, make the search more strict and not find any word at all, requesting the searched string to include vowels as well.

What's your preference?

I'd prefer the first approach, since it makes a more user friendly software. However, I can see that MS Office Word acts strictly, requiring vowels to be present.
Title: Re: Searching Unicode
Post by: skalogryz on August 31, 2016, 06:07:20 pm
The solution is introduced with r5115.
There is an overloaded Search()  method introduced (see the wiki (http://wiki.freepascal.org/RichMemo#Search)) that returns the length of the string found.

In your particular example the string with 5 characters would find a string with 8 characters in it.
Title: Re: Searching Unicode
Post by: rick2691 on August 31, 2016, 08:05:22 pm
Excellent. Thanks.

Rick
TinyPortal © 2005-2018