Recent

Author Topic: Searching Unicode  (Read 3718 times)

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Searching Unicode
« on: August 27, 2016, 07:32:03 pm »
I am using the following code to search unicode texts. Where this is not true of my Syriac font, the Hebrew font will search for consonants and ignore the vowels. A very nice feature.

My problem is that the code line < PageMemo.SetSelLengthFor(SearchBox.Lines.Text);> is highlighting the find by the length of the search parameters.

Unfortunately, what is found includes vowels. The search string is 3 consonants, and what is found has 5 characters because of the vowels. Is there a way to get the length of the string that was found?

Code: Pascal  [Select]
  1. procedure TCmdForm.btnSearchRunClick(Sender: TObject);
  2. var
  3.   s : Integer;
  4.   l : Integer;
  5.   st: Integer;
  6.   opt: TSearchOptions;
  7.  
  8. begin
  9.   st:= PageMemo.SelStart;
  10.   l:= PageMemo.SelLength;
  11.   opt:= [];
  12.   if chkCaseSensitive.Checked then include(opt, RichMemo.soMatchCase); // setting the opt
  13.   if chkWholeWord.Checked then include(opt, RichMemo.soWholeWord);
  14.   if chkBackward.Checked then include(opt, RichMemo.soBackward);
  15.   // must do RichMemo.soMatchCase instead of soMatchCase
  16.   // due to soMatchCase conflict with TStringSearchOptions
  17.  
  18.   SearchPanel.Caption:= 'Searching...';
  19.   SearchPanel.repaint;
  20.   s:= PageMemo.Search(SearchBox.Lines.Text, PageMemo.SelStart, PageMemo.GetTextLen, opt);
  21.   if (s>=0) then
  22.      begin
  23.      if (st=s) and (l=UTF8Length(SearchBox.Lines.Text))
  24.          then s:= PageMemo.Search(SearchBox.Lines.Text, PageMemo.SelStart+1, PageMemo.GetTextLen, opt);
  25.      end;
  26.  
  27.   if (s>=0) then
  28.       begin
  29.       SearchPanel.Caption:= 'Search Found';
  30.       SearchPanel.repaint;
  31.       PageMemo.SelStart:= s;
  32.       PageMemo.SetSelLengthFor(SearchBox.Lines.Text);
  33.       end else begin
  34.                SearchPanel.Caption:= 'Search Failed';
  35.                SearchPanel.repaint;
  36.                //PageMemo.SelStart:= MaxInt; // moves cursor to end of PageMemo
  37.                PageMemo.SelLength:= 0;
  38.                SearchBox.SetFocus;
  39.                showmessage('None Found');
  40.                end;
  41.  
  42. end;  
  43.  

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2284
    • havefunsoft.com
Re: Searching Unicode
« Reply #1 on: August 29, 2016, 03:23:47 pm »
could you please make a sample of the text you're searching for and the text in you're searching within?

what the actual results are and what are the expected results.
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Searching Unicode
« Reply #2 on: August 29, 2016, 09:06:36 pm »
I'll put it together.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Searching Unicode
« Reply #3 on: August 30, 2016, 06:53:10 pm »
The file is attached.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Searching Unicode
« Reply #4 on: August 30, 2016, 07:15:43 pm »
I forgot to mention. The found-word will have 8 characters.
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2284
    • havefunsoft.com
Re: Searching Unicode
« Reply #5 on: August 31, 2016, 03:14:48 pm »
aha, I see the issue. 
RichEdit is pretty smart control to find the word. (for example MS Office Word would not find, while Word Pad does).
 
SetSelLengthFor method is not as smart, since it goes by the number of UTF8 characters.

If you're saying that finding the word by consonants only is good, I'd recommend to come up with some sort of function that would adjust the sel length properly taking vowels into account.
-OR-
On the other hand, make the search more strict and not find any word at all, requesting the searched string to include vowels as well.

What's your preference?

I'd prefer the first approach, since it makes a more user friendly software. However, I can see that MS Office Word acts strictly, requiring vowels to be present.
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2284
    • havefunsoft.com
Re: Searching Unicode
« Reply #6 on: August 31, 2016, 06:07:20 pm »
The solution is introduced with r5115.
There is an overloaded Search()  method introduced (see the wiki) that returns the length of the string found.

In your particular example the string with 5 characters would find a string with 8 characters in it.
« Last Edit: August 31, 2016, 08:15:49 pm by skalogryz »
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Searching Unicode
« Reply #7 on: August 31, 2016, 08:05:22 pm »
Excellent. Thanks.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit