Recent

Author Topic: Problem with UTF8  (Read 1128 times)

Raf20076

  • Full Member
  • ***
  • Posts: 140
    • https://github.com/Raf20076
Problem with UTF8
« on: June 14, 2021, 01:20:05 pm »
Hi guys I have written Lazspell, an example of spelling checker and I have one problem with UTF8.
https://github.com/Raf20076/Lazspell-2-version-2.

In Function unit https://github.com/Raf20076/Lazspell-2-version-2/blob/main/functions.pas
in function FindInMemo seems to me that one character is showed less.

When I run Lazspell-2-version-2

I choose pl_PL dictionary and in Memo I type test tes
then I click button Check, in Errors window is showed tes as an error.
I click tes in Errors window and in Memo word test is underlined (only tree characters tes) not next word tes.
test tes
So it seem to me that something is wrong with position.

This copies one character less - example:
In Memo there is only the word test so the length is 4, and now it copies from 1 character (ok) but 4-1, which is 3, so it won't have TEST, just TES. Polish letters are 2 bytes each. All strings in FindIn Memo must converted to utf8
but this one
Code: Pascal  [Select][+][-]
  1. Result := UTF8Pos(AString, AMemo.Text, StartPos);
doesn't work.

Code: Pascal  [Select][+][-]
  1. function FindInMemo(AMemo: TRichMemo; AString: String; StartPos: Integer): Integer;
  2. {Find clicked error (word) from ListBoxErrors and highligh it}
  3. //This function must be maybe rewritten and more testing including proper character calculation in UTF8
  4. begin
  5.   //Result := UTF8Pos(AString, AMemo.Text, StartPos);//Seems not working
  6.   Result := PosEx(AString, AMemo.Text, StartPos);
  7.   If Result > 0 then
  8.   begin
  9.     AMemo.SelStart := UTF8Length(PChar(AMemo.Text), Result - 1);
  10.     AMemo.SelLength := UTF8Length(AString);
  11.     AMemo.SetRangeParams(AMemo.SelStart, AMemo.SelLength,[tmm_Styles, tmm_Color],'',0,clRed,[fsBold,fsUnderLine],[]);
  12.     //In RichMemo HideSelection must be set to True
  13.     end;
  14.  
Any ideas
Thanks

Bart

  • Hero Member
  • *****
  • Posts: 4345
    • Bart en Mariska's Webstek
Re: Problem with UTF8
« Reply #1 on: June 14, 2021, 02:45:58 pm »
You find the first occurrence of 'Tes' and then mark it, so it is logical that it marks it in the word 'Test'.

Bart

engkin

  • Hero Member
  • *****
  • Posts: 2857
Re: Problem with UTF8
« Reply #2 on: June 14, 2021, 02:50:10 pm »
When you find a word make sure there is no letter before and after that word.



Looking at the code, it will test and add the same word several times. probably you don't want that. You can get rid of duplicates using TStringList with Sorted:=true and Duplicates:=dupIgnore, but then the order is not as in the text. You can remedy that by saving the position and do custom sort based on that at the end. Finally you can add them to the ListBox with the correct order and without duplicates.

Or

When you add to the StringList, you also add to the ListBox if the word was not the StringList before.



I see you made your own set of "non characters" in StripOffNonCharacter. It might not work with unicode characters like emojies. There is a better function, IsLetter, in unit Character.
« Last Edit: June 14, 2021, 03:21:27 pm by engkin »

 

TinyPortal © 2005-2018