Lazarus

Programming => Packages and Libraries => RichMemo => Topic started by: rick2691 on January 26, 2017, 04:18:39 pm

Title: Detecting a Unicode language family
Post by: rick2691 on January 26, 2017, 04:18:39 pm
I have been detecting what language is active by whether a default font is activated when you click at a character or use one the positional keys (home, end, arrow etc).

But now I am using the same font for both English and Greek. So I am having to check the character index by the method below...

if (ord(QryChr)=206) or
  (ord(QryChr)=207) or                       // ord(char)is decimal index of character...
  (ord(QryChr)=225)    // accented       // 206, 207 or 225 for Noto Sans font
  then SetGreek                                 // 214 or 215 for Noto Sans Hebrew
  else SetEnglish;                               // 220 or 221 for Noto Sans Syriac 

As per the side notes. Greek reports any of 3 indexes: 206, 207 & 225. Hebrew would do 214 or 215, and Syriac does 220 or 221. I realize that these are partial indexes (only the first elements of pairing elements), but for now it is enough to get the job done because they are different from each other.

Nevertheless, I would like to retrieve the full Unicode identity for the character... ie. its Unicode Index value (such as a decimal 1488 or 1490 index name). If I can get this data I won't be limited to my default fonts, or be subject to quirky values, such as how Greek reports 3 different Ordinate values instead of only 2 (and there might be another that I haven't encountered).

I have tried to find information about this on the Internet, and I have not been able to find anything that is useful. Likewise, I don't know how to get the additional character elements.

Any help will be appreciated.

Rick

Title: Re: Detecting a Unicode language family
Post by: skalogryz on January 26, 2017, 04:22:14 pm
How do you get QryChr ?
Title: Re: Detecting a Unicode language family
Post by: rick2691 on January 26, 2017, 06:24:57 pm
Sorry. I didn't think it would matter.

Code: Pascal  [Select]
  1. procedure TCmdForm.ClickPageMemo(Sender: TObject);
  2. var CharCnt: longint;
  3.     StartPos,StartLength,QryPos: longint;
  4.     QryChr: char;
  5.     QryStr: string;
  6.     ShiftOn: boolean;
  7. begin     (* UnicodeToUtf8 Utf8ToUnicode UTF8Encode UTF8Decode AnsiToUtf8 Utf8ToAnsi *)
  8.   ReportPosition;
  9.   if (CurrentLin=1) and (CurrentPos=1)
  10.      then PageMemo.GetTextAttributes(PageMemo.SelStart, SelFontFormat)   // get in place
  11.      else PageMemo.GetTextAttributes(PageMemo.SelStart-1, SelFontFormat);  // get by prior
  12.   //SHOWMESSAGE('Lin='+inttostr(CurrentLin)+' Pos='+inttostr(CurrentPos));
  13.  
  14.   ShiftOn:= IsShiftKeyPressed;
  15.   if not ShiftOn then
  16.      begin
  17.      //activate keyboard language
  18.      if (SelFontFormat.name=DefEng) and (DefEng=DefGrk) then
  19.          begin
  20.          StartLength:= PageMemo.SelLength;
  21.          StartPos:= PageMemo.SelStart;
  22.          if StartPos=0 then QryPos:= 0
  23.                        else QryPos:= StartPos-1;
  24.          SendMessage(PageMemo.Handle, EM_HIDESELECTION, 1, 0); // hide selection
  25.          PageMemo.SelStart:= QryPos;
  26.          PageMemo.SelLength:= 1;
  27.          QryStr:= PageMemo.SelText;
  28.          QryChr:= QryStr[1];
  29.          while (((QryChr=' ') or (QryChr= #9)) and (QryPos>0)) do
  30.                begin
  31.                QryPos:= QryPos-1;
  32.                PageMemo.SelStart:= QryPos;
  33.                PageMemo.SelLength:= 1;
  34.                QryStr:= PageMemo.SelText;
  35.                QryChr:= QryStr[1];
  36.                end;
  37.          PageMemo.SelStart:= StartPos;
  38.          PageMemo.SelLength:= StartLength;
  39.          SendMessage(PageMemo.Handle, EM_HIDESELECTION, 0, 0); // show selection
  40.          if (ord(QryChr)=206) or
  41.             (ord(QryChr)=207) or                   // ord(char)is decimal index of character...
  42.             (ord(QryChr)=225)    // accented       // 206, 207 or 225 for Noto Sans font
  43.             then SetGreek                          // 214 or 215 for Noto Sans Hebrew
  44.             else SetEnglish;                       // 220 or 221 for Noto Sans Syriac
  45.          end;
  46.      if (SelFontFormat.name=DefGrk) and not (DefEng=DefGrk) then SetGreek;
  47.      if (SelFontFormat.name=DefHeb) then SetHebrew;
  48.      if (SelFontFormat.name=DefSyr) then SetSyriac;
  49.  
  50. (* // for testing
  51. PageMemo.SelStart:= PageMemo.SelStart-1;
  52. PageMemo.SelLength:= 1;
  53. QryStr:= PageMemo.SelText;
  54. QryChr:= QryStr[1];
  55. PageMemo.SelStart:= PageMemo.SelStart+1;
  56. PageMemo.SelLength:= 0;
  57. showmessage(inttostr(ord(QryChr)));
  58. *)
  59.  
  60.      end; // else showmessage('ShiftOn');
  61.  
  62.   PrepareToolbar;
  63.   PageMemo.Repaint; // clears previous click and highlight shadow
  64. end;            
  65.  

I just updated the code listing by giving you the entire procedure.

Rick
Title: Re: Detecting a Unicode language family
Post by: skalogryz on January 26, 2017, 07:39:13 pm
yes, that matters.
I'd certainty recommend to use Unicode/WideStrings  over utf8 strings. Just because it's easier.

See the attached example.

You might want to copy/paste some Syriac, Greek, Hebrew and English characters into that

Code: Pascal  [Select]
  1. procedure TForm1.PageMemoClick(Sender: TObject);
  2. var
  3.   QryText : WideString;
  4.   WC      : WideChar;
  5.   lang    : string;
  6. begin
  7.   PageMemo.SelLength:=1;
  8.   QryText :=UTF8Decode(PageMemo.SelText);
  9.   if QryText<>'' then begin
  10.     WC:=QryText[1];
  11.     case wc of
  12.       #$0590..#$05FF:
  13.         lang:='hebrew';
  14.       #$0700..#$074F:
  15.         lang:='syriac';
  16.       #$1F00..#$1FFF,  // this is (modern?) greek
  17.       #$0370..#$03FF:  // this is greek and coptic
  18.          lang:='greek';
  19.     else
  20.       lang:='';
  21.     end;
  22.  
  23.     if lang<>'' then
  24.       Caption:=IntToHex(Word(WC),4)+' '+lang
  25.     else
  26.       Caption:=IntToHex(Word(WC),4);
  27.   end else
  28.     Caption:='no text?';
  29. end;
  30.  
Title: Re: Detecting a Unicode language family
Post by: rick2691 on January 26, 2017, 09:26:35 pm
skalogryz, thanks for the help, and nicely done.

Hear is how I applied your method...
Code: Pascal  [Select]
  1. procedure TCmdForm.ClickPageMemo(Sender: TObject);
  2. var StartPos,StartLength,QryPos: longint;
  3.     QryCode : WideString;
  4.     WC      : WideChar;
  5.     lang    : string;
  6.     QryChr: char;
  7.     QryStr: string;
  8.     ShiftOn : boolean;
  9. begin
  10.   ReportPosition;
  11.   if (CurrentLin=1) and (CurrentPos=1)
  12.      then PageMemo.GetTextAttributes(PageMemo.SelStart, SelFontFormat)   // get in place
  13.      else PageMemo.GetTextAttributes(PageMemo.SelStart-1, SelFontFormat);  // get by prior
  14.   ShiftOn:= IsShiftKeyPressed;
  15.   if not ShiftOn then
  16.      begin
  17.      //activate keyboard language
  18.      StartLength:= PageMemo.SelLength;
  19.      StartPos:= PageMemo.SelStart;
  20.      if StartPos=0 then QryPos:= 0
  21.                          else QryPos:= StartPos-1;
  22.  
  23.      SendMessage(PageMemo.Handle, EM_HIDESELECTION, 1, 0); // hide selection
  24.      PageMemo.SelStart:= QryPos;
  25.      PageMemo.SelLength:=1;
  26.      QryStr:= PageMemo.SelText;
  27.      QryChr:= QryStr[1];
  28.      while (((QryChr=' ') or (QryChr=#9)) and (QryPos>0)) do
  29.            begin
  30.            QryPos:= QryPos-1;
  31.            PageMemo.SelStart:= QryPos;
  32.            PageMemo.SelLength:= 1;
  33.            QryStr:= PageMemo.SelText;
  34.            QryChr:= QryStr[1];
  35.            end;
  36.      QryCode:= UTF8Decode(QryStr);  //(PageMemo.SelText);
  37.      PageMemo.SelStart:= StartPos;
  38.      PageMemo.SelLength:= StartLength;
  39.      SendMessage(PageMemo.Handle, EM_HIDESELECTION, 0, 0); // show selection
  40.  
  41.      if QryCode<>'' then
  42.         begin
  43.         WC:=QryCode[1];
  44.         case wc of
  45.              #$0590..#$05FF: SetHebrew;  // lang:='hebrew';
  46.              #$0700..#$074F: SetSyriac;  // lang:='syriac';
  47.              #$1F00..#$1FFF, // this is modern greek
  48.              #$0370..#$03FF: // this is greek and coptic
  49.                              SetGreek; // this is either
  50.              #$0020..#$007F: SetEnglish;
  51.              else SetEnglish; // not in list  // CAPTION:= IntToHex(Word(WC),4);
  52.              end;
  53.         end; // CAPTION:= IntToHex(Word(WC),4);
  54.   end;
  55. end;
  56.  

My implementation inherits attributes from the previous character. Doing so seems logical for me, but it makes a problem. I have to activate EM_HIDESELECTION to stop the flashing, and I have not found a way to hide the caret. The caret dances around as it searches for a valid character to use as a basis for determining the active language.

Actually, I only have to do this for the Greek, because it is in Noto Sans font (which is Latin/English). I have to skip past spaces and tabs. In Hebrew or Syriac, its spaces and tabs are automatically reported as being part of the native language. Not so with the Greek. It thinks it is English.

Do you have a way for hiding the caret?

Rick

Title: Re: Detecting a Unicode language family
Post by: skalogryz on January 26, 2017, 10:16:24 pm
My implementation inherits attributes from the previous character. Doing so seems logical for me, but it makes a problem. I have to activate EM_HIDESELECTION to stop the flashing, and I have not found a way to hide the caret. The caret dances around as it searches for a valid character to use as a basis for determining the active language.
...
Do you have a way for hiding the caret?
I can think of two ways to do that:

1:
try to use GetStyleRange (http://wiki.freepascal.org/RichMemo#GetStyleRange) in conjunction with GetText (http://wiki.freepascal.org/RichMemo#GetText.2C_GetUText).
GetStyleRange - should find the style range for you.
GetText - should extract the text for you w/o tempering with the current selection. (you might need to update to the latest revision for that, since there was a bug, that prevent any text from being extracted)

2: use Lines.BeginUpdate / Lines.EndUpdate.
Whenever a Lines.BeginUpdate is called all visual updates are stopped and will not happen (thus the caret would not flicker).

Code: [Select]
Lines.BeginUpdate;
try
  ...search for the word/style...
  ... other code...
finally
  Lines.EndUpdate;
end;

It is highly recommended to use try .. finally, in order to start and finish the update operation.
If there's any exception occurs during the processing, you want to make sure that EndUpdate is called.
Otherwise your component might look up frozen, after the exception is processed.
(an exception might be presented to a user as an error dialog. And it might not cause the application crash).


Oh yes... you also need to test the code on Phoenician languages too.  Since a character for them occupies more than a single widechar, you'll need to make your case statement more complex.
Title: Re: Detecting a Unicode language family
Post by: rick2691 on January 26, 2017, 11:21:52 pm
Thanks. I expected that you would know a way.

I will try the 2nd option first. It looks easier, and I assume it doesn't need the upgrade. But for the 1st option... what is the revision number for the file.

I am not doing Phoenician at this point, because it triggers Font Bonding. It appears that the RichEdit driver does not like its high Unicode range.

Rick
Title: Re: Detecting a Unicode language family
Post by: skalogryz on January 27, 2017, 03:00:16 am
r5708
Title: Re: Detecting a Unicode language family
Post by: rick2691 on January 28, 2017, 02:39:34 pm
I received this error upon compiling with r5708...

win32richmemo.pas(130,20) Error: There is no method in an ancestor class to be overridden: "class TWin32WSCustomRichMemo.GetZoomFactor(const TWinControl,var Double):Boolean;"

Rick
Title: Re: Detecting a Unicode language family
Post by: rick2691 on February 18, 2017, 08:32:32 pm
skalogryz,

It is possible that the previous post for compiling by r5708 is related to the following. My system has crashed after that post, and I have had to rebuild my computer. Now I am faced with updating all of your revisions.

Is there a master composite of all the files that I can import as a package? Otherwise I have to update and update by their historical creation. No fun, and fraught with chances for for mistakes.

Rick
Title: Re: Detecting a Unicode language family
Post by: Thaddy on February 18, 2017, 09:07:21 pm
s, and I have had to rebuild my computer. Now I am faced with updating all of your revisions.
Unless you are running a 8086 with an early 8087 co-processor and you know about the firestarter virus and how to use it, it is highly unlikely that faulty software would cause you to rebuild your computer.
Again, as usual, provide us the code.... I am willing to try it, I have a safe room for that, camera to follow the explosion, but I don't think I can reproduce it... >:D :'( :-X

Note the picture is an actual hardware failure on a more modern beast. Probably D.T. forgetting that Europe has proper power supply, not a measly 110. POWER!!!!
Silly....
Title: Re: Detecting a Unicode language family
Post by: Cyrax on February 18, 2017, 10:47:19 pm
skalogryz,

It is possible that the previous post for compiling by r5708 is related to the following. My system has crashed after that post, and I have had to rebuild my computer. Now I am faced with updating all of your revisions.

Is there a master composite of all the files that I can import as a package? Otherwise I have to update and update by their historical creation. No fun, and fraught with chances for for mistakes.

Rick

You can update to latest revision in single step. There is no need to update revision by revision unless you are doing some bug hunting.
Title: Re: Detecting a Unicode language family
Post by: rick2691 on February 19, 2017, 12:42:11 pm
@Thaddy, my statement about r5708 and my computer crash was misleading. I was intending to suggest that the problem with r5708 was on account of a system problem in my computer.

@Cyrax, I had looked for that option but did not find one. Can you tell me where the Single-Step update is located?

Rick
Title: Re: Detecting a Unicode language family
Post by: Cyrax on February 19, 2017, 02:21:20 pm
Are you using TortoiseSVN? If you are, then you only need to do is right click with your mouse on the directory where your sources are and select Update menu item.

See attached pictures for more info.
Title: Re: Detecting a Unicode language family
Post by: rick2691 on February 19, 2017, 03:42:09 pm
No, I do not have Tortoise, nor an SVN.
Title: Re: Detecting a Unicode language family
Post by: Cyrax on February 19, 2017, 03:57:58 pm
No, I do not have Tortoise, nor an SVN.

Please install them. They are extremely handy to have if you are meddling with bleeding edge of the source codes! 8)
Title: Re: Detecting a Unicode language family
Post by: rick2691 on February 19, 2017, 05:47:43 pm
Thanks for the tip. I checked it out ... it won't install on Windows XP. By the Website information, I couldn't really tell what it does. There wasn't any manual or documentation.

Rick
Title: Re: Detecting a Unicode language family
Post by: jacmoe on February 19, 2017, 05:51:34 pm
You mean TortoiseSVN (https://tortoisesvn.net/) ?

Quote
Support for Windows XP with SP3 was dropped in 1.9.0. You can still download and install older versions if you need them.

The last version of TortoiseSVN that will work on XP is 1.9.0.
Title: Re: Detecting a Unicode language family
Post by: Bart on February 19, 2017, 05:54:05 pm
Thanks for the tip. I checked it out ... it won't install on Windows XP. By the Website information,

I managed to install an svn client in Win98, so XP should not be a problem.
It's a commandline program though, not as fancy as Tortoise, but it does everything I need.
I also have an svn manual somewhere.

Bart
Title: Re: Detecting a Unicode language family
Post by: jacmoe on February 19, 2017, 05:56:29 pm
You can download version 1.9.0 here: https://sourceforge.net/projects/tortoisesvn/files/1.9.0/Application/ (https://sourceforge.net/projects/tortoisesvn/files/1.9.0/Application/) - that will install on XP with SP3. (Any version later than 1.9.0 will not).
Title: Re: Detecting a Unicode language family
Post by: rick2691 on February 19, 2017, 07:52:49 pm
Thanks for the help. I tried to install 1.9.0 and got the same message as with 1.9.5, "This installation package cannot be installed by the Windows Insatller service. You must install a Windows service pack that contains a newer version of the Windows Installer service."

It did, however, have documentation.

I will try 1.8.

Rick
Title: Re: Detecting a Unicode language family
Post by: jacmoe on February 19, 2017, 07:55:44 pm
Service Pack 3 for XP should have that installer upgrade, if I remember correctly.
Title: Re: Detecting a Unicode language family
Post by: rick2691 on February 19, 2017, 07:58:23 pm
XP Pro, sp3, 32bit, is what I have. 1.8 would also not install.
Title: Re: Detecting a Unicode language family
Post by: rick2691 on February 19, 2017, 08:03:41 pm
I just rebuilt this thing a couple of days ago. I am going to check for updates to the Windows system.
Title: Re: Detecting a Unicode language family
Post by: rick2691 on February 19, 2017, 10:01:02 pm
I found an update for the WinXP Installer. I tried to load 1.9.5 and 1.9.0 ... they both said that I needed Vista or higher OS. 1.8 was built for XP. It works, and I have documentation.

Rick
Title: Re: Detecting a Unicode language family
Post by: jacmoe on February 19, 2017, 10:21:32 pm
So, it works?
If that is so, congratulations!  :)