Recent

Author Topic: Tofu Returned  (Read 12197 times)

rick2691

  • Sr. Member
  • ****
  • Posts: 444
Re: Tofu Returned
« Reply #15 on: April 30, 2017, 07:59:07 pm »
I feel that I have come upon a valid method for implementing Unicode without Font Binding, and without erasing the \langN codes from a file. The codes follow, and as well, an explanation.

Code: Pascal  [Select][+][-]
  1. //global variables...
  2. LateCode: boolean;
  3. KeySet: boolean;
  4.  
  5.  
  6. //with OnCreate set the LateCode and KeySet variables to false;
  7.  
  8.  
  9. // the following executes the LCID value (cf.lcid:= code) if text is selected
  10. // it also sets a flag for late binding by OnChange method if there is no selection
  11.  
  12. procedure TCmdForm.SetLanguage(code: longint);   //PutRTFstr('\lang'+IntToStr(code));
  13. var cf: CharFormat2;
  14.     Run: longint;
  15.     Pos: longint;
  16. begin
  17.   Pos:= PageMemo.SelStart;
  18.   Run:= PageMemo.SelLength;
  19.   LateCode:= false;  //LangCode:= code;
  20.  
  21.   fillchar(cf, sizeof(cf), 0);
  22.   cf.cbSize:= sizeof(CharFormat2);
  23.  
  24.   if (Run=0) and (Pos>0) then
  25.      begin
  26.      PageMemo.SelStart:= Pos-1;
  27.      PageMemo.SelLength:= 1;
  28.  
  29.      if (PageMemo.SelText<>' ')
  30.         and (PageMemo.SelText<>#9)
  31.         and (PageMemo.SelText<>#13) then
  32.         begin
  33.           PageMemo.SelText:= PageMemo.SelText+' ';
  34.           Pos:= Pos+1;
  35.         end;
  36.  
  37.      LateCode:= true;  // do at OnKeyUp because EM_???CHARFORMAT needs selection
  38.      PageMemo.SelStart:= Pos;
  39.      PageMemo.SelLength:= Run;
  40.      end;
  41.  
  42.   if (Run=0) and (Pos=0) then
  43.      begin
  44.      LateCode:= true;  // do at OnKeyUp because EM_???CHARFORMAT needs selection
  45.      end;
  46.  
  47.   if (not LateCode) then  // and (run>0)
  48.      begin
  49.      SendMessage(PageMemo.handle, EM_GETCHARFORMAT, SCF_SELECTION, lparam(@cf));
  50.      cf.lcid:= code;            // richedit bug... //cf.dwMask:= CFM_LCID;  * CFM_LCID is unknown *
  51.      SendMessage(PageMemo.handle, EM_SETCHARFORMAT, SCF_SELECTION, lparam(@cf));
  52.      end;
  53. end;
  54.  
  55. procedure TCmdForm.PageMemoChanged(Sender: TObject);  // OnChange method
  56. var Pos, Run: longint;
  57. begin
  58.   if LateCode and KeySet then  // late setting for \langN code because SelLength was 0
  59.      begin
  60.        Pos:= PageMemo.SelStart;
  61.        Run:= PageMemo.SelLength;
  62.        PageMemo.SelStart:= Pos-1;
  63.        PageMemo.SelLength:= 1;    // select recent key
  64.  
  65.        // set language mode
  66.        if EngOn then SetLanguage(1033);
  67.        if GrkOn then SetLanguage(1033); // 1032 is native but unknown by MsftEdit
  68.        if HebOn then SetLanguage(1037);
  69.        if SyrOn then SetLanguage(1025); // 10241 is native but unknown by MsftEdit
  70.  
  71.        PageMemo.SelStart:= Pos;
  72.        PageMemo.SelLength:= Run;
  73.        LateCode:= false;
  74.        KeySet:= false;
  75.      end else begin
  76.                 LateCode:= false;
  77.                 KeySet:= false;
  78.               end;
  79. end;
  80.  
  81. // typical language activation
  82.  
  83. procedure TCmdForm.MnuEnglishClick(Sender: TObject);
  84. begin
  85.   if PageControl1.PageCount>0 then
  86.      begin
  87.      EngOn:= true;  // *set*
  88.      HebOn:= false;
  89.      SyrOn:= false;
  90.      GrkOn:= false;
  91.      CptOn:= false;
  92.      PhnOn:= false;
  93.      SmrOn:= false;
  94.      CmdForm.caption:= AppName+' • ENGLISH';
  95.  
  96.      SetLanguage(1033);
  97.      cboFont.Text:= DefEng;
  98.      cboFontSelect(Self);
  99.      cboFontSize.text:= DefEngSize.Text;
  100.      cboFontSizeSelect(Self);
  101.      end
  102.      else showmessage('No active document.');
  103. end;  
  104.  
  105. // typical language emplementation
  106.  
  107. procedure TCmdForm.MemoUTF8KeyPress(Sender: TObject; var UTF8Key: TUTF8Char);
  108. var Pos, Run: longint;
  109. begin
  110.   if EngOn then
  111.      begin
  112.      Case UTF8Key of
  113.           #33..#126: KeySet:= true;  // natural characters
  114.           end;
  115.      end;
  116.  
  117.   if HebOn then   //hebrew
  118.      begin
  119.      KeySet:= false;
  120.      Case UTF8Key of
  121.           // hebrew consonants
  122.           #39 : begin UTF8Key:= UnicodeToUTF8(1488); KeySet:= true; end;  // alf (#39=' apostrophe mark)
  123.           'b' : begin UTF8Key:= UnicodeToUTF8(1489); KeySet:= true; end;  // bth
  124.           'g' : begin UTF8Key:= UnicodeToUTF8(1490); KeySet:= true; end;  // gml
  125.           'd' : begin UTF8Key:= UnicodeToUTF8(1491); KeySet:= true; end;  // dlt  
  126.                  
  127.                   // set the other valid keys... always setting KeySet to True
  128.                   end;
  129.          end;
  130.  
  131.   //do the same for other languages
  132. end;  
  133.          
  134.  
  135. (*
  136. The reason that Font Binding has been an issue to date is on account of several problems...
  137.  
  138. 1.  The WindowsXP MsftEdit.DLL driver preceded both Syriac and Phoenician Unicodes.
  139.  
  140. 2.  The Syriac font compensates by being compatible with the Arabic LCID (1025) code.
  141.     Its natural code is 10241 (having a fifth digit, and MsftEdit needs a four digit code).
  142.         LCID 10241 is similar to 1024, which is a symbol LCID code. Font Binding sets it as 1025.
  143.         Since 1025 is Arabic, Font Binding changes subsequent English to Hebrew... it is confused.
  144.        
  145. 3.  In a mixed language paragraph you need to explicitly define each LCID. Otherwise Font
  146.         Binding will guess at it on your behalf... and MsftEdit is an old method.
  147.        
  148. 4.  Phoenician cannot be done because it is newer than Syriac. You can get it to write,
  149.         but internally to MsftEdit it is occupying two characters at a time. Screen printing
  150.         is compensating, but MsftEdit is not compensating in its internal database for character counts.
  151.         Consequently it can give bad information for searches and your own encoded caret positions.
  152.        
  153. 5.  With most writing, character entries are frequently non sequential. This makes MsftEdit guess wrong.
  154.  
  155. 6.  When I erased all of the \langN codes from the RTF disk file, MsftEdit assigned the \langN codes
  156.         sequentially upon reloading the file. Therefore it did not do any Font Binding. Erasing the RTF codes
  157.         was a poor way to approach the problem, but it worked, and it clarified what the problems were.
  158.        
  159. 7.  The EM_SETCHARFORMAT method sets the LCID so that MsftEdit does not have to guess at the LCID.
  160.         Unfortunately, MsftEdit does not have a dwMask parameter for LCID. So you just have to push it through.
  161.         EM_SETCHARFORMAT also doesn't do anything unless there has been a text selection. So hitting one key at
  162.         a time must have an assitance by the OnChange method where it selects the key after it is printed. Using
  163.         the EM_GETCHARFORMAT loaded the selected key's LCID value, so that pushing EM_SETCHARFORMAT through was
  164.         safe. Likewise this selection must happen with every key (including English). Otherwise MsftEdit will do
  165.         Font Binding, and it does not have a comprehensive database to work with.
  166.        
  167. 8.  Consequently, you cannot work with all the languages that are currently available (ie. Phoenician). Syriac
  168.         itself has a strong community that worked with building the font so that it squeaked through. Nevertheless,
  169.         if you follow Syriac with English or Noto Greek (because it is in the Noto English font, it will cause Binding.
  170.        
  171. 9.      My decision is to keep the above procedures, and to not do \langN erasures.     Erasures makes a problem with
  172.         a file if another Editor changes or saves the file, because they do not have code for erasing. Additionally,
  173.         I am not going to do Phoenician or Syriac. I have resigned to do only English, Greek, and Hebrew. I have  
  174.         already converted the Syriac New Testament to Hebrew characters, so I can still make use of their Scriptures.
  175.         That conversion is possible because Syriac is Semitic (as is Hebrew), and both characters and vowels are
  176.         interchangeable (providing that you take care with vowel placement and implementation).
  177. *)
  178.  

Rick
Windows 11, LAZ 2.0.10, FPC 3.2.0, SVN 63526, i386-win32-win32/win64, using windows unit

 

TinyPortal © 2005-2018