Tofu Returned

I feel that I have come upon a valid method for implementing Unicode without Font Binding, and without erasing the \langN codes from a file. The codes follow, and as well, an explanation.
Code: Pascal [Select][+]
//global variables...
LateCode: boolean;
KeySet: boolean;
 
 
//with OnCreate set the LateCode and KeySet variables to false;
 
 
// the following executes the LCID value (cf.lcid:= code) if text is selected
// it also sets a flag for late binding by OnChange method if there is no selection
 
procedure TCmdForm.SetLanguage(code: longint);   //PutRTFstr('\lang'+IntToStr(code));
var cf: CharFormat2;
    Run: longint;
    Pos: longint;
begin
  Pos:= PageMemo.SelStart;
  Run:= PageMemo.SelLength;
  LateCode:= false;  //LangCode:= code;
 
  fillchar(cf, sizeof(cf), 0);
  cf.cbSize:= sizeof(CharFormat2);
 
  if (Run=0) and (Pos>0) then
     begin
     PageMemo.SelStart:= Pos-1;
     PageMemo.SelLength:= 1;
 
     if (PageMemo.SelText<>' ')
        and (PageMemo.SelText<>#9)
        and (PageMemo.SelText<>#13) then
        begin
          PageMemo.SelText:= PageMemo.SelText+' ';
          Pos:= Pos+1;
        end;
 
     LateCode:= true;  // do at OnKeyUp because EM_???CHARFORMAT needs selection
     PageMemo.SelStart:= Pos;
     PageMemo.SelLength:= Run;
     end;
 
  if (Run=0) and (Pos=0) then
     begin
     LateCode:= true;  // do at OnKeyUp because EM_???CHARFORMAT needs selection
     end;
 
  if (not LateCode) then  // and (run>0)
     begin
     SendMessage(PageMemo.handle, EM_GETCHARFORMAT, SCF_SELECTION, lparam(@cf));
     cf.lcid:= code;            // richedit bug... //cf.dwMask:= CFM_LCID;  * CFM_LCID is unknown *
     SendMessage(PageMemo.handle, EM_SETCHARFORMAT, SCF_SELECTION, lparam(@cf));
     end;
end;
 
procedure TCmdForm.PageMemoChanged(Sender: TObject);  // OnChange method
var Pos, Run: longint;
begin
  if LateCode and KeySet then  // late setting for \langN code because SelLength was 0
     begin
       Pos:= PageMemo.SelStart;
       Run:= PageMemo.SelLength;
       PageMemo.SelStart:= Pos-1;
       PageMemo.SelLength:= 1;    // select recent key
 
       // set language mode
       if EngOn then SetLanguage(1033);
       if GrkOn then SetLanguage(1033); // 1032 is native but unknown by MsftEdit
       if HebOn then SetLanguage(1037);
       if SyrOn then SetLanguage(1025); // 10241 is native but unknown by MsftEdit
 
       PageMemo.SelStart:= Pos;
       PageMemo.SelLength:= Run;
       LateCode:= false;
       KeySet:= false;
     end else begin
                LateCode:= false;
                KeySet:= false;
              end;
end;
 
// typical language activation
 
procedure TCmdForm.MnuEnglishClick(Sender: TObject);
begin
  if PageControl1.PageCount>0 then
     begin
     EngOn:= true;  // *set*
     HebOn:= false;
     SyrOn:= false;
     GrkOn:= false;
     CptOn:= false;
     PhnOn:= false;
     SmrOn:= false;
     CmdForm.caption:= AppName+' • ENGLISH';
 
     SetLanguage(1033);
     cboFont.Text:= DefEng;
     cboFontSelect(Self);
     cboFontSize.text:= DefEngSize.Text;
     cboFontSizeSelect(Self);
     end
     else showmessage('No active document.');
end;  
 
// typical language emplementation
 
procedure TCmdForm.MemoUTF8KeyPress(Sender: TObject; var UTF8Key: TUTF8Char);
var Pos, Run: longint;
begin
  if EngOn then
     begin
     Case UTF8Key of
          #33..#126: KeySet:= true;  // natural characters
          end;
     end; 
 
  if HebOn then   //hebrew
     begin
     KeySet:= false;
     Case UTF8Key of
          // hebrew consonants
          #39 : begin UTF8Key:= UnicodeToUTF8(1488); KeySet:= true; end;  // alf (#39=' apostrophe mark)
          'b' : begin UTF8Key:= UnicodeToUTF8(1489); KeySet:= true; end;  // bth
          'g' : begin UTF8Key:= UnicodeToUTF8(1490); KeySet:= true; end;  // gml
          'd' : begin UTF8Key:= UnicodeToUTF8(1491); KeySet:= true; end;  // dlt  
                  
                  // set the other valid keys... always setting KeySet to True
                  end;
         end;
  
  //do the same for other languages
end;  
         
 
(*
The reason that Font Binding has been an issue to date is on account of several problems...
 
1.  The WindowsXP MsftEdit.DLL driver preceded both Syriac and Phoenician Unicodes.
 
2.  The Syriac font compensates by being compatible with the Arabic LCID (1025) code.
    Its natural code is 10241 (having a fifth digit, and MsftEdit needs a four digit code).
        LCID 10241 is similar to 1024, which is a symbol LCID code. Font Binding sets it as 1025.
        Since 1025 is Arabic, Font Binding changes subsequent English to Hebrew... it is confused.
        
3.  In a mixed language paragraph you need to explicitly define each LCID. Otherwise Font 
        Binding will guess at it on your behalf... and MsftEdit is an old method.
        
4.  Phoenician cannot be done because it is newer than Syriac. You can get it to write,
        but internally to MsftEdit it is occupying two characters at a time. Screen printing
        is compensating, but MsftEdit is not compensating in its internal database for character counts.
        Consequently it can give bad information for searches and your own encoded caret positions.
        
5.  With most writing, character entries are frequently non sequential. This makes MsftEdit guess wrong.
 
6.  When I erased all of the \langN codes from the RTF disk file, MsftEdit assigned the \langN codes
        sequentially upon reloading the file. Therefore it did not do any Font Binding. Erasing the RTF codes 
        was a poor way to approach the problem, but it worked, and it clarified what the problems were.
        
7.  The EM_SETCHARFORMAT method sets the LCID so that MsftEdit does not have to guess at the LCID.
        Unfortunately, MsftEdit does not have a dwMask parameter for LCID. So you just have to push it through.
        EM_SETCHARFORMAT also doesn't do anything unless there has been a text selection. So hitting one key at
        a time must have an assitance by the OnChange method where it selects the key after it is printed. Using
        the EM_GETCHARFORMAT loaded the selected key's LCID value, so that pushing EM_SETCHARFORMAT through was
        safe. Likewise this selection must happen with every key (including English). Otherwise MsftEdit will do 
        Font Binding, and it does not have a comprehensive database to work with.
        
8.  Consequently, you cannot work with all the languages that are currently available (ie. Phoenician). Syriac
        itself has a strong community that worked with building the font so that it squeaked through. Nevertheless,
        if you follow Syriac with English or Noto Greek (because it is in the Noto English font, it will cause Binding. 
        
9.      My decision is to keep the above procedures, and to not do \langN erasures.     Erasures makes a problem with
        a file if another Editor changes or saves the file, because they do not have code for erasing. Additionally,
        I am not going to do Phoenician or Syriac. I have resigned to do only English, Greek, and Hebrew. I have  
        already converted the Syriac New Testament to Hebrew characters, so I can still make use of their Scriptures.
        That conversion is possible because Syriac is Semitic (as is Hebrew), and both characters and vowels are 
        interchangeable (providing that you take care with vowel placement and implementation).
*)
 
Rick
Lazarus

Bookstore

Search

Recent

Author Topic: Tofu Returned (Read 12276 times)

rick2691

Re: Tofu Returned

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook