Lazarus

Programming => Packages and Libraries => RichMemo => Topic started by: rick2691 on April 03, 2017, 07:45:43 pm

Title: Tofu Returned
Post by: rick2691 on April 03, 2017, 07:45:43 pm
Skalogryz,

After doing reads and edits on my manual for a month or two, Tofu had started to show up. At first it was only two situations that involved two separate characters that I had entered. I was able retype those lines and get RichEdit to not invoke Font Binding, but then they would come back after several reads. Then it launched a full out attack on my document. It swapped purely English (Noto Sans font) for Noto Sans Hebrew, which does not have any English characters in it, and it did it for three or four pages. Now it keeps growing after several more reads.

Disgusting.

I looked into Kcontrols and its manifest says that it does not do Unicode yet. It also doesn't do Search or Replace on even ASCII RTF text. Moreover, he only worked on the project for one year, and hasn't done anything more since August of 2015.

So I don't really have another option. It is RichMemo or nothing, and it is not going to do Unicode either. I have seen packages for sale that claim to not use RichEdit, and can handle Unicode and Graphics, but they cost about $700. If another person works on it they charge $1000 ... and get up to $7,000 with more people. Yet they haven't stated that they don't do their own version of Font Binding.

Curiously, I loaded the same file into OpenWriter, and it only did Tofu on one character. What do you think they are doing?

Rick

Title: Re: Tofu Returned
Post by: skalogryz on April 03, 2017, 08:12:18 pm
Tofu had started to show up. At first it was only two situations that involved two separate characters that I had entered. I was able retype those lines and get RichEdit to not invoke Font Binding, but then they would come back after several reads.
it sounds like an issue with saving/loading information.

Either characters saved wrong ... or font information saved wrong.
Title: Re: Tofu Returned
Post by: rick2691 on April 03, 2017, 10:01:58 pm
I thought the same thing. Earlier (when it was only two problems), I had looked at the RTF file and saw that on the offending lines had tabs after the font assignment. All of the others had tabs after the assignment. So I edited the file to set the tabs earlier. It removed the tofu, but it was temporary. After a few reloads it started up again... and it got worse.

But what about the OpemWriter load? The file that I posted only had two violations by the same file. They are reading it by a better method.

Rick
Title: Re: Tofu Returned
Post by: rick2691 on April 04, 2017, 02:49:46 pm
Everything that I load from a file is by the streaming method

Code: Pascal  [Select][+][-]
  1. if FileType='.rtf' then // PageMemo.Lines.LoadFromFile(DiskName); // read as text format of rtf code
  2.             // read as native rtf document
  3.             begin
  4.               // Utf8ToAnsi is required for windows stream
  5.               try fs:= TFileStream.Create(Utf8ToAnsi(DiskName), fmOpenRead or fmShareDenyNone);
  6.                   PageMemo.LoadRichText(fs);
  7.                   PageMemo.Hint:= DiskName; // OpenDialog1.Filename;
  8.                   finally
  9.                   fs.Free;
  10.               end;
  11.             end;
  12.  

All "save to file's" are by a similar streaming method. I have not seen any changes by saved files, and "Font Binding" operates off of a file being loaded or a text being pasted. It doesn't interfere with saving a file.

Could it be that streaming is too fast for Font Binding to process?

Rick
Title: Re: Tofu Returned
Post by: rick2691 on April 04, 2017, 05:14:05 pm
I looked closely at the RTF code for the file. It is riddled with false language assignments. This file is very old. I started it as soon as my program was operational. As the software changed, due to necessary fixes, it has likely been corrupted by subsequent Font Binding due to my own errors with operation.

I have copy/pasted the entire document into NotePad++. It looks to have retained all of the Unicode and none of the RTF assignments. Pasting it back into RichMemo it also doesn't have any Tofu. I am assuming it has been cleaned.

I will reformat the text and see if it stays clean.

Rick
Title: Re: Tofu Returned
Post by: rick2691 on April 04, 2017, 08:17:06 pm
OK, I reformatted everything, and reloaded about 10 times... there is no Tofu.

Time will tell, but it looks like the file that I had been working on was compromised with misleading RTF codes. Font Binding did not like it. Having looked at what it was, I can't blame it. It was a mess.

Rick
Title: Re: Tofu Returned
Post by: rick2691 on April 08, 2017, 08:18:56 pm
I actually did not reformat *everything*. I assumed that the English would be natural, and not need formatting... so I only did Hebrew, Syriac, and Greek.

The file has been fine. No tofu or wrong directional flows (Binding would set generic characters like @, #, RETURN, etc. to RTL flow), which had both been problems before.

Then I wondered why I hadn't formatted the English. So I went back and did so... both Tofu and wrong directional flows ripped through most of the file.

Why could that have triggered Font Binding?

Here is the Menu code...
Code: Pascal  [Select][+][-]
  1. procedure TCmdForm.MnuEnglishClick(Sender: TObject);
  2. begin
  3.   if PageControl1.PageCount>0 then
  4.      begin
  5.      EngOn:= true;  // *set*
  6.      HebOn:= false;
  7.      SyrOn:= false;
  8.      GrkOn:= false;
  9.      CptOn:= false;
  10.      PhnOn:= false;
  11.      SmrOn:= false;
  12.      CmdForm.caption:= AppName+' • ENGLISH'; // using alt-7
  13.      //PageMemo.SetFocus; // focus active before setting // reseting focus deactivates mode setting
  14.      cboFontSize.text:= DefEngSize.Text;
  15.      cboFontSizeSelect(Self);
  16.      cboFont.Text:= DefEng;
  17.      cboFontSelect(Self);
  18.      end
  19.      else showmessage('No active document.');
  20. end;
  21.  

Here is the hot-key code...
Code: Pascal  [Select][+][-]
  1. if (Key = VK_E) and (ssAlt in Shift) then  // english mode
  2.      begin
  3.      Key:= 0;  // uses WORD instead of CHAR... ie. not #0
  4.      MnuEnglishClick(self);
  5.      end;
  6.  

This is the font setting function...
Code: Pascal  [Select][+][-]
  1. procedure TCmdForm.cboFontSelect(Sender: TObject);
  2. var ChgFont: string;
  3. begin
  4.   if PageControl1.PageCount>0 then
  5.      begin
  6.         PageMemo.SetFocus;
  7.         ChgFont:= cboFont.Text;
  8.         if PageMemo.SelLength=0
  9.           then begin
  10.                SelFontFormat.Name:= ChgFont;
  11.                PageMemo.SetTextAttributes(PageMemo.SelStart, PageMemo.SelLength, SelFontFormat);
  12.                end
  13.           else begin
  14.                PageMemo.SetRangeParams (PageMemo.SelStart,
  15.                                         PageMemo.SelLength,
  16.                                         [tmm_Name],
  17.                                         ChgFont,
  18.                                         0,
  19.                                         0,
  20.                                         [],
  21.                                         []
  22.                                         );
  23.                end;
  24.         cboFont.Hint:= cboFont.Text; //cboFont.Caption;
  25.         PageMemo.SetFocus;
  26.      end;
  27. end;  
  28.  

This is the font size function...
Code: Pascal  [Select][+][-]
  1. procedure TCmdForm.cboFontSizeEditingDone(Sender: TObject);
  2. begin
  3.   cboFontSizeSelect(Self);
  4. end;
  5.  
  6. procedure TCmdForm.cboFontSizeSelect(Sender: TObject);
  7. var SizeVal: integer;
  8. begin
  9.   if PageControl1.PageCount>0 then
  10.      begin
  11.      PageMemo.SetFocus;
  12.      SizeVal:= StrToInt(cboFontSize.Text);
  13.      if PageMemo.SelLength=0
  14.         then begin
  15.              SelFontFormat.Size:= SizeVal;
  16.              PageMemo.SetTextAttributes(PageMemo.SelStart, PageMemo.SelLength, SelFontFormat);
  17.              end
  18.         else begin
  19.              PageMemo.SetRangeParams (PageMemo.SelStart,
  20.                                       PageMemo.SelLength,
  21.                                       [tmm_Size],
  22.                                       '',
  23.                                       SizeVal,
  24.                                       0,
  25.                                       [],
  26.                                       []
  27.                                       );
  28.              end;
  29.      PageMemo.SetFocus;
  30.      end;
  31. end;
  32.  

Fortunately I had tried this on a copy of the file. I still have the good one. But it was ruthless at changing English fonts to Hebrew... and there is no English in the Hebrew.

Rick
Title: Re: Tofu Returned
Post by: rick2691 on April 11, 2017, 06:41:59 pm
OK. I have found a fix. I looked at the above routines and just don't see anything wrong. Then I looked at the natural code of the RTF file, and could see that the \lang#### codes were frequently wrongly set.

The \lang#### codes are supposed to be ignored, because they are a backward compatibility to older RTF drivers. So I did a search and replace to remove all of the \lang#### codes.

It worked. The file loaded without any interference by Font Binding.

So... is there a way to suppress the insertion of the \lang#### codes?

If not, could the load function have a filter installed to reject the codes?

If not, I could write a routine that would search and delete the codes just prior to loading. The only problem with doing so is that, this far, I have found that the RichMemo search and replace is very slow. 

Changing the Book of Matthew from western to eastern vowels (760 kb file), which also required 10 iterations, took 2 hours to process. At present, removing the language codes would take 3 iterations, and there would have to be another with each language that I might add.

It would be much better to have an inline filter (at load time) that could look for \lang with a wild card for #### ...being any 4 digit number sequence. It would, however, be easier to filter by looking for the \lang sequence, then clip the next 4 characters. It doesn't matter what they are, and they are always 4 numbers. Unless that changes some day. You could just look for the next slash or a space as a terminator.

The other option might be to get the language codes to be correct... which is probably an impossibility, and may not work anyway. They are pretty much randomly wrong. I assume it is because the driver is older than the newer languages that have been added to the Unicode.

My guess is that this could be a fix for Phoenician fonts as well. I think it is the booger in the machinery.

Rick
Title: Re: Tofu Returned
Post by: rick2691 on April 11, 2017, 08:36:55 pm
I just tried this, and it works.

Code: Pascal  [Select][+][-]
  1. // *** skin the cat ***
  2.          PageMemo.Lines.LoadFromFile(DiskName); // read as rtf code
  3.          opt:= [];  // opt: TSearchOptions;
  4.          include(opt, RichMemo.soMatchCase); // setting an opt parameter: soMatchCase, soWholeWord, soBackward
  5.          SelBegin:= 0; //PageMemo.SelStart;
  6.          SelRange:= PageMemo.GetTextLen; //PageMemo.SelLength;
  7.          with PageMemo do
  8.               begin
  9.               Lines.BeginUpdate; // suspends visual update
  10.               Found:= SearchReplace(PageMemo, '\lang1033', '', opt, SelBegin, SelRange);
  11.               Found:= SearchReplace(PageMemo, '\lang1037', '', opt, SelBegin, SelRange);
  12.               Found:= SearchReplace(PageMemo, '\lang1025', '', opt, SelBegin, SelRange);
  13.               Lines.EndUpdate;
  14.               end;
  15.          PageMemo.Lines.SaveToFile(DiskName); // save as rtf code
  16.          // *** cat is skinned ***
  17.  
  18.          // *** reload the cat ***
  19.          if FileType='.rtf' then
  20.             // read as rtf document
  21.             begin
  22.               // Utf8ToAnsi is required for windows stream
  23.               try fs:= TFileStream.Create(Utf8ToAnsi(DiskName), fmOpenRead or fmShareDenyNone);
  24.                   PageMemo.LoadRichText(fs);
  25.                   PageMemo.Hint:= DiskName; // OpenDialog1.Filename;
  26.                   finally
  27.                   fs.Free;
  28.               end;
  29.             end;
  30.          // *** tofu free cat ***
  31.  

Code: Pascal  [Select][+][-]
  1. function SearchReplace(MemoString: TRichMemo;
  2.                        SearchText, ReplaceText: widestring;
  3.                        opt: TSearchOptions;
  4.                        StartRef, StartRng: longint
  5.                        ): Boolean;
  6. var StartPos, StartRun: longint; //opt: TSearchOptions;
  7. begin
  8.   with MemoString do    // ** configure for selection range **
  9.        begin
  10.          SelStart:= StartRef; // := 0;
  11.          if (StartRng=0) then StartRng:= GetTextLen;
  12.          while MemoString.Search(SearchText, SelStart, StartRef+StartRng-SelStart, opt, StartPos, StartRun) do
  13.                begin                                   //GetTextLen - SelStart
  14.                  SelStart:= StartPos;
  15.                  SelLength:= StartRun;
  16.                  SelText:= ReplaceText;
  17.                  SelStart:= StartPos + 1;
  18.                end;
  19.        end;
  20. end;
  21.  

It loads the file as ASCII text. Then it does a search and delete of \lang1033, \lang1037, and \lang1025. It did not take long. I don't know why the Unicode vowels had taken so long to do.

The problem with what I have done is that this is not a flexible filter. I have to rigorously name each language code.

Rick
Title: Re: Tofu Returned
Post by: rick2691 on April 12, 2017, 10:51:15 pm
I have worked out a method for clearing the \lang#### codes. It seems to be working well... there are a few bugs to work out, but I am encouraged.

I added another TRichmemo component to the maine form, and set its Z level to the background. I load the file into it as an ASCII format that exposes the RTF codes. I search for \lang, and delete with a 4 character increase in the SelLength parameter. It strips away all \lang codes with one iteration. To do so I added an EXT parameter (integer) to delete 4 characters beyond the search string.

Code: Pascal  [Select][+][-]
  1. procedure SearchReplace(MemoString: TRichMemo;
  2.                        SearchText, ReplaceText: widestring;
  3.                        ext: longint;
  4.                        opt: TSearchOptions;
  5.                        StartRef, StartRng: longint
  6.                        );
  7. var StartPos, StartRun: longint; //opt: TSearchOptions;
  8. begin
  9.   with MemoString do    // ** configure for selection range **
  10.        begin
  11.          SelStart:= StartRef; // := 0;
  12.          if (StartRng=0) then StartRng:= GetTextLen;
  13.          while MemoString.Search(SearchText, SelStart, StartRef+StartRng-SelStart, opt, StartPos, StartRun) do
  14.                begin                                   //GetTextLen - SelStart
  15.                  SelStart:= StartPos;
  16.                  SelLength:= StartRun+ext;  // ext extends replacement range as trailing wildcard
  17.                  SelText:= ReplaceText;
  18.                  SelStart:= StartPos + 1;
  19.                end;
  20.        end;
  21. end;
  22.  
  23. procedure TCmdForm.CleanRTFfile(CleanFile: string);
  24. var opt: TSearchOptions;
  25.     SelBegin, SelRange, ext: longint;
  26. begin
  27.   CleanMemo.clear;
  28.   CleanMemo.Lines.LoadFromFile(CleanFile); // read as ascii rtf code
  29.   opt:= [];  //include(opt, RichMemo.soMatchCase); // soMatchCase, soWholeWord, soBackward
  30.   SelBegin:= 0;
  31.   SelRange:= CleanMemo.GetTextLen; //CleanMemo.SelLength;
  32.   ext:= 4;  // ext extends replacement range as wildcard
  33.   with CleanMemo do
  34.        begin
  35.        SearchReplace(CleanMemo, '\lang', '', ext, opt, SelBegin, SelRange);
  36.        end;
  37.   CleanMemo.Lines.SaveToFile(CleanFile); // save as ascii rtf code
  38.   CleanMemo.clear;
  39. end;    
  40.  

It works for English, Hebrew, Syriac, and Greek... but Phoenician is a NoGo at this point. I have forgotten what I had to do to make Phoenician print to the screen.

As is, there is no more Tofu, and I can assign all fonts for their particular language.

The problem, so far, is to catch where a file may be active and can't be reloaded. I think I will get there.

Rick



Title: Re: Tofu Returned
Post by: rick2691 on April 13, 2017, 06:43:54 pm
I have gotten the last of the kinks out of the code. I tested the cleaned RTF file in WordPad, PolyEdit, and OpenOffice. They all read the file correctly, and no Tofu in any of them.

It looks like a fix for unwanted Font Binding. Font Binding is now operating in a way that is beneficial, but as several times before, time will tell.

Rick
Title: Re: Tofu Returned
Post by: rick2691 on April 13, 2017, 08:01:17 pm
I found the thread on when I had the first Phoenician problems. By it I have reinstated Phoenician. It works great. No issue with its high indexing so long as they are entered as hexadecimal.

Rick



Title: Re: Tofu Returned
Post by: rick2691 on April 16, 2017, 08:46:16 pm
There is a problem with Phoenician script. Since it becomes two wide characters, to deal with its higher Unicode index, it causes SelStart to have wrong placement. As applied with the following...

Code: Pascal  [Select][+][-]
  1. procedure TCmdForm.SetKeyboard;
  2. var StartPos,StartLength,QryPos: longint;
  3.     QryCode : WideString;
  4.     WC,W1   : WideChar;
  5.     QryChr  : WideChar;
  6.     QryStr  : WideString;
  7.     Str0    : WideString;
  8.     Str1    : WideString; //string;
  9. begin
  10. //activate keyboard language
  11. StartLength:= PageMemo.SelLength;
  12. StartPos:= PageMemo.SelStart;
  13. if StartPos=0 then QryPos:= 0
  14.               else QryPos:= StartPos-1;
  15.  
  16. QryCode:= ''; W1:= #0; Str1:= '';
  17. PageMemo.Lines.BeginUpdate;   // suspend screen updates  // replaces EM_HIDESELECTION
  18. try //SendMessage(PageMemo.Handle, EM_HIDESELECTION, 1, 0); // hide selection ** unnecessary
  19.     PageMemo.SelStart:= QryPos;
  20.     PageMemo.SelLength:=1;
  21.     QryStr:= PageMemo.SelText;
  22.     Str0:= UTF8Decode(QryStr);
  23.     if Str0<>'' then W1:= Str0[1]
  24.                 else W1:= #0;
  25.     if QryStr<>'' then QryChr:= QryStr[1]
  26.                   else QryChr:= #0;
  27.     if (QryPos>0) then
  28.        begin
  29.        while ((QryChr=' ') or (QryChr=#9)) do
  30.              begin
  31.              QryPos:= QryPos-1;
  32.              PageMemo.SelStart:= QryPos;
  33.              PageMemo.SelLength:= 1;
  34.              QryStr:= PageMemo.SelText;
  35.              QryChr:= QryStr[1];
  36.              PageMemo.GetTextAttributes(PageMemo.SelStart, SelFontFormat);
  37.              end;
  38.        end;
  39.     QryCode:= UTF8Decode(QryStr);  // gets widestring value
  40.     //SendMessage(PageMemo.Handle, EM_HIDESELECTION, 0, 0); // show selection ** unnecessary
  41.     finally
  42.     PageMemo.Lines.EndUpdate;   // restore screen updates  // replaces EM_HIDESELECTION
  43.     end; // end try
  44.  
  45. PageMemo.SelStart:= StartPos;
  46. PageMemo.SelLength:= StartLength;
  47.  
  48. if (QryCode<>'')and(PageMemo.SelLength=0) then
  49.   begin
  50.   WC:=QryCode[1];
  51.   case wc of
  52.        #1424..#1535: SetHebrew;  // #$0590..#$05FF:
  53.        #1792..#1871: SetSyriac;  // #$0700..#$074F:
  54.        #7936..#8191: SetGreek;   // #$1F00..#$1FFF, // modern greek
  55.        #0880..#1023: SetGreek;   // #$0370..#$03FF: // ancient greek and coptic
  56.        #0033..#0126: SetEnglish; // #$0021..#$007E: // exclamation to tilde
  57.        //#$-8960..#$-8939: SetPhoenician; // #$10900..#$1091F // #67840..#67871 // #-10238, #-8960..#-8939
  58.        else begin
  59.             SetEnglish; // not in list
  60.             end;
  61.        (*
  62.        Since a Phoenician character occupies more than a single widechar,
  63.        you'll need to make your case statement more complex.
  64.  
  65.        !!! That double character messes up cursor count. Phoenician is a NoGo. !!!
  66.  
  67.        #$0590..#$05FF: SetHebrew;  // lang:='hebrew';
  68.        #$0700..#$074F: SetSyriac;  // lang:='syriac';
  69.        #$1F00..#$1FFF, // this is modern greek
  70.        #$0370..#$03FF: // this is greek and coptic
  71.                        SetGreek; // this is either
  72.        #$0020..#$007F: SetEnglish;
  73.        else SetEnglish;
  74.        *)
  75.        end;
  76.   end else if (PageMemo.SelLength=0) then SetEnglish;  // no text available or none selected
  77.  
  78. case W1 of
  79.      // non-visual and special keys
  80.      #0011: Str1:= 'Soft-Break';  // Generic Keys
  81.      #0013: Str1:= 'Hard-Break';
  82.      #0009: Str1:= 'Tab';
  83.      #0032: Str1:= 'Space';
  84.      #0160: Str1:= 'Lock-Space';
  85.      #8209: Str1:= 'Lock-Hyphen';
  86.      #8212: Str1:= 'Long-Dash';
  87.      #8226: Str1:= 'Bullet';
  88.  
  89.      // Hebrew Vowels
  90.      #1456: Str1:= 'Swa';
  91.      #1457: Str1:= 'Xtf-Sgl';
  92.      #1458: Str1:= 'Xtf-Ptx';
  93.      #1459: Str1:= 'Xtf-Qmz';
  94.      #1460: Str1:= 'Xrq';
  95.      #1461: Str1:= 'Zre';
  96.      #1462: Str1:= 'Sgl';
  97.      #1463: Str1:= 'Ptx';
  98.      #1464: Str1:= 'Qmz';
  99.      #1465: Str1:= 'Xlm';
  100.      #1466: Str1:= 'Xlm-Gdl';
  101.      #1467: Str1:= 'Qbz';
  102.      #1468: Str1:= 'Dgs';
  103.      #1469: Str1:= 'Mtg';
  104.  
  105.      // Syriac Vowels Western
  106.      #1840,#1841: Str1:= 'Ptx';
  107.      #1843,#1844: Str1:= 'Zqp | Qmz';
  108.      #1846,#1847: Str1:= 'Rbs | Zre';
  109.      #1850,#1851: Str1:= 'Xbz | Xrq';
  110.      #1853,#1854: Str1:= 'Eza | Qbz';
  111.  
  112.      // Syriac Vowels Eastern
  113.      #1842: Str1:= 'Ptx';   // a
  114.      #1845: Str1:= 'Zqp | Qmz';     // o
  115.      #1848: Str1:= 'Zlm | Rbs | Zre';   // e
  116.      #1849: Str1:= 'Zlm | Hbs | Xrq';   // i
  117.      #1855: Str1:= 'Rwa | Eza | Xlm';   // v
  118.      #1852: Str1:= 'Qbz';  // u
  119.  
  120.      // Syriac Accents
  121.      #1856: Str1:= 'Feminine Marker';   // @
  122.      #1857: Str1:= 'Hard Accent';   // #
  123.      #1858: Str1:= 'Soft Accent';   // $
  124.      #1863,#1864: Str1:= 'Silent Marker'; // ~ `
  125.  
  126.      // Greek Accents
  127.      #8189: Str1:= 'Acute Accent';
  128.      #8128: Str1:= 'Flex Accent';
  129.      #8175: Str1:= 'Grave Accent';
  130.      #8127: Str1:= 'Rough Accent';
  131.      #8190: Str1:= 'Smooth Accent';
  132.      #0900: Str1:= 'Tonos Accent';
  133.      end;  //else Str1:= 'U+'+IntToStr(Word(WC)); // +IntToHex(Word(WC),4); end;
  134.  
  135. //Str1:= 'U+'+IntToStr(Word(WC));
  136.  
  137. if (Str1<>'') then Str1:= ': '+Str1;
  138. CmdForm.caption:= CmdForm.caption+Str1; // needs single font with all languages
  139. end;
  140.  

...the WC and W1 values can be offset by the double characters as one character. The more Phoenician characters that you create, the greater the offset. Consequently, if you are polling the position, you get wrong results.

Is there anything to be done about that? It sounds complicated to me.

Rick
Title: Re: Tofu Returned
Post by: rick2691 on April 24, 2017, 07:44:55 pm
OK, erasing the \lang codes from the file does not eliminate the codes.

They are reconstructed immediately after loading a file.

But erasing them from the file does clean up the code's erratic behavior, and they are rebuilt in a more stable fashion. That is why Font Binding leaves the cleaned file alone.

I built a query function to retrieve the \lang parameters from any page position.

Code: Pascal  [Select][+][-]
  1. fillchar(cf, sizeof(cf), 0);
  2. cf.cbSize := sizeof( cf );    // cf.dwMask := CFM_LCID;   // CFM_LCID is unknown
  3. SendMessage(PageMemo.handle, EM_GETCHARFORMAT, SCF_SELECTION, lparam(@cf));
  4. Lng:= IntToStr(Loword(cf.lcid));  // language identifier
  5.  

As I step the cursor through a string of mixed languages it shows the following...

1.   Greek is not recognized. The \lang code is 1033, which is English.
2.   Hebrew envelopes spaces and tabs... everything is 1037, which is Hebrew.
3.   Syriac, which is 1025, does not envelope spaces and tabs... they are 1033, English.
4.   When changing to English, in a mixed text, the code sets to Hebrew, 1037, but types English.
5.   Item 4 (above) is only noticed if you type parenthesis. They print backwards.
6.   With item 4 (above) typing a space, dash or period, can make the cursor jump around.

My conclusion is that the \lang codes are dysfunctional, and need to be revised.

One method that might be feasible (if the codes are internal to RichEdit and can't be fixed) is to activate a query from start to stop of the document.

Using a similar method to the code section that I posted above, it could detect when a language change has occurred, then use that as a selection string for a SendMessage(PageMemo.handle, EM_SETCHARFORMAT, SCF_SELECTION, lparam(@cf)). By doing so, it would assign the true \lang code for each run of a specific language.

Then the erasing of \lang codes in the file would be both unnecessary and counterproductive.

The question that I have is... Will the RichEdit driver permit me to manually alter the \lang codes? Or will it reassert its own method for inserting them... sort of like Font Binding, and it might actually be a part of the Font Binding method.

Rick

Title: Re: Tofu Returned
Post by: rick2691 on April 24, 2017, 08:05:32 pm
The previous method that I just cited would only be necessary for cleaning up a garbaged file. A new file might be controlled by using the SendMessage(PageMemo.handle, EM_GETCHARFORMAT, SCF_SELECTION, lparam(@cf)) at the same instance with setting the language mode and font.

Perhaps the driver would not need to assert its own system for language selection. Likewise it should restrain any unwanted Font Binding.

Rick
Title: Re: Tofu Returned
Post by: rick2691 on April 30, 2017, 07:59:07 pm
I feel that I have come upon a valid method for implementing Unicode without Font Binding, and without erasing the \langN codes from a file. The codes follow, and as well, an explanation.

Code: Pascal  [Select][+][-]
  1. //global variables...
  2. LateCode: boolean;
  3. KeySet: boolean;
  4.  
  5.  
  6. //with OnCreate set the LateCode and KeySet variables to false;
  7.  
  8.  
  9. // the following executes the LCID value (cf.lcid:= code) if text is selected
  10. // it also sets a flag for late binding by OnChange method if there is no selection
  11.  
  12. procedure TCmdForm.SetLanguage(code: longint);   //PutRTFstr('\lang'+IntToStr(code));
  13. var cf: CharFormat2;
  14.     Run: longint;
  15.     Pos: longint;
  16. begin
  17.   Pos:= PageMemo.SelStart;
  18.   Run:= PageMemo.SelLength;
  19.   LateCode:= false;  //LangCode:= code;
  20.  
  21.   fillchar(cf, sizeof(cf), 0);
  22.   cf.cbSize:= sizeof(CharFormat2);
  23.  
  24.   if (Run=0) and (Pos>0) then
  25.      begin
  26.      PageMemo.SelStart:= Pos-1;
  27.      PageMemo.SelLength:= 1;
  28.  
  29.      if (PageMemo.SelText<>' ')
  30.         and (PageMemo.SelText<>#9)
  31.         and (PageMemo.SelText<>#13) then
  32.         begin
  33.           PageMemo.SelText:= PageMemo.SelText+' ';
  34.           Pos:= Pos+1;
  35.         end;
  36.  
  37.      LateCode:= true;  // do at OnKeyUp because EM_???CHARFORMAT needs selection
  38.      PageMemo.SelStart:= Pos;
  39.      PageMemo.SelLength:= Run;
  40.      end;
  41.  
  42.   if (Run=0) and (Pos=0) then
  43.      begin
  44.      LateCode:= true;  // do at OnKeyUp because EM_???CHARFORMAT needs selection
  45.      end;
  46.  
  47.   if (not LateCode) then  // and (run>0)
  48.      begin
  49.      SendMessage(PageMemo.handle, EM_GETCHARFORMAT, SCF_SELECTION, lparam(@cf));
  50.      cf.lcid:= code;            // richedit bug... //cf.dwMask:= CFM_LCID;  * CFM_LCID is unknown *
  51.      SendMessage(PageMemo.handle, EM_SETCHARFORMAT, SCF_SELECTION, lparam(@cf));
  52.      end;
  53. end;
  54.  
  55. procedure TCmdForm.PageMemoChanged(Sender: TObject);  // OnChange method
  56. var Pos, Run: longint;
  57. begin
  58.   if LateCode and KeySet then  // late setting for \langN code because SelLength was 0
  59.      begin
  60.        Pos:= PageMemo.SelStart;
  61.        Run:= PageMemo.SelLength;
  62.        PageMemo.SelStart:= Pos-1;
  63.        PageMemo.SelLength:= 1;    // select recent key
  64.  
  65.        // set language mode
  66.        if EngOn then SetLanguage(1033);
  67.        if GrkOn then SetLanguage(1033); // 1032 is native but unknown by MsftEdit
  68.        if HebOn then SetLanguage(1037);
  69.        if SyrOn then SetLanguage(1025); // 10241 is native but unknown by MsftEdit
  70.  
  71.        PageMemo.SelStart:= Pos;
  72.        PageMemo.SelLength:= Run;
  73.        LateCode:= false;
  74.        KeySet:= false;
  75.      end else begin
  76.                 LateCode:= false;
  77.                 KeySet:= false;
  78.               end;
  79. end;
  80.  
  81. // typical language activation
  82.  
  83. procedure TCmdForm.MnuEnglishClick(Sender: TObject);
  84. begin
  85.   if PageControl1.PageCount>0 then
  86.      begin
  87.      EngOn:= true;  // *set*
  88.      HebOn:= false;
  89.      SyrOn:= false;
  90.      GrkOn:= false;
  91.      CptOn:= false;
  92.      PhnOn:= false;
  93.      SmrOn:= false;
  94.      CmdForm.caption:= AppName+' • ENGLISH';
  95.  
  96.      SetLanguage(1033);
  97.      cboFont.Text:= DefEng;
  98.      cboFontSelect(Self);
  99.      cboFontSize.text:= DefEngSize.Text;
  100.      cboFontSizeSelect(Self);
  101.      end
  102.      else showmessage('No active document.');
  103. end;  
  104.  
  105. // typical language emplementation
  106.  
  107. procedure TCmdForm.MemoUTF8KeyPress(Sender: TObject; var UTF8Key: TUTF8Char);
  108. var Pos, Run: longint;
  109. begin
  110.   if EngOn then
  111.      begin
  112.      Case UTF8Key of
  113.           #33..#126: KeySet:= true;  // natural characters
  114.           end;
  115.      end;
  116.  
  117.   if HebOn then   //hebrew
  118.      begin
  119.      KeySet:= false;
  120.      Case UTF8Key of
  121.           // hebrew consonants
  122.           #39 : begin UTF8Key:= UnicodeToUTF8(1488); KeySet:= true; end;  // alf (#39=' apostrophe mark)
  123.           'b' : begin UTF8Key:= UnicodeToUTF8(1489); KeySet:= true; end;  // bth
  124.           'g' : begin UTF8Key:= UnicodeToUTF8(1490); KeySet:= true; end;  // gml
  125.           'd' : begin UTF8Key:= UnicodeToUTF8(1491); KeySet:= true; end;  // dlt  
  126.                  
  127.                   // set the other valid keys... always setting KeySet to True
  128.                   end;
  129.          end;
  130.  
  131.   //do the same for other languages
  132. end;  
  133.          
  134.  
  135. (*
  136. The reason that Font Binding has been an issue to date is on account of several problems...
  137.  
  138. 1.  The WindowsXP MsftEdit.DLL driver preceded both Syriac and Phoenician Unicodes.
  139.  
  140. 2.  The Syriac font compensates by being compatible with the Arabic LCID (1025) code.
  141.     Its natural code is 10241 (having a fifth digit, and MsftEdit needs a four digit code).
  142.         LCID 10241 is similar to 1024, which is a symbol LCID code. Font Binding sets it as 1025.
  143.         Since 1025 is Arabic, Font Binding changes subsequent English to Hebrew... it is confused.
  144.        
  145. 3.  In a mixed language paragraph you need to explicitly define each LCID. Otherwise Font
  146.         Binding will guess at it on your behalf... and MsftEdit is an old method.
  147.        
  148. 4.  Phoenician cannot be done because it is newer than Syriac. You can get it to write,
  149.         but internally to MsftEdit it is occupying two characters at a time. Screen printing
  150.         is compensating, but MsftEdit is not compensating in its internal database for character counts.
  151.         Consequently it can give bad information for searches and your own encoded caret positions.
  152.        
  153. 5.  With most writing, character entries are frequently non sequential. This makes MsftEdit guess wrong.
  154.  
  155. 6.  When I erased all of the \langN codes from the RTF disk file, MsftEdit assigned the \langN codes
  156.         sequentially upon reloading the file. Therefore it did not do any Font Binding. Erasing the RTF codes
  157.         was a poor way to approach the problem, but it worked, and it clarified what the problems were.
  158.        
  159. 7.  The EM_SETCHARFORMAT method sets the LCID so that MsftEdit does not have to guess at the LCID.
  160.         Unfortunately, MsftEdit does not have a dwMask parameter for LCID. So you just have to push it through.
  161.         EM_SETCHARFORMAT also doesn't do anything unless there has been a text selection. So hitting one key at
  162.         a time must have an assitance by the OnChange method where it selects the key after it is printed. Using
  163.         the EM_GETCHARFORMAT loaded the selected key's LCID value, so that pushing EM_SETCHARFORMAT through was
  164.         safe. Likewise this selection must happen with every key (including English). Otherwise MsftEdit will do
  165.         Font Binding, and it does not have a comprehensive database to work with.
  166.        
  167. 8.  Consequently, you cannot work with all the languages that are currently available (ie. Phoenician). Syriac
  168.         itself has a strong community that worked with building the font so that it squeaked through. Nevertheless,
  169.         if you follow Syriac with English or Noto Greek (because it is in the Noto English font, it will cause Binding.
  170.        
  171. 9.      My decision is to keep the above procedures, and to not do \langN erasures.     Erasures makes a problem with
  172.         a file if another Editor changes or saves the file, because they do not have code for erasing. Additionally,
  173.         I am not going to do Phoenician or Syriac. I have resigned to do only English, Greek, and Hebrew. I have  
  174.         already converted the Syriac New Testament to Hebrew characters, so I can still make use of their Scriptures.
  175.         That conversion is possible because Syriac is Semitic (as is Hebrew), and both characters and vowels are
  176.         interchangeable (providing that you take care with vowel placement and implementation).
  177. *)
  178.  

Rick
TinyPortal © 2005-2018