Recent

Author Topic: Tofu Returned  (Read 8691 times)

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Tofu Returned
« on: April 03, 2017, 07:45:43 pm »
Skalogryz,

After doing reads and edits on my manual for a month or two, Tofu had started to show up. At first it was only two situations that involved two separate characters that I had entered. I was able retype those lines and get RichEdit to not invoke Font Binding, but then they would come back after several reads. Then it launched a full out attack on my document. It swapped purely English (Noto Sans font) for Noto Sans Hebrew, which does not have any English characters in it, and it did it for three or four pages. Now it keeps growing after several more reads.

Disgusting.

I looked into Kcontrols and its manifest says that it does not do Unicode yet. It also doesn't do Search or Replace on even ASCII RTF text. Moreover, he only worked on the project for one year, and hasn't done anything more since August of 2015.

So I don't really have another option. It is RichMemo or nothing, and it is not going to do Unicode either. I have seen packages for sale that claim to not use RichEdit, and can handle Unicode and Graphics, but they cost about $700. If another person works on it they charge $1000 ... and get up to $7,000 with more people. Yet they haven't stated that they don't do their own version of Font Binding.

Curiously, I loaded the same file into OpenWriter, and it only did Tofu on one character. What do you think they are doing?

Rick

Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2280
    • havefunsoft.com
Re: Tofu Returned
« Reply #1 on: April 03, 2017, 08:12:18 pm »
Tofu had started to show up. At first it was only two situations that involved two separate characters that I had entered. I was able retype those lines and get RichEdit to not invoke Font Binding, but then they would come back after several reads.
it sounds like an issue with saving/loading information.

Either characters saved wrong ... or font information saved wrong.
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #2 on: April 03, 2017, 10:01:58 pm »
I thought the same thing. Earlier (when it was only two problems), I had looked at the RTF file and saw that on the offending lines had tabs after the font assignment. All of the others had tabs after the assignment. So I edited the file to set the tabs earlier. It removed the tofu, but it was temporary. After a few reloads it started up again... and it got worse.

But what about the OpemWriter load? The file that I posted only had two violations by the same file. They are reading it by a better method.

Rick
« Last Edit: April 03, 2017, 10:04:58 pm by rick2691 »
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #3 on: April 04, 2017, 02:49:46 pm »
Everything that I load from a file is by the streaming method

Code: Pascal  [Select]
  1. if FileType='.rtf' then // PageMemo.Lines.LoadFromFile(DiskName); // read as text format of rtf code
  2.             // read as native rtf document
  3.             begin
  4.               // Utf8ToAnsi is required for windows stream
  5.               try fs:= TFileStream.Create(Utf8ToAnsi(DiskName), fmOpenRead or fmShareDenyNone);
  6.                   PageMemo.LoadRichText(fs);
  7.                   PageMemo.Hint:= DiskName; // OpenDialog1.Filename;
  8.                   finally
  9.                   fs.Free;
  10.               end;
  11.             end;
  12.  

All "save to file's" are by a similar streaming method. I have not seen any changes by saved files, and "Font Binding" operates off of a file being loaded or a text being pasted. It doesn't interfere with saving a file.

Could it be that streaming is too fast for Font Binding to process?

Rick
« Last Edit: April 04, 2017, 02:53:10 pm by rick2691 »
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #4 on: April 04, 2017, 05:14:05 pm »
I looked closely at the RTF code for the file. It is riddled with false language assignments. This file is very old. I started it as soon as my program was operational. As the software changed, due to necessary fixes, it has likely been corrupted by subsequent Font Binding due to my own errors with operation.

I have copy/pasted the entire document into NotePad++. It looks to have retained all of the Unicode and none of the RTF assignments. Pasting it back into RichMemo it also doesn't have any Tofu. I am assuming it has been cleaned.

I will reformat the text and see if it stays clean.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #5 on: April 04, 2017, 08:17:06 pm »
OK, I reformatted everything, and reloaded about 10 times... there is no Tofu.

Time will tell, but it looks like the file that I had been working on was compromised with misleading RTF codes. Font Binding did not like it. Having looked at what it was, I can't blame it. It was a mess.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #6 on: April 08, 2017, 08:18:56 pm »
I actually did not reformat *everything*. I assumed that the English would be natural, and not need formatting... so I only did Hebrew, Syriac, and Greek.

The file has been fine. No tofu or wrong directional flows (Binding would set generic characters like @, #, RETURN, etc. to RTL flow), which had both been problems before.

Then I wondered why I hadn't formatted the English. So I went back and did so... both Tofu and wrong directional flows ripped through most of the file.

Why could that have triggered Font Binding?

Here is the Menu code...
Code: Pascal  [Select]
  1. procedure TCmdForm.MnuEnglishClick(Sender: TObject);
  2. begin
  3.   if PageControl1.PageCount>0 then
  4.      begin
  5.      EngOn:= true;  // *set*
  6.      HebOn:= false;
  7.      SyrOn:= false;
  8.      GrkOn:= false;
  9.      CptOn:= false;
  10.      PhnOn:= false;
  11.      SmrOn:= false;
  12.      CmdForm.caption:= AppName+' • ENGLISH'; // using alt-7
  13.      //PageMemo.SetFocus; // focus active before setting // reseting focus deactivates mode setting
  14.      cboFontSize.text:= DefEngSize.Text;
  15.      cboFontSizeSelect(Self);
  16.      cboFont.Text:= DefEng;
  17.      cboFontSelect(Self);
  18.      end
  19.      else showmessage('No active document.');
  20. end;
  21.  

Here is the hot-key code...
Code: Pascal  [Select]
  1. if (Key = VK_E) and (ssAlt in Shift) then  // english mode
  2.      begin
  3.      Key:= 0;  // uses WORD instead of CHAR... ie. not #0
  4.      MnuEnglishClick(self);
  5.      end;
  6.  

This is the font setting function...
Code: Pascal  [Select]
  1. procedure TCmdForm.cboFontSelect(Sender: TObject);
  2. var ChgFont: string;
  3. begin
  4.   if PageControl1.PageCount>0 then
  5.      begin
  6.         PageMemo.SetFocus;
  7.         ChgFont:= cboFont.Text;
  8.         if PageMemo.SelLength=0
  9.           then begin
  10.                SelFontFormat.Name:= ChgFont;
  11.                PageMemo.SetTextAttributes(PageMemo.SelStart, PageMemo.SelLength, SelFontFormat);
  12.                end
  13.           else begin
  14.                PageMemo.SetRangeParams (PageMemo.SelStart,
  15.                                         PageMemo.SelLength,
  16.                                         [tmm_Name],
  17.                                         ChgFont,
  18.                                         0,
  19.                                         0,
  20.                                         [],
  21.                                         []
  22.                                         );
  23.                end;
  24.         cboFont.Hint:= cboFont.Text; //cboFont.Caption;
  25.         PageMemo.SetFocus;
  26.      end;
  27. end;  
  28.  

This is the font size function...
Code: Pascal  [Select]
  1. procedure TCmdForm.cboFontSizeEditingDone(Sender: TObject);
  2. begin
  3.   cboFontSizeSelect(Self);
  4. end;
  5.  
  6. procedure TCmdForm.cboFontSizeSelect(Sender: TObject);
  7. var SizeVal: integer;
  8. begin
  9.   if PageControl1.PageCount>0 then
  10.      begin
  11.      PageMemo.SetFocus;
  12.      SizeVal:= StrToInt(cboFontSize.Text);
  13.      if PageMemo.SelLength=0
  14.         then begin
  15.              SelFontFormat.Size:= SizeVal;
  16.              PageMemo.SetTextAttributes(PageMemo.SelStart, PageMemo.SelLength, SelFontFormat);
  17.              end
  18.         else begin
  19.              PageMemo.SetRangeParams (PageMemo.SelStart,
  20.                                       PageMemo.SelLength,
  21.                                       [tmm_Size],
  22.                                       '',
  23.                                       SizeVal,
  24.                                       0,
  25.                                       [],
  26.                                       []
  27.                                       );
  28.              end;
  29.      PageMemo.SetFocus;
  30.      end;
  31. end;
  32.  

Fortunately I had tried this on a copy of the file. I still have the good one. But it was ruthless at changing English fonts to Hebrew... and there is no English in the Hebrew.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #7 on: April 11, 2017, 06:41:59 pm »
OK. I have found a fix. I looked at the above routines and just don't see anything wrong. Then I looked at the natural code of the RTF file, and could see that the \lang#### codes were frequently wrongly set.

The \lang#### codes are supposed to be ignored, because they are a backward compatibility to older RTF drivers. So I did a search and replace to remove all of the \lang#### codes.

It worked. The file loaded without any interference by Font Binding.

So... is there a way to suppress the insertion of the \lang#### codes?

If not, could the load function have a filter installed to reject the codes?

If not, I could write a routine that would search and delete the codes just prior to loading. The only problem with doing so is that, this far, I have found that the RichMemo search and replace is very slow. 

Changing the Book of Matthew from western to eastern vowels (760 kb file), which also required 10 iterations, took 2 hours to process. At present, removing the language codes would take 3 iterations, and there would have to be another with each language that I might add.

It would be much better to have an inline filter (at load time) that could look for \lang with a wild card for #### ...being any 4 digit number sequence. It would, however, be easier to filter by looking for the \lang sequence, then clip the next 4 characters. It doesn't matter what they are, and they are always 4 numbers. Unless that changes some day. You could just look for the next slash or a space as a terminator.

The other option might be to get the language codes to be correct... which is probably an impossibility, and may not work anyway. They are pretty much randomly wrong. I assume it is because the driver is older than the newer languages that have been added to the Unicode.

My guess is that this could be a fix for Phoenician fonts as well. I think it is the booger in the machinery.

Rick
« Last Edit: April 11, 2017, 07:04:48 pm by rick2691 »
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #8 on: April 11, 2017, 08:36:55 pm »
I just tried this, and it works.

Code: Pascal  [Select]
  1. // *** skin the cat ***
  2.          PageMemo.Lines.LoadFromFile(DiskName); // read as rtf code
  3.          opt:= [];  // opt: TSearchOptions;
  4.          include(opt, RichMemo.soMatchCase); // setting an opt parameter: soMatchCase, soWholeWord, soBackward
  5.          SelBegin:= 0; //PageMemo.SelStart;
  6.          SelRange:= PageMemo.GetTextLen; //PageMemo.SelLength;
  7.          with PageMemo do
  8.               begin
  9.               Lines.BeginUpdate; // suspends visual update
  10.               Found:= SearchReplace(PageMemo, '\lang1033', '', opt, SelBegin, SelRange);
  11.               Found:= SearchReplace(PageMemo, '\lang1037', '', opt, SelBegin, SelRange);
  12.               Found:= SearchReplace(PageMemo, '\lang1025', '', opt, SelBegin, SelRange);
  13.               Lines.EndUpdate;
  14.               end;
  15.          PageMemo.Lines.SaveToFile(DiskName); // save as rtf code
  16.          // *** cat is skinned ***
  17.  
  18.          // *** reload the cat ***
  19.          if FileType='.rtf' then
  20.             // read as rtf document
  21.             begin
  22.               // Utf8ToAnsi is required for windows stream
  23.               try fs:= TFileStream.Create(Utf8ToAnsi(DiskName), fmOpenRead or fmShareDenyNone);
  24.                   PageMemo.LoadRichText(fs);
  25.                   PageMemo.Hint:= DiskName; // OpenDialog1.Filename;
  26.                   finally
  27.                   fs.Free;
  28.               end;
  29.             end;
  30.          // *** tofu free cat ***
  31.  

Code: Pascal  [Select]
  1. function SearchReplace(MemoString: TRichMemo;
  2.                        SearchText, ReplaceText: widestring;
  3.                        opt: TSearchOptions;
  4.                        StartRef, StartRng: longint
  5.                        ): Boolean;
  6. var StartPos, StartRun: longint; //opt: TSearchOptions;
  7. begin
  8.   with MemoString do    // ** configure for selection range **
  9.        begin
  10.          SelStart:= StartRef; // := 0;
  11.          if (StartRng=0) then StartRng:= GetTextLen;
  12.          while MemoString.Search(SearchText, SelStart, StartRef+StartRng-SelStart, opt, StartPos, StartRun) do
  13.                begin                                   //GetTextLen - SelStart
  14.                  SelStart:= StartPos;
  15.                  SelLength:= StartRun;
  16.                  SelText:= ReplaceText;
  17.                  SelStart:= StartPos + 1;
  18.                end;
  19.        end;
  20. end;
  21.  

It loads the file as ASCII text. Then it does a search and delete of \lang1033, \lang1037, and \lang1025. It did not take long. I don't know why the Unicode vowels had taken so long to do.

The problem with what I have done is that this is not a flexible filter. I have to rigorously name each language code.

Rick
« Last Edit: April 11, 2017, 08:41:21 pm by rick2691 »
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #9 on: April 12, 2017, 10:51:15 pm »
I have worked out a method for clearing the \lang#### codes. It seems to be working well... there are a few bugs to work out, but I am encouraged.

I added another TRichmemo component to the maine form, and set its Z level to the background. I load the file into it as an ASCII format that exposes the RTF codes. I search for \lang, and delete with a 4 character increase in the SelLength parameter. It strips away all \lang codes with one iteration. To do so I added an EXT parameter (integer) to delete 4 characters beyond the search string.

Code: Pascal  [Select]
  1. procedure SearchReplace(MemoString: TRichMemo;
  2.                        SearchText, ReplaceText: widestring;
  3.                        ext: longint;
  4.                        opt: TSearchOptions;
  5.                        StartRef, StartRng: longint
  6.                        );
  7. var StartPos, StartRun: longint; //opt: TSearchOptions;
  8. begin
  9.   with MemoString do    // ** configure for selection range **
  10.        begin
  11.          SelStart:= StartRef; // := 0;
  12.          if (StartRng=0) then StartRng:= GetTextLen;
  13.          while MemoString.Search(SearchText, SelStart, StartRef+StartRng-SelStart, opt, StartPos, StartRun) do
  14.                begin                                   //GetTextLen - SelStart
  15.                  SelStart:= StartPos;
  16.                  SelLength:= StartRun+ext;  // ext extends replacement range as trailing wildcard
  17.                  SelText:= ReplaceText;
  18.                  SelStart:= StartPos + 1;
  19.                end;
  20.        end;
  21. end;
  22.  
  23. procedure TCmdForm.CleanRTFfile(CleanFile: string);
  24. var opt: TSearchOptions;
  25.     SelBegin, SelRange, ext: longint;
  26. begin
  27.   CleanMemo.clear;
  28.   CleanMemo.Lines.LoadFromFile(CleanFile); // read as ascii rtf code
  29.   opt:= [];  //include(opt, RichMemo.soMatchCase); // soMatchCase, soWholeWord, soBackward
  30.   SelBegin:= 0;
  31.   SelRange:= CleanMemo.GetTextLen; //CleanMemo.SelLength;
  32.   ext:= 4;  // ext extends replacement range as wildcard
  33.   with CleanMemo do
  34.        begin
  35.        SearchReplace(CleanMemo, '\lang', '', ext, opt, SelBegin, SelRange);
  36.        end;
  37.   CleanMemo.Lines.SaveToFile(CleanFile); // save as ascii rtf code
  38.   CleanMemo.clear;
  39. end;    
  40.  

It works for English, Hebrew, Syriac, and Greek... but Phoenician is a NoGo at this point. I have forgotten what I had to do to make Phoenician print to the screen.

As is, there is no more Tofu, and I can assign all fonts for their particular language.

The problem, so far, is to catch where a file may be active and can't be reloaded. I think I will get there.

Rick



Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #10 on: April 13, 2017, 06:43:54 pm »
I have gotten the last of the kinks out of the code. I tested the cleaned RTF file in WordPad, PolyEdit, and OpenOffice. They all read the file correctly, and no Tofu in any of them.

It looks like a fix for unwanted Font Binding. Font Binding is now operating in a way that is beneficial, but as several times before, time will tell.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #11 on: April 13, 2017, 08:01:17 pm »
I found the thread on when I had the first Phoenician problems. By it I have reinstated Phoenician. It works great. No issue with its high indexing so long as they are entered as hexadecimal.

Rick



Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #12 on: April 16, 2017, 08:46:16 pm »
There is a problem with Phoenician script. Since it becomes two wide characters, to deal with its higher Unicode index, it causes SelStart to have wrong placement. As applied with the following...

Code: Pascal  [Select]
  1. procedure TCmdForm.SetKeyboard;
  2. var StartPos,StartLength,QryPos: longint;
  3.     QryCode : WideString;
  4.     WC,W1   : WideChar;
  5.     QryChr  : WideChar;
  6.     QryStr  : WideString;
  7.     Str0    : WideString;
  8.     Str1    : WideString; //string;
  9. begin
  10. //activate keyboard language
  11. StartLength:= PageMemo.SelLength;
  12. StartPos:= PageMemo.SelStart;
  13. if StartPos=0 then QryPos:= 0
  14.               else QryPos:= StartPos-1;
  15.  
  16. QryCode:= ''; W1:= #0; Str1:= '';
  17. PageMemo.Lines.BeginUpdate;   // suspend screen updates  // replaces EM_HIDESELECTION
  18. try //SendMessage(PageMemo.Handle, EM_HIDESELECTION, 1, 0); // hide selection ** unnecessary
  19.     PageMemo.SelStart:= QryPos;
  20.     PageMemo.SelLength:=1;
  21.     QryStr:= PageMemo.SelText;
  22.     Str0:= UTF8Decode(QryStr);
  23.     if Str0<>'' then W1:= Str0[1]
  24.                 else W1:= #0;
  25.     if QryStr<>'' then QryChr:= QryStr[1]
  26.                   else QryChr:= #0;
  27.     if (QryPos>0) then
  28.        begin
  29.        while ((QryChr=' ') or (QryChr=#9)) do
  30.              begin
  31.              QryPos:= QryPos-1;
  32.              PageMemo.SelStart:= QryPos;
  33.              PageMemo.SelLength:= 1;
  34.              QryStr:= PageMemo.SelText;
  35.              QryChr:= QryStr[1];
  36.              PageMemo.GetTextAttributes(PageMemo.SelStart, SelFontFormat);
  37.              end;
  38.        end;
  39.     QryCode:= UTF8Decode(QryStr);  // gets widestring value
  40.     //SendMessage(PageMemo.Handle, EM_HIDESELECTION, 0, 0); // show selection ** unnecessary
  41.     finally
  42.     PageMemo.Lines.EndUpdate;   // restore screen updates  // replaces EM_HIDESELECTION
  43.     end; // end try
  44.  
  45. PageMemo.SelStart:= StartPos;
  46. PageMemo.SelLength:= StartLength;
  47.  
  48. if (QryCode<>'')and(PageMemo.SelLength=0) then
  49.   begin
  50.   WC:=QryCode[1];
  51.   case wc of
  52.        #1424..#1535: SetHebrew;  // #$0590..#$05FF:
  53.        #1792..#1871: SetSyriac;  // #$0700..#$074F:
  54.        #7936..#8191: SetGreek;   // #$1F00..#$1FFF, // modern greek
  55.        #0880..#1023: SetGreek;   // #$0370..#$03FF: // ancient greek and coptic
  56.        #0033..#0126: SetEnglish; // #$0021..#$007E: // exclamation to tilde
  57.        //#$-8960..#$-8939: SetPhoenician; // #$10900..#$1091F // #67840..#67871 // #-10238, #-8960..#-8939
  58.        else begin
  59.             SetEnglish; // not in list
  60.             end;
  61.        (*
  62.        Since a Phoenician character occupies more than a single widechar,
  63.        you'll need to make your case statement more complex.
  64.  
  65.        !!! That double character messes up cursor count. Phoenician is a NoGo. !!!
  66.  
  67.        #$0590..#$05FF: SetHebrew;  // lang:='hebrew';
  68.        #$0700..#$074F: SetSyriac;  // lang:='syriac';
  69.        #$1F00..#$1FFF, // this is modern greek
  70.        #$0370..#$03FF: // this is greek and coptic
  71.                        SetGreek; // this is either
  72.        #$0020..#$007F: SetEnglish;
  73.        else SetEnglish;
  74.        *)
  75.        end;
  76.   end else if (PageMemo.SelLength=0) then SetEnglish;  // no text available or none selected
  77.  
  78. case W1 of
  79.      // non-visual and special keys
  80.      #0011: Str1:= 'Soft-Break';  // Generic Keys
  81.      #0013: Str1:= 'Hard-Break';
  82.      #0009: Str1:= 'Tab';
  83.      #0032: Str1:= 'Space';
  84.      #0160: Str1:= 'Lock-Space';
  85.      #8209: Str1:= 'Lock-Hyphen';
  86.      #8212: Str1:= 'Long-Dash';
  87.      #8226: Str1:= 'Bullet';
  88.  
  89.      // Hebrew Vowels
  90.      #1456: Str1:= 'Swa';
  91.      #1457: Str1:= 'Xtf-Sgl';
  92.      #1458: Str1:= 'Xtf-Ptx';
  93.      #1459: Str1:= 'Xtf-Qmz';
  94.      #1460: Str1:= 'Xrq';
  95.      #1461: Str1:= 'Zre';
  96.      #1462: Str1:= 'Sgl';
  97.      #1463: Str1:= 'Ptx';
  98.      #1464: Str1:= 'Qmz';
  99.      #1465: Str1:= 'Xlm';
  100.      #1466: Str1:= 'Xlm-Gdl';
  101.      #1467: Str1:= 'Qbz';
  102.      #1468: Str1:= 'Dgs';
  103.      #1469: Str1:= 'Mtg';
  104.  
  105.      // Syriac Vowels Western
  106.      #1840,#1841: Str1:= 'Ptx';
  107.      #1843,#1844: Str1:= 'Zqp | Qmz';
  108.      #1846,#1847: Str1:= 'Rbs | Zre';
  109.      #1850,#1851: Str1:= 'Xbz | Xrq';
  110.      #1853,#1854: Str1:= 'Eza | Qbz';
  111.  
  112.      // Syriac Vowels Eastern
  113.      #1842: Str1:= 'Ptx';   // a
  114.      #1845: Str1:= 'Zqp | Qmz';     // o
  115.      #1848: Str1:= 'Zlm | Rbs | Zre';   // e
  116.      #1849: Str1:= 'Zlm | Hbs | Xrq';   // i
  117.      #1855: Str1:= 'Rwa | Eza | Xlm';   // v
  118.      #1852: Str1:= 'Qbz';  // u
  119.  
  120.      // Syriac Accents
  121.      #1856: Str1:= 'Feminine Marker';   // @
  122.      #1857: Str1:= 'Hard Accent';   // #
  123.      #1858: Str1:= 'Soft Accent';   // $
  124.      #1863,#1864: Str1:= 'Silent Marker'; // ~ `
  125.  
  126.      // Greek Accents
  127.      #8189: Str1:= 'Acute Accent';
  128.      #8128: Str1:= 'Flex Accent';
  129.      #8175: Str1:= 'Grave Accent';
  130.      #8127: Str1:= 'Rough Accent';
  131.      #8190: Str1:= 'Smooth Accent';
  132.      #0900: Str1:= 'Tonos Accent';
  133.      end;  //else Str1:= 'U+'+IntToStr(Word(WC)); // +IntToHex(Word(WC),4); end;
  134.  
  135. //Str1:= 'U+'+IntToStr(Word(WC));
  136.  
  137. if (Str1<>'') then Str1:= ': '+Str1;
  138. CmdForm.caption:= CmdForm.caption+Str1; // needs single font with all languages
  139. end;
  140.  

...the WC and W1 values can be offset by the double characters as one character. The more Phoenician characters that you create, the greater the offset. Consequently, if you are polling the position, you get wrong results.

Is there anything to be done about that? It sounds complicated to me.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #13 on: April 24, 2017, 07:44:55 pm »
OK, erasing the \lang codes from the file does not eliminate the codes.

They are reconstructed immediately after loading a file.

But erasing them from the file does clean up the code's erratic behavior, and they are rebuilt in a more stable fashion. That is why Font Binding leaves the cleaned file alone.

I built a query function to retrieve the \lang parameters from any page position.

Code: Pascal  [Select]
  1. fillchar(cf, sizeof(cf), 0);
  2. cf.cbSize := sizeof( cf );    // cf.dwMask := CFM_LCID;   // CFM_LCID is unknown
  3. SendMessage(PageMemo.handle, EM_GETCHARFORMAT, SCF_SELECTION, lparam(@cf));
  4. Lng:= IntToStr(Loword(cf.lcid));  // language identifier
  5.  

As I step the cursor through a string of mixed languages it shows the following...

1.   Greek is not recognized. The \lang code is 1033, which is English.
2.   Hebrew envelopes spaces and tabs... everything is 1037, which is Hebrew.
3.   Syriac, which is 1025, does not envelope spaces and tabs... they are 1033, English.
4.   When changing to English, in a mixed text, the code sets to Hebrew, 1037, but types English.
5.   Item 4 (above) is only noticed if you type parenthesis. They print backwards.
6.   With item 4 (above) typing a space, dash or period, can make the cursor jump around.

My conclusion is that the \lang codes are dysfunctional, and need to be revised.

One method that might be feasible (if the codes are internal to RichEdit and can't be fixed) is to activate a query from start to stop of the document.

Using a similar method to the code section that I posted above, it could detect when a language change has occurred, then use that as a selection string for a SendMessage(PageMemo.handle, EM_SETCHARFORMAT, SCF_SELECTION, lparam(@cf)). By doing so, it would assign the true \lang code for each run of a specific language.

Then the erasing of \lang codes in the file would be both unnecessary and counterproductive.

The question that I have is... Will the RichEdit driver permit me to manually alter the \lang codes? Or will it reassert its own method for inserting them... sort of like Font Binding, and it might actually be a part of the Font Binding method.

Rick

Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Tofu Returned
« Reply #14 on: April 24, 2017, 08:05:32 pm »
The previous method that I just cited would only be necessary for cleaning up a garbaged file. A new file might be controlled by using the SendMessage(PageMemo.handle, EM_GETCHARFORMAT, SCF_SELECTION, lparam(@cf)) at the same instance with setting the language mode and font.

Perhaps the driver would not need to assert its own system for language selection. Likewise it should restrain any unwanted Font Binding.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit