Recent

Author Topic: Any RichMemo example projects?  (Read 2384 times)

wp

  • Hero Member
  • *****
  • Posts: 12907
Re: Any RichMemo example projects?
« Reply #15 on: July 12, 2023, 06:02:47 pm »
Thank you for the file.

The first string in the rtf, as an example, is 'b0\'a1\'bc\'b3\'b6\'f3\'b9\'ab\'b3\'d7 which corresponds to the Pascal string #$b0#$a1#$bc#$b3#$b6#$f3#$b9#$ab#$b3#$d7. The following code snippet converts this entire Pascal string from CP949 and displays text in the form's Caption which looks Korean to me:

Code: Pascal  [Select][+][-]
  1. uses
  2.   LConvEncoding;
  3.  
  4. procedure TForm1.Button1Click(Sender: TObject);
  5. var
  6.   s: RawByteString = #$b0#$a1#$bc#$b3#$b6#$f3#$b9#$ab#$b3#$d7;
  7. begin
  8.   Caption := ConvertEncoding(s, 'cp949', '');
  9. end;

But when I select only the 1st character (#$b0) the output is empty. When I add the second character to become #$b0#$a1 I get the first Korean character. Adding more and more characters of the input string, I see that the string does not change with every second character added:

Code: Pascal  [Select][+][-]
  1. uses
  2.   LConvEncoding;
  3. procedure TForm1.Button1Click(Sender: TObject);
  4. var
  5.   s: RawByteString = #$b0#$a1#$bc#$b3#$b6#$f3#$b9#$ab#$b3#$d7;
  6.   part: RawByteString;
  7.   i: Integer;
  8. begin
  9.   part := '';
  10.   for i := 1 to Length(s) do
  11.   begin
  12.     part := part + s[i];
  13.     Memo1.Lines.Add(ConvertEncoding(part, 'cp949', ''));
  14.   end;
  15. end;
So it seems that characters in a Korean codepage ansi text can only converted in pairs. Is this correct?

Your patch does not convert the incoming characters on the fly like I did in my code, your code converts the entire collected string at the end. But the RTF can also contain "real" unicode characters indicated by the '\u' key - and now we have a complicated situation because this part of the string is already UTF8, and the rest is at cp949. The overall effect is that your patch destroys the encoding of the unicode characters in the UTF8. Konvert the test file in my previous post by using your code and you'll see that the Cyrillic characters are gone.

[EDIT]
Found a way to collect all ansi characters in a internal FAnsiText which is converted to UTF8 at the end of a paragraph or table cell, or when a Unicode character comes in. See attached modified urtf2html unit.

« Last Edit: July 12, 2023, 06:41:43 pm by wp »

egsuh

  • Hero Member
  • *****
  • Posts: 1622
Re: Any RichMemo example projects?
« Reply #16 on: July 12, 2023, 07:36:36 pm »
Quote
So it seems that characters in a Korean codepage ansi text can only converted in pairs. Is this correct?

Yes, this is correct. Two-bytes constitute one character in ANSI encoding. I think many Asian characters have similar structure. 

Quote
The overall effect is that your patch destroys the encoding of the unicode characters in the UTF8. Konvert the test file in my previous post by using your code and you'll see that the Cyrillic characters are gone.

I do not have such a deep understanding as to write a patch. I have only shallow knowledge, and I just want a quick result for my own application rather than a patch for generic solution. Never thought that genuine unicode characters may co-exist with ANSI coded characters.

Quote
Found a way to collect all ansi characters in a internal FAnsiText which is converted to UTF8 at the end of a paragraph or table cell, or when a Unicode character comes in. See attached modified urtf2html unit.

Well, this is the approach I originally thought --- rtf text could be parsed into paragraph portions, and each portion could be converted based on the encodings. But I do not understand the whole parsing structure.

I'll try with your new unit. Thank you for your efforts again.

egsuh

  • Hero Member
  • *****
  • Posts: 1622
Re: Any RichMemo example projects?
« Reply #17 on: July 12, 2023, 07:45:03 pm »
Quote
[EDIT]
Found a way to collect all ansi characters in a internal FAnsiText which is converted to UTF8 at the end of a paragraph or table cell, or when a Unicode character comes in. See attached modified urtf2html unit.

I tested this new unit, and it runs OK without any modifications. I'll do more tests, and post the results.

egsuh

  • Hero Member
  • *****
  • Posts: 1622
Re: Any RichMemo example projects?
« Reply #18 on: July 13, 2023, 07:44:02 am »
BTW, I think we had better an function that outputs in string format. Following is copied and modified from

procedure TRtf2HtmlConverter.ConvertToHtml(AStream: TStream; ATitle: String);


Code: Pascal  [Select][+][-]
  1. function TRtf2HtmlConverter.HTMLText(ATitle: string): string;
  2. begin
  3.   FTitle := ATitle;
  4.   FActiveDestination := rtfDefault;
  5.  
  6.   FOutput.Clear;
  7.  
  8.   WriteHtmlHeader;
  9.   FParser.StartReading;
  10.   WriteHtmlFooter;
  11.   WriteDefaultFont;
  12.  
  13.   Result := FOutput.Text;
  14. end;
  15.  
  16.  

wp

  • Hero Member
  • *****
  • Posts: 12907
Re: Any RichMemo example projects?
« Reply #19 on: July 13, 2023, 12:43:50 pm »
Thanks for the idea.

Now I put everything on my github: https://github.com/wp-xyz/rtf2html

 

TinyPortal © 2005-2018