Thank you for the file.
The first string in the rtf, as an example, is 'b0\'a1\'bc\'b3\'b6\'f3\'b9\'ab\'b3\'d7 which corresponds to the Pascal string #$b0#$a1#$bc#$b3#$b6#$f3#$b9#$ab#$b3#$d7. The following code snippet converts this entire Pascal string from CP949 and displays text in the form's Caption which looks Korean to me:
uses
LConvEncoding;
procedure TForm1.Button1Click(Sender: TObject);
var
s: RawByteString = #$b0#$a1#$bc#$b3#$b6#$f3#$b9#$ab#$b3#$d7;
begin
Caption := ConvertEncoding(s, 'cp949', '');
end;
But when I select only the 1st character (#$b0) the output is empty. When I add the second character to become #$b0#$a1 I get the first Korean character. Adding more and more characters of the input string, I see that the string does not change with every second character added:
uses
LConvEncoding;
procedure TForm1.Button1Click(Sender: TObject);
var
s: RawByteString = #$b0#$a1#$bc#$b3#$b6#$f3#$b9#$ab#$b3#$d7;
part: RawByteString;
i: Integer;
begin
part := '';
for i := 1 to Length(s) do
begin
part := part + s[i];
Memo1.Lines.Add(ConvertEncoding(part, 'cp949', ''));
end;
end;
So it seems that characters in a Korean codepage ansi text can only converted in pairs. Is this correct?
Your patch does not convert the incoming characters on the fly like I did in my code, your code converts the entire collected string at the end. But the RTF can also contain "real" unicode characters indicated by the '\u' key - and now we have a complicated situation because this part of the string is already UTF8, and the rest is at cp949. The overall effect is that your patch destroys the encoding of the unicode characters in the UTF8. Konvert the test file in my previous post by using your code and you'll see that the Cyrillic characters are gone.
[EDIT]
Found a way to collect all ansi characters in a internal FAnsiText which is converted to UTF8 at the end of a paragraph or table cell, or when a Unicode character comes in. See attached modified urtf2html unit.