- WriteLn does not write correctly for widestring little endian, so I used Write only
I suspect that WriteLn() with no parameter is a major problem since without a hint associated with the file/stream it won't know how to encode the EOL.
I did think about it. I even wrote a function to check it. First before WriteLn writes the EOL it check the UTF16 data is big or little endian. Then it writes the proper EOL. The pseudocode is something like this
WriteLnUTF16(var F: TextFile; Buffer: WideString);
var
Index, Count: Integer;
isLittleEndian: Boolean;
CRLF: string;
begin
Count := 0;
for Index := 1 to Length(Buffer) do
if not(Odd(Index)) and (Buffer[Index] = #0) then
Inc(Count);
isLittleEndian := Count > (Length(Buffer) div 4;
case isLittleEndlian of
True: CRLF := #13#0#10#0;
False: CRLF := #0#13#0#10;
// .... do the writing
end;Although the solution isn't very accurate, but it should work for most cases, because:
It is also reliable to detect endianness by looking for null bytes, on the assumption that characters less than U+0100 are very common. If more even bytes (starting at 0) are null, then it is big-endian.
Source:
https://en.wikipedia.org/wiki/UTF-16#Byte_order_encoding_schemesUnfortunately the code didn't work (on OP's Sample.txt). Because there were issues about ReadLn. Other possible solution is to introduce new WriteLn with an Endian parameter.
I said ReadLn has issues, here a simple test. The Sample.txt has 4 lines, you can know it by opening the file using a supported text editor (Writer, Pluma, etc). But if you use this code below, the ReadLn parses the file as 9 lines:
procedure TForm1.Button1Click(Sender: TObject);
var
Buffer: WideString;
inText: Text;
Count: Integer;
begin
AssignFile(inText, 'Sample.txt');
Reset(inText);
Count := 0;
while not Eof(inText) do begin
ReadLn(inText, Buffer);
Inc(Count);
end;
ShowMessage(Count.ToString);
CloseFile(inText)
end;
So, I am sure to say there are bugs in the ReadLn. I heard FreeBASIC has both UTF-16BE and UTF-16LE support. And what about Delphi? Anybody here use FreeBASIC or Delphi? Can
you anyone please test the case on them?
Hi Handoko,
Finally, I again adapted your code to a GUI version, the only difference is that when the form is loaded, it runs your procedure (and push the output to a memo instead of the console output). Attached sample3.txt which contains garbage characters instead of the accentuated ones, memo output is corrupted too (only "��B" is displayed). This is my main concern about what can explain such difference with same source code?
I tried to open it using Writer, it show too many garbage characters. Tried to open it using online file viewer, most of them refuse to open it. As you said it contains garbage characters, so what result did you hope?
Garbage in, garbage out - the basic theory any programmer should know.
https://en.wikipedia.org/wiki/Garbage_in%2C_garbage_out