Recent

Author Topic: Reading UTF8 strings from a file  (Read 1639 times)

Nitorami

  • Sr. Member
  • ****
  • Posts: 417
Reading UTF8 strings from a file
« on: January 24, 2022, 03:03:27 pm »
Hi all
The attached file is a raw file generated by LTSpice, an electronic circuit simulator. The file contains a header using UTF8 characters, followed by the binary data after the word "Binary:".

I simply need to read a fews keywords and information from the header, but I do not know how to do this efficiently, because I am unexperienced with UTF strings. With old style 8-bit ASCII strings, I would simply use readln (), but how to do this with UTF8? I tried TFileStream.ReadAnsiString but it does not work because the characters seem to be stored as raw UTF8chars without length encoding. Can someone give me a bit of starting aid please?

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Reading UTF8 strings from a file
« Reply #1 on: January 24, 2022, 04:22:42 pm »
The file is UTF16, not UTF8.

wp

  • Hero Member
  • *****
  • Posts: 9588
Re: Reading UTF8 strings from a file
« Reply #2 on: January 24, 2022, 05:15:16 pm »
Tested this to work:

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var
  3.   stream: TFileStream;
  4.   buffer: UnicodeString = '';
  5.   search: UnicodeString = 'Binary:';
  6.   p: Integer;
  7.   s: String = '';                
  8. begin
  9.   stream := TFileStream.Create('d:\Download\test_circuit.raw', fmOpenRead);
  10.   try
  11.     // Read the file into a single UFT-16 string. The length of the string is
  12.     // half of the file size (2 bytes per UTF-16 "character").
  13.     SetLength(buffer, stream.Size div SizeOf(wideChar));
  14.     stream.Read(buffer[1], stream.Size);
  15.     // Find the position of the unicode word "Binary:"
  16.     p := pos(search, buffer);
  17.     if p > 0 then
  18.       // Get the string up to the found position and convert to UTF-8...
  19.       s := Copy(buffer, 1, p-1)
  20.     else
  21.       s := '(not found)';
  22.     // ... and assign to a memo
  23.     Memo1.Lines.Text := s;
  24.   finally
  25.     stream.Free;
  26.   end;
  27. end;
Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

Nitorami

  • Sr. Member
  • ****
  • Posts: 417
Re: Reading UTF8 strings from a file
« Reply #3 on: January 24, 2022, 05:54:39 pm »
Thanks WP, that works.

I am surprised that the old pos() function works on widestrings as well, while readln() doesn't. I had started myself to scan through the myriads of functions in unit strutils, and found a few which are similar to pos() - e.g. "containsstr","searchbuf", but if pos() works, I will use that.

I understand I cannot read the file line by line using readln, correct ? I would have to write my own readline(), e.g. in a TFileStream descendant. The function would read the file into a buffer word by word until it finds (widestring) EOL, and then copy the buffer to a string, or stringlist. I wonder why such a function does not seem to exist already.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 917
Re: Reading UTF8 strings from a file
« Reply #4 on: February 18, 2022, 04:34:03 pm »
The Text property of a Delphi-compatible TStringList would do that.

 

TinyPortal © 2005-2018