Forum > Beginners

to read an UTF8 text file

(1/3) > >>

jormik:
I can't read correctly characters from a text file encoded in UTF8. I have made many attemps with the functions of FCP (for example with LazUTF8) but without result.
Must I write a personal function to manage  the variable number of bytes of the UTF8 encoding?
Thanks.

skalogryz:
I think you're reading the file just fine. The problem with outputting the results.

What are you trying to achieve?

jormik:
I try to transform my old Firebird+Delphi6 project into a new UTF8 Firebird+Lazarus project. All is gone fine, but not the read of the textfiles.

I must scan textfiles (once ANSI, now UTF8), char by char, and then build appropriate strings to populate the db. The problem is in the variable number of bytes of UTF8, that implies a procedure. LazUTF8 makes this job, but not for files, only for obtain, for example, code points of Unicode (that I use in other parts of project).

skalogryz:
hmm... could you please provide an example?

jormik:
This is the skeleton of a procedure that inserts word into a table of the database.

============================================
procedure example;
var
  t: integer;
  doc: TextFile;
  character: char;
  word: string;
begin
  OpenDialog.Execute;
  AssignFile(doc, OpenDialog.FileName);
  reset(doc);
  word := '';
  for  t := 1 to 100 do
    begin
      read(doc, character);
      if character = ' ' then
        begin
          // write word in the database
          word := ''
        end
      else
         word := word + carattere;
      end;
  CloseFile(doc);
end;
============================================

It works fine with ANSI textfiles, where a character corresponds always to a byte. How can I obtain the same result with UTF8 textfiles, where NOT always a character corresponds to a byte?

Navigation

[0] Message Index

[#] Next page

Go to full version