Recent

Author Topic: Clear example of loading and saving files with UTF8 encoding  (Read 9877 times)

EganSolo

  • Sr. Member
  • ****
  • Posts: 290
Clear example of loading and saving files with UTF8 encoding
« on: November 28, 2015, 08:31:21 am »
Using the latest version of Laz #1.4.4 with FPC 2.6.4 on Win 10.
Could somebody please pretty please write a simple example of loading and saving a file with UTF8 encoding?

Here's what I do and somehow it's not working...
Code: Pascal  [Select][+][-]
  1. Implementation
  2. uses charencstreams;
  3.  
  4. procedure LoadUnicodeText(const aFileName: String; const aList: TStringList);
  5. var f: TCharEncStream;
  6. begin
  7.   f := TCharEncStream.Create;
  8.   f.UniStreamType := ufUtf8;
  9.   f.LoadFromFile(aFileName);
  10.   aList.Clear;
  11.   aList.Text := f.UTF8Text;
  12.   FreeAndNil(f);
  13. end;
  14.  
  15. procedure SaveUnicodeText(const aList: TStringList; const aFileName: String);
  16. var f : TCharEncStream;
  17. begin
  18.   f := TCharEncStream.Create;
  19.   f.UniStreamType :=  ufUtf8;
  20.   f.UTF8Text := aList.Text;
  21.   f.SaveToFile(aFileName);
  22.   FreeAndNil(f);
  23. end;
  24.  

I've read quite a few posts on the support of UTF8 in FPC 3.0 but the current version of Laz is not bundled with FPC 3.0 so I'm not clear on what steps to take to make this works....

Thanks.



Deepaak

  • Sr. Member
  • ****
  • Posts: 454
Re: Clear example of loading and saving files with UTF8 encoding
« Reply #1 on: November 28, 2015, 08:59:12 am »
Code: Pascal  [Select][+][-]
  1. uses LazUTF8Classes;
  2.  
  3. TStringListUTF8.LoadFromFile();
  4. TStringListUTF8.SaveToFile();    
  5.  

try this..
Holiday season is online now. :-)

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Clear example of loading and saving files with UTF8 encoding
« Reply #2 on: November 28, 2015, 02:18:50 pm »
From the example I think you want the filec-contents to be UTF8 encode (not the filename)?

Note that inside Lazarus all strings inside the LCL are in UTF8 encoding.
So if you have e.g. a TMemo then a TMemo.Lines.SaveToFile(AFilename) will save the contents encoded as UTF8 in the file with name AFilename.

So, for these scenario's there is nothing to do at all.

If you load a file with another system encoding on Windows (CP1252 in Wetsren Europena countries), where each character is one byte you can do this:

Code: [Select]
var
  SL: TStringList;
begin
  SL := TStringList.Create;
  SL.LoadFromFile('test.txt');
  SL.Text := SysToUtf8(SL.Text);
  SL.SaveToFile('test-utf8.txt');
  SL.Free;
end;

If you do not knwo the encoding of the inputfile, things get harder.
You can use the LConvEncoding unit to guess the encoding, and then use one of the conversion routines in that unit.

If the inputfile is UTF-16, I don't have an easy answer, but other probably have.

Bart

EganSolo

  • Sr. Member
  • ****
  • Posts: 290
Re: Clear example of loading and saving files with UTF8 encoding
« Reply #3 on: November 29, 2015, 12:11:57 am »
Alright, I've solved my problem thanks to Bart's response  :D

I'd like to give a bit more context in case someone else finds this issue as confusing as I did.

Task: Given two files, One UTF8 encoded and a second Ansi (Windows) encoded, add strings from the second to the first without changing its encoding type.

My problem was that after adding the strings and saving the second file, it would automatically revert to ANSI (windows) encoding. Incidentally, if someone is wondering how I can tell what the file encoding is, I use psPAd which indicates the encoding on its status bar.

All along I thought that Tstringlist.SaveToFile was the culprit, but thanks to Bart's answer, I realized that my error had been in copying from the ascii file without changing its content to UTF8.

Here is a bit of code that illustrates how to make this work. This code is didactic and for illustration purposes. Those who understand it already would find it a bit simplistic but that's its purpose. Please note that I am not adding anything to what Bart had told me in his reply; I'm rehashing it with a bit more details for those of us who are new to these concepts in Lazarus. Hopefully, this might help someone else to better understand what Bart wrote.

Code: Pascal  [Select][+][-]
  1. Function LoadUTF8File(const aFileName: String);
  2. //Loads a utf8 encoded file. Please be sure the file is utf8 encoded
  3. begin
  4.     Result := TStringList.Create; // TStringList by default expects a utf8 encoded file.
  5.     If FileExists(aFileName)
  6.     then Result.LoadFromFile(aFileName);
  7. end;
  8.  
  9. Function LoadAsciiFile(const aFileName: String);
  10. //Loads a windows native (code page ASCII (Windows)) file
  11. begin
  12.    //Now this code is a bit counter-intuitive. If TstringList expects a utf8 encoded file
  13.    //wouldn't it mess things up if we were to give it an ascii file instead?
  14.    //Apparently not. For those of you who are coming from Delphi, I think it is a bit
  15.    //easier to understand why the Lazarus Team did not equip TStringList with an
  16.    //explicit LoadFromFile(const aFileName: String; const aCodeBase...) is because
  17.   //this situation of having to convert between UTF8 and Ascii is (I think) specific to
  18.   //Windows. It does not occur under linux. Having said that, please be aware that
  19.   //if you have to do this sort of transformation of a windows file under linux, my sample
  20.   //code here might not work
  21.    Result := TStringList.Create;
  22.    //We load the file as we did in the prior case...
  23.    If FileExists(aFileName)
  24.    then Result.LoadFromFile(aFileName);
  25.    //Now a bit of magic. Make sure though that FileUtil is in the use statement. If you
  26.    //are writing a console program, you need to add a package containing FileUtil, such
  27.    //as LCL. LCLBase might work to but I haven't tried it.
  28.    Result.Text := systoUTF8(Result.Text);
  29.    //What that previous line does is to convert a string from the system default code page,
  30.    //which in our case is ASCII Windows into UTF8. Now, FileUtil contains additional UTF8 methods
  31.    //that you might be interested in.
  32.  end;
  33.  
  34. //Now By opening the UTF8 and Ascii Windows file with these two methods, you achieve parity: they are both UTF8 encoded. You can then
  35. //perform string manipulation by using Pos, PosEx, Copy etc to move the content of one file into another. Then to save the UTF8 encoded file, simply use TStringList.SaveToFile. This operation produces a UTF8 encoded file.
  36.  

 

TinyPortal © 2005-2018