Forum > LCL

Clear example of loading and saving files with UTF8 encoding

(1/2) > >>

EganSolo:
Using the latest version of Laz #1.4.4 with FPC 2.6.4 on Win 10.
Could somebody please pretty please write a simple example of loading and saving a file with UTF8 encoding?

Here's what I do and somehow it's not working...

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---Implementationuses charencstreams; procedure LoadUnicodeText(const aFileName: String; const aList: TStringList);var f: TCharEncStream;begin  f := TCharEncStream.Create;  f.UniStreamType := ufUtf8;  f.LoadFromFile(aFileName);  aList.Clear;  aList.Text := f.UTF8Text;  FreeAndNil(f);end; procedure SaveUnicodeText(const aList: TStringList; const aFileName: String);var f : TCharEncStream;begin  f := TCharEncStream.Create;  f.UniStreamType :=  ufUtf8;  f.UTF8Text := aList.Text;  f.SaveToFile(aFileName);  FreeAndNil(f);end; 
I've read quite a few posts on the support of UTF8 in FPC 3.0 but the current version of Laz is not bundled with FPC 3.0 so I'm not clear on what steps to take to make this works....

Thanks.


Deepaak:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---uses LazUTF8Classes; TStringListUTF8.LoadFromFile();TStringListUTF8.SaveToFile();      
try this..

Bart:
From the example I think you want the filec-contents to be UTF8 encode (not the filename)?

Note that inside Lazarus all strings inside the LCL are in UTF8 encoding.
So if you have e.g. a TMemo then a TMemo.Lines.SaveToFile(AFilename) will save the contents encoded as UTF8 in the file with name AFilename.

So, for these scenario's there is nothing to do at all.

If you load a file with another system encoding on Windows (CP1252 in Wetsren Europena countries), where each character is one byte you can do this:


--- Code: ---var
  SL: TStringList;
begin
  SL := TStringList.Create;
  SL.LoadFromFile('test.txt');
  SL.Text := SysToUtf8(SL.Text);
  SL.SaveToFile('test-utf8.txt');
  SL.Free;
end;

--- End code ---

If you do not knwo the encoding of the inputfile, things get harder.
You can use the LConvEncoding unit to guess the encoding, and then use one of the conversion routines in that unit.

If the inputfile is UTF-16, I don't have an easy answer, but other probably have.

Bart

EganSolo:
Alright, I've solved my problem thanks to Bart's response  :D

I'd like to give a bit more context in case someone else finds this issue as confusing as I did.

Task: Given two files, One UTF8 encoded and a second Ansi (Windows) encoded, add strings from the second to the first without changing its encoding type.

My problem was that after adding the strings and saving the second file, it would automatically revert to ANSI (windows) encoding. Incidentally, if someone is wondering how I can tell what the file encoding is, I use psPAd which indicates the encoding on its status bar.

All along I thought that Tstringlist.SaveToFile was the culprit, but thanks to Bart's answer, I realized that my error had been in copying from the ascii file without changing its content to UTF8.

Here is a bit of code that illustrates how to make this work. This code is didactic and for illustration purposes. Those who understand it already would find it a bit simplistic but that's its purpose. Please note that I am not adding anything to what Bart had told me in his reply; I'm rehashing it with a bit more details for those of us who are new to these concepts in Lazarus. Hopefully, this might help someone else to better understand what Bart wrote.


--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---Function LoadUTF8File(const aFileName: String);//Loads a utf8 encoded file. Please be sure the file is utf8 encodedbegin    Result := TStringList.Create; // TStringList by default expects a utf8 encoded file.     If FileExists(aFileName)    then Result.LoadFromFile(aFileName);end;  Function LoadAsciiFile(const aFileName: String);//Loads a windows native (code page ASCII (Windows)) filebegin   //Now this code is a bit counter-intuitive. If TstringList expects a utf8 encoded file   //wouldn't it mess things up if we were to give it an ascii file instead?   //Apparently not. For those of you who are coming from Delphi, I think it is a bit    //easier to understand why the Lazarus Team did not equip TStringList with an   //explicit LoadFromFile(const aFileName: String; const aCodeBase...) is because  //this situation of having to convert between UTF8 and Ascii is (I think) specific to   //Windows. It does not occur under linux. Having said that, please be aware that  //if you have to do this sort of transformation of a windows file under linux, my sample  //code here might not work   Result := TStringList.Create;   //We load the file as we did in the prior case...   If FileExists(aFileName)   then Result.LoadFromFile(aFileName);   //Now a bit of magic. Make sure though that FileUtil is in the use statement. If you    //are writing a console program, you need to add a package containing FileUtil, such    //as LCL. LCLBase might work to but I haven't tried it.   Result.Text := systoUTF8(Result.Text);   //What that previous line does is to convert a string from the system default code page,    //which in our case is ASCII Windows into UTF8. Now, FileUtil contains additional UTF8 methods   //that you might be interested in.  end; //Now By opening the UTF8 and Ascii Windows file with these two methods, you achieve parity: they are both UTF8 encoded. You can then//perform string manipulation by using Pos, PosEx, Copy etc to move the content of one file into another. Then to save the UTF8 encoded file, simply use TStringList.SaveToFile. This operation produces a UTF8 encoded file. 

d2010:

--- Quote from: Deepaak on November 28, 2015, 08:59:12 am ---
--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---uses LazUTF8Classes; TStringListUTF8.LoadFromFile();TStringListUTF8.SaveToFile();      
try this..

--- End quote ---
%)
I update this topic, because inside a new4.03version of Lazarus
I got cyclic errors.
After I try "ProjectInspector" , then I got same error.
I fixed, I rename Utf8classString.


--- Code: ---cozipinc.lpr(605,33) Error: Identifier not found "TStringListUTF8"

--- End code ---
Thank you , I fixed, I Solved the question.

Navigation

[0] Message Index

[#] Next page

Go to full version