Recent

Author Topic: Converting File Stream From UTF-16 LE to UTF-8 By UTF8Tools  (Read 6731 times)

Silvernine

  • New member
  • *
  • Posts: 8
Converting File Stream From UTF-16 LE to UTF-8 By UTF8Tools
« on: November 28, 2015, 02:15:38 pm »
Hello! So I'm generally new to programming so I have been messing around with FPC and Lazarus for the last week or so. So I have this code that is to read a text file that is encoded in UTF-16 LE and then I need to convert it to UTF-8 and then save it back out to the same file. Basically convert the file to a different encoding. My goal was to read two separate lines from the file into two separate text fields. Since I had a lot of issues mostly due to my coding skills, my current method is very bad so I wanted to find a better solution but I'm having some issue. I'm also using the newest utf8tools library by Leo. The old code is right now is this:
Code: Pascal  [Select][+][-]
  1.       Fs := TCharEncStream.Create;
  2.       Fs.LoadFromFile('Start.txt');
  3.       StartupParams := Fs.UTF8Text;
  4.       Fs.Free;
  5.       AssignFile(StartupParamsFile,'Start.txt');
  6.       ReWrite(StartupParamsFile);
  7.       WriteLn(StartupParamsFile,StartupParams);
  8.       Reset(StartupParamsFile);
  9.       ReadLn(StartupParamsFile,StartupParams); //Read 1st line of file
  10.       Main.SourceDirSelect.Text := UTF8Encode(StartupParams); //1st line of file -> Source field
  11.       ReadLn(StartupParamsFile,StartupParams); //Read 2nd line of file
  12.       Main.DestinationDirSelect.Text := UTF8Encode(StartupParams); //2nd line of file -> Destination field
This works so it does what I want it to do. However I want to improve upon it further. Since utf8tools have very little documentation, I tried looking at the example but I don't think I full understand how it works. I tried to improve it by:
Code: Pascal  [Select][+][-]
  1.       Fs := TCharEncStream.Create;
  2.       Fs.LoadFromFile('Start.txt');
  3.       Fs.UniStreamType := TUniStreamTypes(ufUtf8);
  4.       Fs.Free;
  5.       AssignFile(StartupParamsFile,'Start.txt');
  6.       Reset(StartupParamsFile);
  7.       ReadLn(StartupParamsFile,StartupParams); //Read 1st line of file
  8.       Main.SourceDirSelect.Text := UTF8Encode(StartupParams); //1st line of file -> Source field
  9.       ReadLn(StartupParamsFile,StartupParams); //Read 2nd line of file
  10.       Main.DestinationDirSelect.Text := UTF8Encode(StartupParams); //2nd line of file -> Destination field
and various other ways with no luck at all. It actually does nothing at all. Do you have any suggestions or how I should approach this in a good way? (And... my code is probably very bad isn't it? Not very elegant at all...)

balazsszekely

  • Guest
Re: Converting File Stream From UTF-16 LE to UTF-8 By UTF8Tools
« Reply #1 on: November 28, 2015, 02:53:16 pm »
Try this:
Code: Pascal  [Select][+][-]
  1. uses LazUTF8;
  2. //...
  3. var
  4.   MS: TMemoryStream;
  5.   S: String;
  6. begin
  7.   MS := TMemoryStream.Create;
  8.   try
  9.     MS.LoadFromFile('c:\text.txt');
  10.     MS.Position := 0;
  11.     S := UTF16ToUTF8(PWideChar(MS.Memory), MS.Size div SizeOf(WideChar));
  12.     ShowMessage(S);
  13.   finally
  14.     MS.Free;
  15.   end;  
  16. end;

After you process S, you can save it back with a similar method or assign it to a StringList.Text and use StringList.SaveToFile. With FPC 3.0.0 + there are many more options.



Silvernine

  • New member
  • *
  • Posts: 8
Re: Converting File Stream From UTF-16 LE to UTF-8 By UTF8Tools
« Reply #2 on: November 28, 2015, 10:24:40 pm »
Thank you very much! The final resulting code is now:
Code: Pascal  [Select][+][-]
  1.     begin
  2.       MS := TMemoryStream.Create;
  3.       try
  4.         MS.LoadFromFile('Start.txt');
  5.         MS.Position := 0;
  6.         S := UTF16ToUTF8(PWideChar(MS.Memory), MS.Size div SizeOf(WideChar));
  7.         fsOut := TFileStream.Create('Start.txt', fmCreate);
  8.         fsOut.Write(S[1], Length(S));
  9.       finally
  10.         MS.Free;
  11.         fsOut.Free;
  12.         Line.Free;
  13.       end;
  14.       Line := TStringList.Create;
  15.       Line.LoadFromFile('Start.txt');
  16.       Main.SourceDirSelect.Text := Line[0];
  17.       Main.DestinationDirSelect.Text := Line[1];
  18.     end;
Which is a lot better I think. Thanks!

balazsszekely

  • Guest
Re: Converting File Stream From UTF-16 LE to UTF-8 By UTF8Tools
« Reply #3 on: November 28, 2015, 11:14:55 pm »
Quote
@ Silvernine
Thank you very much! The final resulting code is now:
You're welcome. Try this instead:
Code: Pascal  [Select][+][-]
  1. var
  2.   SL: TStringList;
  3. begin
  4.   MS := TMemoryStream.Create;
  5.   try
  6.     MS.LoadFromFile('Start.txt');
  7.     MS.Position := 0;
  8.     S := UTF16ToUTF8(PWideChar(MS.Memory), MS.Size div SizeOf(WideChar));
  9.     SL := TStringList.Create;
  10.     try
  11.       SL.Text := S;
  12.       Main.SourceDirSelect.Text := SL[0];
  13.       Main.DestinationDirSelect.Text := SL[1];
  14.       // SL.SaveToFile('c:\test1.txt'); //optional
  15.     finally
  16.       SL.Free;
  17.     end
  18.   finally
  19.     MS.Free;
  20.   end;
  21. end;

Silvernine

  • New member
  • *
  • Posts: 8
Re: Converting File Stream From UTF-16 LE to UTF-8 By UTF8Tools
« Reply #4 on: November 29, 2015, 02:51:17 am »
Oh! That's amazing! Thank you! It also showed exactly what went wrong with my attempts before. I initially initialize a TStringList and attempted to set it to S which did not work due to incompatible types. Now I understand that the TStringList also has the Text property that I had missed which would have solved my problem. At this point, I don't even need to save the file anymore and simply do what I need with each lines. That's amazing! :)

I do have one more question though. UTF16ToUTF8 is kind of odd to me. So normally UTF8 does not recommend the use of a BOM right but in this case, the UTF16ToUTF8 does indeed create a BOM which is kind of funny. So that's completely intended? I tried to look more info into that function and found http://lazarus-ccr.sourceforge.net/docs/lcl/lclproc/utf16toutf8.html so the only thing I know about it is that it's a deprecated function (but at least it's still can be used so that's OK).
« Last Edit: November 29, 2015, 10:55:41 am by Silvernine »

balazsszekely

  • Guest
Re: Converting File Stream From UTF-16 LE to UTF-8 By UTF8Tools
« Reply #5 on: November 29, 2015, 08:49:12 am »
Hi Silvernine,

I'm glad it's working!

Quote
I tried to look more info into that function and found http://lazarus-ccr.sourceforge.net/docs/lcl/lclproc/utf16toutf8.html so the only thing I know about it is that it's a deprecated function (but at least it's still can be used so that's OK).
The function inside "lclproc.pas" is deprecated. According to the link, it's recommended to use function(s) from LazUtils.LazUTF8, which is exactly what we did(see: uses LazUTF8).

Quote
I do have one more question though. UTF16ToUTF8 is kind of odd to me. So normally UTF8 does not recommend the use of a BOM right but in this case, the UTF16ToUTF8 does indeed creates a BOM which is kind of funny.
Well you're right, UTF8 with BOM it's kinda useless, it won't hurt you though. Anyway just to put your mind at ease:   
Code: Pascal  [Select][+][-]
  1. uses LazUTF8, LConvEncoding;
  2.  
  3. procedure TForm1.Button1Click(Sender: TObject);
  4. var
  5.   SL: TStringList;
  6.   MS: TMemoryStream;
  7.   S: String;
  8. begin
  9.   MS := TMemoryStream.Create;
  10.   try
  11.     MS.LoadFromFile('c:\Start.txt');
  12.     MS.Position := 0;
  13.     S := UTF16ToUTF8(PWideChar(MS.Memory), MS.Size div SizeOf(WideChar));
  14.     ShowMessage(GuessEncoding(S)); //UTF8 with BOM
  15.     S := UTF8BOMToUTF8(S);  
  16.     ShowMessage(GuessEncoding(S)); //UTF8 without BOM
  17.     SL := TStringList.Create;
  18.     try
  19.       SL.Text := S;
  20.       SL.SaveToFile('c:\Start_UTF8.txt'); //check with Notepad++
  21.     finally
  22.       SL.Free;
  23.     end
  24.   finally
  25.     MS.Free;
  26.   end;
  27. end;  
  28.  


regards,
GetMem
« Last Edit: November 29, 2015, 09:11:23 am by GetMem »

Silvernine

  • New member
  • *
  • Posts: 8
Re: Converting File Stream From UTF-16 LE to UTF-8 By UTF8Tools
« Reply #6 on: November 29, 2015, 11:07:43 am »
 :D Thank you! That completely answered all my questions and I learned quite a lot too! So that function really was intended but there was another function that allows you to convert from UTF8Bom to one without BOM. Very interesting although funnily it's in a different library haha. Anyway I guess I should start writing personal notes for all this. So many "quirks" that I must remember in case I worked on something again where Unicode is important. Especially which library a conversion function is from (UTF16ToUTF8 and UTF8BOMToUTF8 are from different libraries though they're both about encoding conversions). Once again, thank you!

 

TinyPortal © 2005-2018