Recent

Author Topic: [Solved] Unicode Guidance  (Read 2701 times)

Mike.Cornflake

  • Hero Member
  • *****
  • Posts: 1260
[Solved] Unicode Guidance
« on: August 29, 2018, 10:30:58 pm »
My second ever unicode issue, and the first since Lazarus 0.9.  Things have changed and I haven't kept up :-(  Knew it would bite me one day. 

I have simplified an issue in my software down to the following code

Code: Pascal  [Select][+][-]
  1. Procedure TForm1.Button1Click(Sender: TObject);
  2. Var
  3.   oTemp: TStringList;
  4. Begin
  5.   oTemp := TStringList.Create;
  6.   oTemp.Add('-15µV/cm');
  7.   oTemp.SaveToFile('C:\TEMP\Test.html');
  8.   oTemp.Free;
  9. End;
 

If I open Test.html in notepad, I see the correct text.
If I open Test.html in firefox (latest) or IE (ancient) I see incorrect text: -15µV/cm

I know nothing about file or character encoding.   Is there some flag I need to set on TStringList to get this working?

I'm also not sure what info I need to provide.  Lazarus SVN 58055, fpc 3.0.4.  Windows 7 set to Australia settings.  Anything else?

So yeah.  Help :-)

Many thanks

Mike
« Last Edit: August 29, 2018, 10:53:01 pm by Mike.Cornflake »
Lazarus Trunk/FPC Trunk on Windows [7, 10]
  Have you tried searching this forum or the wiki?:   http://wiki.lazarus.freepascal.org/Alternative_Main_Page
  BOOKS! (Free and otherwise): http://wiki.lazarus.freepascal.org/Pascal_and_Lazarus_Books_and_Magazines

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Unicode Guidance
« Reply #1 on: August 29, 2018, 10:41:15 pm »
try adding a utf8 BOM as the first line of the file and see if firefox changed behavior. If it does then its firefox (correctly) trying to convert from ansi to utf8 an already utf8 string.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

Mike.Cornflake

  • Hero Member
  • *****
  • Posts: 1260
Re: Unicode Guidance
« Reply #2 on: August 29, 2018, 10:49:37 pm »
Err, how do I do that then?  I can change encoding using Notepad++, but that doesn't seem to persist.    And I'm not seeing an option in TStringList either. 

Thanks
Lazarus Trunk/FPC Trunk on Windows [7, 10]
  Have you tried searching this forum or the wiki?:   http://wiki.lazarus.freepascal.org/Alternative_Main_Page
  BOOKS! (Free and otherwise): http://wiki.lazarus.freepascal.org/Pascal_and_Lazarus_Books_and_Magazines

Mike.Cornflake

  • Hero Member
  • *****
  • Posts: 1260
Re: Unicode Guidance
« Reply #3 on: August 29, 2018, 10:52:16 pm »
Ah, found it :-)

Code: [Select]
Procedure TForm1.Button1Click(Sender: TObject);
Var
  oTemp: TStringList;
Begin
  oTemp := TStringList.Create;
  oTemp.Add(Chr($EF) + Chr($BB) + Chr($BF));
  oTemp.Add('-15µV/cm');
  oTemp.SaveToFile('C:\TEMP\Test.html');
  oTemp.Free;
End; 

Thanks, this works.

Update:  I've got this in my app for now, so that takes that monkey off my back.  But this feels like a hack.  I'm busy now, but will try and revisit this in a few weeks.  This feels more like a patch against TStringList.SaveToFile is required (with sort of optional encoding?)
« Last Edit: August 29, 2018, 11:02:54 pm by Mike.Cornflake »
Lazarus Trunk/FPC Trunk on Windows [7, 10]
  Have you tried searching this forum or the wiki?:   http://wiki.lazarus.freepascal.org/Alternative_Main_Page
  BOOKS! (Free and otherwise): http://wiki.lazarus.freepascal.org/Pascal_and_Lazarus_Books_and_Magazines

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: [Solved] Unicode Guidance
« Reply #4 on: August 29, 2018, 11:05:44 pm »
glad to be of service and to complete the info, here is a link for the byte order mark (BOM) https://en.wikipedia.org/wiki/Byte_order_mark .

Update:  I've got this in my app for now, so that takes that monkey off my back.  But this feels like a hack.  I'm busy now, but will try and revisit this in a few weeks.  This feels more like a patch against TStringList.SaveToFile is required (with sort of optional encoding?)
based on the delphi TTstringlist I bet there are already plans to add an overloaded savetofile that has an encoding parameter but I have no idea what they plan to do with BOM in my part of the world it is not used at all it is a waste of space the encoding is part of the file metadata that saves time and guessing.
« Last Edit: August 29, 2018, 11:18:36 pm by taazz »
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4459
  • I like bugs.
Re: [Solved] Unicode Guidance
« Reply #5 on: August 30, 2018, 02:01:23 am »
The browser interprets it as Windows codepage although UTF-8 is most often used in HTML files.
You should have a <meta>-tag there:
 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
A HTML file should also have <html>, <head> and <body> tags.

Your Pascal code is OK. UTF-8 is used automatically also in TStringList.
I remember copying this link to you earlier but here comes again:
 http://wiki.freepascal.org/Unicode_Support_in_Lazarus
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: [Solved] Unicode Guidance
« Reply #6 on: August 31, 2018, 09:30:07 pm »
Update:  I've got this in my app for now, so that takes that monkey off my back.  But this feels like a hack.  I'm busy now, but will try and revisit this in a few weeks.  This feels more like a patch against TStringList.SaveToFile is required (with sort of optional encoding?)
based on the delphi TTstringlist I bet there are already plans to add an overloaded savetofile that has an encoding parameter but I have no idea what they plan to do with BOM in my part of the world it is not used at all it is a waste of space the encoding is part of the file metadata that saves time and guessing.
That was already added in 3.1.1, thus will be part of 3.2.0.  ;D

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: [Solved] Unicode Guidance
« Reply #7 on: August 31, 2018, 10:21:11 pm »
An even better solution is to configure Firefox to interpret unknowns as UTF-8. I can't remember where the option is exactly*, but it's there---I have it set to ISO-8859-15 :)


-----
* Launching FF in this machine is a process that takes more than 10 minutes, so excuse me if I don't do it just to look this.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

 

TinyPortal © 2005-2018