Recent

Author Topic: cross platform text with European accents .  (Read 7835 times)

lazer

  • Full Member
  • ***
  • Posts: 215
cross platform text with European accents .
« on: October 29, 2022, 06:38:48 pm »
Hi,

I have a program displaying the content of some text files with accented characters. I'm having trouble getting it to display consistently between Windows and Linux.

I seem to need utf8 on Linux and ISO-8859-15 on Windows.

I don't want to have to have two versions of the text files. Is there common encoding  I can use of both platforms?

TIA.



Lulu

  • Full Member
  • ***
  • Posts: 226
Re: cross platform text with European accents .
« Reply #1 on: October 29, 2022, 08:00:20 pm »
If I am not mistaken, UTF8 is the 'standard' in Lazarus LCL for all platforms. Its curious that you have to encode your files with ISO-xxxx on Windows. Can you provide some code to show the way you read this file and displaying it ?
wishing you a nice life

lazer

  • Full Member
  • ***
  • Posts: 215
Re: cross platform text with European accents .
« Reply #2 on: November 03, 2022, 11:43:32 am »
Yes, you are correct. utf8 seems to conserve accents.

However, if I open the file with Notepad in Win32 and save it, it is growing by 3 bytes. Since I have a checksum on the file this is causing my program to flag a corrupt file.

I created the text files on linux with featherpad, used unix2dos to add the CRLF and then did my checksum.

I need to find a stable format.


marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11382
  • FPC developer.
Re: cross platform text with European accents .
« Reply #3 on: November 03, 2022, 12:05:39 pm »
Probably notepad adds a BOM. You can avoid the differences by adding a BOM in the original file (and hope neither featherpad or unix2dos remove it)

domasz

  • Sr. Member
  • ****
  • Posts: 423
Re: cross platform text with European accents .
« Reply #4 on: November 03, 2022, 12:56:47 pm »
Yes, you are correct. utf8 seems to conserve accents.
Install a hex editor. HxD is free and pretty nice. Then you will be able to see differences. In this case those few bytes are UTF-8 BOM.

lazer

  • Full Member
  • ***
  • Posts: 215
Re: cross platform text with European accents .
« Reply #5 on: November 03, 2022, 01:49:46 pm »
Code: Pascal  [Select][+][-]
  1. echo -en '\xEF\xBB\xBF' > BOM.dat
  2.  
  3. cat  BOM.dat test.utf8 > test.txt
  4.  

Seems to work . Thanks for pointing out that win32 puts a byte order marker in file with single byte data ;)   I would not have guessed.
« Last Edit: November 04, 2022, 06:26:26 am by lazer »

dseligo

  • Hero Member
  • *****
  • Posts: 1194
Re: cross platform text with European accents .
« Reply #6 on: November 03, 2022, 02:08:17 pm »
I need to find a stable format.

Use XML then and don't do checksum on whole file but only on data.

dseligo

  • Hero Member
  • *****
  • Posts: 1194
Re: cross platform text with European accents .
« Reply #7 on: November 03, 2022, 02:10:27 pm »
win32 puts a byte order marker in file with single byte data

Not win32, but Notepad. There are editors that don't put BOM (automatically) in file.

lazer

  • Full Member
  • ***
  • Posts: 215
Re: cross platform text with European accents .
« Reply #8 on: November 03, 2022, 07:03:22 pm »
Notepad is part of the default windows installation since Win3.1  IIRC.  It's Windows.

 

TinyPortal © 2005-2018