Recent

Author Topic: accented char  (Read 9243 times)

Paolo

  • Hero Member
  • *****
  • Posts: 675
accented char
« on: August 08, 2021, 11:20:39 am »
Hello, I have a small program that read a text file row by row and display it in a ListBox.

Code: Pascal  [Select][+][-]
  1. procedure TForm1.OpenFile(const AFileName : string; ALstBox : TListBox; ALabel : TLabel);
  2. var
  3.   FilInp : TextFile;
  4.   Str1: string;
  5. begin
  6.   AssignFile(FilInp, AFileName);
  7.   try
  8.     Reset(FilInp);
  9.     while not(EOF(FilInp)) do begin
  10.       Readln(FilInp, Str1);
  11.       ALstBox.Items.Add(Str1);
  12.     end;
  13.   finally
  14.     CloseFile(FilInp);
  15.   end;
  16. end;
  17.  

but I am facing the following problems: it happens that opening two files with the "same content" (at least opened wiht a simple text editor) one is correctly displaied the other has the characters like à,è,etc.. diplayed with other simbols, I really don't understand way.

NB: the file that has bad character is coming from a pc with win-7 pro and saved either on USB or CD-Rom, then it is opend on the my machine win-10 6abit

any help ?

win-10 64, laz 2.0.12, FPC 3.2.0

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4673
  • I like bugs.
Re: accented char
« Reply #1 on: August 08, 2021, 12:03:11 pm »
I am facing the following problems: it happens that opening two files with the "same content" (at least opened wiht a simple text editor) one is correctly displaied the other has the characters like à,è,etc.. diplayed with other simbols, I really don't understand way.

NB: the file that has bad character is coming from a pc with win-7 pro and saved either on USB or CD-Rom, then it is opend on the my machine win-10 6abit
Lazarus GUI apps use Unicode with UTF-8 encoding by default. Your text files use a local Windows codepage.
Windows codepages are a source of problems which Unicode solved a long time ago.
The best solution is to convert the files to UTF-8. You can use one of the utility conversion programs out there to convert them all.
Another solution is to convert the data just after reading it in your code. This page gives hints for it:
 https://wiki.lazarus.freepascal.org/Unicode_Support_in_Lazarus
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Paolo

  • Hero Member
  • *****
  • Posts: 675
Re: accented char
« Reply #2 on: August 08, 2021, 03:57:17 pm »
Thanks @Juha,

I did

Code: Pascal  [Select][+][-]
  1. procedure TForm1.OpenFile(const AFileName : string; ALstBox : TListBox; ALabel : TLabel);
  2. var
  3.   FilInp : TextFile;
  4.   Str1: string;
  5.   StrIn: RawByteString;
  6. begin
  7.   AssignFile(FilInp, AFileName);
  8.   try
  9.     Reset(FilInp);
  10.     while not(EOF(FilInp)) do begin
  11.       Readln(FilInp, StrIn);
  12.       Str1:=WinCPToUTF8(StrIn);
  13.       ALstBox.Items.Add(Str2);
  14.     end;
  15.     ALabel.Caption:=AFileName;
  16.   finally
  17.     CloseFile(FilInp);
  18.   end;
  19. end;
  20.  

now the file initially showing wrong char is fine with accented char, but the one that was good at the begin showed different chars...
At the moment the solution is to have an added flag to mange the two cases, but what is happening ? How can I detect if the "WinCPToUTF8" is necessary ?

Jurassic Pork

  • Hero Member
  • *****
  • Posts: 1290
Re: accented char
« Reply #3 on: August 08, 2021, 05:25:40 pm »
hello,
Try the ChsDet (CharacterSet Detector) available through Online-Package-Manager.
have a look here :  http://chsdet.sourceforge.net/

comparison of 3 methods to detect character set in attachment.

content of the files :
Quote
En écrivant ma pensée, elle m'échappe quelquefois; mais cela me fait me souvenir de ma faiblesse,
que j'oublie à toute heure; ce qui m'instruit autant que ma pensée oubliée, car je ne tends qu'à
connaître mon néant.

Pascal, Pensées, Première partie, Chapitre II.

Friendly, J.P
« Last Edit: August 08, 2021, 06:21:50 pm by Jurassic Pork »
Jurassic computer : Sinclair ZX81 - Zilog Z80A à 3,25 MHz - RAM 1 Ko - ROM 8 Ko

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4673
  • I like bugs.
Re: accented char
« Reply #4 on: August 08, 2021, 05:45:20 pm »
now the file initially showing wrong char is fine with accented char, but the one that was good at the begin showed different chars...
At the moment the solution is to have an added flag to mange the two cases, but what is happening ?
I guess those files are already encoded as UTF-8. You really should use the same UTF-8 for them all.

Quote
How can I detect if the "WinCPToUTF8" is necessary ?
Unit LConvEncoding has function GuessEncoding(). It uses heuristics and may not be 100% accurate always.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Jurassic Pork

  • Hero Member
  • *****
  • Posts: 1290
Re: accented char
« Reply #5 on: August 08, 2021, 06:03:16 pm »
Juha in the comparison that i have added in my previous message lconvencoding.pas is using GuessEncoding : seems to be not so bad  ;)
Jurassic computer : Sinclair ZX81 - Zilog Z80A à 3,25 MHz - RAM 1 Ko - ROM 8 Ko

Paolo

  • Hero Member
  • *****
  • Posts: 675
Re: accented char
« Reply #6 on: August 08, 2021, 06:22:52 pm »
Thaks to both, I'll check as soon as possible your suggestions.
Let me explain the situation: I write code on pc A, win-10 either lazarus or delphi, copy the file (simple pas unit), save on the pc B with delphi on win-7, do modification, then save, copy and paste on pc A, try to compare the chages, and here the problems.

Jurassic Pork

  • Hero Member
  • *****
  • Posts: 1290
Re: accented char
« Reply #7 on: August 08, 2021, 06:31:44 pm »
Thaks to both, I'll check as soon as possible your suggestions.
Let me explain the situation: I write code on pc A, win-10 either lazarus or delphi, copy the file (simple pas unit), save on the pc B with delphi on win-7, do modification, then save, copy and paste on pc A, try to compare the chages, and here the problems.
what  are your versions of lazarus and delphi ?
may be the editor in the Delphi IDE uses ANSI encoding
« Last Edit: August 08, 2021, 06:34:14 pm by Jurassic Pork »
Jurassic computer : Sinclair ZX81 - Zilog Z80A à 3,25 MHz - RAM 1 Ko - ROM 8 Ko

Paolo

  • Hero Member
  • *****
  • Posts: 675
Re: accented char
« Reply #8 on: August 08, 2021, 06:36:25 pm »
Tokyo, laz 2.0.12, fpc 3.2.0

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: accented char
« Reply #9 on: August 08, 2021, 07:55:41 pm »
I think the problem is in the text files, more precisely the way you generate these file. How do you generate your files?

Paolo

  • Hero Member
  • *****
  • Posts: 675
Re: accented char
« Reply #10 on: August 08, 2021, 08:07:19 pm »
The origin of pas file is delphi (since delphi 3 up to tokyo) , then now manipulated and modified by lazarus/fpc.
What I think is that the problem is originated by the going back an forth of file between different pc, i cannot now check but the file opened in the editor either delphi on pc A or laz on pc B, all seems fine.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: accented char
« Reply #11 on: August 08, 2021, 08:15:26 pm »
The code in your first post is used to read pas files?

Edit
Quote
I have a small program that read a text file row by row and display it in a ListBox
« Last Edit: August 08, 2021, 08:18:06 pm by engkin »

Paolo

  • Hero Member
  • *****
  • Posts: 675
Re: accented char
« Reply #12 on: August 08, 2021, 09:46:26 pm »
Yes, to compare the content of two pas file. If I'll be able to reproduce the situation I'll post both the files and the code to read them.

Paolo

  • Hero Member
  • *****
  • Posts: 675
Re: accented char
« Reply #13 on: August 09, 2021, 06:57:46 pm »
attached the extracted project that reads the file Prova-1.pas and Prova-2.pas and show them on listboxes.

Use the first button to load Prova-1.pas and the second one to load Prova-2.pat.

You should see the accented char "è" in the comment correctly (See the code why), if you reverse the order, load Prova-2.pas with the first button and viceversa, both are wrongly shown.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: accented char
« Reply #14 on: August 09, 2021, 08:05:48 pm »
Prova-1.pas is utf8
Prova-2.pas is not

Open both files in Lazarus and right click on each
File Settings
  Encoding
    See which one is what and change to utf8

 

TinyPortal © 2005-2018