Recent

Author Topic: Help reading an ANSI text file correctly...?  (Read 22500 times)

Espectr0

  • Full Member
  • ***
  • Posts: 218
Help reading an ANSI text file correctly...?
« on: February 04, 2016, 05:59:28 pm »
Hola!

How can read an ANSI text file correctly ?

my code:

Code: Pascal  [Select][+][-]
  1. ...
  2. var
  3.   Stream   : TStream;
  4.   Size     : Integer;
  5.   Buffer   : TBytes;
  6.   FileName : String;
  7.   Encoding : TEncoding;
  8. begin
  9.   Encoding := NIL;
  10.   FileName := 'd:\ansitext.txt';
  11.  
  12.   Stream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
  13.   try
  14.     Size := Stream.Size - Stream.Position;
  15.     SetLength(Buffer, Size);
  16.     Stream.Read(Buffer[0], Size);
  17.     Size := TEncoding.GetBufferEncoding(Buffer, Encoding, TEncoding.ANSI);
  18.     Memo1.Text := Encoding.GetString(Buffer, Size, Length(Buffer) - Size);
  19.   finally
  20.     Stream.Free;
  21.   end;
  22. ...
  23.  


what am I doing wrong?

Thanks!
« Last Edit: February 04, 2016, 06:04:38 pm by Espectr0 »

balazsszekely

  • Guest
Re: Help reading an ANSI text file correctly...?
« Reply #1 on: February 04, 2016, 06:30:03 pm »
@Espectr0
You can read it directly to Memo, like this:
Code: Pascal  [Select][+][-]
  1. Memo1.Lines.LoadFromFile('c:\test.txt');
If you want to load the file into an ansistring:
Code: Pascal  [Select][+][-]
  1. var
  2.   AStr: AnsiString;
  3.   FS: TFileStream;
  4. begin
  5.   FS := TFileStream.Create('c:\test.txt', fmOpenRead or fmShareDenyWrite);
  6.   try
  7.     if FS.Size > 0 then
  8.     begin
  9.       SetLength(AStr, FS.Size);
  10.       FS.ReadBuffer(Pointer(AStr)^, FS.Size);    
  11.     end;
  12.   finally
  13.     FS.Free;
  14.   end;
  15. end;

Espectr0

  • Full Member
  • ***
  • Posts: 218
Re: Help reading an ANSI text file correctly...?
« Reply #2 on: February 04, 2016, 06:46:49 pm »
@GetMem

Tested yours methods but them do not read the ANSI file correctly, ie: special characters such as accented letters are replaced with "?".

My first code work great in Delphi but not in Lazarus...

Any ideas?


balazsszekely

  • Guest
Re: Help reading an ANSI text file correctly...?
« Reply #3 on: February 04, 2016, 08:11:58 pm »
Quote
@Espectr0
Tested yours methods but them do not read the ANSI file correctly, ie: special characters such as accented letters are replaced with "?".
Then, please post your ansi file as attachment.

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Help reading an ANSI text file correctly...?
« Reply #4 on: February 04, 2016, 08:29:06 pm »
A simple WinCPToUtf8() maybe?

Bart

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: Help reading an ANSI text file correctly...?
« Reply #5 on: February 04, 2016, 08:32:08 pm »
After reading convert the text to UTF8 using one of the conversion routines in LConvEncoding or LazUTF8. You must know the code page of the text. If it is encoded in the system's code page you can try
Code: Pascal  [Select][+][-]
  1. // read the file into the lines of the memo here. Then:
  2.  
  3.   Memo1.Lines.Text := AnsiToUTF8(Memo1.Lines.Text);
  4. // or, if your text is on the - say - Greek codepage use
  5. // Memo1.Lines.Text := CP1253ToUTF8(Memo1.Lines.Text)

Espectr0

  • Full Member
  • ***
  • Posts: 218
Re: Help reading an ANSI text file correctly...?
« Reply #6 on: February 04, 2016, 08:58:03 pm »
@wp, @Bart: Try it but no work for me :(
@GetMem: Ok, attached ANSI text file.

Any ideas?


Note: Im using Lazarus 1.6RC2 and Windows 10.

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: Help reading an ANSI text file correctly...?
« Reply #7 on: February 04, 2016, 09:28:24 pm »
This is working for me with Laz 1.44 / fpc 2.6.4 and Laz trunk / fpc 3.0 on Win 7:
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button2Click(Sender: TObject);
  2. var
  3.   L: TStringList;
  4. begin
  5.   L := TStringList.Create;
  6.   try
  7.     L.LoadFromFile('subansi.txt');
  8.     Memo1.Lines.Text := ISO_8859_15ToUTF8(L.Text);
  9.   finally
  10.     L.Free;
  11.   end;
  12. end;  

Espectr0

  • Full Member
  • ***
  • Posts: 218
Re: Help reading an ANSI text file correctly...?
« Reply #8 on: February 04, 2016, 09:50:49 pm »
Yes, work!
Any chance to improve the first code to detect encoding ?


This is working for me with Laz 1.44 / fpc 2.6.4 and Laz trunk / fpc 3.0 on Win 7:
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button2Click(Sender: TObject);
  2. var
  3.   L: TStringList;
  4. begin
  5.   L := TStringList.Create;
  6.   try
  7.     L.LoadFromFile('subansi.txt');
  8.     Memo1.Lines.Text := ISO_8859_15ToUTF8(L.Text);
  9.   finally
  10.     L.Free;
  11.   end;
  12. end;  

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: Help reading an ANSI text file correctly...?
« Reply #9 on: February 04, 2016, 10:24:37 pm »
No. There is nothing in an ordinary ansi text file from which you could guess the code page. BTW, the standard CP1252 would have worked as well (-->CP1252ToUTF8()).

I wonder why the conversion fails if the file is read directly into the Lines of the Memo:
Code: Pascal  [Select][+][-]
  1.   Memo1.Lines.LoadFromFile('subansi.txt');
  2.   Memo1.Lines.Text := CP1252ToUTF8(Memo1.Lines.Text);  // --> incorrect conversion
  3.  

And I also wonder why AnsiToUtf8 or SysToUtf8 are not working (neither on fpc 2.6.4 nor on fpc 3.0).

parcel

  • Full Member
  • ***
  • Posts: 143
Re: Help reading an ANSI text file correctly...?
« Reply #10 on: February 05, 2016, 02:01:38 am »
No. There is nothing in an ordinary ansi text file from which you could guess the code page. BTW, the standard CP1252 would have worked as well (-->CP1252ToUTF8()).

I wonder why the conversion fails if the file is read directly into the Lines of the Memo:
Code: Pascal  [Select][+][-]
  1.   Memo1.Lines.LoadFromFile('subansi.txt');
  2.   Memo1.Lines.Text := CP1252ToUTF8(Memo1.Lines.Text);  // --> incorrect conversion
  3.  

And I also wonder why AnsiToUtf8 or SysToUtf8 are not working (neither on fpc 2.6.4 nor on fpc 3.0).

AnsiToUtf8 is not work properly under utf-8 enabled LCL.
Try switch "disableutf8rtl"


balazsszekely

  • Guest
Re: Help reading an ANSI text file correctly...?
« Reply #11 on: February 05, 2016, 11:01:12 am »
What about this? It should guess the encoding.
Code: Pascal  [Select][+][-]
  1. uses LConvEncoding;
  2.  
  3. function FileToString(const AFileName: String): String;
  4. var
  5.   AStr: String;
  6.   FS: TFileStream;
  7.   FromEncoding, ToEncoding: String;
  8. begin
  9.   Result := '';
  10.   FS := TFileStream.Create(AFileName, fmOpenRead or fmShareDenyWrite);
  11.   try
  12.     if FS.Size > 0 then
  13.     begin
  14.       SetLength(AStr, FS.Size div SizeOf(Char));
  15.       FS.ReadBuffer(Pointer(AStr)^, FS.Size div SizeOf(Char));
  16.     end;
  17.   finally
  18.     FS.Free;
  19.   end;
  20.   if Length(AStr) > 0 then
  21.   begin
  22.     FromEncoding := GuessEncoding(AStr);
  23.     ToEncoding := EncodingUTF8;
  24.     if FromEncoding <> '' then
  25.       Result := ConvertEncoding(AStr, FromEncoding, ToEncoding);
  26.   end;
  27. end;
  28.  
  29. procedure TForm1.Button1Click(Sender: TObject);
  30. begin
  31.   Memo1.Text := FileToString('c:\SubANSI.txt');
  32. end;
  33.  

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: Help reading an ANSI text file correctly...?
« Reply #12 on: February 05, 2016, 11:08:52 am »
GuessEncoding is not able to distinguish between CP1252 and CP1250, for example.

balazsszekely

  • Guest
Re: Help reading an ANSI text file correctly...?
« Reply #13 on: February 05, 2016, 11:20:01 am »
Quote
@wp
GuessEncoding is not able to distinguish between CP1252 and CP1250, for example.
Yes, that's true. But at least it will detect a few encoding type. I don't know if there is a generic detection method, which will work in all cases.

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: Help reading an ANSI text file correctly...?
« Reply #14 on: February 05, 2016, 11:39:10 am »
Quote
@wp
GuessEncoding is not able to distinguish between CP1252 and CP1250, for example.
Yes, that's true. But at least it will detect a few encoding type. I don't know if there is a generic detection method, which will work in all cases.
I could only imagine some complicated semantic analysis of byte combinations. Principally, a text file with ANSI characters is just a file of bytes, i.e numbers between 0 and 255; there is no embedded meta information which would allow to determine that, for example, character #192 should be interpreted as "Á" (CP1252 - Windows Latin-1), "Ŕ" (CP1250 - Windows Latin-2) or "A" (CP1251 - Windows Cyrillic) (Example taken from https://msdn.microsoft.com/en-us/library/cc195054.aspx#)

 

TinyPortal © 2005-2018