Lazarus

Programming => General => Topic started by: wytwyt02 on November 13, 2019, 10:25:26 pm

Title: Cannot read file contain Chinese word
Post by: wytwyt02 on November 13, 2019, 10:25:26 pm
For example, I have a txt file contain Chinese words, And read it with

Code: Pascal  [Select]
  1. aStringList := TStringList.Create;
  2. aStringList.LoadFromFile(AFile);
  3. Content:= aStringList.Text;
  4. WriteLn(Content);
  5.  

But the console return noting, please check below txt attachment for test.
Title: Re: Cannot read file contain Chinese word
Post by: jamie on November 13, 2019, 10:40:08 pm
more info is needed.

Is the length(Content) = 0 ? or does it have a value other than 0 ?
Title: Re: Cannot read file contain Chinese word
Post by: winni on November 13, 2019, 10:58:34 pm
Hi!

I don't know nothing about chinese, but when I open it with kate it looks as if the first
char is broken. In the forum editor this char appears:

这是中文

8 bytes and 4 glyphs.

So it must have to do with the settings of the console. Is it UTF8-ready??

Winni

Title: Re: Cannot read file contain Chinese word
Post by: wp on November 13, 2019, 11:04:33 pm
Notepad++ tells me that this file is an ANSI file. Playing with the encoding provided by Notepad++ I get "chinese-looking" characters for encoding "chinese/GB2312" - sorry I am European, and do not know anything about this...

The problem seems to me that Lazarus does not support this encoding. So, you only can continue with Lazarus if you make sure that files are directly written in UTF8, or you must convert the ANSI files to UTF8, for example by loading them into NotePad++ and converting them to UTF8. Then the Chinese files can be read without issues (see screenshot below):
Code: Pascal  [Select]
  1. procedure TForm1.Button2Click(Sender: TObject);
  2. var
  3.   L: TStrings;
  4. begin
  5.   L := TStringList.Create;
  6.   try
  7.     L.LoadFromFile('text-utf8.txt');
  8.     Label2.Caption := L[0];
  9.   finally
  10.     L.Free;
  11.   end;
  12. end;  
Title: Re: Cannot read file contain Chinese word
Post by: winni on November 13, 2019, 11:54:49 pm
I was curious about the chinese text.

Google translator says:

这是中文    Zhè shì zhōngwén    This is Chinese

Good to know .....
Title: Re: Cannot read file contain Chinese word
Post by: jus on November 14, 2019, 12:33:35 am
as wp said, you have to do the conversion from the codepage to utf8. For converting the GB2312 codepage to utf8 in Lazarus I do the following: Put a TMemo and a button on the Lazarus project.

Code: Pascal  [Select]
  1. unit Unit1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls;
  9.  
  10. type
  11.  
  12.   { TForm1 }
  13.  
  14.   TForm1 = class(TForm)
  15.     Button1: TButton;
  16.     Memo1: TMemo;
  17.     procedure Button1Click(Sender: TObject);
  18.   private
  19.  
  20.   public
  21.  
  22.   end;
  23.  
  24. var
  25.   Form1: TForm1;
  26.  
  27. implementation
  28.  
  29. {$R *.lfm}
  30.  
  31. uses LConvEncoding;
  32.  
  33. { TForm1 }
  34.  
  35. procedure TForm1.Button1Click(Sender: TObject);
  36. var
  37.   Stream   : TStream;
  38.   Size     : Integer;
  39.   StrIn: RawByteString;
  40.   StrOut: String;
  41. begin
  42.   Stream := TFileStream.Create('text.txt', fmOpenRead);
  43.   try
  44.     Size := Stream.Size;
  45.     SetLength(StrIn, Size);
  46.     Stream.Read(StrIn[1], Size);
  47.     StrOut := CP936ToUTF8(StrIn);
  48.     Memo1.Text := StrOut;
  49.   finally
  50.     Stream.Free;
  51.   end;
  52. end;
  53.  
  54. end.
  55.  

jus
Title: Re: Cannot read file contain Chinese word
Post by: wytwyt02 on November 14, 2019, 09:50:28 pm
Thanks above, I found a better way:

Code: Pascal  [Select]
  1. use ...LConvEncoding...
  2.  
  3. Content := ConvertEncoding(aStringList.Text, GuessEncoding(aStringList.Text), EncodingUTF8);
  4.