Recent

Author Topic: Cannot read file contain Chinese word  (Read 294 times)

wytwyt02

  • New Member
  • *
  • Posts: 44
Cannot read file contain Chinese word
« on: November 13, 2019, 10:25:26 pm »
For example, I have a txt file contain Chinese words, And read it with

Code: Pascal  [Select]
  1. aStringList := TStringList.Create;
  2. aStringList.LoadFromFile(AFile);
  3. Content:= aStringList.Text;
  4. WriteLn(Content);
  5.  

But the console return noting, please check below txt attachment for test.

jamie

  • Hero Member
  • *****
  • Posts: 2174
Re: Cannot read file contain Chinese word
« Reply #1 on: November 13, 2019, 10:40:08 pm »
more info is needed.

Is the length(Content) = 0 ? or does it have a value other than 0 ?
Number 1 at blue screen app creations!

winni

  • Hero Member
  • *****
  • Posts: 609
Re: Cannot read file contain Chinese word
« Reply #2 on: November 13, 2019, 10:58:34 pm »
Hi!

I don't know nothing about chinese, but when I open it with kate it looks as if the first
char is broken. In the forum editor this char appears:

这是中文

8 bytes and 4 glyphs.

So it must have to do with the settings of the console. Is it UTF8-ready??

Winni


wp

  • Hero Member
  • *****
  • Posts: 6502
Re: Cannot read file contain Chinese word
« Reply #3 on: November 13, 2019, 11:04:33 pm »
Notepad++ tells me that this file is an ANSI file. Playing with the encoding provided by Notepad++ I get "chinese-looking" characters for encoding "chinese/GB2312" - sorry I am European, and do not know anything about this...

The problem seems to me that Lazarus does not support this encoding. So, you only can continue with Lazarus if you make sure that files are directly written in UTF8, or you must convert the ANSI files to UTF8, for example by loading them into NotePad++ and converting them to UTF8. Then the Chinese files can be read without issues (see screenshot below):
Code: Pascal  [Select]
  1. procedure TForm1.Button2Click(Sender: TObject);
  2. var
  3.   L: TStrings;
  4. begin
  5.   L := TStringList.Create;
  6.   try
  7.     L.LoadFromFile('text-utf8.txt');
  8.     Label2.Caption := L[0];
  9.   finally
  10.     L.Free;
  11.   end;
  12. end;  
« Last Edit: November 13, 2019, 11:12:18 pm by wp »
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

winni

  • Hero Member
  • *****
  • Posts: 609
Re: Cannot read file contain Chinese word
« Reply #4 on: November 13, 2019, 11:54:49 pm »
I was curious about the chinese text.

Google translator says:

这是中文    Zhè shì zhōngwén    This is Chinese

Good to know .....

jus

  • New Member
  • *
  • Posts: 19
Re: Cannot read file contain Chinese word
« Reply #5 on: November 14, 2019, 12:33:35 am »
as wp said, you have to do the conversion from the codepage to utf8. For converting the GB2312 codepage to utf8 in Lazarus I do the following: Put a TMemo and a button on the Lazarus project.

Code: Pascal  [Select]
  1. unit Unit1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls;
  9.  
  10. type
  11.  
  12.   { TForm1 }
  13.  
  14.   TForm1 = class(TForm)
  15.     Button1: TButton;
  16.     Memo1: TMemo;
  17.     procedure Button1Click(Sender: TObject);
  18.   private
  19.  
  20.   public
  21.  
  22.   end;
  23.  
  24. var
  25.   Form1: TForm1;
  26.  
  27. implementation
  28.  
  29. {$R *.lfm}
  30.  
  31. uses LConvEncoding;
  32.  
  33. { TForm1 }
  34.  
  35. procedure TForm1.Button1Click(Sender: TObject);
  36. var
  37.   Stream   : TStream;
  38.   Size     : Integer;
  39.   StrIn: RawByteString;
  40.   StrOut: String;
  41. begin
  42.   Stream := TFileStream.Create('text.txt', fmOpenRead);
  43.   try
  44.     Size := Stream.Size;
  45.     SetLength(StrIn, Size);
  46.     Stream.Read(StrIn[1], Size);
  47.     StrOut := CP936ToUTF8(StrIn);
  48.     Memo1.Text := StrOut;
  49.   finally
  50.     Stream.Free;
  51.   end;
  52. end;
  53.  
  54. end.
  55.  

jus
« Last Edit: November 14, 2019, 12:54:37 am by jus »

wytwyt02

  • New Member
  • *
  • Posts: 44
Re: Cannot read file contain Chinese word
« Reply #6 on: November 14, 2019, 09:50:28 pm »
Thanks above, I found a better way:

Code: Pascal  [Select]
  1. use ...LConvEncoding...
  2.  
  3. Content := ConvertEncoding(aStringList.Text, GuessEncoding(aStringList.Text), EncodingUTF8);
  4.