Lazarus

Free Pascal => Beginners => Topic started by: BIT on September 15, 2021, 04:45:22 pm

Title: Korean text into utf8
Post by: BIT on September 15, 2021, 04:45:22 pm
Please tell me how to recode Korean text into utf8 ?
Title: Re: Korean text into utf8
Post by: AlexTP on September 15, 2021, 05:43:36 pm
Install CudaText. Open you  text files with Korean. Choose Korean codepage in the statusbar: "Encoding / Reload as / <encoding>".
Then choose "File / Encoding / Convert to / UTF8". Save the file.
Title: Re: Korean text into utf8
Post by: BIT on September 15, 2021, 06:26:09 pm
Install CudaText. Open you  text files with Korean. Choose Korean codepage in the statusbar: "Encoding / Reload as / <encoding>".
Then choose "File / Encoding / Convert to / UTF8". Save the file.

I want to open a Korean file in SynEdit
Title: Re: Korean text into utf8
Post by: BIT on September 15, 2021, 06:41:34 pm
The problem I have is this:
The file itself is encoded in EUC-KR
When opened in SynEdit, I convert to WinCPToUTF8, (for Russian language support)
Save in UTF8ToWinCP, Korean becomes like this ??????.
Title: Re: Korean text into utf8
Post by: skalogryz on September 15, 2021, 07:14:32 pm
When opened in SynEdit, I convert to WinCPToUTF8, (for Russian language support)
this is not necessary Russian language support, this is your "windows code-page" support.

If you need more reliable way of conversion, you might need to specify the code page explicitly.
Here's an example of using FPC charset conversion tools.
Code: Pascal  [Select][+][-]
  1. uses
  2.   .. cp949, charset; // cp949 - is Korean code page. Charset - is the unit with that provides routines to convert the characters, based on "cp949"
  3.  
  4. procedure TForm1.FormCreate(Sender: TObject);
  5. var
  6.   s: string;
  7.   u: WideString;
  8.   f: Text;
  9.   r: integer;
  10. begin
  11.   AssignFile(F,'input.txt');  Reset(f); // this just to load characters from the file. The solution may vary
  12.   ReadLn(f, s);
  13.   CloseFile(f);
  14.  
  15.   SetLength(u, length(s));
  16.   r := getunicode(PChar(s),length(s),getmap(949), tunicodestring(@u[1])); // from "Charset"
  17.   SetLength(u, r);
  18.  
  19.   SynEdit1.Text := UTF8Encode(u);
  20. end;
  21.  

instead of dealing with GetUnicode() function directly, one might use "fpwidestring" wideString manager. But I'm not sure if it's friendly with LCL.
Title: Re: Korean text into utf8
Post by: winni on September 15, 2021, 07:19:02 pm
Hi!


Convert your Korean text to UTF8 with a little bit of Lazarus:

Code: Pascal  [Select][+][-]
  1. uses ........LConvEncoding;
  2. ...
  3. procedure TForm1.Button4Click(Sender: TObject);
  4. var sl1,sl2  : TSTringList;
  5. s : string;
  6. begin
  7.     sl1 := TStringList.create;
  8.     sl2 := TStringList.create;
  9.     sl1.LoadFromFile('Korean.txt');
  10.     s := CP949ToUTF8(sl1.Text);
  11.     sl2.Text := s;
  12.     sl2.saveToFile('KoreanUTF8.txt');
  13.     sl1.free;
  14.     sl2.free;
  15. end;
  16.  

Winni


 
Title: Re: Korean text into utf8
Post by: BIT on September 15, 2021, 08:01:03 pm
Hi!


Convert your Korean text to UTF8 with a little bit of Lazarus:

Code: Pascal  [Select][+][-]
  1. uses ........LConvEncoding;
  2. ...
  3. procedure TForm1.Button4Click(Sender: TObject);
  4. var sl1,sl2  : TSTringList;
  5. s : string;
  6. begin
  7.     sl1 := TStringList.create;
  8.     sl2 := TStringList.create;
  9.     sl1.LoadFromFile('Korean.txt');
  10.     s := CP949ToUTF8(sl1.Text);
  11.     sl2.Text := s;
  12.     sl2.saveToFile('KoreanUTF8.txt');
  13.     sl1.free;
  14.     sl2.free;
  15. end;
  16.  

Winni

Thanks it worked!  :D
Title: Re: Korean text into utf8
Post by: BIT on September 15, 2021, 08:08:28 pm
When opened in SynEdit, I convert to WinCPToUTF8, (for Russian language support)
this is not necessary Russian language support, this is your "windows code-page" support.

instead of dealing with GetUnicode() function directly, one might use "fpwidestring" wideString manager. But I'm not sure if it's friendly with LCL.
Thank you for your answer too!
Title: Re: Korean text into utf8
Post by: BIT on September 15, 2021, 08:19:24 pm
Can I still have a question?
How to automatically find out the file encoding and convert to the desired encoding.
Title: Re: Korean text into utf8
Post by: winni on September 16, 2021, 12:27:03 am
Hi!

I don't know if it helps with Korean but there is in the unit LConvEncoding also this function:

Code: Pascal  [Select][+][-]
  1. function GuessEncoding(const s: string): string;


Winni
Title: Re: Korean text into utf8
Post by: BIT on September 16, 2021, 08:25:18 am
Hi!

I don't know if it helps with Korean but there is in the unit LConvEncoding also this function:

Code: Pascal  [Select][+][-]
  1. function GuessEncoding(const s: string): string;


Winni

Checked Korean does not detect
Title: Re: Korean text into utf8
Post by: egsuh on September 16, 2021, 11:13:36 am
If you do not have to convert it with your own program, you can change encoding with NotePad ++, etc.
Title: Re: Korean text into utf8
Post by: wp on September 16, 2021, 11:39:23 am
GuessEncoding only detects UTF16 with/without BOM or UTF8 with BOM. UTF8 without BOM and ANSI/code-page encoding can safely be distinguished only by a language analysis of the following text - there was a forum discussion about codepage detection recently: https://forum.lazarus.freepascal.org/index.php/topic,45307.msg320403.html#msg320403
TinyPortal © 2005-2018