Recent

Author Topic: Korean text into utf8  (Read 1316 times)

BIT

  • Full Member
  • ***
  • Posts: 119
Korean text into utf8
« on: September 15, 2021, 04:45:22 pm »
Please tell me how to recode Korean text into utf8 ?

Alextp

  • Hero Member
  • *****
  • Posts: 1404
    • UVviewsoft
Re: Korean text into utf8
« Reply #1 on: September 15, 2021, 05:43:36 pm »
Install CudaText. Open you  text files with Korean. Choose Korean codepage in the statusbar: "Encoding / Reload as / <encoding>".
Then choose "File / Encoding / Convert to / UTF8". Save the file.

BIT

  • Full Member
  • ***
  • Posts: 119
Re: Korean text into utf8
« Reply #2 on: September 15, 2021, 06:26:09 pm »
Install CudaText. Open you  text files with Korean. Choose Korean codepage in the statusbar: "Encoding / Reload as / <encoding>".
Then choose "File / Encoding / Convert to / UTF8". Save the file.

I want to open a Korean file in SynEdit

BIT

  • Full Member
  • ***
  • Posts: 119
Re: Korean text into utf8
« Reply #3 on: September 15, 2021, 06:41:34 pm »
The problem I have is this:
The file itself is encoded in EUC-KR
When opened in SynEdit, I convert to WinCPToUTF8, (for Russian language support)
Save in UTF8ToWinCP, Korean becomes like this ??????.

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2712
    • havefunsoft.com
Re: Korean text into utf8
« Reply #4 on: September 15, 2021, 07:14:32 pm »
When opened in SynEdit, I convert to WinCPToUTF8, (for Russian language support)
this is not necessary Russian language support, this is your "windows code-page" support.

If you need more reliable way of conversion, you might need to specify the code page explicitly.
Here's an example of using FPC charset conversion tools.
Code: Pascal  [Select][+][-]
  1. uses
  2.   .. cp949, charset; // cp949 - is Korean code page. Charset - is the unit with that provides routines to convert the characters, based on "cp949"
  3.  
  4. procedure TForm1.FormCreate(Sender: TObject);
  5. var
  6.   s: string;
  7.   u: WideString;
  8.   f: Text;
  9.   r: integer;
  10. begin
  11.   AssignFile(F,'input.txt');  Reset(f); // this just to load characters from the file. The solution may vary
  12.   ReadLn(f, s);
  13.   CloseFile(f);
  14.  
  15.   SetLength(u, length(s));
  16.   r := getunicode(PChar(s),length(s),getmap(949), tunicodestring(@u[1])); // from "Charset"
  17.   SetLength(u, r);
  18.  
  19.   SynEdit1.Text := UTF8Encode(u);
  20. end;
  21.  

instead of dealing with GetUnicode() function directly, one might use "fpwidestring" wideString manager. But I'm not sure if it's friendly with LCL.

winni

  • Hero Member
  • *****
  • Posts: 2703
Re: Korean text into utf8
« Reply #5 on: September 15, 2021, 07:19:02 pm »
Hi!


Convert your Korean text to UTF8 with a little bit of Lazarus:

Code: Pascal  [Select][+][-]
  1. uses ........LConvEncoding;
  2. ...
  3. procedure TForm1.Button4Click(Sender: TObject);
  4. var sl1,sl2  : TSTringList;
  5. s : string;
  6. begin
  7.     sl1 := TStringList.create;
  8.     sl2 := TStringList.create;
  9.     sl1.LoadFromFile('Korean.txt');
  10.     s := CP949ToUTF8(sl1.Text);
  11.     sl2.Text := s;
  12.     sl2.saveToFile('KoreanUTF8.txt');
  13.     sl1.free;
  14.     sl2.free;
  15. end;
  16.  

Winni


 

BIT

  • Full Member
  • ***
  • Posts: 119
Re: Korean text into utf8
« Reply #6 on: September 15, 2021, 08:01:03 pm »
Hi!


Convert your Korean text to UTF8 with a little bit of Lazarus:

Code: Pascal  [Select][+][-]
  1. uses ........LConvEncoding;
  2. ...
  3. procedure TForm1.Button4Click(Sender: TObject);
  4. var sl1,sl2  : TSTringList;
  5. s : string;
  6. begin
  7.     sl1 := TStringList.create;
  8.     sl2 := TStringList.create;
  9.     sl1.LoadFromFile('Korean.txt');
  10.     s := CP949ToUTF8(sl1.Text);
  11.     sl2.Text := s;
  12.     sl2.saveToFile('KoreanUTF8.txt');
  13.     sl1.free;
  14.     sl2.free;
  15. end;
  16.  

Winni

Thanks it worked!  :D

BIT

  • Full Member
  • ***
  • Posts: 119
Re: Korean text into utf8
« Reply #7 on: September 15, 2021, 08:08:28 pm »
When opened in SynEdit, I convert to WinCPToUTF8, (for Russian language support)
this is not necessary Russian language support, this is your "windows code-page" support.

instead of dealing with GetUnicode() function directly, one might use "fpwidestring" wideString manager. But I'm not sure if it's friendly with LCL.
Thank you for your answer too!

BIT

  • Full Member
  • ***
  • Posts: 119
Re: Korean text into utf8
« Reply #8 on: September 15, 2021, 08:19:24 pm »
Can I still have a question?
How to automatically find out the file encoding and convert to the desired encoding.

winni

  • Hero Member
  • *****
  • Posts: 2703
Re: Korean text into utf8
« Reply #9 on: September 16, 2021, 12:27:03 am »
Hi!

I don't know if it helps with Korean but there is in the unit LConvEncoding also this function:

Code: Pascal  [Select][+][-]
  1. function GuessEncoding(const s: string): string;


Winni

BIT

  • Full Member
  • ***
  • Posts: 119
Re: Korean text into utf8
« Reply #10 on: September 16, 2021, 08:25:18 am »
Hi!

I don't know if it helps with Korean but there is in the unit LConvEncoding also this function:

Code: Pascal  [Select][+][-]
  1. function GuessEncoding(const s: string): string;


Winni

Checked Korean does not detect

egsuh

  • Hero Member
  • *****
  • Posts: 855
Re: Korean text into utf8
« Reply #11 on: September 16, 2021, 11:13:36 am »
If you do not have to convert it with your own program, you can change encoding with NotePad ++, etc.

wp

  • Hero Member
  • *****
  • Posts: 8895
Re: Korean text into utf8
« Reply #12 on: September 16, 2021, 11:39:23 am »
GuessEncoding only detects UTF16 with/without BOM or UTF8 with BOM. UTF8 without BOM and ANSI/code-page encoding can safely be distinguished only by a language analysis of the following text - there was a forum discussion about codepage detection recently: https://forum.lazarus.freepascal.org/index.php/topic,45307.msg320403.html#msg320403
Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

 

TinyPortal © 2005-2018