Recent

Author Topic: Using 'Tools->Convert encoding of projects' option  (Read 11143 times)

btr0001

  • New member
  • *
  • Posts: 7
Using 'Tools->Convert encoding of projects' option
« on: November 03, 2011, 01:50:42 pm »
Hi!
I don't understand how to choose project in 'Convert encoding' form. The dropdown list shows some standard packages (ChmHelpPkg and others). I tried to enter path to my project filename, but no files where found.
Please help!

howardpc

  • Hero Member
  • *****
  • Posts: 3258
Re: Using 'Tools->Convert encoding of projects' option
« Reply #1 on: November 03, 2011, 05:47:38 pm »
You have to open the project first in the IDE (or if it is a new project, save it with a specific name somewhere).
Then you'll see the first option in the dropdown list is "Current Project", before the list of packages.
The dialog is not designed to let you navigate to as-yet unopened projects. You are restricted to the (one) project currently open in the IDE.

btr0001

  • New member
  • *
  • Posts: 7
Re: Using 'Tools->Convert encoding of projects' option
« Reply #2 on: November 09, 2011, 11:11:26 am »
Ok. I have done the next.
First I created a new project in Delphi with single label in form. Label has caption in Cyrillic. Then I converted this project using 'Tools->Convert Delphi project to Lazarus project'. All was done correctly, caption was converted to utf8.
Next I created another project with no caption on label but with additional button. When I press the button text in cyrillic is assigned to label. So now text in cyriilic is saved in unit1.pas file instead of Form1.lfm. In this case text in cyrillic was not converted to utf8. Then I used 'tools->convert encoding of projects', pressed 'update preview' I saw in 'Encoding' column that my file has ISO-8859-1 codepage. Really it has cp1251. After I searched 'Environment->Options' trying to find something to set the codepage, but did find any place to do this.
What to do now?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3672
  • I like bugs.
Re: Using 'Tools->Convert encoding of projects' option
« Reply #3 on: November 09, 2011, 01:51:56 pm »
Could you attach a sample project?

Juha

btr0001

  • New member
  • *
  • Posts: 7
Re: Using 'Tools->Convert encoding of projects' option
« Reply #4 on: November 09, 2011, 02:52:08 pm »
Attached project

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3672
  • I like bugs.
Re: Using 'Tools->Convert encoding of projects' option
« Reply #5 on: November 10, 2011, 10:46:20 am »
Ok
Your .pas file encoding is correct. You need to encode the string literal explicitly.
This is the original assignment :
  Label1.Caption:='Ïðèâ³ò!';

I thought this would do the job but it doesn't :
  Label1.Caption:=UTF8Encode('Ïðèâ³ò!');

Maybe someone knows the right way.

Juha

btr0001

  • New member
  • *
  • Posts: 7
Re: Using 'Tools->Convert encoding of projects' option
« Reply #6 on: November 10, 2011, 03:08:46 pm »
I thing you are not right.
Original assignment is
  Label1.Caption:='Привіт!';
If you use Unicode in your browser you can see a difference.
As *.pas is a simple text file it does not contain any information about it's codepage. I don't understand how can you know that my file encoding is correct.
In fact this file was created under WS Windows with cp1251. But I don't know how Lazarus can know about this. Another fact that first program (attached) was converted successfully. In that case caption of Label1 was stored in Unit1.dfm file as codes of cp1251 symbols:

Caption = #1055#1088#1080#1074#1110#1090'!'
Font.Charset = DEFAULT_CHARSET

I also cannot find in this dephi project any other data about what is DEFAULT_CHARSET (it is cp1251), but Lazarus know this. How?

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: Using 'Tools->Convert encoding of projects' option
« Reply #7 on: November 10, 2011, 03:50:20 pm »
When the file doesn't include any encoding info, the system default encoding is used in the convert encoding tool. On windows this is the code page returned by the windows GetACP().  As a result you can only use the convert tool on a machine that uses cp1251 as default encoding. The machine you are using is apparently configured as ISO-8859-1.
On my machine I also read "Label1.Caption:='Ïðèâ³ò!';" but my default windows code page is cp1252. Normal that it doesn't show Cyrillic characters.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3672
  • I like bugs.
Re: Using 'Tools->Convert encoding of projects' option
« Reply #8 on: November 10, 2011, 10:55:37 pm »
I am not an expert with character encodings. According to the converter your file has ISO-8859-1 encoding. See:

 * Converting file /home/juha/SW/LazTest/sample_project/Unit1.pas *
 Changed encoding from ISO-8859-1 to UTF-8
 Replaced unit "Windows" with "LCLIntf, LCLType, LMessages" in uses section.

Now, the strange thing is that I see 'Ïðèâ³ò!' in Lazarus editor always, with or without the conversion. It looks like a bug. I think the string should be converted.
When I open the file with another editor (KWrite) I see 'Привіт!' which is the correct string.

About how the code in Lazarus detects the file's encoding?  It scans the file contents up to some length and guesses.
I didn't write the code and I don't even understand it. You must study the code yourself. See GuessEncoding function.
I used it in Delphi converter but it may well have bugs.

The syntax:
  Caption = #1055#1088#1080#1074#1110#1090'!'
is not really a character encoding but .DFM form file's specific way to represent WideStrings.
You may want to see an old thread about the same topic:
 http://lazarus.freepascal.org/index.php/topic,9045
Before that I didn't even know such syntax existed.

Somebody with more knowledge about character encodings could tell what can be expected from conversion.

Juha

avra

  • Hero Member
  • *****
  • Posts: 1753
    • Additional info
Re: Using 'Tools->Convert encoding of projects' option
« Reply #9 on: November 14, 2011, 11:43:18 am »
Now, the strange thing is that I see 'Ïðèâ³ò!' in Lazarus editor always, with or without the conversion. It looks like a bug. I think the string should be converted.
When I open the file with another editor (KWrite) I see 'Привіт!' which is the correct string.
What font do you use in Lazarus editor? Is it capable of showing cyrillic characters? What happens when you use the same font in KWrite and Lazarus?
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3672
  • I like bugs.
Re: Using 'Tools->Convert encoding of projects' option
« Reply #10 on: November 15, 2011, 01:51:56 pm »
What font do you use in Lazarus editor? Is it capable of showing cyrillic characters? What happens when you use the same font in KWrite and Lazarus?

The font makes no difference. I tested by setting the font to the same "Monospace" in both KWrite and Lazarus.
Even if I save the file from one editor (KWrite or Lazarus) they still show the same difference.
So, clearly they show the exact same data in a different way. KWrite shows it right so there is a bug in Lazarus somewhere.

I still wonder what the char encoding converter does.
It claimed it converted from ISO-8859-1 to UTF-8 but it does not show visually.

Someone should make a bug report.

Juha