Recent

Author Topic: new AnsiString question  (Read 40186 times)

malcome

  • Jr. Member
  • **
  • Posts: 80
Re: new AnsiString question
« Reply #60 on: March 26, 2016, 02:23:54 am »
Thank you guys.
I will reply little by little, Because my processer is single task.


As I wrote earlier this is very compatible with Delphi at source level despite the different encoding.

What does your "Delphi compatible at source level" means?

Code: Pascal  [Select][+][-]
  1. const
  2.   cs: UnicodeString = 'あいうえお';
  3. var
  4.   ws: UnicodeString;
  5. begin
  6.   ws:= 'あいうえお';
  7.   ShowMessage(ws);
  8.   ShowMessage(cs);

The new Delphi must be working correctly above code. I do not have the new Delphi, though.
Is this "Delphi compatible at source level"? If yes, I don't care about "Delphi compatible at source level".

Of course, I know below code is working correctly both the new Delphi and Lazarus 1.6, But It does not mean that Lazarus 1.6 has not the problems.

Code: Pascal  [Select][+][-]
  1. ShowMessage('あいうえお');


If you still must define a UnicodeString constants, the wiki page says:
  WideString/UnicodeString/UTF8String literals only work with {$codepage utf8} / -FcUTF8.
So yes, it is possible.

I thought to report that to Japanese users I recommend to add "{$codepage utf8}" in your source for your safe.
But I have not decided yet.

ADD:
if the new compiler has {$DisableAutoStringConvert}, I decide to recommend this.;-

« Last Edit: March 26, 2016, 03:15:28 am by malcome »

malcome

  • Jr. Member
  • **
  • Posts: 80
Re: new AnsiString question
« Reply #61 on: March 26, 2016, 06:38:50 am »
In my two projects where I use fpCEF, I use UTF8Decode. It is a relict from Lazarus 1.4. It works also in Lazarus 1.6, but it isn't needed anymore cause of the code magic of FPC 3+. I would vote for the simple typecast UnicodeString(SomeStringVar) to hide the warning.

Yes, UTF8Decode and UTF8Encode!
I use them in Lazarus 1.4 too. But I thought that I had to use UnicodeString(v) in Lazarus 1.6. Because I saw that http://wiki.lazarus.freepascal.org/Better_Unicode_Support_in_Lazarus#RTL_with_default_codepage_UTF-8 .
Quote
Many UTF8Encode and UTF8Decode calls are no longer needed, because when assigning UnicodeString to String and vice versus the compiler does it automatically for you.

Probably UTF8Decode and UTF8Encode are best even now, however.

Lazarus 1.4 sample code:
Code: Pascal  [Select][+][-]
  1. procedure SampleFuncInDLL(pws: PWideChar);
  2. begin
  3.   ShowMessage(UTF8Encode(UnicodeString(pws)));
  4. end;
  5.  
  6. procedure TForm1.Button1Click(Sender: TObject);
  7. begin
  8.   SampleFuncInDLL(PWideChar(UTF8Decode('あいうえお'))); // good
  9. end;

Lazarus 1.6 sample code:

Code: Pascal  [Select][+][-]
  1. procedure SampleFuncInDLL(pws: PWideChar);
  2. begin
  3.   ShowMessage(UTF8Encode(UnicodeString(pws)));
  4. end;
  5.  
  6. procedure TForm1.Button1Click(Sender: TObject);
  7. begin
  8.   SampleFuncInDLL('あいうえお'); // fail
  9.  
  10.   //SampleFuncInDLL(u'あいうえお'); // cool!! but we cannot do now...
  11.  
  12.   SampleFuncInDLL(UnicodeString('あいうえお')); // fail
  13.  
  14.   SampleFuncInDLL(PWideChar(UTF8Decode('あいうえお'))); // good
  15.  
  16.   SampleFuncInDLL(PWideChar(LazUTF8.UTF8ToUTF16('あいうえお'))); // good
  17. end;

We do not have to change the code!
« Last Edit: March 26, 2016, 10:35:08 am by malcome »

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: new AnsiString question
« Reply #62 on: March 26, 2016, 10:38:07 am »
What does your "Delphi compatible at source level" means?

I may have explained the idea inaccurately. I meant it is Delphi compatible at source level when you use plain "String" type everywhere, and define string constants without an explicit type.
That's why I wrote: don't use UnicodeString.
More accurately: use UnicodeString only when it is absolutely necessary. It means library APIs, in your case the fpCEF API.
Again: use UnicodeString only as a glue type for library API parameters and return values, either by typecasting or by using a variable.

Then everything works like magic and you don't need to care about encodings much.

Quote
Of course, I know below code is working correctly both the new Delphi and Lazarus 1.6, But It does not mean that Lazarus 1.6 has not the problems.

Lazarus 1.6 has no problems if you know what you are doing. :)

Quote
I thought to report that to Japanese users I recommend to add "{$codepage utf8}" in your source for your safe.
But I have not decided yet.
if the new compiler has {$DisableAutoStringConvert}, I decide to recommend this.;-

That can be achieved by using FPC 2.6.4, but it is not a valid choice in your case. You apparently use Unicode everywhere and the automatic dynamic encoding is a perfect match for you.

In my two projects where I use fpCEF, I use UTF8Decode. It is a relict from Lazarus 1.4. It works also in Lazarus 1.6, but it isn't needed anymore cause of the code magic of FPC 3+. I would vote for the simple typecast UnicodeString(SomeStringVar) to hide the warning.

Yes, UTF8Decode and UTF8Encode!
I use them in Lazarus 1.4 too. But I thought that I had to use UnicodeString(v) in Lazarus 1.6. Because I saw that http://wiki.lazarus.freepascal.org/Better_Unicode_Support_in_Lazarus#RTL_with_default_codepage_UTF-8 .
Probably UTF8Decode and UTF8Encode are best even now, however.

No. As Michl wrote, those explicit conversion functions are not needed any more. Besides, UTF8Decode and UTF8Encode would not work in Delphi (in case you must support it).
You seem to be confused by the whole encoding issue which is understandable. You are also a support person for Japanese users, right? Thus I recommend you to keep things simple:

1. Use UnicodeString only when necessary as a typecast or a variable.
2. Use plain "String" type everywhere else.
3. Define string constants without an explicit type.

If unsure about how a string constant is interpreted, assign it first to a "String" variable. Between variables the automatic conversion goes always right.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

malcome

  • Jr. Member
  • **
  • Posts: 80
Re: new AnsiString question
« Reply #63 on: March 26, 2016, 11:54:41 am »
What does your "Delphi compatible at source level" means?

I may have explained the idea inaccurately. I meant it is Delphi compatible at source level when you use plain "String" type everywhere, and define string constants without an explicit type.
That's why I wrote: don't use UnicodeString.
More accurately: use UnicodeString only when it is absolutely necessary. It means library APIs, in your case the fpCEF API.
Again: use UnicodeString only as a glue type for library API parameters and return values, either by typecasting or by using a variable.

Then everything works like magic and you don't need to care about encodings much.

Quote
Of course, I know below code is working correctly both the new Delphi and Lazarus 1.6, But It does not mean that Lazarus 1.6 has not the problems.

Lazarus 1.6 has no problems if you know what you are doing. :)

Quote
I thought to report that to Japanese users I recommend to add "{$codepage utf8}" in your source for your safe.
But I have not decided yet.
if the new compiler has {$DisableAutoStringConvert}, I decide to recommend this.;-

That can be achieved by using FPC 2.6.4, but it is not a valid choice in your case. You apparently use Unicode everywhere and the automatic dynamic encoding is a perfect match for you.

In my two projects where I use fpCEF, I use UTF8Decode. It is a relict from Lazarus 1.4. It works also in Lazarus 1.6, but it isn't needed anymore cause of the code magic of FPC 3+. I would vote for the simple typecast UnicodeString(SomeStringVar) to hide the warning.

Yes, UTF8Decode and UTF8Encode!
I use them in Lazarus 1.4 too. But I thought that I had to use UnicodeString(v) in Lazarus 1.6. Because I saw that http://wiki.lazarus.freepascal.org/Better_Unicode_Support_in_Lazarus#RTL_with_default_codepage_UTF-8 .
Probably UTF8Decode and UTF8Encode are best even now, however.

No. As Michl wrote, those explicit conversion functions are not needed any more. Besides, UTF8Decode and UTF8Encode would not work in Delphi (in case you must support it).
You seem to be confused by the whole encoding issue which is understandable. You are also a support person for Japanese users, right? Thus I recommend you to keep things simple:

1. Use UnicodeString only when necessary as a typecast or a variable.
2. Use plain "String" type everywhere else.
3. Define string constants without an explicit type.

If unsure about how a string constant is interpreted, assign it first to a "String" variable. Between variables the automatic conversion goes always right.

Sorry, I feel your comment that "Please read the manual and wiki carefully in detail before driving, We shall not be responsible for any loss, damages and troubles."
Or old car(=Lazrus 1.4) does not restrict the freedom of driving, But new car(=Lazarus 1.6) restrict it to confusing another car(=Delphi) compatibility.
Maybe it's just me. ;)
I will wait for Michl.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: new AnsiString question
« Reply #64 on: March 26, 2016, 12:22:42 pm »
@Michl

Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$codepage utf8}
  4.  
  5. type
  6.   CP1252String = type string(1252);
  7.  
  8. const
  9.   cp1252ae: CP1252String = #$E4; // ä
  10. ...
  11.   s := cp1252ae;
  12.   WriteLn(StringCodePage(s));  // s has codepage 1252
Looking at the generates assembly:

s has codepage DefaultSystemCodePage (which happened to be 1252 on your system)
To my surprise it has nothing to do with the 1252 in the CP1252String type declation:
Code: Pascal  [Select][+][-]
  1. type
  2.   CP1252String = type string(1252);

When a constant is assigned to a string variable, FPC uses fpc_ansistr_to_ansistr
When a string is assigned to another string, FPC uses fpc_ansistr_assign

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: new AnsiString question
« Reply #65 on: March 26, 2016, 01:12:45 pm »
Sorry, I feel your comment that "Please read the manual and wiki carefully in detail before driving, We shall not be responsible for any loss, damages and troubles."

Excuse me, in the last post I did not ask you to read manual. Instead I told you to use "String" type as much as possible. I don't know why you refuse to understand it.

Quote
Or old car(=Lazrus 1.4) does not restrict the freedom of driving, But new car(=Lazarus 1.6) restrict it to confusing another car(=Delphi) compatibility.

Both FPC and LCL have been advertised as "Delphi compatible". In Lazarus 1.4 the LCL was not very Delphi compatible because it needed explicit conversion functions for Unicode. In Lazarus 1.6 the LCL is much more Delphi compatible when using plain "String" type.

Please remember how much Delphi tried to keep "String" type backwards compatible when switching to UTF-16 in Delphi 2009. They did not succeed very well, the character size changed and lots of (streaming etc.) code broke.
Our Unicode system does not break as much code, yet "String" is also amazingly Delphi compatible. I don't know why you see that as a bad thing.

If you mean the dynamic string encoding of FPC 3.0 is bad, then please feel free to use FPC 2.6.4. Compilation with it will be supported for some time as I wrote earlier.

Quote
Maybe it's just me. ;)
I will wait for Michl.

I also think it is just you. I hope Michl can help you.
« Last Edit: March 26, 2016, 01:26:21 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Michl

  • Full Member
  • ***
  • Posts: 226
Re: new AnsiString question
« Reply #66 on: March 26, 2016, 01:53:57 pm »
I will wait for Michl.
Don't wait for me, I've never used Delphi, so I can't say anything about compatibility. I'm also not a theory expert. I've just made dozens of tests to find the best solution for me.

We do not have to change the code!
If it works for you, how it is, why would you change the code?

You can also use a third variant for your problem (that one, what Juha wrote before), e.g.
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var
  3.   us: UnicodeString;
  4.   s: String;
  5. begin
  6.   s := 'あいうえお';
  7.   us := s;                 // or
  8.   us := UnicodeString(s);  // this hide the warning
  9.   SampleFuncInDLL(PWideChar(us));
  10. end;  
It's up to you.

When a constant is assigned to a string variable, FPC uses fpc_ansistr_to_ansistr
When a string is assigned to another string, FPC uses fpc_ansistr_assign
Yes, I know it. It is also documented http://wiki.freepascal.org/FPC_Unicode_support#Dynamic_code_page. It's not easy to understand all the consequences of the sentences written there. Usually refers to them only, when you have practical problems with the cast. I've made some bugreports about such, for me, not clear behaviours. Mostly all of them are closed as "No change required", with the valid argument "Delphi behaves the same".

My way is to use "Strings" all the time. All my real projects uses the LCL, so all my strings have the codepage 65001 (UTF-8). I only convert the strings in the moment, when I got a string or send one outside of my application (same as Juha frequently wrote).
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: new AnsiString question
« Reply #67 on: March 26, 2016, 02:13:15 pm »
My way is to use "Strings" all the time. All my real projects uses the LCL, so all my strings have the codepage 65001 (UTF-8). I only convert the strings in the moment, when I got a string or send one outside of my application (same as Juha frequently wrote).

Attention malcome! :)

I must say I am a little frustrated because malcome uses only Unicode and his problems can be easily solved by using that principle.
When I explain it to him, he says I take away his freedom and then copies some more messy conversion code for us to look at. Uhhh...

Code that depends heavily on Windows Ansi codepage has much bigger problems, as taazz has told in his messages. I honestly would like to study his code and find solutions, but I guess it is not possible.
For such code FPC 2.6.4 may be the best option. Compilation with it will be supported for longer than usually.
« Last Edit: March 26, 2016, 02:17:17 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

malcome

  • Jr. Member
  • **
  • Posts: 80
Re: new AnsiString question
« Reply #68 on: March 26, 2016, 03:00:45 pm »
My way is to use "Strings" all the time. All my real projects uses the LCL, so all my strings have the codepage 65001 (UTF-8). I only convert the strings in the moment, when I got a string or send one outside of my application (same as Juha frequently wrote).

Attention malcome! :)

I must say I am a little frustrated because malcome uses only Unicode and his problems can be easily solved by using that principle.
When I explain it to him, he says I take away his freedom and then copies some more messy conversion code for us to look at. Uhhh...

Code that depends heavily on Windows Ansi codepage has much bigger problems, as taazz has told in his messages. I honestly would like to study his code and find solutions, but I guess it is not possible.
For such code FPC 2.6.4 may be the best option. Compilation with it will be supported for longer than usually.

Perhaps we do not understand each other. Yes, I am sorry my bad English.
I do not have a problem, But just I want to report about new AnsiString to Japanese people.
So I do not understand why you say
"I am a little frustrated because malcome uses only Unicode and his problems can be easily solved by using that principle."
"I don't know why you see that as a bad thing."
"If you mean the dynamic string encoding of FPC 3.0 is bad, then please feel free to use FPC 2.6.4. Compilation with it will be supported for some time as I wrote earlier."
"You seem to be confused by the whole encoding issue which is understandable. You are also a support person for Japanese users, right?"
etc.

If I understand all, I do not ask you the questions. You seem to be confused too.
I want to report further information, so I continue the question, please do not be frustrated.
Of course, If I make you tired, Please ignore me.

My poor blog: http://malcomejapan.blogspot.jp/

« Last Edit: March 26, 2016, 03:09:29 pm by malcome »

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: new AnsiString question
« Reply #69 on: March 26, 2016, 03:27:23 pm »
I do not have a problem, But just I want to report about new AnsiString to Japanese people.

Ok, you want to explain what happens behind the scenes and why AnsiString now works with Unicode. It is a little different issue then.
Still, you should first tell to new users that "String" just works with Unicode like magic. Later you can explain why.

Quote
If I understand all, I do not ask you the questions. You seem to be confused too.

I used to be confused when this Unicode feature was planned and implemented. It is a complex topic. Now I know much more than I knew a year or two ago.

Quote
My poor blog: http://malcomejapan.blogspot.jp/

Well, I don't understand what it says. :(
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

malcome

  • Jr. Member
  • **
  • Posts: 80
Re: new AnsiString question
« Reply #70 on: March 26, 2016, 03:43:41 pm »
I will wait for Michl.
Don't wait for me, I've never used Delphi, so I can't say anything about compatibility. I'm also not a theory expert. I've just made dozens of tests to find the best solution for me.
I am sorry. I wrote to you, so just I wanted to reply from you at first. I did not understand why he reply.

We do not have to change the code!
If it works for you, how it is, why would you change the code?
I thought I had to use UnicodeString(v) in Lazarus 1.6. I thought I must not use UTF8Decode.

« Last Edit: March 26, 2016, 03:45:23 pm by malcome »

malcome

  • Jr. Member
  • **
  • Posts: 80
Re: new AnsiString question
« Reply #71 on: March 26, 2016, 04:13:53 pm »
My current opinions:

*SCBC = The String Constant Backward Compatibility Issue. Further information is above.(A man insist a programer's fault. That's one way of looking at it. Of course, You can obey his instructions.)
It may be Windows only.

*SC = String Constant

  • Lazarus 1.6 is pretty good.
  • If you add "{$codepage utf8}" in your source, SCBC has gone.
    But That takes labor. I do not want to do.
  • Do not use UTF8String.
    If you use UTF8String, You may be into debugging hell.
  • UTF8Decode and UTF8Encode are best even now. Do not use UnicodeString(v) and String(v).
    If you use UnicodeString(SC), You may be into debugging hell.
    If you use UnicodeString(v), You lose backward compatibility at source level.
    You can use LazUTF8.UTF8ToUTF16(v).
  • We cannot use cool u'ABC', u8'ABC', etc, yet.
    If we can use this, Maybe SCBC has gone. But It is not backward compatibility at source level.
  • We cannot kill the new smart string automatic conversion.
    If we can kill it, Maybe SCBC has gone.
« Last Edit: March 28, 2016, 01:13:02 am by malcome »

malcome

  • Jr. Member
  • **
  • Posts: 80
Re: new AnsiString question
« Reply #72 on: March 27, 2016, 09:12:33 am »
Thank you guys.
I updated my opinion. ;D
Your answers and hints make sense to me!
See you later! :)

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: new AnsiString question
« Reply #73 on: March 27, 2016, 11:25:00 am »
We cannot kill the new smart string automatic conversion.
If we can kill it, Maybe SCBC has gone.

Actually you can kill it for individual variables by using RawByteString type.
In general the automatic conversion of encodings is a good thing, a big step forward.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

malcome

  • Jr. Member
  • **
  • Posts: 80
Re: new AnsiString question
« Reply #74 on: March 27, 2016, 02:08:11 pm »
We cannot kill the new smart string automatic conversion.
If we can kill it, Maybe SCBC has gone.

Actually you can kill it for individual variables by using RawByteString type.
In general the automatic conversion of encodings is a good thing, a big step forward.
You have been hiding some treasure? :)
Could you please show me some your code?

 

TinyPortal © 2005-2018