Recent

Author Topic: new AnsiString question  (Read 42331 times)

Zoran

  • Hero Member
  • *****
  • Posts: 1910
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: new AnsiString question
« Reply #30 on: March 24, 2016, 07:37:31 am »
Q1 Does Lazarus(FPC) have the constant string order that like C++ L"ABC" or u"ABC"?
Q2 Which do you recommend, UTF8ToUTF16(v) or UnicodeString(v)?

Q1: I don't understand, I don't know what it means.

Q2:
See the comment in LazUTF8 code, above UTF8ToUTF16 procedure:

Quote
  Converts the specified UTF-8 encoded string to UTF-16 encoded (system endian)
  Avoid copying the result string since on windows a widestring requires a full
  copy

Probably it means that this is more efficient, as it avoids copying the result string.

However, I recomend that you use this conversions only if you need to save text to unicode utf-16 encoded file. For everything else, do not convert, but use "String" at treat it as utf8. Then everything works perfectly.

Edit: corrected my comment to be more precise - utf-16 instead of unicode.
« Last Edit: March 24, 2016, 09:10:08 am by Zoran »
Swan, ZX Spectrum emulator https://github.com/zoran-vucenovic/swan

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: new AnsiString question
« Reply #31 on: March 24, 2016, 08:30:28 am »
Probably it means that this is more efficient, as it avoids copying the result string.
I often read in forum, on mailinglist and in code snippets that people write "WideString" when they actually should write "UnicodeString". Sometimes it is intended because some people don't like the typename "UnicodeString" because they use "String" in a system where the default encoding is utf-8 so in their opinion their 8 bit "String" is Unicode too and a 16 bit string should not be privileged by having "Unicode" in it's name. ;-)
But take care, "WideString" <> "UnicodeString" on Windows. "WideString" is a not reference counted 16 bit OLE-string which normally has a lesser performance than the reference counted 16 bit "UnicodeString".
So always check when you read "WideString" if not actually "UnicodeString" has been meant.

Zoran

  • Hero Member
  • *****
  • Posts: 1910
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: new AnsiString question
« Reply #32 on: March 24, 2016, 09:07:30 am »
Probably it means that this is more efficient, as it avoids copying the result string.
I often read in forum, on mailinglist and in code snippets that people write "WideString" when they actually should write "UnicodeString". Sometimes it is intended because some people don't like the typename "UnicodeString" because they use "String" in a system where the default encoding is utf-8 so in their opinion their 8 bit "String" is Unicode too and a 16 bit string should not be privileged by having "Unicode" in it's name. ;-)
But take care, "WideString" <> "UnicodeString" on Windows. "WideString" is a not reference counted 16 bit OLE-string which normally has a lesser performance than the reference counted 16 bit "UnicodeString".
So always check when you read "WideString" if not actually "UnicodeString" has been meant.

Good point, Martin. But I don't thing that in either of functions for which Malcome asked, "WideString" is used.
Swan, ZX Spectrum emulator https://github.com/zoran-vucenovic/swan

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: new AnsiString question
« Reply #33 on: March 24, 2016, 11:01:37 am »
But I don't thing that in either of functions for which Malcome asked, "WideString" is used.
Then the comment
Quote
Avoid copying the result string since on windows a widestring requires a full
  copy
is wrong in case of "UnicodeString".

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: new AnsiString question
« Reply #34 on: March 24, 2016, 12:06:34 pm »
I have new questions.
Q1 Does Lazarus(FPC) have the constant string order that like C++ L"ABC" or u"ABC"?
As far as I know no there is nothing like that in pascal. Try using typed constants instead and see if that helps eg
Code: Delphi  [Select][+][-]
  1. const
  2.   sTest1 : UnicodeString = 'Your Test string';
  3.   sTest2 : String = 'Your other Test String';
  4.  

Q2 Which do you recommend, UTF8ToUTF16(v) or UnicodeString(v)?
I would use the explicit one ee utf8toutf16, UnicodeString(v) is a type cast not a conversion. Then again lcl proved my wrong already by redefining the explicit types so what do I know.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4565
  • I like bugs.
Re: new AnsiString question
« Reply #35 on: March 24, 2016, 12:09:38 pm »
I often read in forum, on mailinglist and in code snippets that people write "WideString" when they actually should write "UnicodeString".

Yes, Marco reminded about that in earlier discussion and I updated the Better Unicode Support page accordingly.
The page has got lots of attention and is in good shape. Please check everybody.

In general, explicit conversion functions are not needed anywhere. Really!
1. When reading / writing ansi codepage data, string encoding can be changed using SetCodePage().
2. When calling "W" WinAPI functions, parameters and return values can be assigned to/from UnicodeString variables, or typecasted to UnicodeString().

The only thing that requires explicit conversion is an old ansi WinAPI function, but they are deprecated now in a Unicode aware system.

Malcome is asking about calling ShowMessage() with a UnicodeString parameter.
The answer is: don't do it. You just create artificial problems by doing so.
Use UnicodeString only for WinAPI function parameters as explained above.

Note also that using "String" type everywhere is the most Delphi compatible way at source level.
Our new Unicode aware code can typically be copied to Delphi and it works as-is (including the WinAPI calls with UnicodeString variables / typecasts).
« Last Edit: March 24, 2016, 05:46:39 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4565
  • I like bugs.
Re: new AnsiString question
« Reply #36 on: March 24, 2016, 12:16:59 pm »
Code: Delphi  [Select][+][-]
  1. const
  2.   sTest1 : UnicodeString = 'Your Test string';

Just don't do that and everything is fine.
« Last Edit: March 24, 2016, 01:42:02 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

loopbreaker

  • New Member
  • *
  • Posts: 32
Re: new AnsiString question
« Reply #37 on: March 24, 2016, 09:18:25 pm »
Juha,
you are repeatedly spreading the bad practice with the String (=ansistring). The generic String has not the same meaning everywhere (in new Delphi is UnicodeString, in old Delphi acp-ansistring, in Lazarus also utf8-ansistring). Modules (thirdparty) with different meanings of String cannot be simply combined, errors would occur (due to missing or wrong conversions). Not only compiletime is the problem, the meaning of String (ansistring) can change at runtime by assignments or calls of SetMultiByteConversionCodePage.

This is also the reason, why professional devs, like in Synopse mORMot,
use their own types, with codepage fixed at compiletime, or others use only UnicodeString (String in new Delphi).

The correct solution is UnicodeString and Utf8String. Or better, use shorter aliases for them.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4565
  • I like bugs.
Re: new AnsiString question
« Reply #38 on: March 24, 2016, 11:27:27 pm »
you are repeatedly spreading the bad practice with the String (=ansistring). The generic String has not the same meaning everywhere (in new Delphi is UnicodeString, in old Delphi acp-ansistring, in Lazarus also utf8-ansistring).

... and why is the "utf8-ansistring" meaning a bad practice but the others are not?
Earlier I wrote that "String" type is Delphi compatible at source level. Yes, it is amazingly compatible considering the different encodings.

Quote
Modules (thirdparty) with different meanings of String cannot be simply combined, errors would occur (due to missing or wrong conversions).

Those modules must match the string type used in LCL / LazUtils obviously.
They can also continue to use the old Ansi codepage encoding + explicit UTF-8 conversion functions by defining DisableUTF8RTL and forcing their applications to define it, too.
We created the DisableUTF8RTL system so that everybody can be happy and nobody has reasons to complain. Still some people complain. Why? I believe they did not actually test the system and they misunderstood something.

Quote
Not only compiletime is the problem, the meaning of String (ansistring) can change at runtime by assignments or calls of SetMultiByteConversionCodePage.

Our UTF-8 system calls SetMultiByteConversionCodePage. Obviously it should not be called again.
Yet, if somebody wants to call it and create his own non-standard string encoding, he can define DisableUTF8RTL and the system is pretty much like it used to be.

Quote
This is also the reason, why professional devs, like in Synopse mORMot,
use their own types, with codepage fixed at compiletime, or others use only UnicodeString (String in new Delphi).

If mORMot uses an own string type, it should be easy to map to our "String" or anything else. What is the problem?
If they have real problems when porting, we can help.

Quote
The correct solution is UnicodeString and Utf8String. Or better, use shorter aliases for them.

I have seen this suggestion before but I don't quite understand how anybody can propose it seriously. It would require every developer to change all their string types in all their code.
Not very realistic.
Do you remember how much effort Delphi made to preserve the backwards compatible "String" when they switched to UTF-16? They broke lots of code but still the changes were reasonable compared to changing every "String" to something else.
Our system breaks much less code yet it is amazingly Delphi compatible. What more, it is a huge improvement over the old UTF-8 hack with explicit conversions functions.

My guess is that you did not even try the new Lazarus UTF-8 system before complaining. Please try!
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

malcome

  • Jr. Member
  • **
  • Posts: 81
Re: new AnsiString question
« Reply #39 on: March 25, 2016, 01:43:04 am »
Q3 Which is more important to you, Delphi compatible or Lazarus1.4- compatible?

My questions are part of that actually facing troubles converting source from 1.4 to 1.6.
« Last Edit: March 25, 2016, 01:46:57 am by malcome »

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: new AnsiString question
« Reply #40 on: March 25, 2016, 01:44:28 am »
Q3 Which is more important to you, Delphi compatible or Lazarus1.4- compatible?
lazarus 1.4 is delphi compatible
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

malcome

  • Jr. Member
  • **
  • Posts: 81
Re: new AnsiString question
« Reply #41 on: March 25, 2016, 02:01:38 am »
Q3 Which is more important to you, Delphi compatible or Lazarus1.4- compatible?
lazarus 1.4 is delphi compatible
You mean the old Delphi(Delphi7)?

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: new AnsiString question
« Reply #42 on: March 25, 2016, 02:39:48 am »
Q3 Which is more important to you, Delphi compatible or Lazarus1.4- compatible?
lazarus 1.4 is delphi compatible
You mean the old Delphi(Delphi7)?
No, I mean the new delphi, 2007 and 2009 up to an extend. IT has some minor differences like String = ansistring instead of unicodestring but all in all it is as compatible as 1.6 if not more.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

malcome

  • Jr. Member
  • **
  • Posts: 81
Re: new AnsiString question
« Reply #43 on: March 25, 2016, 03:19:59 am »
Q3 Which is more important to you, Delphi compatible or Lazarus1.4- compatible?
lazarus 1.4 is delphi compatible
You mean the old Delphi(Delphi7)?
No, I mean the new delphi, 2007 and 2009 up to an extend. IT has some minor differences like String = ansistring instead of unicodestring but all in all it is as compatible as 1.6 if not more.

Delphi 2007- may be the old Delphi, and 2009+ may be the new Delphi.
« Last Edit: March 25, 2016, 03:23:41 am by malcome »

malcome

  • Jr. Member
  • **
  • Posts: 81
Re: new AnsiString question
« Reply #44 on: March 25, 2016, 04:04:58 am »
IMHO:

Lazrus 1.4- use UTF8-string(at least the source editor used) and ANSI-RTL, so we wrote
Code: Pascal  [Select][+][-]
  1. Image.LoadFromFile(UTF8ToSys(v));

We know that "A" type API used and they have many problems.

Lazrus 1.6 is getting UNICODE-RTL, so we can write

Code: Pascal  [Select][+][-]
  1. Image.LoadFromFile(v);

This is awesome!
We know that "W" type API used and they solve the problems.
But we cannot say that Lazarus 1.6 remains UTF8-string.
It may be ACP-string at least at compile time.
So we have the new problems that are Lazarus 1.4- compatibility at string constant.

NOTE:
I love Lazarus 1.6 so much. It is better than I expected except above problems.

ADD:
ACP-string may be for FPC 2.6 compatibility, not for Lazarus 1.4 compatibility.
« Last Edit: March 25, 2016, 07:58:10 am by malcome »

 

TinyPortal © 2005-2018