Recent

Author Topic: new AnsiString question  (Read 43512 times)

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: new AnsiString question
« Reply #45 on: March 25, 2016, 05:20:58 am »
IMHO:

Lazrus 1.4- use UTF8-string(at least the source editor used) and ANSI-RTL, so we wrote
Code: Pascal  [Select][+][-]
  1. Image.LoadFromFile(UTF8ToSys(v));

We know that "A" type API used and they have many problems.

Lazrus 1.6 is getting UNICODE-RTL, so we can write
I don't need lazarus 1.6 for that I can do it with 1.4 and fpc 3.
Code: Pascal  [Select][+][-]
  1. Image.LoadFromFile(v);

This is awesome!
Yeah utf8 had more problems than it solved on 1.4 then again I have my own filestream based on the W api calls from lazarus 1.0 and did care much for that.
We know that "W" type API used and they solve the problems.
But we cannot say that Lazarus 1.6 remains UTF8-string.
No, lazarus 1.6 enforces utf8 strings in my libraries. Its not acceptable sorry.
It may be ACP-string at least at compile time.
So we have the new problems that are Lazarus 1.4- compatibility at string constant.

NOTE:
I love Lazarus 1.6 so much. It is better than I expected except above problems.
You might think so but for me lazarus was a bit hard to accept from the start, after the bone headed move to enforce utf8 in all string types I lost any confidence in the developing team, sorry If I keep on using lcl in the future I will be far better to fork 1.4 remove all annoyances and keep away from the "new" lcl.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

malcome

  • Jr. Member
  • **
  • Posts: 81
Re: new AnsiString question
« Reply #46 on: March 25, 2016, 06:49:08 am »
We know that "W" type API used and they solve the problems.
But we cannot say that Lazarus 1.6 remains UTF8-string.
No, lazarus 1.6 enforces utf8 strings in my libraries. Its not acceptable sorry.
It may be ACP-string at least at compile time.
So we have the new problems that are Lazarus 1.4- compatibility at string constant.

NOTE:
I love Lazarus 1.6 so much. It is better than I expected except above problems.
You might think so but for me lazarus was a bit hard to accept from the start, after the bone headed move to enforce utf8 in all string types I lost any confidence in the developing team, sorry If I keep on using lcl in the future I will be far better to fork 1.4 remove all annoyances and keep away from the "new" lcl.

You mean Lazarus 1.4 is RawByte-string, not UTF8-string? If so, you may be correct.
May the enforce be with you. ;)
« Last Edit: March 25, 2016, 07:02:46 am by malcome »

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: new AnsiString question
« Reply #47 on: March 25, 2016, 08:09:22 am »

You mean Lazarus 1.4 is RawByte-string, not UTF8-string? If so, you may be correct.
no lazarus 1.4 string is cp65001 (not sure if the number is correct) or something along those lines which is the ansi code page for utf8.

May the enforce be with you. ;)
lol, thank you have fun too.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4631
  • I like bugs.
Re: new AnsiString question
« Reply #48 on: March 25, 2016, 08:23:09 am »
But we cannot say that Lazarus 1.6 remains UTF8-string.
It may be ACP-string at least at compile time.
So we have the new problems that are Lazarus 1.4- compatibility at string constant.

I thought this was explained already. Just use normal constants and "String" type variables and everything works.
You had UnicodeString constants and variables which created problems. Do not use them and everything works.
As I wrote earlier this is very compatible with Delphi at source level despite the different encoding.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4631
  • I like bugs.
Re: new AnsiString question
« Reply #49 on: March 25, 2016, 08:54:01 am »
LCL in Lazarus 1.4.x was not compatible with Delphi because it required explicit Unicode conversion calls in many places and the ...UTF8() file functions.
Now we are much closer to Delphi compatibility.

No, lazarus 1.6 enforces utf8 strings in my libraries. Its not acceptable sorry.

Delphi enforces UTF-16 strings. Why is it more acceptable?

Quote
You might think so but for me lazarus was a bit hard to accept from the start, after the bone headed move to enforce utf8 in all string types I lost any confidence in the developing team,

You have continued that same theme for a long time already. Maybe it is time to change, it feels unjust considering all the improvements done and all the effort used.
DisableUTF8RTL mode was considered important partly because of your feedback. Many bugs related to it were fixed.
Yet you haven't commented about it after the release. Does it work? If not, what exactly happens? Did you try it at all?
I think the "development team" has been very co-operative. How about you?

Quote
sorry If I keep on using lcl in the future I will be far better to fork 1.4 remove all annoyances and keep away from the "new" lcl.

It is perfectly ok to do that.
Just please stop this empty whining about the current system which is a big improvement for most people.

[Edit]
I am thinking of the essential reason for complaints against the new Unicode support.
In most cases it is that other string types like UnicodeString and UTF8String cause trouble.
Ok, the cool part is that they are not really needed!
You can use plain "String" without any conversion functions! Maybe it feels too good to be true and has not yet sinked into people's consciousness.
No explicit conversions! None! Zero! Comprende?

As an extra bonus the code becomes Delphi compatible at source level when using plain "String".
Only 2 things are not Delphi compatible:
 1. Reading / writing data with Ansi codepage.
 2. Dealing with individual codepoints beyond 7-bit ASCII area.
Both cases can be solved with few lines of code.
This page was updated recently. Please read again.
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus
« Last Edit: March 25, 2016, 09:19:05 am by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: new AnsiString question
« Reply #50 on: March 25, 2016, 09:29:05 am »
LCL in Lazarus 1.4.x was not compatible with Delphi because it required explicit Unicode conversion calls in many places and the ...UTF8() file functions.
Now we are much closer to Delphi compatibility.
No some aspects of it are closer others wend the other way, the utf8 move was one of the backward moves.
Delphi enforces UTF-16 strings. Why is it more acceptable?
No delphi uses utf16 and leaves the rest of the types as is ee it does not enforce anything. I can't dictate what you use, you can't dictate what I use, you crossed the line.
You have continued that same theme for a long time already. Maybe it is time to change, it feels unjust considering all the improvements done and all the effort used.
I also believe that it is unjust that I have to address this as well and yes it is time to move on.
DisableUTF8RTL mode was considered important partly because of your feedback. Many bugs related to it were fixed.
disableutf8rtl is a patch at best, although as things are now its the only thing I can use if the need arises, I would rather not close my self in to an even smaller corner of the already minuscule pascal community.
Yet you haven't commented about it after the release. Does it work? If not, what exactly happens? Did you try it at all?
In some cases it works exceptionally well and in some corner cases it does not. This forces me in to a corner and I have to decide how to move for the future.
I think the "development team" has been very co-operative. How about you?
You did listened and you did obviously worked on the matter and for that I'm grateful. But as I said your implementation is more on the spirit of "here take this and stop bothering as" instead of proper analysis of the problem.
Quote
sorry If I keep on using lcl in the future I will be far better to fork 1.4 remove all annoyances and keep away from the "new" lcl.
It is perfectly ok to do that.
Just please stop this empty whining about the current system which is a big improvement for most people.

See "stop pestering us, go away", sure, sorry if I seem ungrateful, I'm only here posting because I see some value in your efforts but mostly out of respect for the effort of the developers, if the disableutf8rtl is only for my benefit you can remove it, I'm not going to use it.

Have fun in your future endeavors, I'll probably be dropping from time to time.
« Last Edit: March 25, 2016, 09:30:58 am by taazz »
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4631
  • I like bugs.
Re: new AnsiString question
« Reply #51 on: March 25, 2016, 10:08:43 am »
@taazz, the backwards move is that AnsiString encoding changed together with String encoding, right?
Yes, it is an unfortunate side effect, calling SetMultiByteConversionCodePage() is kind of a hack after all.
In a perfect situation AnsiString would remain as it was and only String encoding changed.
So yes, if your code uses lots of AnsiString and String and assumes different encoding, then our UTF-8 system breaks it.
A fully Delphi compatible UTF-16 solution will come later but lots of work in FPC libs is required before that as you may know.

I understood you have your own Unicode conversion functions and maybe custom types. (?)
I would actually like to study the problems and find solutions using our UTF-8 mode.
At least so far all seamingly big problems were solved miraculously.
Maybe your code could be adjusted, too. I don't know...
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

malcome

  • Jr. Member
  • **
  • Posts: 81
Re: new AnsiString question
« Reply #52 on: March 25, 2016, 10:33:03 am »
But we cannot say that Lazarus 1.6 remains UTF8-string.
It may be ACP-string at least at compile time.
So we have the new problems that are Lazarus 1.4- compatibility at string constant.

I thought this was explained already. Just use normal constants and "String" type variables and everything works.
You had UnicodeString constants and variables which created problems. Do not use them and everything works.
As I wrote earlier this is very compatible with Delphi at source level despite the different encoding.

Why do you bring it on? I do not think I make you explain again at all.
My current questions are
Q1 Does Lazarus(FPC) have the constant string order that like C++ L"ABC" or u"ABC"?
Q2 Which do you recommend, UTF8ToUTF16(v) or UnicodeString(v)?
All else is a idle talk for me.(also Q3, sorry.)

You might have answered to me already, But your comment is hard to understand for me, Because my English is poor. :'(

I understood that "Do not use UTF8String".
But I am using UnicodeString in DLL(e.g.fpCEF3) now, So I do not understand that "Do not use UnicodeString".
Of course I know a workaround now, But it does not mean that Lazarus 1.6 has not the problem.
I believe that you are going to solve the problems someday. ;)

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4631
  • I like bugs.
Re: new AnsiString question
« Reply #53 on: March 25, 2016, 11:33:46 am »
Q1 Does Lazarus(FPC) have the constant string order that like C++ L"ABC" or u"ABC"?

I am not sure what the C++ syntax means. I guess L means Ansi and u means Unicode.

Anyway, a constant in a typical Pascal source file has UTF-8 encoding because that is the file's encoding.
How the compiler interprets it is more complex, as explained here:
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#String_Literals
The most straightforward way is to assign it to a "String" type variable. Later you can assign variables of different encodings to each other and it always goes right.

Quote
Q2 Which do you recommend, UTF8ToUTF16(v) or UnicodeString(v)?

I recommend a typecast UnicodeString(v). Or, to make things still more clear, use explicit UnicodeString variable and assign to it. FPC's dynamic string encoding will then do the right thing.
Think of it, this makes your code compatible with Delphi which has String = UnicodeString. A typecast there is a no-op but does not break anything. UTF8ToUTF16() would be wrong there.
Besides, using UTF8ToUTF16() would break the earlier claim of not needing explicit conversions. :)

Quote
But I am using UnicodeString in DLL(e.g.fpCEF3) now, So I do not understand that "Do not use UnicodeString".
Of course I know a workaround now, But it does not mean that Lazarus 1.6 has not the problem.
I believe that you are going to solve the problems someday. ;)

I guess constants are the only problem. Variables have dynamic encoding and FPC always converts them when needed.
If you still must define a UnicodeString constants, the wiki page says:
  WideString/UnicodeString/UTF8String literals only work with {$codepage utf8} / -FcUTF8.
So yes, it is possible.
« Last Edit: March 25, 2016, 12:22:53 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

bytebites

  • Hero Member
  • *****
  • Posts: 756
Re: new AnsiString question
« Reply #54 on: March 25, 2016, 11:43:30 am »

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4631
  • I like bugs.
Re: new AnsiString question
« Reply #55 on: March 25, 2016, 12:31:51 pm »
String literal in C++
http://en.cppreference.com/w/cpp/language/string_literal

L"ABC" / u"ABC" mean WideString / UTF-16 constants then, which sounds like the same thing to me.
Typical Pascal constant is like u8"ABC".
« Last Edit: March 25, 2016, 12:42:57 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Michl

  • Full Member
  • ***
  • Posts: 226
Re: new AnsiString question
« Reply #56 on: March 25, 2016, 02:30:00 pm »
I understood that "Do not use UTF8String".
+ 1

A UTF8String doesn't guarantee that such a string is always UTF-8 encoded. Simple FPC program (without the LCL-hack) demonstrate it:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$codepage utf8}
  4.  
  5. type
  6.   CP1252String = type string(1252);
  7.  
  8. const
  9.   cp1252ae: CP1252String = #$E4; // ä
  10.  
  11. var
  12.   s: String;
  13.   us: UTF8String;
  14.  
  15. begin
  16.   s := cp1252ae;
  17.   WriteLn(StringCodePage(s));  // s has codepage 1252
  18.   us := cp1252ae;
  19.   WriteLn(StringCodePage(us)); // us has codepage UTF-8 (65001)
  20.   us := s;
  21.   WriteLn(StringCodePage(us)); // the UTF8String "us" has now codepage 1252 !!!
  22.   ReadLn;
  23. end.
The same code with the usage of LCL gives for the third string assigning a UTF-8 encoded string.

There are some more of such confusing behaviours, so my conclusion is to use always a "String" not a predefined type string(...) and I can work fine.

But I am using UnicodeString in DLL(e.g.fpCEF3) now, So I do not understand that "Do not use UnicodeString".
These five examples works fine for me. Example 3 and 4 are mostly the same, but you get a warning which are maybe removed in the future (http://bugs.freepascal.org/view.php?id=24103):
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. const
  3.   aURL: UnicodeString = 'http://forum.lazarus.freepascal.org/';
  4. begin
  5.   Chromium.Load(aURL);
  6. end;
  7.  
  8. procedure TForm1.Button2Click(Sender: TObject);
  9. var
  10.   s: String;
  11. begin
  12.   s := 'http://www.lazarus-ide.org/';
  13.   Chromium.Load(UTF8Decode(s));
  14. end;
  15.  
  16. procedure TForm1.Button3Click(Sender: TObject);
  17. var
  18.   s: String;
  19. begin
  20.   s := 'http://bugs.freepascal.org/view_all_bug_page.php/';
  21.   Chromium.Load(UnicodeString(s));
  22. end;
  23.  
  24. procedure TForm1.Button4Click(Sender: TObject);
  25. var
  26.   s: String;
  27.   us: UnicodeString;
  28. begin
  29.   s := 'http://free-pascal-lazarus.989080.n3.nabble.com/';
  30.   us := s; // Here is a warning
  31.   Chromium.Load(us);
  32. end;
  33.  
  34. procedure TForm1.Button5Click(Sender: TObject);
  35. var
  36.   s: String;
  37.   wc: PWideChar;
  38. begin
  39.   s := 'http://wiki.freepascal.org/';
  40.   wc := GetMem(length(s) * 2 + 2);
  41.   Utf8ToUnicode(PWideChar(wc), Length(s) + 1, PChar(s), Length(s) + 1);
  42.   Chromium.Load(wc);
  43. end;
I've never see code which use Utf8ToUnicode. I've searched in Lazarus itself and find only 2 units where such method is used. So I don't prefer that.

In my two projects where I use fpCEF, I use UTF8Decode. It is a relict from Lazarus 1.4. It works also in Lazarus 1.6, but it isn't needed anymore cause of the code magic of FPC 3+. I would vote for the simple typecast UnicodeString(SomeStringVar) to hide the warning.
« Last Edit: March 25, 2016, 03:00:12 pm by Michl »
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4631
  • I like bugs.
Re: new AnsiString question
« Reply #57 on: March 25, 2016, 05:02:26 pm »
... sorry If I keep on using lcl in the future I will be far better to fork 1.4 remove all annoyances and keep away from the "new" lcl.

Actually you don't need to fork anything because compilation with FPC 2.6.4 will be supported for a long time to come.
Usually 2 past compiler versions are supported but now it makes sense to extend that, it is relatively easy after all.
It is a perfectly valid way to go in your case.
It means you can enjoy improvements in LCL and Lazarus IDE while still using string types of FPC 2.6.4.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Bart

  • Hero Member
  • *****
  • Posts: 5614
    • Bart en Mariska's Webstek
Re: new AnsiString question
« Reply #58 on: March 25, 2016, 06:00:51 pm »
... FPC 2.6.4 will be supported for a long time to come.
Usually 2 past compiler versions are supported but now it makes sense to extend that, it is relatively easy after all.

Why change that policy?
It's just going to cloud are codebase with even more {$ifdef}'s and workarounds for things that are already fixed in stable fpc's > 2 minor releases ago..
We have to move on in the same direction as the fpc development.

Bart

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4631
  • I like bugs.
Re: new AnsiString question
« Reply #59 on: March 25, 2016, 10:17:25 pm »
Why change that policy?

Because our Unicode change is big and serious and is not backwards compatible.
Especially loosing the Ansi codepage from AnsiString is quite bad for certain code as taazz has reminded many times.
DisableUTF8RTL with FPC 3.x also changes things compared to FPC 2.x due to dynamic string encoding.
Continuing to use FPC 2.6.4 is a valid option for many people, I believe.

Quote
It's just going to cloud are codebase with even more {$ifdef}'s and workarounds for things that are already fixed in stable fpc's > 2 minor releases ago..

There are not so many IFDEFs and I don't see any urgent need for FPC 3.x features in the IDE's core. Packages in Lazarus repo can freely depend on FPC 3.x if they are not installed by default. That includes FreeSparta packages which will use some fancy new language features.
New planned features like an online package manager will be a package itself and can use FPC 3.0.

I searched Lazarus sources which has about 2000000 LOC, there are less than 100 IFDEFs testing FPC_FULLVERSION for FPC 2.7 or 3.0. Not bad.
There are some IFDEFs testing for older FPC versions, they could be removed.

Quote
We have to move on in the same direction as the fpc development.

FPC development direction is Delphi compatible UTF-16 based system where String = UnicodeString.
Our current Unicode system goes against that direction. It is a hack although an exceptionally good one.

I think we should support FPC 2.6.4 as long as the UTF-16 based system is ready. It requires big changes in FPC's libs and may take a long time. We can re-evaluate the situation after a year or two.
Anyway, I don't see pressure to break FPC 2.6.4 compilation any time soon.
« Last Edit: March 26, 2016, 12:38:58 am by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

 

TinyPortal © 2005-2018