Recent

Author Topic: new AnsiString question  (Read 43533 times)

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: new AnsiString question
« Reply #75 on: March 27, 2016, 02:38:51 pm »
What about {$mode lazarusunicode} where string = utf8string?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4631
  • I like bugs.
Re: new AnsiString question
« Reply #76 on: March 27, 2016, 03:25:36 pm »
You have been hiding some treasure? :)
Could you please show me some your code?

It is used in conversion functions in unit LazUTF8 and its include file WinLazUTF8.inc, and in unit LConvEncoding. All of them in LazUtils package. Please have a look.
You can also search for "RawByteString" in FPC sources. The libraries have plenty of matches.

It is needed as a typecast when setting a string codepage to match input/output data by calling SetCodePage().
See:
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#Reading_.2F_writing_text_file_with_Windows_codepage
Otherwise it is not needed much in user code if the other string types are used cleverly.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4631
  • I like bugs.
Re: new AnsiString question
« Reply #77 on: March 27, 2016, 04:00:00 pm »
What about {$mode lazarusunicode} where string = utf8string?

You must ask FPC developers if they want to add such mode.
I would like it very much because then only String would map to UTF8String but AnsiString would keep its Windows system codepage.
Changing the encoding of AnsiString is actually the only detail that makes our solution a "hack". Some code apparently depends on Windows codepage a lot.

I would like to discuss this idea later. Now the time is not ripe. The FPC's Unicode solution was discussed for years and everybody involved is tired. There is a contradiction between the 2 main encodings UTF-8 and UTF-16. Suggesting UTF-8 mode now would feel like we are opposing UTF-16.
I have learned they both can be supported. The relevant code can be encapsulated. Even user code can be made to support both encodings by using helper functions (not done yet), and using String for individual codepoints like one is used to do with UTF-8 already.

I would like to return to this idea once FPC project has got their libraries fixed for Delphi compatible UTF-16.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12531
  • FPC developer.
Re: new AnsiString question
« Reply #78 on: March 27, 2016, 07:31:53 pm »
What about {$mode lazarusunicode} where string = utf8string?

Maybe, but the main thing would have to be to make utf8 the central 1-byte type, that is used to convert all other 1-bytes types to/from 2-byte types.

But modes are on a per unit basis, the question is if this actually works, since passing a string from an $mode lazarusunicode unit to a unicodestring function in a non lazarusunicode unit would do what? Use the rules of the source or destination unit?

Even if you make a choice the result will be fragile, hard to support, and only available in the next major version anyway.

 Moreover such work detracts of proper Delphi compatible utf16 support. There is some discussion about starting the utf16 classes unit (vcl/rtl.classes) real soon now.

malcome

  • Jr. Member
  • **
  • Posts: 81
Re: new AnsiString question
« Reply #79 on: March 28, 2016, 02:28:22 am »
You have been hiding some treasure? :)
Could you please show me some your code?

It is used in conversion functions in unit LazUTF8 and its include file WinLazUTF8.inc, and in unit LConvEncoding. All of them in LazUtils package. Please have a look.
You can also search for "RawByteString" in FPC sources. The libraries have plenty of matches.

It is needed as a typecast when setting a string codepage to match input/output data by calling SetCodePage().
See:
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#Reading_.2F_writing_text_file_with_Windows_codepage
Otherwise it is not needed much in user code if the other string types are used cleverly.

So, You mean?
I want to know how to solve SCBC, not how to use RawByteString.
Just I want to try your simple code, And I want to check SCBC has gone and it keep backward compatibility at source level.

・・・・・・・・・・・・・・・・・・・・・・・・

By the way, IMHO,
"Do not use UTF8String!"
"Do not use UnicodeString!"
"Use only plane String!"
"Please read the wiki before using new AnsiString!"
"Please check assembler code!"
You must not..., You should..., You must not..., You should...,
Your pride new smart string automatic conversion is not dying?

You said "Your programing sounds unique, Nobody write it.", So I am saying now
"Your pride new smart string automatic conversion sounds unique, Nobody want it."
If somebody want your pride smart bomb, He/She should add "{$EnableSmartBomb}" or "{$mode TheNewDelphi}" in his/her source.
We are converting string manually using UTF8Decode(v), UTF8ToUTF16(v), AnsiToUTF8(v), etc. But we are happy.
We are not duty-bound to obey your unstable dream.
But do not forget I love Lazarus 1.6 so much. :)

« Last Edit: March 28, 2016, 02:30:14 am by malcome »

Zoran

  • Hero Member
  • *****
  • Posts: 1974
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: new AnsiString question
« Reply #80 on: March 28, 2016, 09:46:40 am »
Malcome, I have to say that I do not understand what real problems you still encounter now.

I had an impression that by now you understood well that the solution (call it workaround if you prefer) is simple - use only plain "String" type and treat it as utf-8 encoded. That also ensures that "SCBC" is gone.

Now can you please give us code where you still encounter what you call SCBC.
Please note that your code from the start of this topic is not an example for this (because the problem exists only when you assign it to UTF8String variable).
Actually, if all variables are declared as "String", then you do not get any compatibility issue with string constants (there is no SCBC, as you call it).
Swan, ZX Spectrum emulator https://github.com/zoran-vucenovic/swan

malcome

  • Jr. Member
  • **
  • Posts: 81
Re: new AnsiString question
« Reply #81 on: March 28, 2016, 10:40:51 am »
Thank you, Zoran.
My current opinions are above. That's all.
I have achieved the goal.
Of course, Your comments make sense to me, too. Thanks.
I don't intent to stay here, But he told me new topic, So just I am waiting his sample code.
« Last Edit: March 28, 2016, 10:43:51 am by malcome »

BeniBela

  • Hero Member
  • *****
  • Posts: 947
    • homepage
Re: new AnsiString question
« Reply #82 on: March 28, 2016, 12:07:07 pm »

 Moreover such work detracts of proper Delphi compatible utf16 support. There is some discussion about starting the utf16 classes unit (vcl/rtl.classes) real soon now.

UTF16 is the last thing anyone needs on Linux

Although UTF8String really seems completely broken

Code: [Select]
{$codepage utf8}
var u; UTF8String;
begin
  u := 'äöü';
  writeln(u); // => corrupted. huh? when codepage = string encoding, why does it not just copy everything?
end.


malcome

  • Jr. Member
  • **
  • Posts: 81
Re: new AnsiString question
« Reply #83 on: March 28, 2016, 12:32:58 pm »
Zoran, Just one thing.
"Change your souce!" is as same as his opinion.
My opinion is "You do not have to change your source!".
Of course, this is Your own choice.
Thank you so much!

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4631
  • I like bugs.
Re: new AnsiString question
« Reply #84 on: March 28, 2016, 12:56:52 pm »
Code: [Select]
{$codepage utf8}
var u; UTF8String;
begin
  u := 'äöü';
  writeln(u); // => corrupted. huh? when codepage = string encoding, why does it not just copy everything?
end.

http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#Why_not_use_UTF8String_in_Lazarus.3F
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Zoran

  • Hero Member
  • *****
  • Posts: 1974
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: new AnsiString question
« Reply #85 on: March 28, 2016, 02:09:21 pm »
Zoran, Just one thing.
"Change your souce!" is as same as his opinion.
Okay, and I guess, by "his opinion" you actually mean "the advice Juha gave to you and to all of us, and repeated it several times", don't you?
Well, then "his opinion" was never "Change your source!".

My opinion is "You do not have to change your source!".
Isn't "his opinion" actually just that - "You do not have to change your source!", unless your source contains explicit UTF8String declarations?
If you follow "his opinion", you just won't get any "SCBC".

Thank you so much!
You are welcome!

Of course, this is Your own choice.
Of course it is.
Yes, of course it is not to change my source.
Yes, just like Juha adviced. Yes, just like "his opinion". And yes, it just works. Yes, like magic. ;)
Swan, ZX Spectrum emulator https://github.com/zoran-vucenovic/swan

BeniBela

  • Hero Member
  • *****
  • Posts: 947
    • homepage
Re: new AnsiString question
« Reply #86 on: March 28, 2016, 03:27:11 pm »
Code: [Select]
{$codepage utf8}
var u; UTF8String;
begin
  u := 'äöü';
  writeln(u); // => corrupted. huh? when codepage = string encoding, why does it not just copy everything?
end.

http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#Why_not_use_UTF8String_in_Lazarus.3F

But that does not explain anything

Bart

  • Hero Member
  • *****
  • Posts: 5614
    • Bart en Mariska's Webstek
Re: new AnsiString question
« Reply #87 on: March 28, 2016, 04:39:25 pm »
No problem here:

Code: [Select]
{$ifdef mswindows}
{$apptype console}
{$endif}
{$ifdef fpc}
{$mode objfpc}
{$h+}
{$codepage utf8}
{$endif}

{$define plainfpc}
uses SysUtils {$ifdef unix},cwstring{$endif};


{$ifdef plainfpc}{$else}{$endif}
var
  u8: Utf8String;
  s: string;
  u: UnicodeString;
begin
  {$ifdef plainfpc}
  write('Plain Fpc, ');
  {$else}
  write('Lazarus, ');
  {$endif}
  {$ifdef windows}
  write('Windows ');
  {$else}
  write('Linux ');
  {$endif}
  {$ifdef cpu32}
  writeln('32-bit');
  {$endif}
  {$ifdef cpu64}
  writeln('64-bit');
  {$endif}
  writeln('DefaultSystemCodePage = ',DefaultSystemCodePage);
  u8 := 'äÄëËïÏöÖüÜ';
  s := 'äÄëËïÏöÖüÜ';
  u := 'äÄëËïÏöÖüÜ';
  writeln('u8 = ',u8,' [cp=',StringCodepage(u8),']');
  writeln('s  = ',s,' [cp=',StringCodepage(s),']');
  writeln('u  = ',u,' [cp=',StringCodepage(u),']');
  u8 := s;
  u := s;
  writeln('u8 := s');
  writeln('u := s');
  writeln('u8 = ',u8,' [cp=',StringCodepage(u8),']');
  writeln('u  = ',u,' [cp=',StringCodepage(u),']');
  u8 := u;
  s := u;
  writeln('u8 := u');
  writeln('s := u');
  writeln('u8 = ',u8,' [cp=',StringCodepage(u8),']');
  writeln('s  = ',s,' [cp=',StringCodepage(s),']');
end.

Code: [Select]
[bart@localhost ConsoleProjecten]$ fpc test.pas
Free Pascal Compiler version 3.0.0 [2015/11/24] for i386
Copyright (c) 1993-2015 by Florian Klaempfl and others
Target OS: Linux for i386
Compiling test.pas
test.pas(44,8) Warning: Implicit string type conversion from "AnsiString" to "UnicodeString"
test.pas(49,9) Warning: Implicit string type conversion with potential data loss from "UnicodeString" to "UTF8String"
test.pas(50,8) Warning: Implicit string type conversion with potential data loss from "UnicodeString" to "AnsiString"
Linking test
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?
55 lines compiled, 0.4 sec
3 warning(s) issued

Code: [Select]
[bart@localhost ConsoleProjecten]$ ./test
Plain Fpc, Linux 32-bit
DefaultSystemCodePage = 65001
u8 = äÄëËïÏöÖüÜ [cp=65001]
s  = äÄëËïÏöÖüÜ [cp=65001]
u  = äÄëËïÏöÖüÜ [cp=1200]
u8 := s
u := s
u8 = äÄëËïÏöÖüÜ [cp=65001]
u  = äÄëËïÏöÖüÜ [cp=0]
u8 := u
s := u
u8 = äÄëËïÏöÖüÜ [cp=65001]
s  = äÄëËïÏöÖüÜ [cp=65001]
[bart@localhost ConsoleProjecten]$

Code: [Select]
[bart@localhost ConsoleProjecten]$ locale
LANG=nl_NL.UTF-8
...

Bart
« Last Edit: March 28, 2016, 04:41:25 pm by Bart »

BeniBela

  • Hero Member
  • *****
  • Posts: 947
    • homepage
Re: new AnsiString question
« Reply #88 on: March 28, 2016, 05:10:35 pm »
Oh, {$codepage utf8} has to be put before the uses clause

malcome

  • Jr. Member
  • **
  • Posts: 81
Re: new AnsiString question
« Reply #89 on: March 29, 2016, 12:50:43 am »
Zoran, Just one thing.
"Change your souce!" is as same as his opinion.
Okay, and I guess, by "his opinion" you actually mean "the advice Juha gave to you and to all of us, and repeated it several times", don't you?
Well, then "his opinion" was never "Change your source!".

My opinion is "You do not have to change your source!".
Isn't "his opinion" actually just that - "You do not have to change your source!", unless your source contains explicit UTF8String declarations?
If you follow "his opinion", you just won't get any "SCBC".

Thank you so much!
You are welcome!

Of course, this is Your own choice.
Of course it is.
Yes, of course it is not to change my source.
Yes, just like Juha adviced. Yes, just like "his opinion". And yes, it just works. Yes, like magic. ;)

You must be correct.

・・・・・A long time ago in a galaxy far, far away・・・・・

He said: Change your source. (= Don't use UTF8String, plane string constant, etc.)
I said: Change Lazarus/FPC's source. (= Lazarus/FPC has bug.)
~
I thought: We do not need UTF8String?
~
We say: Don't use UTF8String.

・・・・・To be continued・・・・・

He said: Change your source. (= Don't use UnicodeString, plane string constant, etc.)
I said: Change Lazarus/FPC's source. (= Lazarus/FPC has bug.)
~
I thought: We do not have to use UnicdeString(v)?
~
He say: Change your source. Luke, I am your father.
I say: We do not have to change our source.(= Don't use UnicodeString(v) = Don't use the new smart bomb)

・・・・・Maybe the end・・・・・

I know there are many talented people in Lazarus/FPC team, So I belive they know what they should do.

Thanks you!
« Last Edit: March 29, 2016, 02:20:34 am by malcome »

 

TinyPortal © 2005-2018