Recent

Author Topic: using the CODEPAGE correctly?  (Read 29382 times)

loopbreaker

  • New Member
  • *
  • Posts: 32
Re: using the CODEPAGE correctly?
« Reply #45 on: June 24, 2016, 01:09:09 pm »
You never use UTF8String type. Use plane string type.

Oh malcome, please don't try a new direction after so many years of discussion;
the solution (alias String for utf8String) is found.
It just needs to be allowed in FPC. If they change FPC today, yes, TODAY,
it would have no side-effects, because this is an opt-in, it has to be manually
enabled on unit basis. The unit author should know when his "String"
is equal to utf8 or utf16. We are so close, it just needs to be allowed.
Oh, it's so painful the see this and no-one is doing something....

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4652
  • I like bugs.
Re: using the CODEPAGE correctly?
« Reply #46 on: June 24, 2016, 01:13:05 pm »
@lainz:
Your original version of function UTF8UpperFirst() is overly complex and slow. The whole string is converted twice between encodings for no reason!

wp's version works but it has a slow and useless call to UTF8Length(Value) which can be replaced with Length(Value) or just MaxInt.
[Edit:] Simple UpperCase can be used instead of UTF8UpperString.
Then it becomes:
Code: Pascal  [Select][+][-]
  1. Result := UpperCase(UTF8Copy(Value, 1, 1)) + UTF8Copy(Value, 2, Length(Value));
  2. // or
  3. Result := UpperCase(UTF8Copy(Value, 1, 1)) + UTF8Copy(Value, 2, MaxInt);

It can be further optimised by taking UTF8CharacterLength(Value) once and then using simple Copy() twice. Super-fast.
It is amazing how often you can use CodeUnit resolution with variable lenght encoding. I remember I got a wow-effect when I realized it. See examples:
  http://wiki.freepascal.org/UTF8_strings_and_characters
Please remember also my encoding agnostic functions if the code must be maintained between Delphi <-> Lazarus.

Quote
How I can convert (if needed) each case to newest lazarus with no usage of codepage. Thanks.

I think you are confusing things now. This thread is about using {$codepage UTF8} but it makes absolutely no difference for your code because it has no constants. There are 2 separate things:

1. Changing the default encoding of AnsiString (and String) variable type to UTF-8. This is now the recommended way and happens automatically for LCL applications. It can be disabled by -dDisableUTF8RTL if needed. This is a rather big change but mostly for the good.

2. {$codepage UTF8} only tells the compiler to treat string literals as UTF-8. It is a rather small issue because constants are less common than variables in normal code. The associated problems have easy workarounds, thus I think the problems have been greatly exaggarated.
« Last Edit: June 27, 2016, 11:47:15 am by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

lainz

  • Hero Member
  • *****
  • Posts: 4738
  • Web, Desktop & Android developer
    • https://lainz.github.io/
Re: using the CODEPAGE correctly?
« Reply #47 on: June 24, 2016, 03:19:27 pm »
@lainz:
Your original version of function UTF8UpperFirst() is overly complex and slow. The whole string is converted twice between encodings for no reason!

wp's version works but it has a slow and useless call to UTF8Length(Value) which can be replaced with Length(Value) or just MaxInt.
Then it becomes:
Code: Pascal  [Select][+][-]
  1. Result := UTF8UpperString(UTF8Copy(Value, 1, 1)) + UTF8Copy(Value, 2, Length(Value));
  2. // or
  3. Result := UTF8UpperString(UTF8Copy(Value, 1, 1)) + UTF8Copy(Value, 2, MaxInt);

It can be further optimised by taking UTF8CharacterLength(Value) once and then using simple Copy() twice. Super-fast.
It is amazing how often you can use CodeUnit resolution with variable lenght encoding. I remember I got a wow-effect when I realized it. See examples:
  http://wiki.freepascal.org/UTF8_strings_and_characters
Please remember also my encoding agnostic functions if the code must be maintained between Delphi <-> Lazarus.

Quote
How I can convert (if needed) each case to newest lazarus with no usage of codepage. Thanks.

I think you are confusing things now. This thread is about using {$codepage UTF8} but is makes absolutely no difference for your code because it has no constants. There are 2 separate things:

1. Changing the default encoding of AnsiString (and String) variable type to UTF-8. This is now the recommended way and happens automatically for LCL applications. It can be disabled by -dDisableUTF8RTL if needed. This is a rather big change but mostly for the good.

2. {$codepage UTF8} only tells the compiler to treat string literals as UTF-8. It is a rather small issue because constants are less common than variables in normal code. The associated problems have easy workarounds, thus I think the problems have been greatly exaggarated.

In fact that function UTF8UpperFirst() was contributed by someone else here in the forum, I not coded that because I don't know how strings was working in older fpc and I asked for help.

Thanks for this new version.

The rest of code works out of the box with no modifications, I removed unnecesary encode/decode functions.

About point 1.) I noticed that no more conversions are needed for lcl components, also saving to file works really well in TStringList with no decode/encode functions.
About point 2.) Ok so I don't worry about that. My code works now ;)

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4652
  • I like bugs.
Re: using the CODEPAGE correctly?
« Reply #48 on: June 24, 2016, 03:34:07 pm »
Oh malcome, please don't try a new direction after so many years of discussion;
the solution (alias String for utf8String) is found.
It just needs to be allowed in FPC. If they change FPC today, yes, TODAY,
it would have no side-effects, because this is an opt-in, it has to be manually
enabled on unit basis. The unit author should know when his "String"
is equal to utf8 or utf16. We are so close, it just needs to be allowed.
Oh, it's so painful the see this and no-one is doing something....

I must ask you to stop this FUD now.
Whining about missing features and telling other people to implement them for you brings nothing good, especially when you hijacked an innocent thread about an existing compiler directive for your "mission".
If you want to help implement the new compiler mode, please do it in FPC-devel mailing list and be prepared to provide patches.

I could take you more seriously if you had shown practical examples of Unicode related problems. I guess you have not even tried the new UTF-8 system. This appears to be the case with most whiners. We have seen positive comments from people who actually use it.
« Last Edit: June 24, 2016, 03:56:53 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

malcome

  • Jr. Member
  • **
  • Posts: 81
Re: using the CODEPAGE correctly?
« Reply #49 on: June 24, 2016, 04:06:24 pm »
You have to Use UTF8Decode(), UTF8Encode(), or etc as you have so far.. Don' t trust Auto-converting-String-Codepage.

That is not true. I remember this was discussed with you already.
You don't need UTF8Decode() nor UTF8Encode() any more. Assignment between string variables goes always right thanks to the dynamic encoding info.
Assigning constants is trickier but can be solved easily, too.

Looks like you still have not understood this system. How come? Please open a new thread and attach your code there.

My rules are simple, your great wikis rules are not simple. That's all.
I think some people cannot understand your great wiki. But they are normal.
« Last Edit: June 24, 2016, 04:08:56 pm by malcome »

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4652
  • I like bugs.
Re: using the CODEPAGE correctly?
« Reply #50 on: June 24, 2016, 06:36:37 pm »
My rules are simple, your great wikis rules are not simple. That's all.
I think some people cannot understand your great wiki. But they are normal.

Your rules may be simple but they are also wrong. UTF8Decode() and UTF8Encode() are not needed any more.
I would like to see your code where they are needed. Is it about assigning a constant to a variable?

BTW, the "great" wiki page is not done entirely by me. Mattias, Bart, Michl and maybe others have participated.
True, it may contain too many details already.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Michl

  • Full Member
  • ***
  • Posts: 226
Re: using the CODEPAGE correctly?
« Reply #51 on: June 24, 2016, 11:14:20 pm »
A lot of emotions here ;D

For the initial topic: In my eyes, there is no general best way to go. Sometimes it is good to use {$codepage UTF8}, sometimes not. But please remember, it has only effects for string constants in code.

To help anyone, what solution is the best for a current project, i added a overview to the wiki: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#String_Literals_Overview.

Hope, it help to clear the issue a little bit ;)  (I've tested Windows and Linux, it behaves identical - thank you FPC core team!)
« Last Edit: June 27, 2016, 08:18:09 am by Michl »
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

 

TinyPortal © 2005-2018