Recent

Author Topic: String functions. conversion from Delphi  (Read 25730 times)

mm7

  • Full Member
  • ***
  • Posts: 193
  • PDP-11 RSX Pascal, Turbo Pascal, Delphi, Lazarus
String functions. conversion from Delphi
« on: March 26, 2015, 02:13:46 pm »
I am converting project from Delphi7 into Lazarus 1.4R2.
Delphy converter has converted file functions to UTF8.
But in code lots of calls of string functions Copy, Pos, Trunc etc ANSI ones.
Not converted for some reason (why?).
I understand that because LCL is mainly UTF8, the according functions, like UTF8Copy, should be used instead of ANSI ones.

What would be a best approach:
A. replace literally all functions calls with their UTF8 brothers;
B. create unit MyUTF8StringUtils where UTF8 calls are wrapped into Copy, Pos, etc ANSI-like functions, and use it instead of standard one;
C. Other?

I read docs about conversion and UTF8 support, and searched here. Have not found a clear solution for this issue. Sorry if I have not found something and the issue is well known. Please advise.

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: String functions. conversion from Delphi
« Reply #1 on: March 26, 2015, 02:22:48 pm »
B. seems to be the best and the most flexible.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4474
  • I like bugs.
Re: String functions. conversion from Delphi
« Reply #2 on: March 26, 2015, 03:04:39 pm »
What would be a best approach:
A. replace literally all functions calls with their UTF8 brothers;
B. create unit MyUTF8StringUtils where UTF8 calls are wrapped into Copy, Pos, etc ANSI-like functions, and use it instead of standard one;
C. Other?

The right answer is C.
It is comforting to see that an experienced programmer, skalogryz, is as clueless with Unicode as I was just a short time ago.

Now, the best solution in your case is this :
  http://wiki.lazarus.freepascal.org/Better_Unicode_Support_in_Lazarus
The downside is that you need development versions of FPC and Lazarus. FPC 3.0 is released at some point which improves the situation.
I don't see it as a big problem because you just started converting your code. By the time it is ready, both FPC and Lazarus have evolved more.

If you want to continue with the release versions, the instructions are here :
  http://wiki.lazarus.freepascal.org/LCL_Unicode_Support

Quote
I read docs about conversion and UTF8 support, and searched here. Have not found a clear solution for this issue. Sorry if I have not found something and the issue is well known. Please advise.

Here is an answer to your original question about why Copy and Pos are not converted :
Becase they already work well with UTF-8 in most cases. See :
  http://wiki.lazarus.freepascal.org/UTF8_strings_and_characters
This holds true in any case, whether you use the new "better" UTF-8 support or the current official one.

(skalogryz, hint: you should also read the wiki links, you are a FPC/Lazarus developer for God's sake :)
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

mm7

  • Full Member
  • ***
  • Posts: 193
  • PDP-11 RSX Pascal, Turbo Pascal, Delphi, Lazarus
Re: String functions. conversion from Delphi
« Reply #3 on: March 26, 2015, 03:53:53 pm »
Now, the best solution in your case is this :
  http://wiki.lazarus.freepascal.org/Better_Unicode_Support_in_Lazarus
The downside is that you need development versions of FPC and Lazarus. FPC 3.0 is released at some point which improves the situation.
I don't see it as a big problem because you just started converting your code. By the time it is ready, both FPC and Lazarus have evolved more.
Juha, many thanks for your detailed reply.
Yes, I agree Better_Unicode_Support_in_Lazarus would be the best. But FPC3 is not released an I do not know when it will be. Currently 2.7 is in trunk.
I already converted the project. I relied on "Delphy Converter" and the project in general works. Except bugs here and there. Current bug is caused by Copy. I'll illustrate later.

If you want to continue with the release versions, the instructions are here :
  http://wiki.lazarus.freepascal.org/LCL_Unicode_Support
Yes, of course, I read it.

Here is an answer to your original question about why Copy and Pos are not converted :
Becase they already work well with UTF-8 in most cases. See :
  http://wiki.lazarus.freepascal.org/UTF8_strings_and_characters
This holds true in any case, whether you use the new "better" UTF-8 support or the current official one.
Keyword is "in most cases". But not in every one. Unfortunately.
Not sure about Pos, but Str:=Copy('Трапециедальная',1,23); causes funny results if you add that Str to TMemo.Lines or to another GTK2 component, even if it is writeln to Gnome xterm. It erases everything printed above!
The Copy cuts 23 bytes(!) leaving part of UTF8 char at the end and this makes GTK go crazy.

PS/ Трапециедальная in Russian means Trapezoidal.

I think using my unit (B) is also good because it can be turned off by setting some {$IfNdef FPC3} (or something that tells that FPC3 compiler/RTL uses UTF8) around it, for future compatibility with FPC3.


Cyrax

  • Hero Member
  • *****
  • Posts: 836
Re: String functions. conversion from Delphi
« Reply #4 on: March 26, 2015, 05:56:50 pm »
...
Currently 2.7 is in trunk.
...

Err, FPC trunk have 3.1.1 version numer now.

mm7

  • Full Member
  • ***
  • Posts: 193
  • PDP-11 RSX Pascal, Turbo Pascal, Delphi, Lazarus
Re: String functions. conversion from Delphi
« Reply #5 on: March 26, 2015, 06:27:47 pm »
...
Currently 2.7 is in trunk.
...

Err, FPC trunk have 3.1.1 version numer now.

Well, it was from http://www.freepascal.org/develop.var
Quote
...You can download today's development (trunk - currently v2.7.x) sources...

If you know about FPC so well, could you tell when FPC3 will be released?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4474
  • I like bugs.
Re: String functions. conversion from Delphi
« Reply #6 on: March 26, 2015, 07:17:48 pm »
Keyword is "in most cases". But not in every one. Unfortunately.
Not sure about Pos, but Str:=Copy('Трапециедальная',1,23); causes funny results ...
Yes, then you must use UTF8Copy but fortunately you don't need to copy a fixed number of characters very often.
If you deal with valid UTF-8 strings, you can safely use Pos, Copy and Length. They work with byte positions but due to inherent properties of UTF-8 they work correctly ALWAYS.
Say you want to cut your text to a user specified UTF-8 character. The user has typed the char in Edit1. Here you go :
Code: [Select]
  Txt := 'Трапециедальная';
  i := Pos(Edit1.Text, Txt);
  if i > 0 then
    Txt := Copy(Txt, 1, i-1);

See other examples from the wiki page I linked and from internet.

Quote
I think using my unit (B) is also good because it can be turned off by setting some {$IfNdef FPC3} (or something that tells that FPC3 compiler/RTL uses UTF8) around it, for future compatibility with FPC3.

No, you clearly have not understood the issue.
The "future compatibility with FPC3" does not change anything when dealing with individual UTF-8 characters. You need to use either Pos/Copy/Length or UTF8Pos/UTF8Copy/UTF8Length depending on your particular code. This one detail stays the same when we move to the "better" UTF-8 support.

Pos/Copy/Length work with byte positions.
UTF8Pos/UTF8Copy/UTF8Length work with UTF-8 character positions.
Most often you can continue using the byte positions but sometimes you need UTF-8 character positions.
You should not blindly convert everything to use UTF-8 character positions. It slows down your code in any case and leads to wrong results in some cases.
You must go through your code case by case.

Unicode is unbelievable complex. Now we only touched the surface. It happened to me at least 10 time that I though I understood it, but then something new came up again.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Cyrax

  • Hero Member
  • *****
  • Posts: 836
Re: String functions. conversion from Delphi
« Reply #7 on: March 26, 2015, 08:39:43 pm »
...
Currently 2.7 is in trunk.
...

Err, FPC trunk have 3.1.1 version numer now.

Well, it was from http://www.freepascal.org/develop.var
Quote
...You can download today's development (trunk - currently v2.7.x) sources...

That page needs some updating.

Quote
If you know about FPC so well, could you tell when FPC3 will be released?

It's in stabilization phase : http://svn.freepascal.org/cgi-bin/viewvc.cgi/branches/fixes_3_0/?sortby=date&view=log
So probably in near future.

mm7

  • Full Member
  • ***
  • Posts: 193
  • PDP-11 RSX Pascal, Turbo Pascal, Delphi, Lazarus
Re: String functions. conversion from Delphi
« Reply #8 on: March 26, 2015, 10:12:23 pm »
...
You should not blindly convert everything to use UTF-8 character positions. It slows down your code in any case and leads to wrong results in some cases.
You must go through your code case by case.
...

I agree, this is wise.
Especially those parsed strings read from text files, that are usually just ASCII.

Is it harmful if UTF8 functions applied to plain ASCII 7 data like CSV (digits with simple separators...)?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4474
  • I like bugs.
Re: String functions. conversion from Delphi
« Reply #9 on: March 26, 2015, 11:03:54 pm »
Is it harmful if UTF8 functions applied to plain ASCII 7 data like CSV (digits with simple separators...)?

No harm. UTF-8 is backwards compatible with ascii.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

mm7

  • Full Member
  • ***
  • Posts: 193
  • PDP-11 RSX Pascal, Turbo Pascal, Delphi, Lazarus
Re: String functions. conversion from Delphi
« Reply #10 on: March 28, 2015, 12:27:06 am »
Is it harmful if UTF8 functions applied to plain ASCII 7 data like CSV (digits with simple separators...)?

No harm. UTF-8 is backwards compatible with ascii.
Thank you Yuha!

Does it mean I can safely replace all ASCII functions with UTF8 ones?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4474
  • I like bugs.
Re: String functions. conversion from Delphi
« Reply #11 on: March 28, 2015, 01:13:02 am »
Does it mean I can safely replace all ASCII functions with UTF8 ones?

Safely, I am not sure. You should always understand what happens in your code.
Anyway, you can try that. It will work most of times although it is not a very clever solution.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

mm7

  • Full Member
  • ***
  • Posts: 193
  • PDP-11 RSX Pascal, Turbo Pascal, Delphi, Lazarus
Re: String functions. conversion from Delphi
« Reply #12 on: March 29, 2015, 12:25:30 am »
Problem with overload of function length.

there are two of them one for string another for dynamic arrays.

If I overload one for string I get compilation error
my.pas(173,28) Error: Incompatible type for arg no. 1: Got "Array[0..8] Of TUserstring", expected "AnsiString"

Code: [Select]
unit MyStringUtils;

{$mode delphi}

interface

uses
  Classes, SysUtils, LazUTF8;

function Length(const s: string): PtrInt; overload;

implementation

function Length(const s: string): PtrInt;
begin
     result := UTF8Length(s);
end;

end.

Why FPC does not understand different signatures in this case?
« Last Edit: March 29, 2015, 12:32:15 am by mm7 »

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4474
  • I like bugs.
Re: String functions. conversion from Delphi
« Reply #13 on: March 29, 2015, 01:14:26 am »
Problem with overload of function length.

Length is a built-in function, not a library function.
Just do as I told, use UTF-8 specific functions only when they are needed. You will understand Unicode and UTF-8 better during the process.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8757
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: String functions. conversion from Delphi
« Reply #14 on: March 29, 2015, 10:27:50 am »
Length is a built-in function, not a library function.
Which is user-overridable, it's a matter of usage, looking at the error message.

 

TinyPortal © 2005-2018