* * *

Author Topic: Lazarus: Why not define "String" unambiguously?  (Read 1673 times)

loopbreaker

  • New member
  • *
  • Posts: 32
Lazarus: Why not define "String" unambiguously?
« on: April 10, 2016, 10:47:36 am »
Hello,
I'm a Delphi developer and explore prospects to port to Lazarus.
I have no urgent need, I'm occupied with other things, but still,
I'm puzzled each time I look here. FPC strategy is fortunately
already defined (Delphi string compatible), but what is with Lazarus?

I don't understand, why Lazarus would not define the strings as follows:
1) Utf8-strings via Utf8String (and shorter alias).
2) Utf16-strings via UnicodeString and String (Delphi compatible).
3) Acp-strings via Ansistring (and shorter alias).

This means all encodings are declared at compiletime,
the best we could reach. As a consequence, libraries from
different authors, different platforms, would be compatible.

All apis (FPC and LCL) preferrably have method-parameter overloads
with Rawbytestring, Utf8String (if Rawbytestring version is absent), UnicodeString.
This minimizes the implicit conversions before method-calls.

I don't understand where the opposition comes from.
Currently, in Lazarus1.6, "String" is used for utf8 (via acp ansistring,
inclusive conflicts by defaultcodepage change, differences in libraries,
systems with other codepage).

But "String" could be easily renamed to Utf8String, because both are data equivalent.
Is this the origin of the problem, Lazarus users still want "String",
regardless of the encoding behind it, regardless of differences
in usage (eg. character access to utf8 and utf16) ?

Thaddy

  • Hero Member
  • *****
  • Posts: 4638
Re: Lazarus: Why not define "String" unambiguously?
« Reply #1 on: April 10, 2016, 11:17:53 am »
Coming from Delphi you probably forget that most of the other ecosystems like unix and mac are mostly UTF8 centric. So choosing string as UTF8string is the logical choice from a multiplatfprm point of view. Windows is the stranger with choosing UTF16 as its default string type.
"Logically, no number of positive outcomes at the level of experimental testing can confirm a scientific theory, but a single counterexample is logically decisive."

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 5825
Re: Lazarus: Why not define "String" unambiguously?
« Reply #2 on: April 10, 2016, 11:19:00 am »
All apis (FPC and LCL) preferrably have method-parameter overloads
with Rawbytestring, Utf8String (if Rawbytestring version is absent), UnicodeString.
This minimizes the implicit conversions before method-calls.

Of course that was the first thought too when the details of the D2009 release started slipping out, however there are several problems with that:

- even if possible, a truckload of work and double code
- most importantly: overloading doesn't work for virtual methods with string arguments.
- Rawbytestring is not a full string type, special care must be taken in many cases.
- the default after rawbytestring should be ansistring, not utf8string since that would be  Delphi compatible. (and complicate the already considerable conversion matrix even more).
- overloading deals badly with passing expressions to overloaded function with two or more string types.

Quote
I don't understand where the opposition comes from.

The problem is the utf8 nonsense. This invites speculation on alternate solutions that take a lot of time keeping everything longer in limbo. Moreover nobody can actually predict the practical workability of many such experiments, and in how far they affect Delphi compatibility

Quote
Is this the origin of the problem, Lazarus users still want "String",
regardless of the encoding behind it, regardless of differences
in usage (eg. character access to utf8 and utf16) ?

The main problem is that String is the default string type that is used
as central spot for conversions. A lot of the manoeuvring is to avoid having to make
two totally seperate releases and codebases. One delphi compatible, one lazarus utf8 experimental.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 5825
Re: Lazarus: Why not define "String" unambiguously?
« Reply #3 on: April 10, 2016, 11:20:13 am »
Coming from Delphi you probably forget that most of the other ecosystems like unix and mac are mostly UTF8 centric.

Afaik both Mac's Cocoa and *nix QT are utf16 on an API level.  On document level many systems are mostly utf8 nowadays, but on api level it is not so clear cut.

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus