Recent

Author Topic: FPC: Unit-scope alias String for Utf8String  (Read 18144 times)

Graeme

  • Hero Member
  • *****
  • Posts: 1428
    • Graeme on the web
Re: FPC: Unit-scope alias String for Utf8String
« Reply #30 on: May 11, 2017, 11:27:02 am »
UTF-8 and UTF-16 both support the full Unicode range (all UTFs do).
I obviously know that, but it seems here are some people in this forum that doesn't. Many seem to think UTF-16 is BMP only.  %)

Quote
If you want to support Unicode properly, you have to treat all UTFs (except for UTF-32) as variable-length, multi-codeunit encodings.
Exactly.

The other thing you need to content with when using UTF-16, is the endianess. Are you working with UTF-16LE or UTF-16BE encoded text data. Once again with UTF-8 you don't need to worry about that at all. I find the UTF-8 encoding much simpler and easier to implement and use. Much less can go wrong when using UTF-8, and much less to worry about.
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

z505

  • New Member
  • *
  • Posts: 38
  • think first, code after
Re: FPC: Unit-scope alias String for Utf8String
« Reply #31 on: May 12, 2017, 11:25:14 am »
If you need string to be AnsiString and want to stick to the Lazarus world, simply use AnsiString and AnsiChar instead of string and char. These are 1 byte based.
You should do so anyway, because it is the only way to keep your code understandable in Lazarus (or modern Delphi's for that matter).

Are you saying that 1 byte based strings (good old 1970's and 1990's) char by char way of programming is the only way to make things understandable?  Or are you saying that declaring ansistring as a type, specifically when you are using a 1 byte based normal old style string, is the only way to make your code readable so that people know you are using 1 byte based strings and not unicode or utf16 strings..

Any case, I agree with you either way, but specifically I agree with the first argument that old 1970's char by char programming is the only way to make any program readable ;-)

As IMO a small subset like 255 characters is much much easier to prove a program correct with, than a literally infinite characterset that is literally unprovable. Possibly even untestable to infinity

But not sure if this is what you meant :-)

Would this string alias that the original poster mentioned, be a compiler switch such as the $H+ switch, or a type declaration?
think first, code after

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11353
  • FPC developer.
Re: FPC: Unit-scope alias String for Utf8String
« Reply #32 on: May 12, 2017, 11:54:56 am »
But use it in a expression, and it will be converted using the ACS type.

And?  As long as the conversion is correct (and converting UTF-8 <-> UTF-16 is trivial to implement), who cares how it is performed behind the scenes?

Combinations are converted to the basetype. In  FPC objfpc/delphi mode that is ansistring(0) (ACS) so the result will be corrupt unless when in lazarus hackmode (since then ACS=utf8). But requiring the lazarus hack defies working with utf8string as separate type if it requires ACS=utf8 under the hood anyway.  Which is Delphi incompatible since you don't have a stringtype to process Windows real ACS strings.

Delphi/unicode always has unicodestring as base type, so will probably calculate the intermediate result in unicodestring and then assign it to utf8string, so no problem.

I don't know if $mode Delphiunicode already fixes this, but even if it does it requires large amounts of RTL+libraries  to be compiled in that mode to be safe.
« Last Edit: May 12, 2017, 11:56:27 am by marcov »

Thaddy

  • Hero Member
  • *****
  • Posts: 14169
  • Probably until I exterminate Putin.
Re: FPC: Unit-scope alias String for Utf8String
« Reply #33 on: May 12, 2017, 12:05:51 pm »
Are you saying that 1 byte based strings (good old 1970's and 1990's) char by char way of programming is the only way to make things understandable?  Or are you saying that declaring ansistring as a type, specifically when you are using a 1 byte based normal old style string, is the only way to make your code readable so that people know you are using 1 byte based strings and not unicode or utf16 strings..
Did you loose sight on the original question?  >:D >:D
Anyway. Yes. because string <> string.. If you want one byte strings in either UTF8 or UTF16, plz call them Ansistring.
« Last Edit: May 12, 2017, 12:13:46 pm by Thaddy »
Specialize a type, not a var.

 

TinyPortal © 2005-2018