* * *

Author Topic: FPC: Unit-scope alias String for Utf8String  (Read 2322 times)

Graeme

  • Hero Member
  • *****
  • Posts: 1363
    • Graeme on the web
Re: FPC: Unit-scope alias String for Utf8String
« Reply #30 on: May 11, 2017, 11:27:02 am »
UTF-8 and UTF-16 both support the full Unicode range (all UTFs do).
I obviously know that, but it seems here are some people in this forum that doesn't. Many seem to think UTF-16 is BMP only.  %)

Quote
If you want to support Unicode properly, you have to treat all UTFs (except for UTF-32) as variable-length, multi-codeunit encodings.
Exactly.

The other thing you need to content with when using UTF-16, is the endianess. Are you working with UTF-16LE or UTF-16BE encoded text data. Once again with UTF-8 you don't need to worry about that at all. I find the UTF-8 encoding much simpler and easier to implement and use. Much less can go wrong when using UTF-8, and much less to worry about.
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

z505

  • New member
  • *
  • Posts: 38
  • think first, code after
Re: FPC: Unit-scope alias String for Utf8String
« Reply #31 on: May 12, 2017, 11:25:14 am »
If you need string to be AnsiString and want to stick to the Lazarus world, simply use AnsiString and AnsiChar instead of string and char. These are 1 byte based.
You should do so anyway, because it is the only way to keep your code understandable in Lazarus (or modern Delphi's for that matter).

Are you saying that 1 byte based strings (good old 1970's and 1990's) char by char way of programming is the only way to make things understandable?  Or are you saying that declaring ansistring as a type, specifically when you are using a 1 byte based normal old style string, is the only way to make your code readable so that people know you are using 1 byte based strings and not unicode or utf16 strings..

Any case, I agree with you either way, but specifically I agree with the first argument that old 1970's char by char programming is the only way to make any program readable ;-)

As IMO a small subset like 255 characters is much much easier to prove a program correct with, than a literally infinite characterset that is literally unprovable. Possibly even untestable to infinity

But not sure if this is what you meant :-)

Would this string alias that the original poster mentioned, be a compiler switch such as the $H+ switch, or a type declaration?
think first, code after

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 5444
Re: FPC: Unit-scope alias String for Utf8String
« Reply #32 on: May 12, 2017, 11:54:56 am »
But use it in a expression, and it will be converted using the ACS type.

And?  As long as the conversion is correct (and converting UTF-8 <-> UTF-16 is trivial to implement), who cares how it is performed behind the scenes?

Combinations are converted to the basetype. In  FPC objfpc/delphi mode that is ansistring(0) (ACS) so the result will be corrupt unless when in lazarus hackmode (since then ACS=utf8). But requiring the lazarus hack defies working with utf8string as separate type if it requires ACS=utf8 under the hood anyway.  Which is Delphi incompatible since you don't have a stringtype to process Windows real ACS strings.

Delphi/unicode always has unicodestring as base type, so will probably calculate the intermediate result in unicodestring and then assign it to utf8string, so no problem.

I don't know if $mode Delphiunicode already fixes this, but even if it does it requires large amounts of RTL+libraries  to be compiled in that mode to be safe.
« Last Edit: May 12, 2017, 11:56:27 am by marcov »

Thaddy

  • Hero Member
  • *****
  • Posts: 3675
Re: FPC: Unit-scope alias String for Utf8String
« Reply #33 on: May 12, 2017, 12:05:51 pm »
Are you saying that 1 byte based strings (good old 1970's and 1990's) char by char way of programming is the only way to make things understandable?  Or are you saying that declaring ansistring as a type, specifically when you are using a 1 byte based normal old style string, is the only way to make your code readable so that people know you are using 1 byte based strings and not unicode or utf16 strings..
Did you loose sight on the original question?  >:D >:D
Anyway. Yes. because string <> string.. If you want one byte strings in either UTF8 or UTF16, plz call them Ansistring.
« Last Edit: May 12, 2017, 12:13:46 pm by Thaddy »
Why do the Danish always try to fuck up any programming language?

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus