Recent

Author Topic: Is this string function efficient?  (Read 13258 times)

Thaddy

  • Hero Member
  • *****
  • Posts: 18676
  • Jungle wars. And failing health it seems.
Re: Is this string function efficient?
« Reply #30 on: January 10, 2018, 04:34:04 pm »
As I said: it is only the fastest in very simple scenario's where ASCII 0..127 is guaranteed. My point is assignments are safer and prevent users (Lazarus users in particularl) from making mistakes.
AND the buffer size is still wrong as I already explained.

But given ASCII 0.127, OK, you have my blessing although if you need that much string copying to save so little time I guess you would should be fired... 8-) >:D

Normal users should use direct assignment. period.

For lower ASCII both naive solutions gain some speed over assignment, that is true.
« Last Edit: January 10, 2018, 04:53:59 pm by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Is this string function efficient?
« Reply #31 on: January 10, 2018, 08:16:57 pm »
UTF-8 is backwards compatible with plain ASCII. You can iterate the data using the good old byte offsets. Just ignore anything outside ASCII.
Provided we speak of the first 128 bytes.
It's only logical.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1315
Re: Is this string function efficient?
« Reply #32 on: January 11, 2018, 07:44:36 am »
Even with UTF-32 not all Unicode chars have the same length (I wish they did, but they don't).

Thaddy

  • Hero Member
  • *****
  • Posts: 18676
  • Jungle wars. And failing health it seems.
Re: Is this string function efficient?
« Reply #33 on: January 11, 2018, 09:14:15 am »
Even with UTF-32 not all Unicode chars have the same length (I wish they did, but they don't).
Wrong: UTF32 is a fixed length format: 4 bytes. As per the standard.
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Is this string function efficient?
« Reply #34 on: January 11, 2018, 09:34:16 am »
Even with UTF-32 not all Unicode chars have the same length (I wish they did, but they don't).
Thaddy is correct. UTF-32 causes the least conversion problems, even though it's not ASCII-compatible. The only downside is that ASCII files converted to UTF-32 would be four times the size, which is probably why UTF-8 has become the internet and UNIX standard.
It's only logical.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1315
Re: Is this string function efficient?
« Reply #35 on: January 11, 2018, 09:39:28 am »
Nope. That was before they allowed composite characters.

Thaddy

  • Hero Member
  • *****
  • Posts: 18676
  • Jungle wars. And failing health it seems.
Re: Is this string function efficient?
« Reply #36 on: January 11, 2018, 09:48:31 am »
Nope. That was before they allowed composite characters.
Wrong:
Yes:
http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf  // "The Standard"
http://www.unicode.org/faq/utf_bom.html  See the table....
https://en.wikipedia.org/wiki/UTF-32
UTF32 is fixed length, UCS(2) is fixed length. Ansi is fixed length
UTF8 is variable length, UTF16 is variable length.
You are mixing things up. Probably confuse  UCS2 (Delphi "unicode16") and UTF16.

You may also be confused because FPC has no native UTF32String type yet, AFAIK.

Any of UTF16 + all extensions to it can be expressed in a single 4 byte UTF32 and there is room to spare to repeat that whole process at least 16 times, even if reserved is taken into account.
The actual space is a power function to UTF16 or UTF8, were it not for reserved entries. The theoretical limit is 2^ 32 or 4.294.967.296 or 4 billion (4294 million) pictures. These - of course - do not all exist....
Even if we include all Martian dialects and languages, we have still room to spare for several other planets. Withdraw your remark. >:( >:D O:-) (or prove me wrong, which you can't)
« Last Edit: January 11, 2018, 10:17:12 am by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1315
Re: Is this string function efficient?
« Reply #37 on: January 11, 2018, 10:18:06 am »
Ask Juha.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4660
  • I like bugs.
Re: Is this string function efficient?
« Reply #38 on: January 11, 2018, 12:05:48 pm »
Ask Juha.
This thread is locked for some reason. Replies must go to a new thread or somewhere. I can still add a comment.
Please see the older thread:
 http://forum.lazarus.freepascal.org/index.php/topic,39045.msg268507.html#msg268507
Shortly put: Only codepoints are encoded!
See e.g. characters: ÓÓ
One of them is composed of 2 separate codepoints. If you encode it with UTF-32, it will still be 2 codepoints.
The true complexity of Unicode is outside of codepoints. People should not concentrate on encodings so much, codepoints are easy to get right anyway.
« Last Edit: January 11, 2018, 12:10:42 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

 

TinyPortal © 2005-2018