Recent

Author Topic: String functions. conversion from Delphi  (Read 28937 times)

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4715
  • I like bugs.
Re: String functions. conversion from Delphi
« Reply #30 on: April 04, 2015, 03:11:18 pm »
I use it all the time.

Ok, both ways are needed. My original idea was to tell mm7 that he should not blindly use the UTF8 functions everywhere. Then I maybe emphasized it too much.

Quote
Except you can have arbitrary many accents. ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
And they exist.
There are no surrogate pairs in UTF 8  !

Damn, it happened again! I thought I understood Unicode but I didn't. For some reason I confused surrogate pairs with combining codepoints with accents.
Yes, "surrogate pair" is a UTF-16 concept only, equivalent to multibyte UTF-8 codepoint.
What is the correct name for the Unicode character definition for combining codepoints with accents?
Is it just "combined codepoints" or maybe "decomposed Unicode characters"?
I have already used the wrong term in some places. I must fix them. :(

[Edit] This explains surrogate pairs :
  http://en.wikipedia.org/wiki/UTF-16#U.2B10000_to_U.2B10FFFF
It is purely a UTF-16 concept but resembles a multibyte UTF-8 codepoint.
Even Martin used the term in a non-accurate way which means it is a tricky issue. :)
« Last Edit: April 04, 2015, 04:52:24 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12345
  • Debugger - SynEdit - and more
    • wiki
Re: String functions. conversion from Delphi
« Reply #31 on: April 04, 2015, 03:23:24 pm »
I think utf8length should check to stay within the strings length. This is the kind of bug that causes errors that allow code  injection and the like.....

SO yes that should be reported.

Utf8Length is in codepoints (not chars, as Jarto wrote). And it should be kept like this. (Utf8CharLength/Count can be added.

Utf8CharacterLength may be an issue, as it also is doing codepoint. But it is a misnomer.

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: String functions. conversion from Delphi
« Reply #32 on: April 04, 2015, 03:34:07 pm »
Damn, it happened again! I thought I understood Unicode but I didn't. For some reason I confused surrogate pairs with combining codepoints with accents.
Yes, "surrogate pair" is a UTF-16 concept only, equivalent to multibyte UTF-8 codepoint.
What is the correct name for the Unicode character definition for combining codepoints with accents?
Is it just "combined codepoints" or maybe "decomposed Unicode characters"?
I have already used the wrong term in some places. I must fix them. :(

http://unicode.org/reports/tr15/#Norm_Forms

https://en.wikipedia.org/wiki/Unicode_equivalence#Canonical_ordering

I think it is "Canonical decomposition", in adjective form probably something like canonically decomposed character. I guess we can shorten it to "decomposed character".

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12345
  • Debugger - SynEdit - and more
    • wiki
Re: String functions. conversion from Delphi
« Reply #33 on: April 04, 2015, 04:06:00 pm »
My understanding (and no guarantee for correctness) is, that:


 A decomposed char, is made from
- a normal char (either single codepoint, or surrogate pair)
- and one or more combining codepoints

Note there is a difference between Normalization and "Canonical decomposition"

According to the wiki the former includes the latter. (But not the other way round, if I understand it correctly)

jarto

  • Full Member
  • ***
  • Posts: 106
Re: String functions. conversion from Delphi
« Reply #34 on: April 04, 2015, 07:10:23 pm »
Hmm, the issue with UTF8Length is not as bad as I though. A range check error is not triggered even though UtfLength(chr(223)) accesses the 2nd byte of the string. Apparently the compiler doesn't care if the null is accessed.

I'm not sure if I should report this one after all.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12345
  • Debugger - SynEdit - and more
    • wiki
Re: String functions. conversion from Delphi
« Reply #35 on: April 04, 2015, 08:10:38 pm »
Hmm, the issue with UTF8Length is not as bad as I though. A range check error is not triggered even though UtfLength(chr(223)) accesses the 2nd byte of the string. Apparently the compiler doesn't care if the null is accessed.

I'm not sure if I should report this one after all.

Did you recompile LazUtils with range check or only your project? But I think it does not matter, Utf8Lenght probably uses PChar, and therefore will not trigger it.

But the problem is not if a range check is triggered or not. A range check for bad data, is kind of ok.

The problem is that calling code trusts that the result is accessible memory. And calling code may then write to it (get the byte-pos, and byte-length of that codepoint and write to it. And that makes any app using this function vulnerable.


Now having said that, maybe it is documented somewhere that the function does not work with invalid data, and that another function exists to first check the data. If that is the case then it works as designed (though it might not be the bes design, in terms of promoting security (none vulnerability)).

dietmar

  • Full Member
  • ***
  • Posts: 170
Re: String functions. conversion from Delphi
« Reply #36 on: April 05, 2015, 05:25:04 am »
Hi,

I have just read this topic, followed the given links and now I am totall confused :(

What is the "best practice" to use if you are making a multi-language application in Lazarus?

The "LCL Unicode Support" page give some hints, that there might be some tricky conversions to remember, e.g. with GetCurrentDir().

I would appreciate a howto or whatever helps to clarify the issues.

Thx,
Dietmar
Lazarus 2.2.0RC1 with FPC 3.2.2 (32 Bit) on Windows10 (64Bit)

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4715
  • I like bugs.
Re: String functions. conversion from Delphi
« Reply #37 on: April 05, 2015, 09:19:46 am »
I have just read this topic, followed the given links and now I am totall confused :(
What is the "best practice" to use if you are making a multi-language application in Lazarus?
The "LCL Unicode Support" page give some hints, that there might be some tricky conversions to remember, e.g. with GetCurrentDir().
I would appreciate a howto or whatever helps to clarify the issues.

In your case the best solution is this :
  http://wiki.lazarus.freepascal.org/Better_Unicode_Support_in_Lazarus
It reduces confusion as no explicit conversions are needed.
Any new development or new Delphi conversion should use it.
You need development versions of FPC and Lazarus but it should not be a big problem.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

dietmar

  • Full Member
  • ***
  • Posts: 170
Re: String functions. conversion from Delphi
« Reply #38 on: April 06, 2015, 03:27:12 am »
Ok,

since the latest bundle is just with FPC 2, I suppose I have to install Lazarus and FPC 3 separately?

Anyway, will try it - thanks for the help!

Dietmar
Lazarus 2.2.0RC1 with FPC 3.2.2 (32 Bit) on Windows10 (64Bit)

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4715
  • I like bugs.
Re: String functions. conversion from Delphi
« Reply #39 on: April 06, 2015, 09:20:24 am »
since the latest bundle is just with FPC 2, I suppose I have to install Lazarus and FPC 3 separately

You can do it at one go using FPCup. I have tested with FPC fixes_3.01 branch + Lazarus trunk. Works well.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

dietmar

  • Full Member
  • ***
  • Posts: 170
Re: String functions. conversion from Delphi
« Reply #40 on: April 06, 2015, 10:56:57 pm »
Hi,

I downloaded fpcup and fpcupgui for Windows, read the documentation, but now I am failing an  "how to tell fpcup what Lazarus and what FPC to install". I tried via settings.ini and command line, but failed. fpcup does nothing or just shows me up a (for me not wanted) default installation.

Do you have a sample configuration?

Thx,
Dietmar
Lazarus 2.2.0RC1 with FPC 3.2.2 (32 Bit) on Windows10 (64Bit)

 

TinyPortal © 2005-2018