Recent

Author Topic: SelLength incorrect value for text containing characters > $FFFF  (Read 28398 times)

fedkad

  • Full Member
  • ***
  • Posts: 176
SelLength incorrect value for text containing characters > $FFFF
« on: November 23, 2017, 03:06:38 pm »
I think this is a bug. For TEdit or TMemo the SelLength property returns wrong information for selections containing UTF8 characters with codes larger than $FFFF.

For example, create a TMemo object and during run time paste the character 𝛁 (code: $1D6C1) to it; then do a select all (Ctrl+A) and run the code:

Code: Pascal  [Select][+][-]
  1. showmessage( inttostr(utf8length(Memo1.Text))
  2.  + ' ' +     inttostr(Memo1.SelLength)
  3.  + ' ' +     inttostr(utf8length(Memo1.SelText))
  4. );

It will display 1 2 1.

Strangely enough, if you add
Code: Pascal  [Select][+][-]
  1. Memo1.SelectAll;
before the code above, it will display 1 1 1.

Manually selecting text causes SelLength to return a wrong value; programmatically selecting the text returns the correct value.

Environment: Windows 7 + Lazarus 1.6.4 (64 bit).
« Last Edit: November 23, 2017, 03:12:49 pm by fedkad »
Lazarus 2.2.6 / FPC 3.2.2 on x86_64-linux-gtk2 (Ubuntu/GNOME) and x86_64-win64-win32/win64 (Windows 11)

Bart

  • Hero Member
  • *****
  • Posts: 5265
    • Bart en Mariska's Webstek
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #1 on: November 23, 2017, 03:26:52 pm »
Maybe manually you also select a linefeed?

Bart

fedkad

  • Full Member
  • ***
  • Posts: 176
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #2 on: November 23, 2017, 04:00:10 pm »
Maybe manually you also select a linefeed?

No. I was careful not to do that. Can you test it?

And also, shouldn't Memo1.SelLength = utf8length(Memo1.SelText)? Obviously the code displays different values.
« Last Edit: November 23, 2017, 04:03:10 pm by fedkad »
Lazarus 2.2.6 / FPC 3.2.2 on x86_64-linux-gtk2 (Ubuntu/GNOME) and x86_64-win64-win32/win64 (Windows 11)

ASerge

  • Hero Member
  • *****
  • Posts: 2212
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #3 on: November 23, 2017, 11:56:13 pm »
And also, shouldn't Memo1.SelLength = utf8length(Memo1.SelText)? Obviously the code displays different values.
Detail:
string: Length=4, Bytes=$F09D9B81
UnicodeString: Length=2, Bytes=$35D8C1DE
CodePoint: UTF8Length=1, Bytes=$C1D601

The system (my OS is Windows) uses UnicodeChar (2 byte characters). Because UnicodeLength = 2, then Ctrl + A will select two unicode characters (one codepont).
Lazarus Memo1.SelectAll does not use the system message (EM_SETSEL, 0, -1), but explicitly calls SetSelLength(UTF8Length(Text)), which select only the first unicode character (half of the codepoint).

fedkad

  • Full Member
  • ***
  • Posts: 176
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #4 on: November 24, 2017, 09:50:17 am »
So, is this a bug or a feature? How do we code to avoid such inconsistencies? Should we avoid the functions/properties that return wrong (unexpected and inconsistent) values?

Please forgive me. I am just confused.  :-\ %)
Lazarus 2.2.6 / FPC 3.2.2 on x86_64-linux-gtk2 (Ubuntu/GNOME) and x86_64-win64-win32/win64 (Windows 11)

tomitomy

  • Sr. Member
  • ****
  • Posts: 251
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #5 on: November 25, 2017, 09:40:50 am »
I did the test in Linux GTK2, and the results were all 111. Maybe it's just a BUG in Windows. Maybe you should submit a BUG report.

fedkad

  • Full Member
  • ***
  • Posts: 176
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #6 on: November 26, 2017, 12:31:14 pm »
I prepared a small demo application. Please, find it in the attached ZIP file. I tested it in Linux (1.8.0~rc4) and it did not have the problem I described. However, Lazarus 1.6.4 (64bit) in Windows does have the problem.

To test it, run the application and try to select from the beginning of the Editbox 1, 2, and 3 characters. The relevant values will be displayed on the right.

I will try to submit a BUG report.

I attached some screenshots too.
« Last Edit: November 26, 2017, 01:11:57 pm by fedkad »
Lazarus 2.2.6 / FPC 3.2.2 on x86_64-linux-gtk2 (Ubuntu/GNOME) and x86_64-win64-win32/win64 (Windows 11)

Bart

  • Hero Member
  • *****
  • Posts: 5265
    • Bart en Mariska's Webstek
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #7 on: November 26, 2017, 02:31:13 pm »
Don't forget to attach the sample application and the screenshots to the bugreport.

Bart

fedkad

  • Full Member
  • ***
  • Posts: 176
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #8 on: December 05, 2017, 03:31:58 pm »
I DID submit a bug report. However, I am not very happy of its treatment!  :(

https://bugs.freepascal.org/view.php?id=32731

It was "related" to bug #0030478, whose platform and other information does NOT look to be related to my bug platform, etc.

Note: I DID mention that this problem is not encountered in Linux platform. It's a Windows only bug.
Lazarus 2.2.6 / FPC 3.2.2 on x86_64-linux-gtk2 (Ubuntu/GNOME) and x86_64-win64-win32/win64 (Windows 11)

tomitomy

  • Sr. Member
  • ****
  • Posts: 251
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #9 on: December 05, 2017, 03:41:21 pm »
I also have some reports that have not been dealt with, I think it may be too complicated, no one has time to deal with, or it's not a matter of top priority.

wp

  • Hero Member
  • *****
  • Posts: 11830
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #10 on: December 05, 2017, 04:26:21 pm »
I DID submit a bug report. However, I am not very happy of its treatment!  :(

https://bugs.freepascal.org/view.php?id=32731

It was "related" to bug #0030478, whose platform and other information does NOT look to be related to my bug platform, etc.

Note: I DID mention that this problem is not encountered in Linux platform. It's a Windows only bug.
You also wrote "I tested the same code in Linux + Lazarus (1.8.0~rc4) and it did not have the problem described above." This means: this bug is fixed. I doubt if anybody will take care of your report and go through the hassle of backporting the fix to an old version. There will not be any 1.6.6 release since v1.8 is about to appear shortly.

Thaddy

  • Hero Member
  • *****
  • Posts: 14164
  • Probably until I exterminate Putin.
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #11 on: December 05, 2017, 04:28:18 pm »
Yup. Like a patch for Windows 1.0 that makes it Windows 10.1  :D :D :D :D
Specialize a type, not a var.

wp

  • Hero Member
  • *****
  • Posts: 11830
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #12 on: December 05, 2017, 04:57:11 pm »
I also have some reports that have not been dealt with.
If the reports are important to you change something in them, maybe add a simple comment "ping" or so to bring it up to the top of the report list. If a report has not been assigned to a developer yet it could easily be unnoticed if the developer who normally would take care of it does not see it within the first days. Because later it will be buried by many other reports.

And most important: Reports with poor description of the issue, referring to an old version, missing a demo project. missing exact steps how to reproduce, etc, have a high change of being forgotten.

Thaddy

  • Hero Member
  • *****
  • Posts: 14164
  • Probably until I exterminate Putin.
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #13 on: December 05, 2017, 05:23:45 pm »
The system (my OS is Windows) uses UnicodeChar (2 byte characters). Because UnicodeLength = 2,
No, it is not! <grumpy  >:D >:D >:D >:D>
Unicode16 can have a length of more than 2. It is explicitly a variable length format with a minimum of 2 bytes per character.
You confuse UTF16 with UCS, the latter indeed being limited to exactly two bytes.

Windows is - since XP+  - UTF16 and not UCS anymore. So you can't rely on a two byte format.

http://www.differencebetween.net/technology/software-technology/difference-between-ucs-2-and-utf-16/
« Last Edit: December 05, 2017, 05:29:33 pm by Thaddy »
Specialize a type, not a var.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: SelLength incorrect value for text containing characters > $FFFF
« Reply #14 on: December 05, 2017, 07:47:55 pm »
I DID submit a bug report. However, I am not very happy of its treatment!  :(
https://bugs.freepascal.org/view.php?id=32731
It was "related" to bug #0030478, whose platform and other information does NOT look to be related to my bug platform, etc.
I had added a relation to a wrong issue. The right one is #32101. I changed that now. Please take a look. Both issues are about wrong surrogate pair treatment on Windows.

Who will fix them? It may not be trivial. Somebody must fearlessly dive into the LCL-Win32 code.
« Last Edit: December 05, 2017, 07:51:08 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

 

TinyPortal © 2005-2018