Recent

Author Topic: Entering large Unicode numbers  (Read 16440 times)

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2268
    • havefunsoft.com
Re: Entering large Unicode numbers
« Reply #30 on: January 17, 2017, 05:46:21 pm »
By the image I am assuming that you are in Win10. Is your application also 64bit?

It may be that it is Win32 widget problem.
Yes, it's Win10 but it's 32-bit application.
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Entering large Unicode numbers
« Reply #31 on: January 17, 2017, 06:14:15 pm »
I have attached a series of 6 screen shots. It is in an RTL paragraph. I am hitting the same key each time. It is difficult to see the motion with all of them being the same character, but with all 6 you might see it. The caret shifts around, going right and left, and the actual key is appearing beside the caret (not at the far left end of the series).

Rick

Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Entering large Unicode numbers
« Reply #32 on: January 17, 2017, 06:36:19 pm »
This is the same demonstration, but it is with a LTR paragraph, and I am using new characters. Notice where the caret is (as to right or left) to the new character.
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Entering large Unicode numbers
« Reply #33 on: January 17, 2017, 07:13:13 pm »
I installed RichMemo in Win10 and it did similar to yours. The image shows the top line as an LTR paragraph, and the lower line is an RTL paragraph. Both the characters and caret moved in an LTR method within both paragraphs. So there is no RTL function in Win10.

Given that there is erratic behavior in WinXP, I don't know if it has RTL function either.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2268
    • havefunsoft.com
Re: Entering large Unicode numbers
« Reply #34 on: January 17, 2017, 07:19:25 pm »
I'd think if you try to adjust SelStart manually, you would be able to achieve the correct result after the insertion.

I'd think the more important is to verify that the navigation through entered characters are working as expected.
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Entering large Unicode numbers
« Reply #35 on: January 17, 2017, 08:02:16 pm »
I don't expect that I can do that with WinXP. It is not indexing the caret by a mathematical method. I can try, but I haven't because (even if I succeed) I also expect that it will not know how to wrap the text. It would probably treat it as LTR... but I can see if that is true.

Rick

Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Entering large Unicode numbers
« Reply #36 on: January 17, 2017, 08:30:19 pm »
OK, I did that. In doing so I also found that I had some code that was already trying to do that. I had forgotten. It is what was causing the erratic behavior. I apologize. I have been the cause of a lot of trouble.

But I was right about wrapping. It wraps as an LTR paragraph. I don't think that this font has an RTL attribute embedded within it. It is a symbol font, as I once had thought. You can't tell with unicode.

Again, I am embarrassed. Please accept my apology.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2268
    • havefunsoft.com
Re: Entering large Unicode numbers
« Reply #37 on: January 17, 2017, 09:21:15 pm »
That's fine.

I think you still found an issue that overwriting UTF8Key value with a surrogate pair option doesn't work at all.
I tried to fix that on my end, but I found that even sending a UTF32 via WM_UNICHAR doesn't work (an inserted character is a "tofo" character). However setting the same character over SelText works just fine. 
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Entering large Unicode numbers
« Reply #38 on: January 18, 2017, 06:15:12 pm »
I have Phoenician working by using the hexadecimal codes. The cursor point advances with the characters going right to left (no tofu), and it operates that way with both LTR and RTL paragraphs.

The only thing that it does not do is to wrap properly (with both LTR or RTL). It always wraps as if it is English.

I don't know if it is a font problem, or another issue with the unicode index exceeding the range of our compiler. It is probably the latter.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Entering large Unicode numbers
« Reply #39 on: January 19, 2017, 01:50:26 pm »
I have hit the last straw with the Phoenician language. After editing my User Manual for including the Phoenician font, subsequent saves and reloads have created erratic tofu (sometimes keeping the character, and other times rejecting it). Consequently, I am suspending the operations for Phoenician.

Although we don't have any extant Phoenician documents, it is because they wrote on papyrus and skins. They lived on the coast lands of the Mediterranean Sea, and the humidity had caused all of their lifestyle records to decay. All that remains are short phrases, an array of letters, or single words that were engraved on metal or a stone.

Nevertheless, they were the first people to devise a method for phonetic writing. Hebrew, Syriac, Samaritan, Persian, Canaani, Coptic, Greek, and subsequently even English and Russian were all contrived by adopting the Phoenician method for writing. Moreover, the Hebrew Scriptures were all written and maintained by using the Phoenician script from the time of Moses, and up to the Hebrew exile into Babylon. So I wanted to include it in my application, and also because there are extant Dead Sea Scrolls that were written with it.

I regret that I cannot include it in my application at this time, but I am retaining the code that we have devised (through this Forum exchange), for the chance that it might be reapplied at some point.

It is unfortunate that the Phoenician Unicode has been indexed at values that are higher than the Cardinal Range that we have to work with.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2268
    • havefunsoft.com
Re: Entering large Unicode numbers
« Reply #40 on: January 19, 2017, 02:29:27 pm »
I have hit the last straw with the Phoenician language. After editing my User Manual for including the Phoenician font, subsequent saves and reloads have created erratic tofu (sometimes keeping the character, and other times rejecting it). Consequently, I am suspending the operations for Phoenician.
I presume you save/load on XP... there might be a trick that could help.
Did you try to load the saved RTF in some other editor? (to see if tofu is there or not)
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Entering large Unicode numbers
« Reply #41 on: January 19, 2017, 03:31:11 pm »
I had not, but I just did. PolyEdit had the same problem, and OpenOffice Writer had different but similar problems.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2268
    • havefunsoft.com
Re: Entering large Unicode numbers
« Reply #42 on: January 19, 2017, 03:39:07 pm »
could you please provide the following:
* the saved file
* the expected behavior (how the file should look like when it opens back)
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

rick2691

  • Sr. Member
  • ****
  • Posts: 375
Re: Entering large Unicode numbers
« Reply #43 on: January 19, 2017, 09:22:42 pm »
You must go to https://www.google.com/get/noto/ ...then download and install the following font files:

NotoSans-Bold.ttf
NotoSans-BoldItalic.ttf
NotoSans-Italic.ttf
NotoSans-Regular.ttf
NotoSansHebrew-Bold.ttf
NotoSansHebrew-Regular.ttf
NotoSansPhoenician-Regular.ttf
NotoSansSyriacEstrangela-Regular.ttf

They are hard to find because an alphabetic order ignores the dash character.

Attached, herewith, are 2 RTF files... "CmdBlue Key-Map Tofu by Font Binding.rtf" and "CmdBlue Key-Map Tofu Binding Removed.rtf".

"by font binding" is the file with switched fonts.
It is for your comparison with an ASCII editor.

"binding removed" is where I edited the RTF file to reverse the font binding.
It is for your loading with an RTF editor.

At the start of the document is a table of consonants and descriptions.

On the far left column is a "y". Its row, at the 5th and 6th column is the tofu.

Further down, at the far left column, is a "t".

At the 4th column of that row there is tofu before the Phoenician character.

Do not save anything unless you save it with another name.

If you look at the "by font binding" file with an ASCII editor you will find that it has switched the font to \f10, which is the Hebrew font. It should be \f0, which is the English font.

The same, with the "binding removed" file in the RTF editor, if you click the tofu it will say that it is "Noto Sans Hebrew". It should be "Noto Sans".

It is because RichEdit does not know how to process the Phoenician font... and it is very bad at what it tries to do. It is showing as tofu because there are no English characters in the "Noto Sans Hebrew" font, and it ignores that it is designated as "Noto Sans". It thinks it knows better.

Of course, Font Binding had been eliminated by the Not Sans family of fonts. This is only happening because the Unicode index for Phoenician has made it choke.

Rick
Windows 10, LAZ 1.6.4, FPC 3.0.2, SVN 54278, i386-win32-win32/win64, forms use windows unit

Bart

  • Hero Member
  • *****
  • Posts: 3518
    • Bart en Mariska's Webstek
Re: Entering large Unicode numbers
« Reply #44 on: January 22, 2017, 01:49:38 am »
Aha. Found it. UnicodeToUTF8Inline in LazUTF8 is buggy and CAN'T handle that code point. UnicodeToUTF8 calls UnicodeToUTF8Inline...
It can't handle high surrogate pairs.

Thaddy: please see my nores and sample application in Issue #31243.
AFAICS they all pass OK.

@rick2691: please respond to the bugtracker.

Bart