Recent

Author Topic: UTF8 symbols in Editor - strange copy & paste  (Read 1204 times)

winni

  • Sr. Member
  • ****
  • Posts: 335
UTF8 symbols in Editor - strange copy & paste
« on: July 02, 2019, 12:52:17 am »
Hi

I just wanted to type a simple string:
Code: Pascal  [Select]
  1.  
  2. Zodiak = '♈♉♊♋♌♍♎♏♐♑♒♓';
  3.  

I selected the UTF8 symobls in KCharSelect and copied it into the clipboard. Checking the clipbboard to ensure that it's simple unformated text.

I copy the string into the IDE editor - and the symols are not just the simple one I wanted, but some kind of icons: painted on a circle and colored - see attachment. I know that some fonts have now such crazy icons, but

* The Editor Font is Liberation Mono - with simple symbols
* The Desktop Font is Liberation Sans - with simple symbols

So - where do the icons come from?

And: is this an IDE issue or (again) a KDE Plasma problem?

System: KDE Plasma 5.16,  Suse Tumbleweed

Winni


lucamar

  • Hero Member
  • *****
  • Posts: 2021
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #1 on: July 02, 2019, 02:59:37 am »
Try to see how those symbols are implemented (if they are) in Liberation Mono--with a charset map application, for example. Just tested and in Liberation Mono they are "normal" symbols.

Note, though, that despite how they "look" they are in fact the same characters. What happens if you close and then reopen that file?

ETA: It should look like the attached image (Lazarus/Ubuntu + Liberation Mono)
« Last Edit: July 02, 2019, 03:19:52 am by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

winni

  • Sr. Member
  • ****
  • Posts: 335
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #2 on: July 02, 2019, 03:20:04 am »
Quicker than me ...

I tested it in Libre Office writer - simple symbols, as I saw it in KCharSelect.

Reopening: I also tested - no effect.

And now: Opened good old joe in a terminal. Pure utf8. Copied the symbols to the editor. Called it zodiak.inc and opened  that file in the IDE: Colored icons.

Think we need an IDE magician to get the miracle solved !!??

Winni


lucamar

  • Hero Member
  • *****
  • Posts: 2021
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #3 on: July 02, 2019, 03:24:19 am »
I don't think it's the IDE per se, but something to do with your system. Note that my image shows the string in Lazarus.

Unfortunately I can't test with KDE at the moment, I don't have any (working) KDE system at hand here. :(
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

PascalDragon

  • Hero Member
  • *****
  • Posts: 626
  • Compiler Developer
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #4 on: July 02, 2019, 09:14:30 am »
It's probably related to the font. Some font supports colored glyphs (e.g. Emojis) and if the widgetset backend (Qt/Gtk) supports them they might be used automatically.

winni

  • Sr. Member
  • ****
  • Posts: 335
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #5 on: July 03, 2019, 12:08:19 am »
No it's not related to the font as mentioned above.

I did the same test with the liberation font on Windows 7 with Lazarus 2 - and everything is ok.
And loading the pure utf8 file "zoadiak.inc" (see above) showed the utf8 symbols and not those icon-like signs.

So there is a problem with the clipboard of KDE Plasma.

I'll have a look what Mr. Google knows.

Winni

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #6 on: July 03, 2019, 06:51:43 pm »
As explained by PascalDragon, two things:
1-The font should include emoji style glyphs, beside the text style glyphs for some Unicode points. (A substitution from another font might be possible)
2-The glyph rendering routine should choose the correct glyph. Text style is chosen for some glyphs by default. While the emoji style is chosen for some other glyphs also by default.

The zodiac signs are supposed to be rendered using the emoji style. An application can request the text style by using U+FE0E after the desired Unicode. It is up to the rendering routine to fulfill the request properly.

This is part of the Unicode standard, and its implementation might not be up to date in different systems/applications/rendering routines.

Notice that the font itself is not enough to replicate the issue. In another word, having a different result when using the same font on Windows is expected.

lucamar

  • Hero Member
  • *****
  • Posts: 2021
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #7 on: July 03, 2019, 07:47:21 pm »
The zodiac signs are supposed to be rendered using the emoji style. An application can request the text style by using U+FE0E after the desired Unicode. It is up to the rendering routine to fulfill the request properly.

This is part of the Unicode standard, [...]

Huh? Part of the standard? Where have you seen that? AFAIK, the recommendation is exactly the contrary: don't render "emoji" unless explicitely asked for. Have that changed?

And even if it's as you say, I seriously doubt that Liberation (much less the "Mono" font!) have such colorful "emojis" (unless they were added recently), so they have to be coming from another font that the system is substituting, which sounds rather strange to me.

But then, I know little of these new-fangled fashions :)
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #8 on: July 03, 2019, 08:10:20 pm »
Huh?

 :(  Text vs Emoji.


Edit:
Changed the URL from http*s* to http.
« Last Edit: July 04, 2019, 12:51:06 am by engkin »

lucamar

  • Hero Member
  • *****
  • Posts: 2021
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

winni

  • Sr. Member
  • ****
  • Posts: 335
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #10 on: July 04, 2019, 12:21:40 am »
Me also:

Huh????

But when I try to open https://xahlee.info/comp/text_vs_emoji.html then i get an
SSL_ERROR_INTERNAL_ERROR_ALERT

- in Firefox, Chromium and Vivaldi.

Anyway - let's grap the facts:

Liberation Fonts have no major updates in the last 10, years - and the last big update was the creation of a narrow font:
https://github.com/liberationfonts/liberation-1.7-fonts/blob/liberation-fonts-1_07_3/ChangeLog
So there are definitly no Emojis in Liberation Fonts.

I never had problems neither with KDE <= 4.x nor with windows with this issue: Even copying formated text appeared in the editor, like it should.

And why is the clipboard or the fontmanager or who the hell ever overriding the IDE options about the font?

I played around with this problem and detected just an astonishing problem in th GIMP:
* Copy the 12 unformated zodiak signs to the clipboard.
* Create a textbox with any font you like
* Copy  the clipboard into the textbox: The font name
 changes at once to "sans-serif" and (!!!)
* you cannot change the font style anymore!!!!
* Weird!

I Think there is a problem with the clipboard of KDE Plasma. But I'm not quite shure.

Winni




engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #11 on: July 04, 2019, 12:36:21 am »
Me also:

Huh????

But when I try to open https://xahlee.info/comp/text_vs_emoji.html then i get an
SSL_ERROR_INTERNAL_ERROR_ALERT
Try it without SSL:
http://xahlee.info/comp/text_vs_emoji.html



So there are definitly no Emojis in Liberation Fonts.
As I said:
Quote
1-The font should include emoji style glyphs, beside the text style glyphs for some Unicode points. (A substitution from another font might be possible)



I never had problems neither with KDE <= 4.x nor with windows with this issue: Even copying formated text appeared in the editor, like it should.

And why is the clipboard or the fontmanager or who the hell ever overriding the IDE options about the font?
Newer version trying to comply with the "standard".
« Last Edit: July 04, 2019, 12:42:15 am by engkin »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5635
    • wiki
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #12 on: July 04, 2019, 01:05:52 am »
There is another side that may explain the text vs emoji http://xahlee.info/comp/text_vs_emoji.html

SynEdit is not utf8 complete. And it relies currently on its own hardcode info on character properties, rather than getting this from the OS.

So the variation markers are not known to it. Which, if they are present, may mean that SynEdit will calculate wrong char positions and/or the selectors will not do their work.
Open
components\synedit\synedittextbuffer.pp line 920
function TSynEditStringList.LogicPosIsCombining(const AChar: PChar): Boolean;

and add this line (2nd to last)
   ( (AChar[0] = #$EF) and (AChar[1] = #$B8) and (AChar[2] in [#$80..#$8F]) ) or  // Invisible variation selector



winni

  • Sr. Member
  • ****
  • Posts: 335
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #13 on: July 04, 2019, 01:25:36 am »
@engkin: thanx for the URL.

But which idiot invented that?? UTF8 war a smooth and round wheel; now it has rags and spikes!

The SSL-error might come from the last SSL update 10 days ago: with Tumbleweed you are not allways backward compatible .....

@ Martin_fr: yes, that me be part of the solution of the behaviour of the IDE. But I think thats not everything - see the strange experience with the GIMP.

Btw: Libre Office Writer  6.2.4.2.0 doesn't   care about icons anyway - good crew!

Winni

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: UTF8 symbols in Editor - strange copy & paste
« Reply #14 on: July 04, 2019, 01:42:08 am »
But which idiot invented that?? UTF8 war a smooth and round wheel; now it has rags and spikes!
LOL

Btw: Libre Office Writer  6.2.4.2.0 doesn't   care about icons anyway - good crew!
I will not be optimistic. Sooner or later they might follow the "standard".