Recent

Author Topic: Considerations on IME - Japanese in SynEdit (probably all other IME users too)  (Read 18653 times)

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2269
    • havefunsoft.com
imho, there shouldn't be an option.
"non-integrated" approach (the same as observed in Notepad) is just lousy in 2015 on the most modern IDE in the world.

Btw, you can "non-integrated" look like "integrated" as long as you know the length of the composition text window width.
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5638
    • wiki
imho, there shouldn't be an option.
"non-integrated" approach (the same as observed in Notepad) is just lousy in 2015 on the most modern IDE in the world.
Hence it is not default. As I said MS visual studio express 2010 has the none integrated.

In SynEdit its a separate class. If you drop SynEdit on a form, only the full integration gets compiled into your app. (smart linking). You need to add the none integrated yourself, if you want it.
In the IDE there is an option. No harm.

Quote
Btw, you can "non-integrated" look like "integrated" as long as you know the length of the composition text window width.
Partly. it still would not do the same char spacing. because SynEdit applies the enforced monospacing.

Also the none integrated window will wrap if needed, then the length does no longer matter.

Can someone test with a current MS-Office/Word ?
I assume that will be integrated, but what if (hint, reduce window width):
- the IME becomes longer than the line, and is scrolled
- press space to convert the chars at the begin (scrolled out)

In this case the caret (correctly) remains at the end of the IME. But in al apps that I tested, that means that the part you actually work on (to which the drop down applies) is scrolled out.
« Last Edit: March 26, 2015, 05:41:28 pm by Martin_fr »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5638
    • wiki
Scrolling fixed in r48512

In case it is not wanted, change
Code: [Select]
  FAdjustLeftCharForTargets := True;
to false,

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5638
    • wiki
I added that the IME prevents underline from other styles.

Preventing frames is a TODO, because current markup code does not allow it.

Preventing font, and background color, italics or bold, will not be added as default, but when/if colors become configurable, it can be specified by the user.

only issue: if surrounding code is underlined, the IME underline joins the surrounding underline. The markup currently does not allow to leave a gap in such cases. That might one day be added.....


That's it for now.

malcome

  • Jr. Member
  • **
  • Posts: 76
I think, distinguish about "IDE Source Editor" and about "SynEdit Component"?

About "SynEdit Component",
We can customize freely for our user. change property, override method, ...
(Of course this is thanks to your preparations)
In a worst case, we can patch your code.
(Of course this is thanks to your decided Open Source)
So We aren't in trouble so much. I don't have an opinion, so far.

About "IDE Source Editor",
You have to make good one for Lazarus users.
I'm one of them. So I have an opinion. (All things I have told so far are about that)

-----------------

I think, distinguish about "Text Editor(≒IDE Source Editor)" and about "Word Processor"?

Both of them are classified clearly in Japan at least.
A Word Processor exists to make the document decorated beautifully.
Still, Text Editor is required light weight.
Of course, We're talking about Text Editor.(believed so)
If Lazarus Source Editor is heavy, there would be a lot of people who quit using it.
Your responsibility is very important.

-----------------

I knew about Scroll problem. Why silent.
Because we do not input so long with IME.
And used only about 80 chars.(Right margin indicator is very convenient)
So I wasn't interested.
« Last Edit: March 26, 2015, 09:45:29 pm by malcome »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5638
    • wiki
Well thanks for all the feedback.

As you can see, I plan to add config for some of the still open issues. But tats a bigger task, so it will be later.


Until then, if you find any other problem, let me know.

malcome

  • Jr. Member
  • **
  • Posts: 76
I have made your work increase. Sorry.
I'd like to say, Your editor does not have big problems, and your editor is great.
Good luck!

malcome

  • Jr. Member
  • **
  • Posts: 76
....I lied to you.
http://bugs.freepascal.org/view.php?id=27707 is a big problem.

PS
I'm planning to play by Linux in the summer holidays.
If Japanese input is perfect on Lazarus in Linux then, I'd call you "God Martin".
« Last Edit: March 26, 2015, 11:00:17 pm by malcome »

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2269
    • havefunsoft.com
....I lied to you.
http://bugs.freepascal.org/view.php?id=27707 is a big problem.
Let me ask you this question: is there any instance when a character (specifically quotes) should not be shown as full width in a text-editor or word-processor.



PS
I'm planning to play by Linux in the summer holidays.
What about OSX?
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5638
    • wiki
Let me ask you this question: is there any instance when a character (specifically quotes) should not be shown as full width in a text-editor or word-processor.

Well If I understand utf specs correctly,  any char that is defined as either narrow or half-witdh, but not ambiguous, should be shown half-width (always).

Equally all wide, full width, but not ambiguous, are shown full width.

But for ambiguous chars the utf8 spec seems to give no hint at all. (Well as far as I got reading it...)


And for ambiguous it does not even depend just on the settings of the OS. It depends on other factors.

For example on my PC some ambiguous chars are displayed half-witdh with one font, full-witdh with another.
But I guess nothing even guarantees that it will be always the same with the same font. At least in theory it could depend on context (e.g. surrounding chars).

Anyway SynEdit can override widths, and for ease of editing, it may make sense to show all ambiguous chars as full width, so they fit into the same grid as the Japanese glyphs.


skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2269
    • havefunsoft.com
I'm thinking about two things.

First. The context that should be used to resolve ambiguity. Is the content of text a context? It surely is. But just like in malcome patch - system language is also a context.

Second. SynEdit considers all text as full-width anyway (it doesn't have half-width!).
Remember this post? It's not related but if every character would be "stretched" ... but it's not the point.

I presume SynEdit stops treating any unicode sequences as monospace characters and tries to render then as a single line of text. (cyrillic too?) Maybe an exception should be made for unicode characters from CJK group? In this each character would be rendered as full-width monospace characters. That should look as expected.
I'd think that having CJK characters as sort of exception should do the trick easily.
« Last Edit: March 27, 2015, 03:46:34 am by skalogryz »
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5638
    • wiki
First. The context that should be used to resolve ambiguity. Is the content of text a context? It surely is. But just like in malcome patch - system language is also a context.

Yes it is, and I would be happy to use it. But someone needs to code a list of ambiguous chars. Malcome's patch affects many half-width chars.

Now I can see, that someone editing Japanese in monospace might want to force half-width into the full width grid too, and even that is ok. Only not as default. Just because a systems codepage is Japanese does not mean that a latin "A" becomes double-width. That would have to be an option, that needs explicit enabling.


Further more:"system language is also a context." A context. One, but not the only one. The problem here is that if the System decides that a char has a certain width, then one must be carefully about enforcing another. (See the link you posted).

In most cases enforcing a bigger width is a lesser issue. So ok, there would be a risk, that an ambiguous char on a Japanese PC has some extra spacing. Not good, but well.

Then there is the problem what happen on a none Japanese system. Many ambiguous chars are narrow. So that should be the choice, or should it not? In a latin text you don't want a quote to introduce an extra space, don't you?
Only depending on font some, but not all, ambiguous chars may be double width, and forced to narrow they will overlap....

So this issue affect none Japanese setup too. (Actually maybe only if East Asian fonts are installed).

Anyway I am taking it further than the current issue needs: Detecting the codepage is fine, *IF* there is a list of ambiguous chars. More fixes will be needed, but it is a good start.


Quote
Second. SynEdit considers all text as full-width anyway (it doesn't have half-width!).
SynEdit name for "full-width" is "double-width".But that is naming. SynEdit has 2 widths for monospaced chars. (And more for tabs)

Quote
I presume SynEdit stops treating any unicode sequences as monospace characters and tries to render then as a single line of text. (cyrillic too?) Maybe an exception should be made for unicode characters from CJK group? In this each character would be rendered as full-width monospace characters. That should look as expected.
I'd think that having CJK characters as sort of exception should do the trick easily.

SynEdit treats everything as monospace, except that mono = duo.

If not Japanese would not behave with SynEdits grid. East asian fonts on windows use some sort of font fallback.
If not forced by SynEdit then Japanese would have aprox 1.7 the width of a latin char. SynEdit expands that to factor 2.

If that was not done SynEdit (expecting everything to the grid) could not place the caret correct.

If Japanese chars would be allowed their desired wide, SynEdit would need to support proportional font behaviour to place the caret correct. (Maybe some day, but coming from the history SynEdit has, that is not possible now)

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2269
    • havefunsoft.com
Yes it is, and I would be happy to use it. But someone needs to code a list of ambiguous chars.
I think there should be an utility in FPC to convert these unicode tables to pascal arrays? no?
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5638
    • wiki
Well currently I have a huge case statement.
So I would need a tool to generate this.

But I want to test, storing this as a data structure. And reduce the loop code. Then I could also have different data structures for different locale.

So yes, ideally there will be a tool to generate the code for this. But right now, I don not have one.

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2269
    • havefunsoft.com
Well currently I have a huge case statement.
So I would need a tool to generate this.

So yes, ideally there will be a tool to generate the code for this. But right now, I don not have one.
ok here you go.
Both the product cjkinfo.pas (to lookup char widths) and the tool (if you want something different than the product)

the lookup is in cjkinfo.pas. the single function GetCJKWidth that should return the width value for a unicode character.

The rest of files is the tool. data11.pas - is utility unit to parse/read the file, as well as additional utilities to store and process the parsed information.
The generation of pascal code is at data11read.pas.

Patron Cocoa Widgetset development https://www.patreon.com/skalogryz