Recent

Author Topic: Reinvent the SynEdit  (Read 3203 times)

x2nie

  • Sr. Member
  • ****
  • Posts: 478
  • Impossible=I don't know the way
    • impossible is nothing - www.x2nie.com
Reinvent the SynEdit
« on: December 10, 2015, 08:37:00 am »
Hi gurus,

Me has no question here, but let discuss about SynEdit interest.


-----
I am in the middle of making SynEdit Highlighter (TSynCustomFoldHighlighter) that aim to be more comprehensive when being inherited. It will solve a little problem you faced when you need to querying: where are the all parent of this nested folding by given current line index?

I have had anything I need, it almost done, and just want to send a patch...
But, then I noticed by thinking of some overlapped classes in SynEditHighlighterFoldBase.pas
so here I am, sharing what I got from there.


This topic is about how to make a good SynEdit.
It is not finding consensus, it just : "May be I miss something?"
I just not sure about SynEdit in few things.


You don't need to be SynEdit expert to join discussion, just let me know if you have different point of view.
I may have real question in the next post.

------
Now, Before redesign the SynEdit, before changing big things,
let we restart our furthest learning curve of SynEdit, from very beginning: the primitive holy folks that worked behind the TSynCustomFoldHighlighter universe.
So, correct me when I wrong or misinterpret ...


First let abbreviate classes name for easier to name them.
PL = programming language / specific highlighter syntaxes. such as Pascal, Python, XML, or LFM
HL = TSynCustomFoldHighlighter
Range = TSynCustomHighlighterRange
RangeList = TSynCustomHighlighterRanges
FoldBlock = TSynCustomCodeFoldBlock
Lines = whole text being edited in SynEdit
Line = single text separated by EOL (LF / CR)

AFAIK each FoldBlock represents a logical area of folding system.
FoldBlock primarily hold the blocktype which maybe a IfThenElse kind, or CaseEnd, or other specific PL.
FoldBLock also contains information of it's friends as in it's properties : Parent,Children,Left,Right.
It seem that a single FoldBlock is enough for storing the whole lines via it's unlimited nested Children.

HL only have one FoldBlock : HL.RootCodeFoldBlock Once its created, wouldn't be replaced. (immutable).
But HL never manage this FoldBlock directly.
FoldBlock managed by HL via Range.
HL has one Range (property CodeFoldRange) for one time
this property is immutable; meaning the value is replaced one with other when the line being parsed is switched from one line to next line (and so from one to random line index).

a Range is created when HL is start parsing the first line.
another range is created when HL start parsing the next line.

But, When the first line is being re-parsed,
a new range is recreated.
Therefore a new range is also created when second line was being re-parsed.
being amazed ?

Now the show:
SynEdit works in per line basis.
So parsing the whole lines from first to last would never happen.
Rather, it works randomly from first visible line to last visible one.

The magic trick is in action when the first visible line is the 99657823th line.
the HL wouldn't need to parse from the 1 through the 99657822th line
(Ups, Let say the 23th and 22th  line...)
. instead, it just reuse the exact previous (the 22th) line info.
Which line info is reused? the 22th Range ! (the 22th TSynCustomHighlighterRange).
This way, before parsing the 23th line, the 22th Range is copied to the this just new created Range.

Any question?


Hey, Wait...
Q: why reparsing is needed for each line?
A: because we want to coloring the PL keyword. Without parsing we don't know how to coloring them.

Q: No, I refer to reparsing, not parsing. Why reparsing is needed?
A: well, reparsing and parsing is equivalent in SynEdit. both has no different function,
   both done in single method, so nor reparsing is taken as special action.
   Anyway, parsing is done twice or thousands time because SynEdit doesn't store the result of parsing.

Q: Wow, I am now noticed that SynEdit has such stupid behaviour: reparsing a line more than once.
A: You are another stupid when exposed earlier conclusion without knowing the big design behind it.
   
Q: Lets Don't be pedantic. When a line is unchanged, why this line being parsed twice or thousands time?
A: Pedantic, huh?
   Okay, the decision is made as best in the context of that day decision made.
   What bad this day, may is best in 10 years ago.
   Also, between 10 years there are many new classes introduced to work together with SynEdit.
   So, keeping the all things unchanged is also good for SynEdit ecosystem: lazarus editor, code tools,
   all platform well tested, all OS supported, etc.

Q: I repeat, when A LINE IS UNCHANGED, why one parsing is not enough?
A: well, enough not enough, the answer is very depends of how your SynEdit is designed.
   for simple plain SynEdit, once parsing is enough.
   for complex SynEdit that have many gutter, + several markup, + rich of plugins, they may
   triggers reparsing more than once at a time.
   Again, when it is a bad decision, it was best in the date of this activity created.
   We must keep ecosystem works : lazarus things, well tested things,..
   Changing this interactions, is not worth for the time and money.
   
Q: I think the reparsing can be avoided somehow. for efficiency purpose. can it?
A: No. The most efficient way you can do is do nothing. simplicity in code is not the issue here.
   The real issue related to reparsing is the corectness of syntax highliter vs time needed to
   provide correct syntax.
   
Q: knowing PL's keyword is enough to parse a line, why need to reuse previous line's Range?
A: Not enough! programmer also need to know which context applied in a line.
   For instance, when a keyword placed inside a comment, that keyword shall be ignored
   because in the context of comment, keyword is not a keyword anymore.     

Q: can the parsed info be stored statically, so parsing will only be required once?
A: yes, it could be, but only if user will never edit any part of text.
   Once it happened, we should rename SynEdit become SynStaticText ! :)p

   
to be continued...
« Last Edit: December 10, 2015, 09:29:28 am by x2nie »
When you were logged in, you can see attachments.
Lazarus Trunk @ Windows7 64bit, XP 32bit, Debian under VirtualMachine

x2nie

  • Sr. Member
  • ****
  • Posts: 478
  • Impossible=I don't know the way
    • impossible is nothing - www.x2nie.com
Re: Reinvent the SynEdit
« Reply #1 on: December 10, 2015, 09:20:34 am »
now about my idea of changing the HL.


1) I found the TSynPasSyn introduces a new Range class : TSynPasSynRange.
it can't directly use TSynCustomHighlighterRange (for folding), by the reason of extend the infos attached to this class.
Meaning: TSynPasSynRange introduces new properties: PasFoldEndLevel,  PasFoldMinLevel,  PasFoldFixLevel,  LastLineCodeFoldLevelFix,  BracketNestLevel, Mode.


I think it's bad design. Because, each ancestor will also introduce new Range class, by only the reason of extending the infos being added?
This way, will bother any HL ancestor, because the new introduced range class will be used almost everywhere inside the ancestor HL class.

To improve the readibility (thus easier maintenance, faster time to finding bug, quicker to understand),

The new approach is to not bothering ancestor of subclassing the Range, if there are no other reason.
The solution is would be extending the length of info's member without subclassing.
It will done by moving info member from Class's property into Record's member.


The real world solution is will be something like PNodeInfo of TVirtualTree.
Using this solution, HL author will only inherit a few method, no new class maintenance of the range class, and many other benefit of clean maintainability.



« Last Edit: December 10, 2015, 09:28:13 am by x2nie »
When you were logged in, you can see attachments.
Lazarus Trunk @ Windows7 64bit, XP 32bit, Debian under VirtualMachine

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5691
    • wiki
Re: Reinvent the SynEdit
« Reply #2 on: December 10, 2015, 06:37:22 pm »
AFAIK each FoldBlock represents a logical area of folding system.
This is at least ambigious, if not misleading.

FoldBlock  is part of the range ( http://wiki.lazarus.freepascal.org/SynEdit_Highlighter#Important_note_on_Ranges ).
It's purpose is to provide the absolute minimum information needed, so that a scan of this line can be done.

Actual information about the folding area, can be discovered during the scan. It is not stored, but can be returned when GetFoldNodeInfo is called.

Ranges (if they are an object) are re-used. And with that the same FoldBlock object is used to represent the state on many lines.
Code: Pascal  [Select]
  1. begin                   // range at the end of line has foldblock 0x0001
  2.   if a then begin    // range at the end of line has foldblock 0x0002
  3.   end;                  // range at the end of line has foldblock 0x0001 since the inner fold is closed
  4.   if longer_condition then begin // range at the end of line has foldblock 0x0002 again (re-used), even the X pos differs.
  5.   end;
  6. end;
  7.  

So FoldBlock does not represent a logical area of the folding system.

Quote
It seem that a single FoldBlock is enough for storing the whole lines via it's unlimited nested Children.
not sure what is meant by this?
Quote
HL only have one FoldBlock : HL.RootCodeFoldBlock Once its created, wouldn't be replaced. (immutable).
The top level is always "unfolded". But it has (can have) many (siblings or nested) children. So there are many blocks.

As with the range, one needs to differentiate between the working copy for the currently scanned line (not immutable), and the range (+ foldblock) stored with already scanned lines (immutable)
Quote
HL has one Range (property CodeFoldRange) for one time
this property is immutable; meaning the value is replaced one with other when the line being parsed is switched from one line to next line (and so from one to random line index).
a Range is created when HL is start parsing the first line.
another range is created when HL start parsing the next line.

But, When the first line is being re-parsed,
a new range is recreated.
Therefore a new range is also created when second line was being re-parsed.
being amazed ?
see last comment.

A new range is not always created.

Start parsing a line: A mutable copy of the last known range (from end of previous line) is created (by assigning to the existing working range object)

End parsing:  The list of existing stored ranges is searched for a range equal to the working range. If found (a reference to) the found range is returned, otherwise it is created and added.

Quote
Now the show:
SynEdit works in per line basis.
So parsing the whole lines from first to last would never happen.
It happens when the text is first loaded.

It also happens when (for example) in pascal, with nested comment, a "(*" as added at the start of file, turning all into a comment. That is a todo, it needs to be optimized to scan only to the point needed (last visible line, or last line affecting visible lines).

A line can only be scanned if all lines above were scanned. Only then the range at its start is valid. (though there may in future be HL that can provide the range without this requirement)

Quote
Q: Wow, I am now noticed that SynEdit has such stupid behaviour: reparsing a line more than once.
A: You are another stupid when exposed earlier conclusion without knowing the big design behind it.
   
SynEdit (core) itself does not trigger or need "reparsing"

Some modules (like folding) need to parse a line a 2nd time, to got extra info.
BUT this is only for visible lines, not for the entire document (so maybe 50 out of 10000 lines).

Also those modules can/should then cache the data the got (until the line changes).

If several independent modules need to do this, it may be possible to further optimize this....
Quote
Q: Lets Don't be pedantic. When a line is unchanged, why this line being parsed twice or thousands time?
It is not. (not thousands)

Unless modules are added to synedit that do not cache the info.

It may be scanned twice, because the 2 scans operate in different mode, with different purpose.

With many modules it may have ONE extra scan per module. This will eventually need to be optimized, but is not a problem at current.
Quote
Q: can the parsed info be stored statically, so parsing will only be required once?

See above, there may in the future be a need to CACHE this for the VISIBLE lines (and visible lines only).
There is no point to store this info for 100.000 lines.

Just a note:
A huge part of this design is to try and keep memory usage down (and yes there is a lot still to be optimized even further than now)

Why? Why, if memory is cheap and available?

Well because using more memory can slow down computation.

I made this experience when working on fpdebug. Loading, storing, and then working with (searching) debug info. I tested with debugging the IDE itself, with all packages having full debug info. Initially the index (created in memory on the debug info) was several 100 MB. Not much. But finding a format that required just under 100MB, even though it made the code itself more complex, improved speed quite a lot.
« Last Edit: December 10, 2015, 06:57:15 pm by Martin_fr »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5691
    • wiki
Re: Reinvent the SynEdit
« Reply #3 on: December 10, 2015, 07:38:59 pm »
1) I found the TSynPasSyn introduces a new Range class : TSynPasSynRange.
it can't directly use TSynCustomHighlighterRange (for folding), by the reason of extend the infos attached to this class.
Meaning: TSynPasSynRange introduces new properties: PasFoldEndLevel,  PasFoldMinLevel,  PasFoldFixLevel,  LastLineCodeFoldLevelFix,  BracketNestLevel, Mode.


I think it's bad design. Because, each ancestor will also introduce new Range class, by only the reason of extending the infos being added?
This way, will bother any HL ancestor, because the new introduced range class will be used almost everywhere inside the ancestor HL class.

The base class provides the minimum shared by all HL. So it should be kept as it is.

However if there is a field that will be used by many HL then a new base class, inheriting the current, can be introduced for those fields.

LastLineCodeFoldLevelFix is something that only a few HL will need. (e.g. LFM HL will never need it)


PasFold...Level may be something that certain other HL can use. IF it is needed at all. PasHL needs review, if that is needed.

Quote
The real world solution is will be something like PNodeInfo of TVirtualTree.
Using this solution, HL author will only inherit a few method, no new class maintenance of the range class, and many other benefit of clean maintainability.

You still would need inheritance, since some methods are overridden and add functionality. Some of this can be very specific to one HL, yet it may belong into the range class, and not into the HL.