Lazarus

Miscellaneous => Suggestions => LCL => Topic started by: Edson on August 22, 2013, 11:40:47 pm

Title: New ID attributes on SynEditHighLighter unit.
Post by: Edson on August 22, 2013, 11:40:47 pm
Is it posible to include new ID attributes (and his correspond Property) on SynEditHighLighter?
Currently, i just can see:

  SYN_ATTR_COMMENT           =   0;
  SYN_ATTR_IDENTIFIER        =   1;
  SYN_ATTR_KEYWORD           =   2;
  SYN_ATTR_STRING            =   3;
  SYN_ATTR_WHITESPACE        =   4;
  SYN_ATTR_SYMBOL            =   5;                                             //mh 2001-09-13

I think there should be additionally: SYN_ATTR_NUMBER and SYN_ATTR_MACRO.

Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on August 23, 2013, 12:15:46 am
SYN_ATTR_NUMBER  makes sense.

MACRO ? How are you plan to use this.  Does defining it for *ALL* highlighters have an advance, even very few, if any will have it?

You only need this, if you do something like
Code: [Select]
for i := 0 to list.count do begin
  HighlighterOfUnknownClass := list[i];
  HighlighterOfUnknownClass.GehtDefaultAttribute(SYN_ATTR_NUMBER )
 
 

Or am I missing anything?

------
SYN_ATTR_NUMBER   I will add, IF you supply a patch.

The patch should include adding SYN_ATTR_NUMBER  to the implementation of GetDefaultAttribute of the majority of those highlighters that can return it.


Adding SYN_ATTR_NUMBER , without extending the GetDefaultAttribute will mean that someone will probably complain of it as a bug. And then I am the one who has he work to do.....

-----
 SYN_ATTR_MACRO  How many highlighters do we have, that have this, or what is a good example for how to use it.

If you write your own HL, you can always define it there.

we can add  SYN_ATTR_FIRST_FREE = 6

and then you can do
 SYN_ATTR_MACRO = SYN_ATTR_FIRST_FREE +1;

SYN_ATTR_FIRST_FREE can be changed if  other values are added.
Or in can be
SYN_ATTR_FIRST_FREE = 1000;

Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on August 23, 2013, 08:22:41 pm
Thanks Martin for the information.

For "macro" I mean, the directive #define (of C), or the $IFDEF (of Pascal). I don't know if they have another name. I see it's called "fDirecAttri" in the C++ highlighter.

I realize that many highlighters redefine the properties:

    property CommentAttribute:
    property IdentifierAttribute:
    property KeywordAttribute:
    property StringAttribute:
    property SymbolAttribute:
    property WhitespaceAttribute:

and add some more, especially "NumberAttribute".

I think they should, use the propertys defined on "TSynCustomHighlighter", instead of re-declare their own (that is what I do), and grow innecesary. But for do that, we need more constants, like SYN_ATTR_NUMBER (and his property of course). FAIK, it will be more efficient.

Am I wrong?

I know,  I can always, modify my code for manage that, without modify "TSynCustomHighlighter", but it will better if it will be included.

If I have to do a patch. No problem. Just give me some information about that, and I would do.


Greettings.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on August 23, 2013, 08:53:41 pm
For "macro" I mean, the directive #define (of C), or the $IFDEF (of Pascal). I don't know if they have another name. I see it's called "fDirecAttri" in the C++ highlighter.

I would not use MACRO for #define, because there also are others (#include). In C they are called preprocessor directives. In Pascal there is no preprocessor, so they are just directives.

Quote
I realize that many highlighters redefine the properties:
    property CommentAttribute:
    property IdentifierAttribute:
...
Quote
and add some more, especially "NumberAttribute".

I think they should, use the propertys defined on "TSynCustomHighlighter", instead of re-declare their own (that is what I do), and grow innecesary. But for do that, we need more constants, like SYN_ATTR_NUMBER (and his property of course). FAIK, it will be more efficient.

Ok, so you do not just want to add the constants, but also add the properties.

As for growing:
adding
Code: [Select]
    property StringAttri: TSynHighlighterAttributes read fStringAttri;
produces less code and data, than having to add code to "GetDefaultAttribute"

Yet adding a property "DirectiveAttribute" to the base class (even if not published, and not visible in ObjectInspector) makes this property present and accessible in all classes, even if they do not use it. It will also show in code-completion, which is not desirable.


Besides, that this is not about saving a few bytes. This must be about design. If there was a good reason to be able to iterate all highlighter classes, and ask each of them for a certain attribute, then this makes sense. But for that you do NOT need the property. You only need the constant and access to "GetDefaultAttribute" (ok, it is protected so that would be an issue).
Actually for that there is
Code: [Select]
    property AttrCount: integer read GetAttribCount;
    property Attribute[idx: integer]: TSynHighlighterAttributes read GetAttribute;
and you can check Attr.Name = SYNS_AttrNumber / not as nice as having a number.....

------------
Anyway the conclusion is, we can talk about ways to iterate attributes, or ask for specific types by the use of ONE single function like "GetDefaultAttribute"

But I do not think, that it is a good idea to add further attibute properties to the base class (as far as I go, I would rather remove them, but don't worry, I wont, as it would break compatibility)

Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on August 25, 2013, 03:31:41 am
Thanks Martin for the response.

I understand your point. But, would you, please tell me:
What is the reason of we have these properties:     

property CommentAttribute
property IdentifierAttribute
property KeywordAttribute
property StringAttribute
property SymbolAttribute
property WhitespaceAttribute

defined in "TSynCustomHighlighter" if we have to re-define them in each descendant class we use?
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on August 25, 2013, 11:15:16 am
I dont know why they where added. they existed before I joined. Probably even in the original synedit.
I can only guess that someone assumed they would be used by (almost) every HL ?


And you do not have to redefine them?. Though you do have to publish them (but thats a different topic).
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on August 26, 2013, 05:38:49 pm
I have checked the original SynEdt and these properties exist in it. I can also see, that it has one less property than the Lazarus versión, so I can deduct that someone added one more constant (and property) to "TSynCustomHighlighter" in the Lazarus version (probably in the indicated date):
  SYN_ATTR_SYMBOL   =   5;                 //mh 2001-09-13

I think the "TSynCustomHighlighter" class, have a good design. His properties "xxxxxAtribute" are usefull.
They are a elegant way to access to the attributes of some general categories, through the editor, independenty of wich highlighter is the editor using (Using "GetDefaultAttribute" is not elegant and not accesible).

I have a example. If you have one App with many editors windows opened, and each one can have one diferente highlighter. If you have to configure the attributes of one editor, it will be easy to use just one unique "configuration dialog" that access to the "highlighter" property of  the editor and using this general properties (the no used must return NIL) instead of ask wich highlighter is using this particular editor.

The example should be valid too for one editor with many possible highlighters.

It's not exactly iterates highlighters. It is a like generic access to highlighter attributes of any editor.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on August 26, 2013, 07:41:15 pm
I think the "TSynCustomHighlighter" class, have a good design. His properties "xxxxxAtribute" are usefull.
They are a elegant way to access to the attributes of some general categories, through the editor, independenty of wich highlighter is the editor using (Using "GetDefaultAttribute" is not elegant and not accesible).

I still prefer the  "GetDefaultAttribute"

It does not have to be a function. It can be ONE property

Code: [Select]
type TSynAttrId = Integer;

property AttributeByID [Id: TSynAttrId]: TSynHlAttr read ... write ...
 function HasAttrId(Id: TSynAttrId): boolean;
function AttrIdCount: Integer;
function AttrIdFromIndex(idx: Integer): TSynAttrId
property AttributeByIndex [Id: TSynAttrId] :TSynHlAttr read ... write ...
........


much more powerful , and no properties that do not work in child classes

Quote
I have a example. If you have one App with many editors windows opened, and each one can have one diferente highlighter. If you have to configure the attributes of one editor, it will be easy to use just one unique "configuration dialog" that access to the "highlighter" property of  the editor and using this general properties (the no used must return NIL) instead of ask wich highlighter is using this particular editor.

The example should be valid too for one editor with many possible highlighters.

It's not exactly iterates highlighters. It is a like generic access to highlighter attributes of any editor.

an editor such as the IDE itself.

highlighters already have a list of all attr. the ide uses that.


Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 22, 2013, 05:37:47 pm
@Martin,

After some analysis, I still think it's necessary add, at least:

SYN_ATTR_NUMBER

to the "TSynCustomHighlighter".

Could be useful to include SYN_ATTR_DIRECTIVE, SYN_ATTR_FUNCTION, SYN_ATTR_VARIABLE.

Make public "GetDefaultAttribute", could help.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 22, 2013, 05:46:59 pm
I already wrote

Quote
SYN_ATTR_NUMBER   I will add, IF you supply a patch.

The patch should include adding SYN_ATTR_NUMBER  to the implementation of GetDefaultAttribute of the majority of those highlighters that can return it.


Adding SYN_ATTR_NUMBER , without extending the GetDefaultAttribute will mean that someone will probably complain of it as a bug. And then I am the one who has he work to do.....

Note, that I will only add it, if the patch also fixes the majority of HL to return it.

And it is only for GetDefaultAttr. No new global property
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 22, 2013, 06:23:23 pm
Ah OK, so the proposal is still opened. I just need to do a patch.

But I don't understand why I need to modify other HL.

For all I can see, it just need to include this lines to the unit SynEditHighlighter:

Code: [Select]
  SYN_ATTR_NUMBER            =   6;

...

  TSynCustomHighlighter = class(TComponent)
  ...
    property NumberAttribute: TSynHighlighterAttributes
      index SYN_ATTR_NUMBER read GetDefaultAttribute;
...

This won't affect others HL. All of them who use "NumberAttribute" property, have their own definition for this property.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 22, 2013, 06:36:40 pm
Again, I will NOT add a new property called NumberAttribute.

I will add the constan, IF:
... as already stated.

Please look at GetDefaultAttribute for eact highlighter.

Each highlighter  knows the constants. If I add a constant, then each HL, that has a number, must return for that constant....

-----------------------------------
The other way is

iterate all Attributes, using the existing
    property AttrCount: integer read GetAttribCount;
    property Attribute[idx: integer]: TSynHighlighterAttributes
      read GetAttribute;

and add
property AttributeUsageId read ;
to
TSynHighlighterAttributes


then you can find them by iterate. But all HL, must create the attribute with the ID
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 22, 2013, 07:22:16 pm
Again, I will NOT add a new property called NumberAttribute.

I don't end to understand, but I suppose you have good reasons for that.

So you accept to add the constant and make public GetDefaultAttribute. It'¡s OK?

Each highlighter  knows the constants. If I add a constant, then each HL, that has a number, must return for that constant....

OK, I get your point. You want that if we add a new constant, all the HL can now return a correct attribute value, if we use GetDefaultAttribute() with this new constant.

Well I see this is not so necessary, based on that, now users, don't use GetDefaultAttribute() for access to NumberAttribute. But it could be included on the patch.

The other way for access attribute (AttrCount, Attribute[]), is not so secure as I can see, for identify a particular attribute.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 22, 2013, 07:34:53 pm
So you accept to add the constant and make public GetDefaultAttribute. It'¡s OK?

Yes.
And we can also add a constant for CompilerDirectives in that case.

We should introduce a new (better named) public method, that calls GetDefaultAttribute
Maybe
  function AttibuteForTokenType

It does NOT need to be virtual, and can just call GetDefaultAttribute.

Quote
OK, I get your point. You want that if we add a new constant, all the HL can now return a correct attribute value, if we use GetDefaultAttribute() with this new constant.

Well I see this is not so necessary, based on that, now users, don't use GetDefaultAttribute() for access to NumberAttribute. But it could be included on the patch.

The problem is not what it would break now, but:
once the method exists, and the constant exists, users will try to use it.

And then there will be bug report: The function returns for comments, but not for numbers, that is not consistent.
So I try to avoid that.


Quote
The other way for access attribute (AttrCount, Attribute[]), is not so secure as I can see, for identify a particular attribute.

How is it any less secure? In both cases you may end up with nil or an attribute.

But never mind, both are ok.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 22, 2013, 10:09:03 pm
'AttibuteForTokenType', it's a good name for me.

If we are going to modify many HL, maybe we should consider to add other constants.

Quote
And we can also add a constant for CompilerDirectives in that case.

What it would be for?


Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 22, 2013, 10:21:00 pm
Quote
And we can also add a constant for CompilerDirectives in that case.

What it would be for?

That was what you originally wanted (only you named it macro)
pascal:
{$IFdef}

c (actually preprocessor...)
#include
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 24, 2013, 07:05:07 am
Quote
That was what you originally wanted (only you named it macro)

Ok. I didn't realize.

I was checking the range of "TtkTokenKind" on the 20 Highlighters in the Lazarus 1.0.12. And it's clear that we need to include:
* SYN_ATTR_NUMBER

These others are common too:
* SYN_ATTR_TEXT
* SYN_ATTR_VARIABLE
* SYN_ATTR_DIRECTIVE (used by Pascal and C++)
* SYN_ATTR_ASM  (used by Pascal and C++)

There is a "tkPreprocessor" type, in SynHighlighterAny, that I suppose is some kind of "tkDirective".

Another values could be:
* SYN_ATTR_CONSTANT  (used by SynHighlighterAny)
* SYN_ATTR_FUNCTION  (used by SynHighlighterSQL)
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 24, 2013, 01:45:59 pm
Quote
These others are common too:
* SYN_ATTR_TEXT
* SYN_ATTR_VARIABLE
* SYN_ATTR_DIRECTIVE (used by Pascal and C++)
* SYN_ATTR_ASM  (used by Pascal and C++)
* SYN_ATTR_CONSTANT  (used by SynHighlighterAny)
* SYN_ATTR_FUNCTION  (used by SynHighlighterSQL)
Before I comment, I need to know:

What would be examples for TKText? Does it differ from string?

Which highlighters do variables at the moment?

---

If constant is only used by SynAny, then I don't see it that important. And any number and string are a constant. So what to return as default for constant, if there are more than one?

ASM, would be ok. But some day will break. The idea (though no plan yet when) is to highlight asm with an embedded hl. (so asm will have many attributes. then which one to return as default?

Also with asm highlighted like that, there may be number(none asm), and number(asm) which can be different. Which one to return as number?

---
Even today the concept is flawed.

String/char in pascal has a default attribute (default: blue). But the pascal highlighter can mix attributes. if the char is a case label
Code: [Select]
case foo of
  'A': ;
  'B': ;
end;
then (if configured) it will have a different attribute (result of mixing or replacing).

And not only that, but the mixed attribute (the same object, with either the same or different values) is also returned for numbers and identifiers, that are case labels.

So you can not use the returned attribute, to find all numbers, because some numbers are not using the default.

----

I will still add the extra, since this concept is already there. I will place a comment there with warnings, that its usage may become less and less reliable....



In the end, an extension of the Attr[n]/ AttribCount idea might be able to cope with this better. Each attibute can belong to more than one class, The attribute can then be asked
Attrib[n].IsInTokenClass(SYN_ATTR_NUMBER)


But if you do the patch for GetDefaultAttib, then I accept that. It is only about which classes/types to add.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 24, 2013, 06:14:48 pm
Code: [Select]
What would be examples for TKText? Does it differ from string?
In SynHighlighterIni, it refers to the string value (text that follow the "=" after the Key, that is not a number).  In INI files, it usually don't have quotes.

In SynHighlighterXML it refers to the content beetwen to labels (tags). It is consider like a string even if it hasn't quotes.

Basically the difference is that tkText are strings without quotes. But I would consider them like some syntactic approach.

I have not decided yet about the scope of the highlighters. I consider that the HL should be just a lexer, but it this way, it wouldn't have to analyze KEYWORDS, that IMHO are syntactic elements. So in some cases we need to extend the scope of the HL.

In my highlighter's documentation, I have moreless, this definition:

Constants, Variables, Directives, Functions could be consider like a sub-category of Identifiers, in most of the cases, but in some syntaxs they could have some lexical special definitions, like the variables in PHP.

For me tkText, is a similar case. It Could be consider as a sub-category of Identifier of it could be a special token (lexical different to identifiers).

So tkText, could be at the same level of Constants, Variables, Directives, and Functions too.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 24, 2013, 06:23:24 pm
Code: [Select]
Which highlighters do variables at the moment?
According to I have pointed, the languages who can easy identify the variables like special tokens (lexical level).

They are:

SynHighlighterBat
SynHighlighterSQL
SynHighlighterPHP
synhighlighterunixshellscript
SynHighlighterPerl
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 24, 2013, 06:46:05 pm
Code: [Select]
If constant is only used by SynAny, then I don't see it that important. And any number and string are a constant. So what to return as default for constant, if there are more than one?
Constants are only used by SynAny. In this case, they are considered as a subcategory of the identifiers, (syntactic level), so they are at the same level that Keywords.

Numbers and strings are constant too, but at the Syntactic or Semantic Level. At the lexical level they are differents tokens category. It depends on what level we want to carry the HL.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 24, 2013, 07:24:17 pm
Code: [Select]
What would be examples for TKText? Does it differ from string?
I have not decided yet about the scope of the highlighters. I consider that the HL should be just a lexer, but it this way, it wouldn't have to analyze KEYWORDS, that IMHO are syntactic elements. So in some cases we need to extend the scope of the HL.

In my highlighter's documentation, I have moreless, this definition:

Constants, Variables, Directives, Functions could be consider like a sub-category of Identifiers, in most of the cases, but in some syntaxs they could have some lexical special definitions, like the variables in PHP.

This attempt to structure tokens, on a  global (across all Highlighter) level, is adding to my dislike of the initial idea.

You can define that for a selected set of HL. But any HL is free to break this rules.

By definition, HL is not even restrained to be  a lexer. IT could be anything. Of course a higher level of complexity would bring new problems (speed). But there may be a usecase where this does not matter (e.g. it is known that the data will always be small).

The point is that those ID should not limit the future of HL. This is why I already pointed out that a warning will go along with them, that the concept of such ID may be broken in future.   
As in the example where a HL dynamically generates new Attributes (SynPasSyn already does). Such dynamically generated attributes are only known while parsing the tokens. they can not be returned by getDefaultAttr.


HL may also be nested (asm HL as part of the pas HL) This will lead to more than one default for the same token kind.
With asm, in pas, you may define the default as taken from pas.

But with the SynMulti, there is an outer HL (that may have almost none attribs of its own), and then there are many HL all nested at the same Level. What then?


---
So in conclusion: Those ID, will be an extension, that in a limited number of cases may be used succesful, but with no guarantee of support in future versions (other than in HL written by the person themself)

For that reason, I want to keep any future work resulting out of this at a minimum.


So if it must be

* SYN_ATTR_TEXT    (not my favourite)
* SYN_ATTR_VARIABLE
* SYN_ATTR_DIRECTIVE (used by Pascal and C++)
* SYN_ATTR_ASM  (used by Pascal and C++)

can be added.

 SYN_ATTR_FUNCTION   would be a bad name, they a like a 2nd group of keywords., but next is a HL, wit 3 group of keywords. Or 4 group of numbers.
Pascal has 3 sort of comments(but they shore one attribute / until now, but what if pasdoc will be recognized?)

Pascal has 2 directives (fpc {$...} and ide {%...}
the 2nd has a dynamically created attribute.

---
Absolutely no promise of maintenance.


Still not sure how this will be of use....
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 24, 2013, 07:41:24 pm
There is also the option of a class helper, that searches through
Attrib[n] / AttribCount.

Using the StoredName

for i := 0 to hl.attribCount do
  if hl.attrib[ i ].StoredName = SYNS_XML_AttrString,

SYNS_XML_AttrString, is a constant in SynEditStrConst

That way all existing attributes can already be found.

Of course it does not solve dynamically created attributes,
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 26, 2013, 06:22:41 pm
There is also the option of a class helper, that searches through
Attrib[n] / AttribCount.

Using the StoredName


It could be an option, but it depends on attributes of HL have been defined using the correct constant. And there is a lot. (I have counted more than 100, without consider XML ), very likely to confuse.

In fact "SynHighlighterAny", have some attributes defined with some different strings.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 26, 2013, 06:38:33 pm
Code: [Select]
This attempt to structure tokens, on a  global (across all Highlighter) level, is adding to my dislike of the initial idea.
It's necessary to have some kind of generalization for working with Scriptable and Multi-Sintax HL.

Of course, not all the Syntax can be included, but they can be managed with a special HL. Far all I can see, most of the current HL of Lazarus can be replaced with a Scriptable HL.
 
Code: [Select]
HL may also be nested (asm HL as part of the pas HL) This will lead to more than one default for the same token kind.
With asm, in pas, you may define the default as taken from pas.

I agree. HL should have the ability for be nested.  I can guess that the current DefaultAttribute depends on what is the current HL. But it's a problem that I haven't faced seriously by now.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 26, 2013, 06:45:56 pm
Well but if you want to access *all* attributes by an ID (which is not an option, because I will not enter the maintenance hell of 100 IDs), but in theory, you would need the same amount of ID.

The different ones in SynAnySyn are a bit of an issue.
1) They shoulfd really be updated.
2) That breaks compatiblity, if the StoredName is used as an ID to save the attribute to a file.

SynAnySyn, is really not maintained. It just happens to be there. Hence I newer noticed the problem.

--------------------
Quote
but it depends on attributes of HL have been defined using the correct constant

So does the ID, it must be added, and maintained on each HL. It just adds to the amount of things needing maintenance. And with that to the amount of things that can break.

---------------------
I said I add a certain subset, if patch is supplied. So I will stand by that word of mine.
But I will keep telling you why I thing they are a problem.

The only case where somehing like them is needed, would be, if you wantt to be able to offer a global config, for values that are common across more that one highlighter (e.g. if the IDE would allow to configure the color for comments, and apply it to all supported languages.)

But since there could be multiplry comment attributes in some languages, a way is needed that allows to retrieve *all* comment attributes from a highlighter. Not just one.

In sofar such an ID may be usable. But the current way is wrong.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 26, 2013, 06:49:35 pm
Code: [Select]
This attempt to structure tokens, on a  global (across all Highlighter) level, is adding to my dislike of the initial idea.
It's necessary to have some kind of generalization for working with Scriptable and Multi-Sintax HL.

Why, example?

Quote
Of course, not all the Syntax can be included, but they can be managed with a special HL. Far all I can see, most of the current HL of Lazarus can be replaced with a Scriptable HL.
 
Different topic. And that is mainly because most of them are kept very basic.

The only not basic one, is pascal, and that (maybe possible) will be hard to make script-able.

Quote

Code: [Select]
HL may also be nested (asm HL as part of the pas HL) This will lead to more than one default for the same token kind.
With asm, in pas, you may define the default as taken from pas.

I agree. HL should have the ability for be nested.  I can guess that the current DefaultAttribute depends on what is the current HL. But it's a problem that I haven't faced seriously by now.

But adding something new, should consider such possibilities. Even if not yet faced. Otherwise we end up with code that we may regret later.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 26, 2013, 07:40:48 pm
Quote
Well but if you want to access *all* attributes by an ID (which is not an option, because I will not enter the maintenance hell of 100 IDs), but in theory, you would need the same amount of ID.

The idea is not to identify all the possible attributes of a HL.

Code: [Select]
The only case where somehing like them is needed, would be, if you wantt to be able to offer a global config, for values that are common across more that one highlighter (e.g. if the IDE would allow to configure the color for comments, and apply it to all supported languages.)
That's the idea.


It's necessary to have some kind of generalization for working with Scriptable and Multi-Sintax HL.

Why, example?


Do you refer to why it's necessary to make generalization?

I think it's obvious that a Scriptable Highlighter must have a finite quantity of elements and rules in order to cover many syntax. But trying to cover all the existent syntax would be impossible, and it will require a lot of code and process.

Sorry, probably I don't understand the question.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 26, 2013, 08:16:18 pm
Quote
Quote
The only case where somehing like them is needed, would be, if you wantt to be able to offer a global config, for values that are common across more that one highlighter (e.g. if the IDE would allow to configure the color for comments, and apply it to all supported languages.)
That's the idea.

Ok, but then we only need ID that occur in more than one highlighter.

Here is another thing how, that may need to be handled.

Example perl
   $var1 = "text$var2"

$var2 is both, variable, and string.

If I ever have to much time, and work on the perl highlighter, then either string, or var-attrib will be made a modifier, so that the final result can be a mix.

A modifier attribute has things ilke alpha (how to blend with the other attribute).

Now some HL will have string as basic attib, some as modifier-attrib. How do you copy values.

This problem exists of course, never mind how the groups are done. So it is NOT an argument against any solution.

---
Similar:

pascal already has modifiers:
- there is a normal attribute for directives (fpc style:  {$Ifdef}_
- a modifier for directives (ide style {%region} {%w-}
   the 2nd is based on the 1st

now the default attrib for directives would obviously be the fpc style one. But if you copy some value to it, you should know there is a dependent attrib. So you can have an option to reset it.

Otherwise, if you have: ide-directive = red background + foreground copied from fpc directive
and you change fpc directive to red foreground, then ide directive is hard to read....

---
As for modifiers, they may also modify *unrelated* tokentypes
Modifier for case label, can modify string, number or identifier

Case foo of
  CONST_LOW, 5: ;
end

CONST_LOW, 5 will be modified.

--------------------

Which actually means there is no safe way. ...

Anyway, I indicated which attribs are acceptable. So you can write a patch for them if you want (I will still keep telling you, why I do not like it / but I made the offer, and it stands)
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 26, 2013, 08:27:10 pm
Quote
Do you refer to why it's necessary make generalization?

I think it's obvious that a Scriptable Highlighter must have a finite quantity of elements and rules in order to cover many syntax. But trying to cover all the existent syntax would be impossible, and it will require a lot of code and process.

All rules are user supplied, that makes them finite, since a users lifetime is finite, and that acts as limitation (also available hard disk space)

Otherwise there is no limit.

The HL is a state engine.

Each state is define by either:
- one value out of a userdefined set of values
- a combination of values, each value out of a userdefined set of values

Each state has rules that lead to the next state,

The rule may for example be a keyword. In that case a  keyword may only act in some states.

The same word, may be 2 different keywords, depending on the state.

---

Comments are no exception.

(* is the keyword that enters the state. And for the state only *) is defined as rule.

----
In order to handle nested states a stack is needed.
So that (* does not simply replace the state, but puts the previous state on the stack.
*) then pops the previous state of the stack.

You also need that for coed blocks (In perl or C: {} / pascal: begin end)

---
Some keywords only exist in other blocks. So you do need to keep track of all that. Then you may do it in a generig way.

A script able highlighter has now idea about strings, comments, ... They are all just states.


Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 27, 2013, 01:10:37 am
 %)
This is material for a new Topic. Some like  "Reflexions about the Scope of the New Highlighters".

It's a very interesting topic. I have faced some of this issues  when designing a highlighter. and I can resume it, saying what I have pointed before:

Code: [Select]
Numbers and strings are constant too, but at the Syntactic or Semantic Level. At the lexical level they are differents tokens category. It depends on what level we want to carry the HL.
Like I see, most of the difficulties you find on the definition of HL, are because they bellow to the Sintactic level.

At the Lexical level, tokens are tokens and they don't depend on blocks, range or context. At the next level, there are other complications. Thats why I prefer to maintain the HL like lexers with just a few sintactic features.

Returning to the topic, we agree on including:

* SYN_ATTR_NUMBER
* SYN_ATTR_VARIABLE
* SYN_ATTR_DIRECTIVE
* SYN_ATTR_ASM 
* SYN_ATTR_TEXT   

Now, I just need to know how to make a patch.  :-\
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 27, 2013, 01:53:57 am
Now, I just need to know how to make a patch.  :-\

And how to use [ quote ] instead of [ code ]....

WinMerge can generate patches. But you need the original, and the modified.

Since you should base a patch on the latest SVN, I recommend to use TortoiseSVN, which also has the option to create a patch.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 27, 2013, 02:14:30 am
About lexical and syntactical.

I am not sure what the point is (other than complexity and speed...)

But a generic HL does not need to handle any token special.

It may have a hardcoded implementation for strings, but a string token is still nothing special. No different from a keyword, or anything else.

A pascal string (simplified) could be defined as:
'  go to state "string"

In state "String" the following tokens are recognized (and end the state)
' #10 #13 #0

And that's it.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 27, 2013, 04:17:56 am
Quote
I am not sure what the point is (other than complexity and speed...)

It's order. It is dividing the complexity in layers for better analysis and implementation.

If we define the HL just work identifying tokens (lexical) we simplify the process. Just need to define simple rules. No worry for context, ranges, blocks.

Strings are tokens that we can define in differents ways. Using the syntax of this highlighter http://forum.lazarus.freepascal.org/index.php/topic,22148.0.html
We can define Strings style Pascal:

  <Token Start="'" End="'" Attribute='STRING'></Token>
  <Token Start="#" Content = '0..9' Attribute='STRING'> </Token>

The first sentence is what I call "Delimited Token". The second way is "Token by content". With this two ways of definyng tokens, we can cover most of the tokens categories. Numbers are usually "Token by content", Comments are usually "Delimited Token".

One line Comment:
  <Token  Start="//" Attribute='COMMENT'> </Token>

Directives:
  <Token Start="{$" End="}" Attribute='DIRECTIVE'></Token>
  <Token Start="{%" End="}" Attribute='DIRECTIVE'></Token>
 
Strings style Python:
  <String Start="&quot;&quot;&quot;" End="&quot;&quot;&quot;" multiline=true></String>


But they are just tokens, wherever they appear. Given the HL the ability for changing the category of one token is not part of the lexical level.

We can focus the problem of the blocks, at the same way we want to treat the ASM blocks. Each block or range (begin ... end, repeat ... until) should be strictly processed for a nested HL. Inside a block there is only tokens. Managing blocks is part of the sintactical work.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 27, 2013, 01:53:57 pm
Quote
It's order. It is dividing the complexity in layers for better analysis and implementation.

ok, so basically a way to deal with complexity, and many resulting properties.

You are right, about strings, if you match tokens with regular expressions (or similar). But then they are still nothing special.

Of course if you want to keep it simple, and avoid any kind of state or context, then it may be hard to deal with fpc nested comments. You do need to maintain the nestlevel.

But back to the point, which IIRC was
Quote from: Edson
Constants, Variables, Directives, Functions could be consider like a sub-category of Identifiers, in most of the cases, but in some syntaxs they could have some lexical special definitions, like the variables in PHP.
Quote from: Edson
Quote from: martin
This attempt to structure tokens, on a  global (across all Highlighter) level, is adding to my dislike of the initial idea.
It's necessary to have some kind of generalization for working with Scriptable and Multi-Sintax HL.

Of course, not all the Syntax can be included, but they can be managed with a special HL.
I do not see at all, why making one token "subclass" to another is needed. You may do so for your own HL (though I do not even see, why that would be needed [1]), but in generic (cross HL) there seems no point in it.

The closest to that is the color config of IDE directives in pascal, which inherits the colors of fpc directives. But the tokens are still independent.

[1] Maybe if you want to offer dependent configs? But then there would (or could) be new base-classes of attributes. E.g. pascal could have 3 directive attributes instead of 2: directive-base, D-fpc, D-ide. The base would never be used alone. But with that you get back to the question, which one is the default?

Quote from: Edson
I think it's obvious that a Scriptable Highlighter must have a finite quantity of elements and rules in order to cover many syntax. But trying to cover all the existent syntax would be impossible, and it will require a lot of code and process.

finite, yes, but due to the limitations of any PC, and human beings.

But hard-coded maximum (other than high(Integer)) ? No.

A user can define as many attributes as he wants. So long as the can be identified by the matching engine.
- Environmen-var %foo
- global var ::Foo
- local var $foo
- object var :$foo
....
There is no limit, Neither should there be.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 27, 2013, 06:48:30 pm
A string is just a simple token, because usually it's a delimited token using the same single-char character like delimiter. But it's not necessarily true.

  <Token Start="#" Content = '0..9' Attribute='STRING'> </Token>

This is an unusual definiton for a token string used in Pascal.

Quote
I do not see at all, why making one token "subclass" to another is needed. You may do so for your own HL (though I do not even see, why that would be needed [1]), but in generic (cross HL) there seems no point in it.

The Keywords are the clasical example. They don't have a special lexical definition. They have the same definition for identifiers:

  <Token Start= "A..Za..z_" Content = "A..Za..z0..9_" Attribute='IDENTIFIER'> </Token>

So they are no other category of token (of course we can create a definition for every weyword and make them common tokens, but it's not practical).

Like the keywords are a subset of the Identifiers, we can consider them a sub-category of identifiers.

This is a fast way for identify a keyword. First we find a identifier and then we compare if it is a Keyword. (We can apreciate this on most of the HL implementation).

The same way we can do with constants, variables, and functions. If all of them have the same token definition. (Pascal have).

But some languages can have a different token struct for a variable (like PHP), constant, function or even a Keyword. In this cases we have to create a special definition for this kinds of token:

  <Token Start= "A..Za..z_" Content = "A..Za..z0..9_" Attribute='IDENTIFIER'> </Token>
  <Token Start= "@" Content = "A..Za..z0..9_" Attribute='KEYWORD'> </Token>

Here, we can not say that a KEYWORD is a subcategory of an IDENTIFIER.

Other example of subcategory, no so easy to see, could be the "Operator". I can define an OPERATOR like a sub-category of a SYMBOL (I haven't implemented yet). Again, we do this for speed and avoid to write a token definition for each operator.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 27, 2013, 07:29:22 pm
String exist in many more form, such as containing escaped quote. Or special escapes allowing them to go past end of lline, to the next line.
In perl there is q//, q{}, q!!, ... and many more.

And that is just by looking at a few given languages. Highlighters must be able to deal with any definition of a language.

And if we deal with yet unknown languages, then why should keywords be subset of identifiers?
A language could define, that keywords must be all upper, while identifiers must be all lower case. Then they are entirely different groups.

Reading on, you pointed that out yourself.

So then again, why applying those "scope" rules, to a scriptable HL, when the power of a scriptable HL should be, that it can deal with any language?

As for operator: "and" is also an operator. It may be both is some languages, a keyword and an operator., then it would be in both groups (according to your rules): identifier and symbol. But that makes no sense.

--------------------
If you want to do such grouping in your own HL, no one stops you.

But limiting all highlighters, by adding the concept of rules, that can not be fulfilled by all highlighters, imho not a good idea.


--
One more, you assume that identifiers are always separated by none letters.

There is a programming language called "whitespace". No if this exists, it is possible to also create one, that has no spaces at all, but where there are only a-z and nothing else. all tokens , keywords, operators, identifiers are part of a single very long word.
One way would be camelcase, they all start with an uppercase, and consume the following lower-case letters

Another would be that they start with a letter that indicate the len (A=1.-.Z-26). Then parsing can only start at the very begin:
Code: [Select]
AXBXYCFOOAX
AX BXY CFOO AX

Again what then is a subset of what? All tokens are made of a-z, and all are at the same level. Are operators a subset of identifiers, or identifiers a subset of operators?
There may not even be identifiers. "Whitespace" and "brainfuck" to not have identifiers...
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 27, 2013, 08:08:30 pm
Quote
why should keywords be subset of identifiers?

No. That's not what I have expresed:

A keyword/Variable/Constant/Function, DO NOT HAVE to be a subcategory of a Identifier. I have showed one definition of when it's not true. It's optional for who define the syntax.

Quote
A language could define, that keywords must be all upper, while identifiers must be all lower case. Then they are entirely different groups.

Totally agree:

  <Token Start= "A..Z" Content = "A..Z_" Attribute='KEYWORD'> </Token>
  <Token Start= "a..z" Content = "a..z" Attribute='IDENTIFIER'> </Token>

Quote
So then again, why applying those "scope" rules, to a scriptable HL, when the power of a scriptable HL should be, that it can deal with any language?

For speed and complexity.

Quote
As for operator: "and" is also an operator. It may be both is some languages, a keyword and an operator., then it would be in both groups (according to your rules): identifier and symbol. But that makes no sense.

It seems, I haven't been so clear.  :o You are mixing the lexical and sictactical levels.

Quote
One more, you assume that identifiers are always separated by none letters.

No. That's not true. Look at the definition os an identifier:

  <Token Start= "A..Z" Content = "A..Z_" Attribute='IDENTIFIER'> </Token>

The definition if for content. It can include spaces or symbols. And if I define for delimiters, it can contents almost whatever, like a HEREDOC.

Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 27, 2013, 08:34:10 pm
Ok, no we do have a huge misunderstanding somewhere.

You spoke about introducing scopes (as in one type of token being a subset for another):

1) This is fol your HL only, and not generic?
There is no code/definition or other reference to has in any other HL?
None of the base classes for HL need to provide anything special for this?

2) This is.. or This is not ... related to the  IDs for GetDefault attribute?

3) (Assuming "yes, only your HL" in (1) ): Your HL will.. enforce this? ... offer it as an option?

-----
Code: [Select]
It seems, I haven't been so clear.  :o You are mixing the lexical and sictactical levels.
How so? Or how so, any more than you?

How is that different from choosing some identifiers as a keyword? Or from defining some symbols as operators (which you mentioned, by declaring them as subset)

I did not say that "and" is an operator depending on context. In the example "and" would *always* be an operator, same as "+" is.

There is no syntactical analyses needed to say: "and" is an operator.

----
Code: [Select]
[quote]    One more, you assume that identifiers are always separated by none letters.
[/quote]No. That's not true. Look at the definition os an identifier:

Sorry inaccurately worded by me.

The point is, a language could define +++-* as identifier (allowing it as name for a variable or function. (In brainfuck there are only symbols).

--------------------
So then the point about the scoping is:
Quote
For speed and complexity.

I am not convinced. But since I have not looked at any code, I can not judge, if in your case it will speed up things, or reduce complexity.

Just at what point, will there be established, what is a subset of what? Since it will only be known after the user conf was read?

Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 27, 2013, 10:32:50 pm
You spoke about introducing scopes (as in one type of token being a subset for another):

Yes. I spoke about:
a) The scope of a nested HL (range or block).
b) The ability for defining subsets of a token with a different Attribute (like Keywords).

I'm not sure what do you refer exactly?

1) This is fol your HL only, and not generic?
There is no code/definition or other reference to has in any other HL?
None of the base classes for HL need to provide anything special for this?

For creating my HL, I have been learning of some Lexer's and editors (Notepad++ and UltraEdit). One lexer can support nesting, but it's too much power (and process) for a highlighter. I had to simplify the flexibility for gain speed.

As I have seen, Nested HL are a need if we can make a reasionably flexible scriptable HL.  By now I haven't implemented the nesting. (I'm taking a rest of HL by now), but I expect to develop it, on a future.

I had not structured yet, how it will be defined a case of nested HL. I have some ideas buy nothing clear by now. Not even if this will need to modify the base class.

But what I can say by now, is that (and this is material for another Topic) it's related to the Folding and Code-Completion features.

If we work with nested HL, we should consider, that even the whole HL have attibutes, like the background color, or the default font. We could see all the background of a procedure on a different color, if we use nested HL for blocks.

Quote
2) This is.. or This is not ... related to the  IDs for GetDefault attribute?

In some way. Because when working with scriptable HL, it's necessary to make generalization about the attributes.  If the HL works at the lexical level, it can name their attributes on any way (probably need just a few constant SYN_ATTR_COMMENT, SYN_ATTR_IDENTIFIER,   SYN_ATTR_KEYWORD , etc.).

If we work at the syntantic level, we can managed a lot of attribute (SYNS_AttrASP, SYNS_AttrAssembler, SYNS_AttrAttributeName,   SYNS_AttrAttributeValue, SYNS_AttrBlock, ...)


Quote
How is that different from choosing some identifiers as a keyword? Or from defining some symbols as operators (which you mentioned, by declaring them as subset).

For visual results, no differences.

The operator "and" could be defined as a Subset of Identifiers with the attribute OPERATOR. For visual efects, it will be similar to defining the operator "&&" like a subset of Symbol.

There is not inconsistence.

But if we don't define "and" like a OPERATOR token, it will be sure considered as an identifier (and coloured as such). And it's lexically correct. Probably, syntactically we know that "and" is an operator, but the HL have not way for know it.

Again, there is not inconsistence.

Remember when I said that managing subsets of tokens is some kind of sintactical approach of a HL.

Quote
The point is, a language could define +++-* as identifier (allowing it as name for a variable or function. (In brainfuck there are only symbols).

OK we can define some like:

 <Token Start= "+" Content = "+-" Attribute='IDENTIFIER'> </Token>

Or

 <Token Start= "+" End = "-" Attribute='IDENTIFIER'> </Token>

The rule is than at "Lexical Level", an attribute is just THE WAY A TOKEN IS SHOWED in the screen. This mean that a VARIABLE attribute, no necessary have to correspond to the definition of a VARIABLE in the syntax of a language.

In fact I could define my identifiers like:

<Token Start= "A..Za..z_" Content = "A..Za..z0..9_" Attribute='DIRECTIVE'> </Token>

And if the attribute DIRECTIVE have the same properties that I expect to see on the editor for an IDENTIFIER, the user won't have idea (and really don't care) the name of the attribute, except for Config.

About dinamyc attributes:

We can define dynamically, many attributes on a syntax.  This definition create a new attribute, with the label NUMBER:

  <Token Start="0..9" Content = '0..9' Attribute="NUMBER"> </Token>

We can say that the NUMBER attribute always exists, but in a scriptable HL, you have to create it.

Currently I can not create dinamicaly attributes on my highlighter (But I have analysed that possibility), but I have defined two "empty attributes" for using on defining new token struct:

  <Token Start="{%" End="}" Attribute='EXTRA1'></Token>

Craete dinamicaly an new attribute on a scriptable HL should consider the name of the attribute ("MYSTRING") and the label (STRING).

Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 27, 2013, 11:26:39 pm
Quote
For creating my HL, I have been learning of some Lexer's and editors
Yes, I noted you done some research...  you forced me to do a bit of read up too, somewhere back on this thread.

Going to reply out of order...
-----------------------------------------
about the "and"
You wrote:
Quote
Other example of subcategory, no so easy to see, could be the "Operator". I can define an OPERATOR like a sub-category of a SYMBOL
I do not know: did you mean that in a lexical, or syntactical parser? IMHO that decision can be done on a lexical level. The token (e.g. "+" can be looked up in a lexicon, and there it says operator)

Then the same can be said for "and". "and" only exists as operator. It never can be anything else. So it can be determined on a lexical level.

Meaning, that if you introduce subclasses as you said "operator =subclass of symbol", then something is wrong.
Just goes to show, how very tricky it is to define those subclasses.

Nevertheless, you are free to define and as keyword (current pas HL does do that do / in fact "and" is both"). Then it (kind of) works.

-----------------------------------------

Quote
Code-Completion features
Is not currently HL related at all. Codetool have there own scanner, that does run much less frequent.
-----------------------------------------
Quote
a) The scope of a nested HL (range or block).
b) The ability for defining subsets of a token with a different Attribute (like Keywords).

I was mainly about (b).

*** As for (a) *** (referred to as "assumption" in the rest of my reply)
"nested HL"
1) 2 HL, one nested in the other (e.g. asm in pascal)?
2) nested = context sensitive (nested comment, or keyword (nested) in a block?

*** As for (b) ***
I am very sceptical of this. I can see that for some fixed HL, where such thinks can be predicted, but in a scriptable, I see it as a limitation.

As I already set (and tried to show), only after the scripted definition are read, you can calculate, what may be a subset of what.
But then it just means extra vork, extra code, and I see no benefit.

If you define it upfront (hardcoded), It is a limitation, and only text can be highlighted where the subsets are indeed of the expected kind.

Anyway
Quote
For creating my HL
Your choice then.

-----------------------------------------

Quote
If we work with nested HL, we should consider, that even the whole HL have attibutes, like the background color, or the default font. We could see all the background of a procedure on a different color, if we use nested HL for blocks.

Not sure is this assumption 1 or 2?
HL, can mix attributes, so yes you can define to return multiple attributes for some code.

-----------------------------------------
Quote
In some way. Because when working with scriptable HL, it's necessary to make generalization about the attributes.

This, I have a real problem with.

I do not see any need for such assumptions. On a lexer level, there is a direct mapping from match to token-kind.
The token kind must have no meaning to the lexer at all.

On a syntactical level, state can change depending on token-kind found. There may be groups of token sharing behaviour.
It is not necessary to group them, but then the behaviour must be repeatedly specified by the user in the config file.
Grouping is an option to make configuring easier for the user. But grouping itself must be a configuration. Hardcoded groups are always going to be a limitation.

-----------------------------------------
As for "how to create attributes", I newer doubted the ability to create the rules for any of my examples.
My examples where given only to show how the could conflict with any assumption made about token subclassing. (unless the subclassing comes from config, and is not mandatory)



Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 27, 2013, 11:36:09 pm
Just one more note. If the problem is to decide which rule applies...

e.g
<Token List="Begin,end,case,try" Attribute='KEYWORD'> </Token>
<Token Start= "A..Za..z_" End = "A..Za..z0..9_" Attribute='IDENTIFIER'> </Token>

Now "begin" is matched by both rules.

Then rules should have a priority. That is similar, but it does not define a relation between the resulting types (subclassing does define a relation)

priority can be the order of declaration in the list of rules.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 28, 2013, 05:54:00 am
Just one more note. If the problem is to decide which rule applies...

e.g
<Token List="Begin,end,case,try" Attribute='KEYWORD'> </Token>
<Token Start= "A..Za..z_" End = "A..Za..z0..9_" Attribute='IDENTIFIER'> </Token>

Now "begin" is matched by both rules.

Then rules should have a priority.

I have a fixed priority. Subcategory definition prevalece. It's like we define numerical series/prefix on telephone plan number:

0099 -> Country A
00995 -> Country B         //subcategory prevalece

Actually the syntax of the "syntax file" is something like:

<Token Start= "A..Za..z_" End = "A..Za..z0..9_" Attribute='IDENTIFIER'> </Token>
<Keywords> begin end case try </Keywords>

I adopted this struct, for compatibility with the Notepad++. So we can easy adapt their language definitions.

And the order of the declarations (Token, Keywords) is not important. I do two-pass reading. First I read the definition of Identifiers, numbers. Because they are the base for building other tokens definiitions.

Quote
I do not know: did you mean that in a lexical, or syntactical parser? IMHO that decision can be done on a lexical level. The token (e.g. "+" can be looked up in a lexicon, and there it says operator)

Yes, we can have some rules at the lexical level, that can define "and", and "+" like attribute OPERATOR. (Note: I don't say they are Operators, just are lexical defined with the Attribute OPERATOR). But we can do too, at the syntantical level.

There is a confusion when we talk about type, category or attribute of a token.

"Token is one or more characters, grouped by specific rules. Each token have one and only one attribute."

At the lexical level, we can only talk about tokens and their attributes.

At the syntactical level we can recognize types, procedures, functions, classes, objects, ...

Quote
Then the same can be said for "and". "and" only exists as operator. It never can be anything else. So it can be determined on a lexical level.

* OPERATOR (lexical) -> Have a forecolor, backcolor, font, ... . Just have rules for defining the token.

* operator (syntactical) -> Some token that can construct expresions, have strict rules on the language.

Quote
As I already set (and tried to show), only after the scripted definition are read, you can calculate, what may be a subset of what.
Quote
Meaning, that if you introduce subclasses as you said "operator =subclass of symbol", then something is wrong.

I don't know what is wrong.  In the HL we just define attributes for tokens. Lexically we can define operator in diferentes ways: Like a "delimited token", like a "token by content", "Like a subcategory of a token". And we can use one or more definitions at the same time. Syntactically, it doesn't care. We can even consider some other IDENTIFIER (no lexically identified) like an Operator. Probably here is the confusion.

I dont see any conflict.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 28, 2013, 10:51:37 am
Nothing you wrote does in anyway change what I wrote.

Most of it extends what can be done, but does not deal with the actual issue described by me.

And as I wrote, as for implementation it does probably not bother you.

The issue is constructed purely from what you said. With the single goal to show, that your definition (in regards to subclassing) could lead to conflicts.

Those conflicts may not matter in your implementation. But that is not the point.

The point is to show that subclassing (by the need to avoid conflicts) can add limitations to the capabilities of  the HL.
This is why I think they are not a good idea.

This is not to say that a HL must be limitless. Limits will occur, and in more than one place. But why adding them where they are (imho) not needed.

I described the problem twice. But somehow we look at it at too different an angle. So your response show that you concentrated on other aspects than I.

I saw this as a very abstract theoretical example. You seem to look at the practical side only (just guessing, sorry if wrong).

This part is not that important. But if interested, re-read my posting, and put it only in the context of those words I quoted from you. Ignoring all the possibilities, of what could be done.

-----------

Now to more practical. Let me try to understand how you deal with subclasses.

Quote
<Token Start= "A..Za..z_" End = "A..Za..z0..9_" Attribute='IDENTIFIER'> </Token>
<Keywords> begin end case try </Keywords>

I assume Keywords, are in this case meant to be a subclass of identifiers?

If so, and only if so:
At which time is that established?
1) Hardcoded?
2) calculated from the above config?
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 28, 2013, 05:53:31 pm
Quote
Most of it extends what can be done, but does not deal with the actual issue described by me.

Probably I'm not understanding the issue. Can it be shown on a practical case? Would you please put some practical case when this issue occurs?. 

Quote
The point is to show that subclassing (by the need to avoid conflicts) can add limitations to the capabilities of  the HL.

I think the confusion is about the term "subclass". I have never used this word, because it can have some strong implications. I just say "subset" or "subcategory".

Quote
<Token Start= "A..Za..z_" End = "A..Za..z0..9_" Attribute='IDENTIFIER'> </Token>
<Keywords> begin end case try </Keywords>

I assume Keywords, are in this case meant to be a subclass of identifiers?

If so, and only if so:
At which time is that established?
1) Hardcoded?
2) calculated from the above config?


By now I have it hardcoded.
Formally, it should be something like this:

<Token Start= "A..Za..z_" End = "A..Za..z0..9_" Attribute='IDENTIFIER'> </Token>
<Subset Set='IDENTIFIER' Attribute='KEYWORD'>
   begin end case try
</Subset>

I maintain in the form <KEYWORDS> </KEYWORDS>:

1. For compatibility with Notepad++.
2. For simplicity. It is easier than get the concept of "Subset of tokens"
3. Because I haven't defined other case of Subset. I'm studying the case of Symbols and Numbers.

Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 28, 2013, 06:21:36 pm
Quote
Probably I'm not understanding the issue. Can it be shown on a practical case? Would you please put some practical case when this issue occurs?. 

Depends on how it is implemented.

But every implementation that forces a sub-categorization, will add limits. And some text/language may not be able to be highlighted in that limit.

Again, there will be limits anyway. but why add.

That was my example:
ASSUME your implementation forces operator, to be a subcategory of symbol (an example you gave), then "and" can not be an operator, because it is not symbol(s).

Quote
By now I have it hardcoded.

That is exactly the problem.

It is one thing to say keywords, identifiers etc must entirely consist of a given set of chars (word token chars). Ideally configurable, but may even be hardcoded.

But the last sentence does not put a relation between identifiers and keywords.

Each of them may be a different subset of word-tokens. word tokens themself have no attribute.

defining the concept of word-token chars, allows the scanner to determine word boundaries, and then match the found words against the configured group.

This allows for
* word token char A-Za-z (0-9 maybe)
* identifiers start a-m
* keywords start n-z

Which requires keywords to NOT be a subcategory of identifiers.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 28, 2013, 06:28:38 pm
If it were not hardcoded, but configurable, then it was not so much an issue.

Because I as user could define any outer group I need, even if the outer group does not describe any highlightable set
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 29, 2013, 03:55:08 am
That was my example:
ASSUME your implementation forces operator, to be a subcategory of symbol (an example you gave), then "and" can not be an operator, because it is not symbol(s).

My implementation doesn't forces. I've tried to explain this, in many ways. Let's see this hipothetycal definition:
Code: [Select]
<Token Start= "+-:;" Content = "+-:;" Attribute='SYMBOL'> </Token>
<Subset Set='SYMBOL' Attribute='OPERATOR'> +  - </Subset>

<Token Start= "A..Za..z_" End = "A..Za..z0..9_" Attribute='IDENTIFIER'> </Token>
<Subset Set='IDENTIFIER' Attribute='OPERATOR'> and or </Subset>

Can I say "I have forced OPERATOR, to be a subcategory of SYMBOL"?.
Can I say "I have forced OPERATOR, to be a subcategory of SYMBOL and IDENTIFIER"?.

No. I just have defined the symbols "+", "-" and the identifiers "and", "or" to have the attribute OPERATOR.

"Is this definition of OPERATOR restricted to be an SYMBOL or IDENTIFIER ?"

No. Because I could have included other token definition for to have the attribute  OPERATOR with nothing to do with identifiers or symbols.

Formally: There is not a definition for an OPERATOR (like we understand syntactically) on a HL.

When I define:

<Subset Set='IDENTIFIER' Attribute='OPERATOR'> and or </Subset>

I don't say "Define all the operators for to be a subclass of IDENTIFIER".
I am saying: "By the way: Use the parser of IDENTIFIER like a filter for easy identify this tokens 'and,or' with the attribute OPERATOR"

I repeat:
In the HL we just define attributes for tokens. Lexically we can define operator in diferentes ways: Like a "delimited token", like a "token by content", "Like a subcategory of a token". And we can use one or more definitions at the same time. Syntactically, it doesn't care.

Even in my simple implementation, I don't force a token KEYWORD, to be a subset of IDENTIFIER. I can define tokens KEYWORD with any other rule.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 29, 2013, 01:55:05 pm
Ah, I see now (at least one part)

I have always read your statements that the attribute itself is also part of the subcategory. (and therefore *all* it's tokens)

But the Attribute (itself) is NOT a sub category (neither a subset) of any other attribute.

Some (not necessarily all) tokens belonging to an attribute are (or may be) a subset of tokens from another attribute.

This is different. And OK.

This was a long discussion for getting us synced. But interesting.

-------------------------------

Leaves one last point: Why/How does that need GetDefaultAttribute?

Does/Will/Do you plan: The user can define it's own attributes ?

<Subset Set='IDENTIFIER' Attribute='MYCOLOR1'> and or </Subset>
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on November 29, 2013, 05:35:26 pm
Well, it's sure I haven't expressed it correctly in English, and this post have grown unnecessarily.

Quote
Leaves one last point: Why/How does that need GetDefaultAttribute?

Actually, it can live without that. But when it's needed to access to common attributes of a HL through the editor, for config, it's desirable to have an easy way for that. And NUMBER are a common attribute.

Quote
Does/Will/Do you plan: The user can define it's own attributes ?
I haved considered. It would be relative easy. But I'm afraid of that imply to change the struct of the Base Class of the HL.

By now I have predefined these Attributes:
tkNull, tkIdentif, tkKeyword, tkDirective, tkVariable, tkNumber, tkSpace, tkString, tkComment, tkSymbol, tkLabel, tkAsm, tkExtra1, tkExtra2

Firts I am worried on defining the properties of the attributes:

<Attribute Name='MYSTRING' type='STRING' ForeColor="$$$$" Backcolor="$$$$">
</Attribute>

Also I want to include the process of symbols, and operators.

But what is important for me, is struct someway for to manage folding on a scritptable HL (for me, it belong to the syntactic level) and the nested HL. I hope you can help me on this issue.

Later, I would like to improve the code-completion, making it part of the "syntax file". In that way, all the highlighting, folding and completion could be joined on a uniq "syntax file".
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on November 29, 2013, 08:05:08 pm
Well as you pointed out earlier, not all I wrote was always expressed clearly. That is the fun of human languages....


GetDefaultAttrib for conf makes some sense.
Personally I prefer other ways, but that is personal preference.


Adding attributes from user conf should be no problem.

The highlighter can keep them in a list/array and use them. They do not need to have a property.
If you want them accessible it Object inspector, you can add a TCollection.

Configuration such as the IDE options can use the Attrib[n]/AttribCount properties. That is what the IDE does.

In fact IIRC the IDE stores extra attibutes in this list. (such as block selection color, which is not handled by the HL)

-------------------
Could you use "range based" "state driven" or similar instead of nested? nested to me always is using an asm HL inside a pas HL (2 different HL, one usde within the other)

Folding, start here, and the example folder in the IDE installation.
http://wiki.lazarus.freepascal.org/SynEdit_Highlighter

It is a complete tutorial. Feel free to ask question.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on December 08, 2013, 07:37:59 pm
Quote
WinMerge can generate patches. But you need the original, and the modified.

Since you should base a patch on the latest SVN, I recommend to use TortoiseSVN, which also has the option to create a patch.

@Martin
I have made a Ckeckout on my PC using TortoiseSVN from "http://svn.freepascal.org/svn/lazarus/trunk"

I have located the Highlighters files on my local copy on "D:\Lazarus\components\synedit", so  it's where I have to make the changes and generate the patch.

Is't OK?
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on December 08, 2013, 07:40:45 pm
Yes make the changes there.

Then check the context menu of the windows (file) explorer. In the tortoise section is "create patch"

Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on December 08, 2013, 08:42:58 pm
Ok.

I have found these new HL files:

* synhighlighterjscript.pas
* synhighlighterposition.pas

that I don't find on the Component Palett (on 1.0.14).

Do they have to be changed too?
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on December 08, 2013, 09:53:11 pm
if it is easy, then yes. But no big deal.

If one is missed, not a problem. It may get reported one day on mantis, and *one* is quick ho fix.

Only, if a lot are missing, and it gets reported, then it would be a lot of work. That is why I did ask to implement it for all/majority of HL.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on December 09, 2013, 05:29:02 am
I've prepared this patch.

I've included these new constants:

+  SYN_ATTR_NUMBER            =   6;
+  SYN_ATTR_DIRECTIVE         =   7;
+  SYN_ATTR_ASM               =   8;   
+  SYN_ATTR_VARIABLE          =   9;

and updated 14 highlighters, with these new values.

This is the first I do using TortoiseSVN, so I hope it will be OK. I have to learn more about SVN.

First of all, I would like to know how to to test this patch?. I suppose this doesn't work on Lazarus 1.0.14.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on December 09, 2013, 09:49:50 pm
just acknowledging, have seen it, and will look at asap.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on December 11, 2013, 04:26:03 pm
Just tried, it does not even compile.

    SYN_ATTR_SYMBOL: Result := tkSymbol;

Should return the attribute (as in the object holding the colors) not the token-kind.

Same for all HL.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on December 11, 2013, 07:01:50 pm
Sorry Martin, I have been doing a copy-paste from the tokend's ID instead of the attribute.

I've corrected the patch by hand, so I don't know if this is OK.

Would you please tell me, what should be the correct way for test this patch.?
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on December 11, 2013, 07:45:10 pm
I still have to check the patch.


You should be able to actually run, and recompile Lazarus from the SVN directory. Then you can test (which is part of supplying a patch).

Copy a lazarus.exe into the folder.

When calling it, use a primary-config-path (search forum and wiki), so it use a diff config-dir.

It will ask for fpc. point it to the fpc from your normal install (do not copy, move, rename the fpc folder. that will not work (or you must edit fpc.cfg)



Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on December 14, 2013, 02:42:32 pm
Committed in 43540.

You did not add a public wrapper in the patch? If you still need it, please add it to mantis, so it will not be forgotten, since I probably want look at it before the holidays.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on December 15, 2013, 04:42:10 am
Thanks Martin. But, what is a "public wrapper". Do you mean to report a bug on Mantis?

 
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on December 15, 2013, 12:53:38 pm
"public wrapper": I thought you needed a public method to access this?
If not then I am mistaken  and all is fine.

Mantis:
*IF* you do need the above and *if* you do have a patch, then subwit the patch on mantis.


Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on December 15, 2013, 05:53:13 pm
Do you refer to something like this:

Code: [Select]
    property NumberAttribute: TSynHighlighterAttributes
      index SYN_ATTR_NUMBER read GetDefaultAttribute;
   ...
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Martin_fr on December 15, 2013, 06:46:45 pm
NO, it was all discussed back on this thread.

Anyway, If the committed is good for what you want to do, then it is all fine, and no need for more.
Title: Re: New ID attributes on SynEditHighLighter unit.
Post by: Edson on December 15, 2013, 06:59:47 pm
NO, it was all discussed back on this thread.

Yes, it is what I supposed.

The "committed" is OK for me.  I would have done more changes, but it would break the compatibility.