Recent

Author Topic: MailHighligher  (Read 13879 times)

wp

  • Hero Member
  • *****
  • Posts: 11857
MailHighligher
« on: November 02, 2015, 01:14:46 pm »
I am writing a little tool which retrieves the archived Lazarus mailing list from http://lists.lazarus.freepascal.org/pipermail/lazarus/, displays the subjects in a tree and the selected mail in a TSynEdit. It would be nice to have some kind of "mail highlighter" which  displays the various reply levels in different colors: Lines beginning with ">" should be in color 1, lines beginning with ">>" should be in color 2, etc.

Here is an example of what I want to achieve:

On Mon, 12 Jan 2015 15:27:25 +0100
XXXX <XXXX at XXXX.XXX> wrote:
> On 1/12/15, XXXXXX <XXXX at XXXX.XXX> wrote:
>

>> Is my understanding correct?
>> - All changes of 1.3 as of last weekend will be in 1.4 automatically.
>> - What comes later must be requested in the new 1.4RC1 wiki page.

Yes.
> Things before r47335 are in 1.4 branch AFAIK.
Yes.

How to start? Which would the best available highlighter to derive from? Or has someone already written some kind of similar highlighter that he/she would be willing to share?

Graeme

  • Hero Member
  • *****
  • Posts: 1428
    • Graeme on the web
Re: MailHighligher
« Reply #1 on: November 02, 2015, 04:54:19 pm »
I don't know anything about SynEdit, but I can say that I've accomplished what you want very easily with fpGUI's TfpgTextEdit component using regular expressions to define syntax highlighting.

Maybe somehow SynEdit allows using regex too?
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

Edson

  • Hero Member
  • *****
  • Posts: 1301
Re: MailHighligher
« Reply #2 on: November 03, 2015, 03:58:42 am »
With SynFacilSyn (https://github.com/t-edson/SynFacilSyn) and SynEdit, you can use this XML highlighter:

Code: Pascal  [Select][+][-]
  1. <?xml version="1.0"?>
  2. <Language >
  3.   <Token Start='&gt;' End = "" Attribute='String'> </Token>
  4.   <Token Start='&gt;&gt;' End = "" Attribute='Keyword'> </Token>
  5. </Language>
  6.  


No need for Regex. The syntax is very simple.
Lazarus 2.2.6 - FPC 3.2.2 - x86_64-win64 on Windows 10

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9793
  • Debugger - SynEdit - and more
    • wiki
Re: MailHighligher
« Reply #3 on: November 03, 2015, 11:51:50 am »
there is a wiki page on synedit highlighters with example on how to create one.

wp

  • Hero Member
  • *****
  • Posts: 11857
Re: MailHighligher
« Reply #4 on: November 03, 2015, 04:48:32 pm »
Martin, I know but I wanted to go the easy way  ;)

Edson, I gave it a try. Very easy, and quick result! This is the xml file used to define the highlighter:
Code: XML  [Select][+][-]
  1. <?xml version="1.0"?>
  2. <Language >
  3.   <Token Start='&gt;' End = "" Attribute='String'> </Token>
  4.   <Token Start='&gt;&gt;' End = "" Attribute='Keyword'> </Token>
  5.   <Token Start='&gt; &gt;' End = "" Attribute='Keyword'></Token>
  6.   <Token Start='&gt;&gt;&gt;' End = "" Attribute='Number'> </Token>
  7.   <Token Start='&gt; &gt; &gt;' End = "" Attribute='Number'></Token>
  8.   <Token Start='&gt;&gt;&gt;&gt;' End = "" Attribute='Comment'> </Token>
  9.   <Token Start='&gt; &gt; &gt; &gt;' End = "" Attribute='Comment'></Token>
  10. </Language>
Can I define the tokens by code? I want to avoid an external file. In your docs, you mention methods such as "AddTokenString" etc, but they are not found by the compiler.

As can be seen in the attached screenshot there are some issues though:
  • As can be seen in the second line, the ">" is sometimes misinterpreted. It would be better if I could enforce that the ">" must be at the start of a line. Any idea?
  • Every quoted mail begins with a line telling its sender and send date. Mostly, but not always, these lines begin with "On ..." and end with "wrote:". Quotation is indicated in these lines by one ">" less than with the other lines. Therefore, these introductory lines get the color of the previous quotation level which looks a bit confusing.
As usual, it looks that the quick and easy solution will not be enough.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: MailHighligher
« Reply #5 on: November 03, 2015, 07:09:04 pm »
I don't know anything about SynEdit, but I can say that I've accomplished what you want very easily with fpGUI's TfpgTextEdit component using regular expressions to define syntax highlighting.

Maybe somehow SynEdit allows using regex too?

These answers with no real ontopic information  are getting annoying

Edson

  • Hero Member
  • *****
  • Posts: 1301
Re: MailHighligher
« Reply #6 on: November 03, 2015, 11:24:38 pm »
Can I define the tokens by code? I want to avoid an external file. In your docs, you mention methods such as "AddTokenString" etc, but they are not found by the compiler.

Yes. It's possible. The code would be something like:

Code: Pascal  [Select][+][-]
  1.  
  2.   hlt := TSynFacilSyn.Create(self);  //my highlighter
  3.   hlt.ClearSpecials;              
  4.   hlt.CreateAttributes;          
  5.   hlt.ClearMethodTables;
  6.   hlt.DefTokDelim('>','', hlt.tkString);
  7.   hlt.DefTokDelim('>>','', hlt.tkKeyword);
  8.   hlt.Rebuild;
  9.   ...
  10.  

"AddTokenString" is an old function. Is not present on the new version of SynfacilSyn. I have updated the documentation, with the new functions.

You can read the Section 5 of the "Technical documentation", to get more information.
Lazarus 2.2.6 - FPC 3.2.2 - x86_64-win64 on Windows 10

Edson

  • Hero Member
  • *****
  • Posts: 1301
Re: MailHighligher
« Reply #7 on: November 04, 2015, 03:41:00 am »
   
  • As can be seen in the second line, the ">" is sometimes misinterpreted. It would be better if I could enforce that the ">" must be at the start of a line. Any idea?
  Well, there is parameter in the lexer of SynFacilSyn to detect the ordinal for tokens (not only the first). But it's not implemented in delimited tokens. It could be implemented, using an alternative "DefTokDelim()", but I haven't tested yet.

   
  • Every quoted mail begins with a line telling its sender and send date. Mostly, but not always, these lines begin with "On ..." and end with "wrote:". Quotation is indicated in these lines by one ">" less than with the other lines. Therefore, these introductory lines get the color of the previous quotation level which looks a bit confusing.
This looks like a job for a custom highlighter. To implement this, using SynFacilSyn, probably is not the best way.
Lazarus 2.2.6 - FPC 3.2.2 - x86_64-win64 on Windows 10

wp

  • Hero Member
  • *****
  • Posts: 11857
Re: MailHighligher
« Reply #8 on: November 04, 2015, 11:48:52 am »
Thanks. Great highlighter, great documentation!

Here's what I've been doing so far:
Code: Pascal  [Select][+][-]
  1. procedure TMainForm.FormCreate(Sender: TObject);
  2. var
  3.   tkLevel1, tkLevel2, tkLevel3,
  4.   tkLevel4, tkLevel5, tkLevel6: TSynHighlighterAttributes;
  5. begin
  6.   FMailHighlighter := TSynFacilSyn.Create(self);
  7.   with TSynFacilSyn(FMailHighlighter) do begin
  8.     ClearSpecials;
  9.     CreateAttributes;
  10.     ClearMethodTables;
  11.     tkLevel1 := NewTokType('Level1');
  12.     tkLevel1.Foreground := $BD814F;
  13.     tkLevel2 := NewTokType('Level2');
  14.     tkLevel2.Foreground := $4D50C0;
  15.     tkLevel3 := NewTokType('Level3');
  16.     tkLevel3.Foreground := $59BB9B;
  17.     tkLevel4 := NewTokType('Level4');
  18.     tkLevel4.Foreground := $A264B0;
  19.     tkLevel5 := NewTokType('Level5');
  20.     tkLevel5.Foreground := $C6AC4B;
  21.     tkLevel6 := NewTokType('Level6');
  22.     tkLevel6.Foreground := $4696F7;
  23.     DefTokDelim('>',              '', tkLevel1);
  24.     DefTokDelim('On',           ':$', tkLevel1);
  25.     DefTokDelim('On',           ':$', tkLevel1);
  26.  
  27.     DefTokDelim('>>',             '', tkLevel2);
  28.     DefTokDelim('> >',            '', tkLevel2);
  29.     DefTokDelim('> On',         ':$', tkLevel2);
  30.     DefTokDelim('> On',         ':$', tkLevel2);
  31.  
  32.     DefTokDelim('>>>',            '', tkLevel3);
  33.     DefTokDelim('> > >',          '', tkLevel3);
  34.     DefTokDelim('>> On',        ':$', tkLevel3);
  35.     DefTokDelim('> > On',       ':$', tkLevel3);
  36.  
  37.     DefTokDelim('>>>>',           '', tkLevel4);
  38.     DefTokDelim('>>> On',       ':$', tkLevel4);
  39.     DefTokDelim('>>> On',       ':$', tkLevel4);
  40.     DefTokDelim('> > > >',        '', tkLevel4);
  41.     DefTokDelim('> > > On',     ':$', tkLevel4);
  42.     DefTokDelim('> > > On',     ':$', tkLevel4);
  43.  
  44.     DefTokDelim('>>>>>',          '', tkLevel5);
  45.     DefTokDelim('>>>> On',      ':$', tkLevel5);
  46.     DefTokDelim('>>>> On',      ':$', tkLevel5);
  47.     DefTokDelim('> > > > >',      '', tkLevel5);
  48.     DefTokDelim('> > > > On',   ':$', tkLevel5);
  49.     DefTokDelim('> > > > On',   ':$', tkLevel5);
  50.  
  51.     DefTokDelim('>>>>>>',         '', tkLevel6);
  52.     DefTokDelim('>>>>> On',     ':$', tkLevel6);
  53.     DefTokDelim('>>>>> On',     ':$', tkLevel6);
  54.     DefTokDelim('> > > > > >',    '', tkLevel6);
  55.     DefTokDelim('> > > > > On', ':$', tkLevel6);
  56.     DefTokDelim('> > > > > On', ':$', tkLevel6);
  57.     Rebuild;
  58.   end;
  59.  

The construction with the "On" as start and ":$" as end delimiters is intended to identify tokens beginning with "On" and ending with a colon at line end. This works well rather often, but there are also cases like in the screenshot where sequences in the regular text are falsely detected. And I suspect that line end detection is not correct because sometimes the highlighted "on" phrase ends without a colon (see last line in screenshot). - Probably the $ character as a line-end symbol is not supported by your regular expression engine. And I guess there is also no support for ^ as line start symbol because all the experiments using e.g. '^>" as start delimiter fail.

On the other hand, I am not sure if it is worth aiming at a good detection rate of the highlighter because the syntax of the mail files is so poorly defined that fails just cannot be avoided.

Graeme

  • Hero Member
  • *****
  • Posts: 1428
    • Graeme on the web
Re: MailHighligher
« Reply #9 on: November 04, 2015, 04:46:31 pm »
These answers with no real ontopic information  are getting annoying
Like I said, I don't know SynEdit well, but made the suggestion that regex could possibly be used. I know most SynEdit highlighters are quite complex and "understand" the syntax of the language they are highlighting. Email's don't have much language constructs - hence again my suggestion to use regex (if SynEdit supports a regex based highlighter). Many other text edit components or programmer editors support regex based syntax highlighting, so making the assumption that SynEdit might follow suite is not a far stretch. So I really don't see what is annoying or wrong about my reply?  Take a chill-pill will you!
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: MailHighligher
« Reply #10 on: November 04, 2015, 08:50:57 pm »
These answers with no real ontopic information  are getting annoying
Like I said, I don't know SynEdit well, but made the suggestion that regex could possibly be used.

That is only one very minor step above "a parser could be used" and similar open doors, so I rest my case.

(Unix in particular has an history of formulating scanners (tokenizers) using finite state formalisms equal to regex in the "lex" tool(that even comes with FPC in the form of plex!), though usually generating code and avoid the runtime overhead that regex interpreters have)

Edson

  • Hero Member
  • *****
  • Posts: 1301
Re: MailHighligher
« Reply #11 on: November 05, 2015, 06:01:00 pm »
Probably the $ character as a line-end symbol is not supported by your regular expression engine. And I guess there is also no support for ^ as line start symbol because all the experiments using e.g. '^>" as start delimiter fail.

Your suspicions are true. The Regex is poorly supported in SynfacilSyn. It's because SynFacilSyn is designed to bee fast before than flexible.

But the character "^" is easy to fix. Use the new SynFacilSyn 1.15 from GitHub. I have added a patch to recognize the "^" as part of the Start delimiter.

About the "$" character, is not so easy to include. Let me check.
Lazarus 2.2.6 - FPC 3.2.2 - x86_64-win64 on Windows 10

Edson

  • Hero Member
  • *****
  • Posts: 1301
Re: MailHighligher
« Reply #12 on: November 05, 2015, 06:08:50 pm »
These answers with no real ontopic information  are getting annoying

There is someone here with a bad humor.

People just want to help. Don't worry, be happy.
Lazarus 2.2.6 - FPC 3.2.2 - x86_64-win64 on Windows 10

wp

  • Hero Member
  • *****
  • Posts: 11857
Re: MailHighligher
« Reply #13 on: November 05, 2015, 06:42:36 pm »
But the character "^" is easy to fix. Use the new SynFacilSyn 1.15 from GitHub. I have added a patch to recognize the "^" as part of the Start delimiter.
Edson, you are magnificent! Thank you. Getting better and better...

Sorry for another question: I added lines beginning with FROM, DATE and SUBJECT to the displayed mail body to show the corresponding information in the synedit control along with the mail text. Good candidates for key words... But defining them as keywords highlights these words also in the text - see attached screenshot. Is there a way to restrict a keyword such that it must be the first word of a line? Or is there another concept in the highlighter terminology which could be "abused" for this purpose?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: MailHighligher
« Reply #14 on: November 05, 2015, 08:23:08 pm »
People just want to help. Don't worry, be happy.

Sometimes people just want to drop certain offtopic words or urls. I'll leave it as an exercise to the reader what is the case here.

 

TinyPortal © 2005-2018