Recent

Author Topic: Checking Speed of SynEdit Highlighters  (Read 105272 times)

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Checking Speed of SynEdit Highlighters
« Reply #105 on: October 30, 2013, 01:24:01 pm »
Would you point me to more details? e.g. The documentation. The doc is fpc? So best to discuss on the fpc-mail list?
I've already pointed it out to Michael van Canneyt.
See the two lists in the FPLanguageRef.pdf at sections 1.3.1 and 1.3.3 where 'on' is duplicated.

Quote
Does it? "@" is an operator. It can appear in front of an identifier. (With or without a space: OnClick := @   Foo; )
However the "&" is not yet handled
My mistake. As you surmised I meant '&' not '@'. I never use this facility and remembered the wrong prefix character.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9857
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #106 on: October 30, 2013, 01:53:55 pm »
*r43342 SynEdit: Pas HighLighter, recognize new &keyword for identifier style

Edson

  • Hero Member
  • *****
  • Posts: 1302
Re: Checking Speed of SynEdit Highlighters
« Reply #107 on: October 30, 2013, 04:47:48 pm »
Quote
the current hash, has an implementation flaw

By now, we agree this, but we can use the current hash highlighter like a base, for evaluate our algorithms. It's supposed, we must be faster at the same conditions.

For testing, I suggest to implement a simple highlighter, and compare with a Lazarus highlighter. Differents computers have different speed. And like Martin said, string comparisons are just a part of the job that a highlighter must do.

You have to be carefull if using the current Lazarus Pascal highlighter. It includes folding too. So, comparing with a no-folding highlighter could be no so fair.
« Last Edit: October 30, 2013, 04:51:07 pm by Edson »
Lazarus 2.2.6 - FPC 3.2.2 - x86_64-win64 on Windows 10

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #108 on: November 02, 2013, 10:53:40 am »
Optimized version of the matcher.
With full comparision it is 60% faster than the current Pas HL.
By comparing only the hashes in the list 75% faster.

Scanning idetifiers and checking for keywords is the slowest part of a HL.
Checking for folding doe not consume much time.


In the HL KeyHash can be improved too, to save needles comparisions.
about 20%-22% more speed.

function TcSynPasSyn.KeyHash: Integer; inline;
var
  Start, ToHash: PChar;
begin
  Result := 0;
  if (fToIdent<fLineLen) then begin
    Start := fLine + fToIdent;
    ToHash := Start;
    if IsLetterChar[ToHash^] then
    begin
      inc(Result, mHashTable[ToHash^]);
      inc(ToHash);
      {while (IsLetterChar[ToHash^] or IsUnderScoreOrNumberChar[ToHash^]) do
      begin
        inc(Result, mHashTable[ToHash^]);
        inc(ToHash);
      end;}
    end;
    if IsUnderScoreOrNumberChar[ToHash^] then
    begin
      Result := 0; { cannot be a Key}
      inc(ToHash);
      while (IsLetterChar[ToHash^] or IsUnderScoreOrNumberChar[ToHash^]) do
      begin
        inc(ToHash);
      end;
    end;
    fStringLen := PtrUInt(ToHash) - PtrUInt(Start);
    //if CompareText(copy(fLineStr,fToIdent+1,fStringLen),'varargs')=0 then debugln('TSynPasSyn.KeyHash '+copy(fLineStr,fToIdent+1,fStringLen)+'='+dbgs(Result));
  end else begin
    fStringLen := 0;
  end;
end; { KeyHash }
« Last Edit: November 02, 2013, 12:14:38 pm by Morton »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9857
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #109 on: November 02, 2013, 12:19:44 pm »
Not sure what you mean by "optimized"? (well apparently "faster")

But not sure how it should be working?

You commented out the hash loop. So essentially it is a first char only compare?

So it is a different matcher again, not at all an optimized original...

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #110 on: November 02, 2013, 12:36:55 pm »
@Martin_fr

IsKeywordbyHash compares only the hash values, but not the strings.
The computed hash is rather expensive, but creates unique hashes.
Because of the unique hashes the string comparision can be omitted.

Checking Keywords is fast despide the expensive hash,
because the matcher eliminates almost all comparisions.
A hash needs only to be computed for the small rest.

A hashed search array is a lot faster than search trees (trie).

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Checking Speed of SynEdit Highlighters
« Reply #111 on: November 02, 2013, 12:46:44 pm »
A hashed search array is a lot faster than search trees (trie).

However computing a hash is slow, so unless you present a routine that combines all the operations (hash computation and searching) to give a working IsKeyword function there is nothing concrete to compare beyond opinions.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9857
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #112 on: November 02, 2013, 12:52:38 pm »
Do we talk about the piece of code you embedded in your post?

Or something else , hidden in your attachment? (I havent read the attachment, since I assume the quote in the past shows the relevant code?

Quote
but creates unique hashes.
You mean you found a perfect hash that works with an not-limited input? That can simple not be.

Some none pascal word will always give the same hash as one of the pascal tokens.



Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #113 on: November 02, 2013, 12:56:35 pm »
A hashed search array is a lot faster than search trees (trie).

However computing a hash is slow, so unless you present a routine that combines all the operations (hash computation and searching) to give a working IsKeyword function there is nothing concrete to compare beyond opinions.

The matcher eliminates needless comparisions close to 100%, a hash needs only to be computed for the very small rest.
The matcher is very cheap with regard to the hash in the search list.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9857
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #114 on: November 02, 2013, 01:23:02 pm »
A hashed search array is a lot faster than search trees (trie).
Most likely yes. So long as the hash is cheap.

We need to scan for end-of-word anyway. So the cpu needs to fetch the data anyway, and just adding up (and maybe doing a bitshift) the values in a cpu register (assumes really good optimization) will not add much time (maybe even no time at all, since the cpu can use the time of the add, to pre-fetch the next bit of data)

So given how modern cpu work, yes a good hash implementation can be the fastest choice.


The trie needs to access more data, and the cost of fetching this, may be bigger than the cost of the hash+compare.

Quote
However computing a hash is slow, so unless you present a routine that combines all the operations (hash computation and searching) to give a working IsKeyword function there is nothing concrete to compare beyond opinions.

Yes, we are still missing any good comparison. So far we got a lot of impressive, but meaningless times.

And yes the final compare of the keyword must be done in the same function that did the hash (I wrote that before, but it was apparently ignored)


Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #115 on: November 02, 2013, 01:43:37 pm »
Do we talk about the piece of code you embedded in your post?

I was sloppy. ( rendered myself a foo l)
But some time could be saved.

Quote
Quote
but creates unique hashes.
You mean you found a perfect hash that works with an not-limited input? That can simple not be.

Some none pascal word will always give the same hash as one of the pascal tokens.

If the hashes are equal then the comparision has never failed in my tests for pascal.
Not that much time to save by omitting the full comparision.
The list is derived from a List I wrote once to do fast searches in millions or even billions of items.
The list is scanning  for double hashes if the comparisions fails.
Even in lists with hundred million of randomly generated strings a double hash occured only one time in
a huge lot of tests and it was for a doubled entry, not a double hash for different words.




Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #116 on: November 02, 2013, 01:49:33 pm »
Yes, we are still missing any good comparison. So far we got a lot of impressive, but meaningless times.

The Matcher does a full comparision too, by calling the lists apropriate function.
Hashes are only computed inside the list.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9857
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #117 on: November 02, 2013, 02:02:07 pm »
Ok, sounds good.

I need to find some time to read the attached code.

Working on other items now

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Checking Speed of SynEdit Highlighters
« Reply #118 on: November 03, 2013, 08:09:58 pm »
Here's an alternative keyword matcher which does lookup using arrays of sets.
It's not faster than the synhighlighter tree-based algorithm (on my computer it's consistently about 2% slower), however it is simpler to implement, with source code less than half the length. I estimate its TDictionary structure occupies about 1.5 MB.
I'm not sure how to gauge the memory footprint of the synedit tree. Possibly it is less than 1MB?

Edit
I removed this attachment. See the improved version (speedwise) posted 4 Nov 2013
« Last Edit: November 04, 2013, 10:27:58 pm by howardpc »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9857
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #119 on: November 03, 2013, 08:48:13 pm »
I am not sure how much mem the current tree uses.

Also the treedata itself (and the builder) are more complex than needed. The implement the aho corasick algorithm.

Just the search loop was reduced to ignore the extra.

The tree can be packed in various ways (none implemented), but then the search loop will be more complex, and take extra time.

I only added it, because everyone kept adding all kind of stuff.

I still think, that the hash (if done correctly) will be fastest.
That is in generic. Depends on a lot of factors. The hash will especially be faster, if there are many keywords.

If you have very very few keywords, then other algorithm may be as good as the hash, maybe even beat it by a tiny bit. (Just my guess).

 

TinyPortal © 2005-2018