Recent

Author Topic: Checking Speed of SynEdit Highlighters  (Read 105100 times)

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #75 on: October 26, 2013, 05:53:59 am »
Sets will make pretty good hashes.
Very fast to create a set for each string.
Store keyword and set in a list and sort it by two keys:
Key 1:
The set.
Key 2:
The string compared from the last char to the first,
which is often faster than the standard.
That allows very fast binary searches.
Strings need only to be compared if the sets are equal.
Divide the list internaly into sublists by length and last char.
would be the simplest possible solution, no need for a hash.
OK, thats a kind of hash too.
 The matrix matcher implemented with sets would be small and allow
to extremely reduce the amount of comparisions.
Simple, clean and extremely fast solution, just as I like.
« Last Edit: October 26, 2013, 06:26:51 am by Morton »

Edson

  • Hero Member
  • *****
  • Posts: 1301
Re: Checking Speed of SynEdit Highlighters
« Reply #76 on: October 26, 2013, 06:36:59 am »
It sounds a very creative way for fast comparisons. I haven´t tested the sets for comparison,  I suspect they are faster, but no idea.

I think you will need a fast way for transform a identifier to a Set, for no affect the performance.
Lazarus 2.2.6 - FPC 3.2.2 - x86_64-win64 on Windows 10

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #77 on: October 26, 2013, 07:09:15 am »
I think you will need a fast way for transform a identifier to a Set, for no affect the performance.

As I noted in another post about 128 million insertions per second on my PC.
With an average length of 4 bytes that is thirty millions per second, with six still 20 million.

BTW a virtual subdivision of a sorted list can be done by additional comparisions levels.
Eg. First compare the lenght and only if equal compare the string
« Last Edit: October 26, 2013, 08:04:05 am by Morton »

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #78 on: October 26, 2013, 09:22:07 am »
Provided the sublists are short, the keywordlist of a highlighter that is realy divided into sublists and not just divided by sorting could be made selfoptimizing by counting the first 10000 accesses and storing the access count with the keys.
Then the sublists can be sorted by access counts.
Sublists will have to be searched linear.
Resorted whole list could be streamed to files,
for a scriptable HL into the scripts.
The aproach can be adjusted or lists just divided by sorting too.
After resorting for optimized access:
Use binary search to get the sublist, scann for a boundary, and then
search linear to the other boundary.
« Last Edit: October 26, 2013, 09:38:50 am by Morton »

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #79 on: October 26, 2013, 07:35:04 pm »
Seems to be fixed now.
I had a nasty little logical bug.
Still about 20 million matches per second.

The match matrix is a memory hog as it is now.

Would be nice if there would be a BitBool only occupying a single Bit,
per member instead of 8.
Sets for 'A'..'Z' use 32 byte which is still far to much as only 4 would be required.
With bitwise manipulations 'A'..'Z' will fit into LongWord.
The Matrix could be much smaller.
Even smaller with a sparse implementation.
« Last Edit: October 26, 2013, 07:42:03 pm by Morton »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: Checking Speed of SynEdit Highlighters
« Reply #80 on: October 26, 2013, 11:00:36 pm »

Sets for 'A'..'Z' use 32 byte which is still far to much as only 4 would be required.

That is not because sets use bytes, but because sets use from 0 to ord(high).

Playing with $packset should be able to cut it down to   (64+26/8) rounded up. (probably delphi mode does too).

That the lower bound is always zero afaik can't be helped atm. Sparse sets help size, but make operations more expensive.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #81 on: October 26, 2013, 11:40:36 pm »
Did you test the Matcher2 with

  if IsKeyWord('interface') then Caption :=  Caption + '1';
  if IsKeyWord('inherited') then Caption :=  Caption + '2';
  if IsKeyWord('interited') then Caption :=  Caption + '3';


because it reports all 3 for me.

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #82 on: October 27, 2013, 04:17:33 pm »
Each letter in a keyword is initialized by index,  letter  at index and letter at index div 2.
Each letter in a string is checket by index,  letter  at index and letter at index div 2.
24 million matches per second.

Edson

  • Hero Member
  • *****
  • Posts: 1301
Re: Checking Speed of SynEdit Highlighters
« Reply #83 on: October 27, 2013, 05:06:15 pm »
It give me "123" with:

  if IsKeyWord('interface') then Caption :=  Caption + '1';
  if IsKeyWord('inherited') then Caption :=  Caption + '2';
  if IsKeyWord('interfaed') then Caption :=  Caption + '3';
Lazarus 2.2.6 - FPC 3.2.2 - x86_64-win64 on Windows 10

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #84 on: October 27, 2013, 08:04:48 pm »
or

  if IsKeyWord('usit') then Caption :=  Caption + 'X';

Morton: The algorithm is flawed, however much you change it.
And even if you get a version that works for the current list of pascal identifiers (either which is well possible, the ones you use, or the ones in the SynPasSyn), it will not work for other lists.
Take a scriptable HL, and it will always be possible to find a set of 2 keywords that allow for a 3rd none keyword to be detected.

Which also means I will stop testing them....

Also the speed of an actual highlighter is not just the matching of keywords...

Besides:
- the hash is pretty good as it is
- the hash could  be improved (e.g storing the keywords on the hashtable, together with the function ref, and compare before calling the function)
- a trie (prefix tree) could be used
- ...

But all that is only a part of the time used.

Scanning is only needed to the last visible line, after that it could be done on idle.

Scanning a big pascal file create 1500 ranges and more. They are in a tree, and are searched for each line.
Searching a tree with 1500 entries is of course nothing...But doing so 100-thousand times (for 100,000 lines) or more, and it gets noticeable.

This is because of some words (like deprecated) are only highlighted in very specific context. So ranges contain a lot of information, and therefore there are plenty of different ranges.
Yet a highlighter could be subdivided:
1) general structure:
   unit interface implementation procedure begin end ....
  This needs to be scanned from the start of file, to the last visible line
2) all the other details
  only needed for the visible part. Find the start of a suitable range, just before the first visible line. And scan only what is needed.
  Cache a certain amount around the visible area, for faster scrolling

You can do a simplified version of the above spilt:

Keywords like and,or,not => they do not change the state of the range
So you can have:
-  a hashtable without them for scanning the entire file
-  a hashtable with them, when scanning a line for getting the tokens to display.



Edson

  • Hero Member
  • *****
  • Posts: 1301
Re: Checking Speed of SynEdit Highlighters
« Reply #85 on: October 27, 2013, 09:17:47 pm »
Morton: But we really appreciate your attemp.
Lazarus 2.2.6 - FPC 3.2.2 - x86_64-win64 on Windows 10

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #86 on: October 28, 2013, 02:13:58 pm »
  type

    { MatchItem }

    MatchItem = object
      MaybeTrue: Boolean;
      Indexes: array of Smallint;
      procedure SetTrue(const KeywordIndex: Smallint);
      function Check: Boolean;
    end;
  var
    KeyMatch: array of array of array[0..25,0..25] of MatchItem;
    ListOfKeys: TStringList;

Each member of ListOfKeys has to be set if fails in the matrix are possible.
For most languages the list of possibly failing Keys is much shorter than
the count of ListOfKeys.

Only Keys with a minimum lenght can have fails under some conditions.
« Last Edit: October 28, 2013, 02:54:33 pm by Morton »

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #87 on: October 28, 2013, 09:15:32 pm »
Still 14 million matches per second.
List could be optimized as described in former posts.
Meybe even by just using a cheaper hash.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #88 on: October 28, 2013, 09:58:22 pm »
How many matches per time does the original hash have?

That is:
- If you put it standalone, in a unit like the matcher
- If you put the actual keywords in the hashtable, so if the hash matches, they can be compared *without* (or before) calling a subroutine?

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #89 on: October 29, 2013, 12:30:38 am »
@Martin_fr it is just an example.
I have already outlined some means.
BTW by an average length of just about 3 chars the 14 million matches
translate to processing over 100 million bytes per second.
Included in a highlighter and optimized comparision for the keywords
it will be a lot faster.
computing a hash in the list for just the last 2-4 chars would be likely sufficient.
But other means outlined by me are more efficient.
Eg. sublists which contain keys with the same length.
Storing indexes in the matrix.
There are many ways to reduce comparisions.

I have a nasty cold since a week and I am a bit rusty as I havent coded for
11 years.
« Last Edit: October 29, 2013, 12:46:01 am by Morton »

 

TinyPortal © 2005-2018