Recent

Author Topic: Checking Speed of SynEdit Highlighters  (Read 105081 times)

Edson

  • Hero Member
  • *****
  • Posts: 1301
Re: Checking Speed of SynEdit Highlighters
« Reply #120 on: November 03, 2013, 10:52:06 pm »
howardpc:

Are you still comparing with TSynPasSyn.IsKeyword()?

It's not the hash rutine of the Lazarus Pascal Highlighter.

Quote
It's not faster than the synhighlighter tree-based algorithm (on my computer it's consistently about 2% slower)

Sorry, I'm lost. Which synhighlighter tree-based algorithm are you comparing?

Quote
however it is simpler to implement, with source code less than half the length

It's nice. A small code is good.

PD. I can't compile your code. Is it 64 bits?
« Last Edit: November 03, 2013, 10:54:21 pm by Edson »
Lazarus 2.2.6 - FPC 3.2.2 - x86_64-win64 on Windows 10

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Checking Speed of SynEdit Highlighters
« Reply #121 on: November 04, 2013, 02:50:05 pm »
Are you still comparing with TSynPasSyn.IsKeyword()?
It's not the hash rutine of the Lazarus Pascal Highlighter.
No, I do read Martin's posts! He indicated a while ago that such a comparison was useless.
The attached project compares the set-based matcher I came up with against the Aho algorithm tree-matcher implementation in the synedit code Martin adapted.
I've made a few minor optimisations to the code I'm offering, and the set-based matcher is now slightly faster than tree-matcher, at least using multiple iterations of the Lazarus source files used in the test.
I tried comparing the matchers using text with a much lower proportion of Pascal keywords (a chunk of the bible), and the performance of the two routines was almost identical.
For a Pascal keyword matcher you of course want an algorithm that performs well with a low keyword count, since Object Pascal is such a well-designed language (with only 73 keywords currently; and most programs use less than half of these).
The proportion of non-keywords in the data the routines process will cause variations in comparative speed. I'm assuming for our purposes that the data to be processed will normally have a fair proportion of the sought-after keywords.
Quote
I can't compile your code. Is it 64 bits?
You have to use a Lazarus compiled from SVN trunk (or a recent snapshot) otherwise the tree-matcher does not compile since it is based on fairly recent Synedit commits.

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #122 on: November 04, 2013, 02:56:40 pm »
I still think, that the hash (if done correctly) will be fastest.
That is in generic. Depends on a lot of factors. The hash will especially be faster, if there are many keywords.

The hash used in my hashed keylist is rather expensive.
Effects of cheaper hashes tends to be negative, as cheaper hashes will likely produce identical hashes for different strings.
A binary search for Pascal keys in my hashed keylist is about twenty times faster then the binary search of a TStringList.
The list of keys would have to be very very short to make the binary search of a TStringList equal or faster.

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #123 on: November 04, 2013, 03:10:09 pm »
Attached is a newer slightly optimized version of the matcher.
I have tested it against a lot sources.
If the hashes are equal then the comparision has never failed.

So far my assumption that a save key check without any real comparison is possible stays true.

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Checking Speed of SynEdit Highlighters
« Reply #124 on: November 04, 2013, 07:05:39 pm »
Attached is a newer slightly optimized version of the matcher.
I have tested it against a lot sources.
If the hashes are equal then the comparision has never failed.
Testing against the 9 Lazarus source files used in the project attached to my earlier post shows that this matcher incorrectly identifies the words - ID, Now, OI, dit, val, Val, dir - as keywords.

Quote
So far my assumption that a save key check without any real comparison is possible stays true.
I understand neither your assumption nor why it should be true (whatever it means).

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #125 on: November 05, 2013, 12:29:49 pm »
@howardpc

procedure TForm1.Button1Click(Sender: TObject);
begin
  if IsKeyWord('ID') then
  Caption:= Caption + ' 1';
  if IsKeyWord('Now') then
  Caption:= Caption + ' 2';
  if IsKeyWord('OI') then
  Caption:= Caption + ' 3';
  if IsKeyWord('dit') then
  Caption:= Caption + ' 4';
  if IsKeyWord('val') then
  Caption:= Caption + ' 5';
  if IsKeyWord('dir') then
  Caption:= Caption + ' 6';
  if IsKeyWordByHash('ID') then
  Caption:= Caption + ' 1';
  if IsKeyWordByHash('Now') then
  Caption:= Caption + ' 2';
  if IsKeyWordByHash('OI') then
  Caption:= Caption + ' 3';
  if IsKeyWordByHash('dit') then
  Caption:= Caption + ' 4';
  if IsKeyWordByHash('val') then
  Caption:= Caption + ' 5';
  if IsKeyWordByHash('dir') then
  Caption:= Caption + ' 6';
end;

Your statement is an obvious lie.
« Last Edit: November 05, 2013, 12:36:03 pm by Morton »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9791
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #126 on: November 05, 2013, 12:40:35 pm »
If you rely only on a hash, comparing only the hash value and not compare the word, or anything else, then there will always be some word (word = combination of random letters) that will fail it.

This should be obvious:
- The hash has a fixed length. There is a fixed number of values itcan have (even if this is a very large number).
- Words in the text can be of any length. The amount of words is endless.

mapping an endless number to a fixed range, leads to double allocations. It has to.

For a given set of keywords, there may be a hash, that if compared together with the length, will not allocate any other words (of the same length) to any of the hash values of any keyword.

To prove you have such a hash, you must try ALL combinations of letters of the length of each keyword.

And if you ever have to add another keyword, well....

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #127 on: November 05, 2013, 01:17:42 pm »
If you rely only on a hash, comparing only the hash value and not compare the word, or anything else, then there will always be some word (word = combination of random letters) that will fail it.

function TForm1.RndLetter: Char;
var B: byte;
begin
  Result:= Char(Random(25) + 97);
end;

function TForm1.RndLength: Integer;
begin
  Result:= 0;
  while Result < 2 do
    Result:= Random(14);
end;

function TForm1.RndString: String;
var I, L: Integer;
begin
  L:= RndLength;
  SetLength(Result, L);
  for I:= 1 to L do
  Result:= RndLetter;
end;

procedure TForm1.Button1Click(Sender: TObject);
var I: Integer;
  Stop: PChar;
  S: String;
begin
  aList.Clear;
  for I:= 1 to 10000000 do
  begin
    S:= RndString;
    if IsKeyWordByHash(PChar(S), Stop) then
    aList.Add(S);
  end;
  SynEdit1.Lines:= aList;
end;

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9791
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #128 on: November 05, 2013, 02:01:14 pm »
And the point of this? Did you actually read what I wrote. And (since you did not ask questions) Did you understand it?

But first of all, I am writing this under the assumptions, you still try to "proove" that comparing the hash value, but skipping the string compare) is a solution ?

If that is not the case, then please make it clear.

I wrote
Quote
For a given set of keywords, there may be a hash, that if compared together with the length, will not allocate any other words (of the same length) to any of the hash values of any keyword.

To prove you have such a hash, you must try ALL combinations of letters of the length of each keyword.
Quote
And if you ever have to add another keyword, well....

So even if you ignore the last sentence (but why ignore it?)

What good is your code, that tests a subset of combinations ?

And if you did NOT ignore the last sentence of my quote:
What good would it be, to show it works with the current set of keywords, if tomorrow there will be a new keyword added (e.g. fpc adds a new modifier, like "deprecated", that gets highlighted, so must be added).
And then that new keyword may break it, and you need a new hash algorithm?

What good is all that?


Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #129 on: November 05, 2013, 02:39:16 pm »
And the point of this? Did you actually read what I wrote. And (since you did not ask questions) Did you understand it?

Looks rather like you do not understand.
It is plain and simple mathematics.

The maximal amount of different strings for a given range of lenths:
(number of different chars)  *  (maximal length) * (medium length).

For any given set of keywords:
The maximal amount of different strings determined by max and min length will be just a few thousands.

One has just to select a hash method that will never create the same hasch
for  just a few thousands possible strings.

Every mathematician will confirm this is not an impossible task.

The principles of genetic programming could be used to compute the cheapest
possible hash method that can fulfill the requirment.

« Last Edit: November 05, 2013, 03:02:49 pm by Morton »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9791
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #130 on: November 05, 2013, 02:59:56 pm »
The maximal amount of different strings for a given range of lenths:
(number of different chars)  *  (maximal length) * (medium length).

For any given set of keywords:
The maximal amount of different strings determined by max and min length will be just a few thousands.

So you also compare the length, in addition to the hash value?

Then why did you not answer to:
Quote
But first of all, I am writing this under the assumptions, you still try to "proove" that comparing the hash value, but skipping the string compare) is a solution ?

Well yes, I did not explicitly exclude the length. But I would still think that this was a clear "hash value only, nothing else")

And just to be clear, adding the length to the hash value, by putting it for example in the upper byte (that is left empty by the rest of the hash => that counts still as 2 values in this case)




Yes if you do compare the length too, it is possible.

Yet (As I already wrote) leaves you to the danger, of having to change the hash algorithm, each time a keyword is added.

If you honour speed that much over maintainability

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #131 on: November 05, 2013, 03:26:05 pm »

So you also compare the length, in addition to the hash value?

NO.

Quote
Then why did you not answer to:
Quote
But first of all, I am writing this under the assumptions, you still try to "proove" that comparing the hash value, but skipping the string compare) is a solution ?

I did.

Quote
If you honour speed that much over maintainability

It is maintainable.

Assumed one would be stupid enough to create a language with keys of up to 100 letters,
that could contain every of the 256 chars.
The result would be about 1.3 million possible strings.

The principles of genetic programming could be used to compute the cheapest
possible hash method that can fulfill the requirment .





Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9791
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #132 on: November 05, 2013, 03:37:20 pm »

So you also compare the length, in addition to the hash value?

NO.


Then I don't understand.

Quote
The maximal amount of different strings for a given range of lenths:
(number of different chars)  *  (maximal length) * (medium length).

But if you do not check the length....

The keywords may have a maximum length.

But other words are limited only by memory and disk space

Procedure NameWithABout500CharsOfLengthFOOFOFFBARxxxxxxxxxxx....

and longer. Those can happen. How do you know, they will not create a hash value, equal to that of a keyword?

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #133 on: November 05, 2013, 03:50:26 pm »
But other words are limited only by memory and disk space

Does not matter.
Hashes will only be created for strings within the min and max lenths of the keywords.
The matcher will exclude close to 100% of the posible non keyword strings in this range before hashes are generated.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9791
  • Debugger - SynEdit - and more
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #134 on: November 05, 2013, 04:02:58 pm »
Quote
Hashes will only be created for strings within the min and max lenths of the keywords.


So you also compare the length, in addition to the hash value?
NO.


Those 2 statements directly contradict each other.

You do compare the length (compare is not limited to equality, it can also include range.)

 

TinyPortal © 2005-2018