* SynAnySyn highlighter configured as PHP: | 67 seg, 67 seg, 67 seg. |
* PHP highlighter (with hash functions): | 11seg, 11 seg, 11 seg. |
* PHP highlighters with alternative method: | 11 seg, 11 seg, 12 seg. |
* SynAnySyn highlighter configured as PHP: | 67 seg, 67 seg, 67 seg. |
* PHP highlighter (with hash functions): | 11seg, 11 seg, 11 seg. |
* PHP highlighters with alternative method: | 10 seg, 10 seg, 10 seg. |
* SynAnySyn highlighter configured as PHP: | 6.12 seg, 6.22 seg, 6.12 seg. |
* PHP highlighter (with hash functions): | 1.04 seg, 1.02 seg, 1.03 seg. |
* New highlighters with external PHP file: | 1.03 seg, 1.02 seg, 1.02 seg. |
Why we complicate using hash functions?
The speed of an algorithm depends on CPU, compiler and available memory.
BTW it is even possible to create highlighters in C# or any other NET language
without much speed penalty, provided one knows how.
I don't say that Hash method is the worse, but probably in this case it isn't the best.
But always, you'll need to do one more step (calculate the hush function). It's algorithm.
There is another difference:
* Hash functions:
Make comparisons with n characters.
* First-char-prefix:
Make comparisons with n-1 characters.
PD: Which highlighter hash returns 192 values and have 89 function assigned?Pascal
* implementation.
The current hash base does a lot of subroutine calls. that slows stuff down.
Maybe the new prefix based implementation has less function calls / more code in a single procedure?
- inline some (e.g. KeyHash, IdentKind)
- skip creation of exception frames ({$IMPLICITEXCEPTIONS off})
(do have a fair compare, add this to BOTH)
So probably, our FPC is not so fast as we believe.
So probably, our FPC is not so fast as we believe.
BTW I have noticed some needles string copying,
which should not occur in a high performance tool.
BTW I have noticed some needles string copying,
which should not occur in a high performance tool.
Can you point them out please?
IMHO optimization should follow the project/package settings.
IMHO optimization should follow the project/package settings.
IMHO well tested units, especially high performance ones, should use the highest save optimization
and not depend on project/package settings.
Edson you could try another version with a case table instead of the call table.
Maybe casetables allow inlining.
Ordering your "if" cascades by statistical frequency may help to.
Some of your "if" cascades could likely be improved by a case table.
Edson your Perl HL produces a different amout of tokens than the one included in the newest Lazarus.
I just wanted to show to even in this way, the prefix algorithm could be faster.
IMHO, this is an issue of algorithm more than of compiler.
An algorithm using character selection for every character of the string keys would be the fastest possible as it reduces the needed comparisons to zero..
QuoteAn algorithm using character selection for every character of the string keys would be the fastest possible as it reduces the needed comparisons to zero..
That's is the objetive of the algorithm I proposed. The first-char-prefix is just the first level. I have indicated before:
Doing that way, there is no comparison. The hash will be always lose.
Doing that way, there is no comparison. The hash will be always lose.
And it matches the word "cabel".
Because it starts like "case" and ends like "label"
And it matches the word "cabel".
Because it starts like "case" and ends like "label"
I dont have the modified highlighter. But adding to your app:
procedure TForm1.Button1Click(Sender: TObject);
begin
if IsKeyWord('label') then Caption := Caption + '1';
if IsKeyWord('xabel') then Caption := Caption + '2';
if IsKeyWord('cabel') then Caption := Caption + '3';
end;
and press that button, will give 13
For pascal you need at least a sequence of 6
operator + property = operty
Ok, I had not looked at your changes.
You added the length into the compare... (I thought, you may have added another 0..25 to the array.
Try
if IsKeyWord('object') then Caption := Caption + '1';
if IsKeyWord('except') then Caption := Caption + '2';
if IsKeyWord('objept') then Caption := Caption + '3';
It can be solved, if all keywords are known (llike pascal).
But if I can add my own lists of keywords, I can probably always add 2 keywords, that allow a 3rd none keyword.
Instead a proper prefix tree should be used.
I prefer a slower algorithm that correctly matches what it says on the tin.
Such a trie exists in SYnEdit (MarkupHighlightAll).
I remember those tricks needed to do something with extremely little memory.
(If you feel the need to practice, FPC/trunk now has a 16-bit tiny memorymodel)
@sam707
Thanks.
I have been teached to program a 4 bit computer in machine language,
not assembler as it did not had enoug memory for an assembler.
It had no display only lamps.
1979 I got my fist with a single line display and a keyboard with letters.
My first PC 1983 was a Toshiba T300, with CPM and DOS on floppies,
without a disk.
prices had been grazy at this time.
The list prices of the t300 and a little softwares summed up to over 20000 DM.
I remember those tricks needed to do something with extremely little memory.
I agree with you (and the others on that post) upon the fact that SynEdit was written years ago, following a tradition with hashcodes that sound useless to a highlighter.
I think you will need a fast way for transform a identifier to a Set, for no affect the performance.
Sets for 'A'..'Z' use 32 byte which is still far to much as only 4 would be required.
I like the algorithm focus on rejects.
But the compare is useless. SInce you are not comparing with what the highligther does in a normal scan.Point taken. I'll look more carefully at the HL implementation to try to understand what it actually has to do, and see if I can offer any measureable performance improvement in that.
...
So for a fair compare, you need to rewrite the hash in the HL first.
But if you use TSynPasSyn.IsKeyword() of "SynHighlighterPas", it doesn't use the tipical hash-algorithm. It use a simple "Find" on a "TStringList".
Would you point me to more details? e.g. The documentation. The doc is fpc? So best to discuss on the fpc-mail list?
Two things I noticed in a quick scan of the relevant units. The keyword 'on' is listed correctly as a TP word, but then repeated in the Delphi list. The current docs (presumably the source of this) also have this same duplication, but it is clearly a typo.
Since FPC allows '@' as the first character of an identifier, I also wonder if TValidStringChars and the protected GetIdentChars method need to be updated.
var
&type: pchar;
type should not highlight.Only compared against the code by HowardPC
On my system:
HowardPC 1295
Prefix 795
Would you point me to more details? e.g. The documentation. The doc is fpc? So best to discuss on the fpc-mail list?I've already pointed it out to Michael van Canneyt.
Does it? "@" is an operator. It can appear in front of an identifier. (With or without a space: OnClick := @ Foo; )My mistake. As you surmised I meant '&' not '@'. I never use this facility and remembered the wrong prefix character.
However the "&" is not yet handled
the current hash, has an implementation flaw
A hashed search array is a lot faster than search trees (trie).
but creates unique hashes.You mean you found a perfect hash that works with an not-limited input? That can simple not be.
A hashed search array is a lot faster than search trees (trie).
However computing a hash is slow, so unless you present a routine that combines all the operations (hash computation and searching) to give a working IsKeyword function there is nothing concrete to compare beyond opinions.
A hashed search array is a lot faster than search trees (trie).Most likely yes. So long as the hash is cheap.
However computing a hash is slow, so unless you present a routine that combines all the operations (hash computation and searching) to give a working IsKeyword function there is nothing concrete to compare beyond opinions.
Do we talk about the piece of code you embedded in your post?
Quotebut creates unique hashes.You mean you found a perfect hash that works with an not-limited input? That can simple not be.
Some none pascal word will always give the same hash as one of the pascal tokens.
Yes, we are still missing any good comparison. So far we got a lot of impressive, but meaningless times.
It's not faster than the synhighlighter tree-based algorithm (on my computer it's consistently about 2% slower)
however it is simpler to implement, with source code less than half the length
Are you still comparing with TSynPasSyn.IsKeyword()?No, I do read Martin's posts! He indicated a while ago that such a comparison was useless.
It's not the hash rutine of the Lazarus Pascal Highlighter.
I can't compile your code. Is it 64 bits?You have to use a Lazarus compiled from SVN trunk (or a recent snapshot) otherwise the tree-matcher does not compile since it is based on fairly recent Synedit commits.
I still think, that the hash (if done correctly) will be fastest.
That is in generic. Depends on a lot of factors. The hash will especially be faster, if there are many keywords.
Attached is a newer slightly optimized version of the matcher.Testing against the 9 Lazarus source files used in the project attached to my earlier post shows that this matcher incorrectly identifies the words - ID, Now, OI, dit, val, Val, dir - as keywords.
I have tested it against a lot sources.
If the hashes are equal then the comparision has never failed.
So far my assumption that a save key check without any real comparison is possible stays true.I understand neither your assumption nor why it should be true (whatever it means).
If you rely only on a hash, comparing only the hash value and not compare the word, or anything else, then there will always be some word (word = combination of random letters) that will fail it.
For a given set of keywords, there may be a hash, that if compared together with the length, will not allocate any other words (of the same length) to any of the hash values of any keyword.
To prove you have such a hash, you must try ALL combinations of letters of the length of each keyword.
And if you ever have to add another keyword, well....
And the point of this? Did you actually read what I wrote. And (since you did not ask questions) Did you understand it?
The maximal amount of different strings for a given range of lenths:
(number of different chars) * (maximal length) * (medium length).
For any given set of keywords:
The maximal amount of different strings determined by max and min length will be just a few thousands.
But first of all, I am writing this under the assumptions, you still try to "proove" that comparing the hash value, but skipping the string compare) is a solution ?
So you also compare the length, in addition to the hash value?
Then why did you not answer to:QuoteBut first of all, I am writing this under the assumptions, you still try to "proove" that comparing the hash value, but skipping the string compare) is a solution ?
If you honour speed that much over maintainability
So you also compare the length, in addition to the hash value?
NO.
The maximal amount of different strings for a given range of lenths:
(number of different chars) * (maximal length) * (medium length).
But other words are limited only by memory and disk space
Hashes will only be created for strings within the min and max lenths of the keywords.
NO.
So you also compare the length, in addition to the hash value?
QuoteHashes will only be created for strings within the min and max lenths of the keywords.NO.
So you also compare the length, in addition to the hash value?
Those 2 statements directly contradict each other.
Lets agree to disagree.
All else seems to lead nowhere.
By the way, I'm good on mathematics (but not in English) and your formula is wrong:
Your test with using "random" is not a guarantee, but could be modified to iterate all possible words.
(number of different chars)^n + (number of different chars)^(n+1) + ... + (number of different chars) ^m
Your test with using "random" is not a guarantee, but could be modified to iterate all possible words.
Would be a bit less than (number of different chars) * (maximal length) * (medium length).
For Pascal less than 2548 different strings, including the keywords.
Lenght 2: (number of different chars) ^ 2 eg. 26^2 = 676.
Lenght 3: (number of different chars) ^ 3 eg. 26^3 = 17576
Lenght 4: (number of different chars) ^ 4 eg. 26^4 = 456976
Lenght 5: (number of different chars) ^ 5 eg. 26^5 = 11881376
Lenght 6: (number of different chars) ^ 6 eg. 26^6 = 308915776
Sumary A LOT!!!!!
Your test with using "random" is not a guarantee, but could be modified to iterate all possible words.
Would be a bit less than (number of different chars) * (maximal length) * (medium length).
For Pascal less than 2548 different strings, including the keywords.
Please see Edson's formula.
implementation = 14 char long
26^14 = 64509974703297150976 (if I copied it correctly from my calculator)
+ 26^13 + 26 ^ 12 ....
Failed in your code (several times)
@Morton
this is what i did :
1. reduce your hash function output to smallint.
reason : my pc not fast enough to search all combination with 32bit (cardinal) output.
but i also did :
2. use maximum keyword length 5 chars. pascal keyword need 14 char
3. find hash for keyword 'end' (12251)
4. try to find hash for all 2-5 chars ('a'-'z') combination which equal to 12251.
result :
find 185 combination
uqdc=12251
wwje=12251
....
....
....
azlrz=12251
dfzwz=12251
with your original hash function (cardinal output) did not make any difference because all keyword combinations (14 char) bigger than cardinal capacity.
None of these combinations will pass the matrix
and thus there will never be a hash generated for them.
Never learned to read?
None of these combinations will pass the matrix
and thus there will never be a hash generated for them.
Never learned to read?
your code said otherwise. i read your code and you should did too.
i am done. bye.
Maybe it's time a moderator took a look at this topic as I think some people really show they have no manners at all.
Is it so hard to discuss something without insulting the other?
Morton, please learn some manners and try to behave in this forum.
The way you behave now I think more annd more people will just ignore you as they have had enough of your insults.
>:( ...
No of course... only the screen range will be rescanned and maybe a bit of code before...
::)
>:( ...
No of course... only the screen range will be rescanned and maybe a bit of code before...
::)
Usually only a single line will be scanned,
but some chars will cause a scan from the modified line to the end of the whole code.
The thread was started by one who felt urged to show how smart he is.
>:( ...
No of course... only the screen range will be rescanned and maybe a bit of code before...
::)
Usually only a single line will be scanned,
but some chars will cause a scan from the modified line to the end of the whole code.
The thread was started by one who felt urged to show how smart he is.
It's not right for example if I type {*, the whole doc doesn't need to be analyzed, if the text displayed on my screen doesn't include the *}...
Because idealy the highlighter will have set the flags for: "scanStringEnd, scanComment End etc...)...
It was allways my stance people could do with my code everything they are capable of,
which wasn't usually that much with few exceptions
[I think your tone is very arrogant.