Recent

Author Topic: Checking Speed of SynEdit Highlighters  (Read 85974 times)

Edson

  • Hero Member
  • *****
  • Posts: 1044
Checking Speed of SynEdit Highlighters
« on: September 11, 2013, 08:45:45 pm »
I have been analizing the implementation of several languages highlighters for SynEdit and I wondered if it is necessary to use a hash-function method for to identify tokens. I know that method is very fast, but complicated and no readable, moreover it is difficult for configurate from an external file.

I have used a simpler method for implementing a highlighter (http://blog.pucp.edu.pe/media/4946/20130905-coloreado_de_sintaxis_con_synedit.pdf  - 1.3.6) without using hash functions and I discovered it was, the same of fast that the hash function. It is based on using the first character of tokens for searching.

I made some tests for getting the real speed of the highlighters. I used the PHP highlighter like a sample(because it is simple). This is my result:

Time for proccesing a sample PHP file for 100000 times:

* SynAnySyn highlighter configured as PHP: 67 seg, 67 seg, 67 seg.
* PHP highlighter (with hash functions): 11seg, 11 seg, 11 seg.
* PHP highlighters with alternative method: 11 seg, 11 seg, 12 seg.

Well, we all knew the SynAnySyn, was slow. But, I surprised that the common PHP highlighter with hash functions, is the same fast as a no-hash functions highlighter.
 
Probably I should make more tests, but by now, I ask:

Why we complicate using hash functions?

And my conclusion is:

"It is possible to implement a fast highlighter without using the cryptic hash functions."

I think this could help for improving the SynAnySyn highlighter and using an external file for syntax of SynEdit.

Do you know of some alternative method for implementing fast highlighters?
« Last Edit: September 11, 2013, 08:51:12 pm by Edson »
Lazarus 1.6 - FPC 3.0.0 - x86_64-win64 on  Windows 7

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5633
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #1 on: September 11, 2013, 10:24:31 pm »
First of all. The hashes are NOT a must, if you do your own highlighters.

As for the build in highlighters, there may be equally fast or even faster ways. But changing the current implementation will have to be very well considered, and tested in various ways. Including full test-suite (unit tests), so (hopefully) no functionality will be broken.

"WHY the hash?"
Don't know, that was done by someone maybe a decade ago.


---------------

The speed of finding tokens depends on many factors. Hashes are fast as far as algorithm goes (big O notation), true. But on very small data they may have more overhead, and even be slower.

Next to considering the algorithm will be the implementation. Code and data size (as well as data distribution in memory) can make huge differences, Can the CPU fit all into the cache (into as little cache lines as possible [ http://en.wikipedia.org/wiki/CPU_cache#Cache_entries ]) ? Though with todays cache technologies, again a few 100 bytes will not matter.


Looking at your comparison:

Well the HL use a hash, but it is not a "perfect hash". So for every hit, the keyword is again compared.

I would *assume* that in a text that as a high amount of keywords [1] (99% or more), your HL may be faster. While if there are no keywords the hash will be better.

[1] anything that the HL will match.


Another option would be a trie (google: aho corasick), but that can use lots of memory, and then the cache issue may make it slower. Though it would be good for the SynAnySyn.

There are also thinks like pplex (IIRC there is one that does pascal). They can generate lexers. So there is no need to worry about spaghetti code with long ifdef, case or other stuff.

---------------

BTW, a good tool to measure speed is callgrind (part of valgrind) and kcachegrind to view results.

But it is linux only.


Edson

  • Hero Member
  • *****
  • Posts: 1044
Re: Checking Speed of SynEdit Highlighters
« Reply #2 on: September 12, 2013, 04:08:45 am »
Hi Martin,

I don't pretend to change the current highlighters. You are right. It will be a risk by now. It's necessary to make more tests. And if it do, it will be a lot of work.

Probably, new highlighters must consider not to use the hash method.

I point to have an alternative method (and simpler) for implementing highlighters. And specially, have a Fast SynAnySyn Highlighter, that can use an external file for the sintax. For all I Know, it doesn't exist.
 
Of course I also, wanted to show statistics about highlighters speed.
Lazarus 1.6 - FPC 3.0.0 - x86_64-win64 on  Windows 7

Leledumbo

  • Hero Member
  • *****
  • Posts: 8108
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Checking Speed of SynEdit Highlighters
« Reply #3 on: September 12, 2013, 07:14:12 am »
Prefix tree is the fastest of all, as it compares the string exactly only once (O(n) where n is the string length). The space complexity is not so big either (compared to hash table), but unless you're implementing lexers for languages with A LOT of keywords (C++, COBOL, etc.) it would be somewhat overkill.

Edson

  • Hero Member
  • *****
  • Posts: 1044
Re: Checking Speed of SynEdit Highlighters
« Reply #4 on: September 13, 2013, 05:06:27 am »
That's right. I have used a Prefix Tree, but implemented for only the first level. That's why it's easy to implement, because full implemented should be complicated. Of course this method could be slower if the sintax has many words starting with the same character.  In that case, it could be improved using a second level of the tree.

I discovered an error in my test, correcting , I have nice a surprise:

* SynAnySyn highlighter configured as PHP: 67 seg, 67 seg, 67 seg.
* PHP highlighter (with hash functions): 11seg, 11 seg, 11 seg.
* PHP highlighters with alternative method: 10 seg, 10 seg, 10 seg.

A "One level Prefix Tree" method is not only simpler. It is even faster than the hash method.

I will try to make a highlighter using an external file and test the speed.
Lazarus 1.6 - FPC 3.0.0 - x86_64-win64 on  Windows 7

Edson

  • Hero Member
  • *****
  • Posts: 1044
Re: Checking Speed of SynEdit Highlighters
« Reply #5 on: September 22, 2013, 08:39:09 pm »
Finally I have implemented a simple Highlighter configurated with a External file syntax.

I made some new comparative tests (always using the PHP language as "guinea pig"):

* SynAnySyn highlighter configured as PHP: 6.12 seg, 6.22 seg, 6.12 seg.
* PHP highlighter (with hash functions): 1.04 seg, 1.02 seg, 1.03 seg.
* New highlighters with external PHP file: 1.03 seg, 1.02 seg, 1.02 seg.

I still have to implement some features, but until now, it's faster enough.
« Last Edit: September 22, 2013, 10:03:58 pm by Edson »
Lazarus 1.6 - FPC 3.0.0 - x86_64-win64 on  Windows 7

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #6 on: October 13, 2013, 04:57:39 pm »
Why we complicate using hash functions?

The speed of an algorithm depends on CPU,  compiler and available memory.
In this case also from the graphic routines and system.

I designed mwEdit for Delphi not FPC and to run fast on CPUs below 100MHz and memory in single number MB space.
mwEdit was just intended to be an example on how to create a fast and versatile highlighting editor.
The version only written by myself was significantly faster than Delphis own,
and the only available behaving like it.
Once it was published lots of other editors emerged.

BTW it is even possible to create highlighters in C# or any other NET language
without much speed penalty, provided one knows how.



Edson

  • Hero Member
  • *****
  • Posts: 1044
Re: Checking Speed of SynEdit Highlighters
« Reply #7 on: October 13, 2013, 06:11:44 pm »
Hi Norton,

I didn't know you were the designer of the SynEdit.

Quote
The speed of an algorithm depends on CPU,  compiler and available memory.

It's true. I'm aware of that.

I don't say that Hash method is the worse, but probably in this case it isn't the best.
It's always good to question the efficiency of the algoritms we are using. I have been testing the speed of a highlighter and have made many tests. But, like I said. it's necessary to make more tests. I haven´t tested many cases.

When I was designing my highlighter I was aware of the size of the code too. No only the speed. And other objetive was to be easy for maintaining.

Quote
BTW it is even possible to create highlighters in C# or any other NET language
without much speed penalty, provided one knows how.

It's curious because I have implemented a Editor with syntax highlighter, autocompletion (that responde to the context), and "Column Mode Edit", without external dll's nor ocx's in VB6:

http://blog.pucp.edu.pe/item/178457/editor-con-sintaxis-coloreada-en-visual-basic-6


« Last Edit: October 13, 2013, 06:31:58 pm by Edson »
Lazarus 1.6 - FPC 3.0.0 - x86_64-win64 on  Windows 7

Morton

  • Full Member
  • ***
  • Posts: 111
Re: Checking Speed of SynEdit Highlighters
« Reply #8 on: October 13, 2013, 06:37:34 pm »
I don't say that Hash method is the worse, but probably in this case it isn't the best.

Maybe this simple hash is not the best for FPC and modern CPUs,
but for the version of Delphi and CPUs available at this time it was.
Last time (years ago) I have tested it Delphi outperformed FPC by two times with this algorithm.

BTW even preselection based on just the first char is a hash.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5633
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #9 on: October 13, 2013, 07:11:31 pm »
It is known that in many cases Delphi can optimize code better than FPC. (Though I have not checked very recently).
But comparing algorithms, we should compare with the same compiler.

The hash is a well performing and cheap to implement way.

The speed will also depend on:
* the data:
- the hash can perform well, if the percentage of keywords is low, because then the hash value already decides that there is ne match. (and the computing of the hash, is a very small bit of code, optimal for making use of cpu cache, and probably other optimizations performed by the cpu.
- using a prefix will become very slow, if a pascal source contain many identifiers starting with a keyword, like "WhileFoo", "BeginField", ... because they all must be compared to the first after the keyword.

In other words, the prefix may perform better on best, and maybe average case, but probably it has a far worse worst case.

* implementation.
The current hash base does a lot of subroutine calls. that slows stuff down.
Maybe the new prefix based implementation has less function calls / more code in a single procedure?
- inline some (e.g. KeyHash, IdentKind)
- skip creation of  exception frames ({$IMPLICITEXCEPTIONS off})
  (do have a fair compare, add this to BOTH)

The difference measured between the 2, may be caused by the algorithm, but it may as well be by the implementation.

Edson

  • Hero Member
  • *****
  • Posts: 1044
Re: Checking Speed of SynEdit Highlighters
« Reply #10 on: October 13, 2013, 08:15:27 pm »
So, without consider the implementation, we could say:

* If your syntax have many "keywords" starting the same char, like "while1", "while2", ... it's better using hash function (assuming your hash function can distribute well the keywords, other way it have no sense)
* If your syntax have his "keywords" well alphabetic distributed, use First-char-prefix.

« Last Edit: October 13, 2013, 08:17:19 pm by Edson »
Lazarus 1.6 - FPC 3.0.0 - x86_64-win64 on  Windows 7

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5633
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #11 on: October 13, 2013, 08:37:51 pm »
Well both algorithm use an additional string compare, if the first step says that there may be a match.

I have not gone and calculated the exact cost of the 2 algorithm in full  (including this string compare).
My statement is just an assumption.

But since the string compare is the same in both, it simply depends, on how often the string compare is triggered for none keywords.

Edson

  • Hero Member
  • *****
  • Posts: 1044
Re: Checking Speed of SynEdit Highlighters
« Reply #12 on: October 13, 2013, 09:14:52 pm »
But there is a difference:

* Hash functions:
  Start with alphabetic -> pass to funtion to capture identifiers.
  Capture identifier, and calculate Hash funtion.
  With hash function value, call to function to process.
  Make comparisons.

* First-char-prefix:
  Start with alphabetic -> pass to prefix-function.
  Capture identifier, and make comparisons.

There is an extra load for hash method.
« Last Edit: October 13, 2013, 09:18:40 pm by Edson »
Lazarus 1.6 - FPC 3.0.0 - x86_64-win64 on  Windows 7

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5633
    • wiki
Re: Checking Speed of SynEdit Highlighters
« Reply #13 on: October 13, 2013, 10:09:25 pm »
That is what I wrote. But that is a difference in the implementation, and NOT in the algorithm.

You can implement the hash with less function calls.

Edson

  • Hero Member
  • *****
  • Posts: 1044
Re: Checking Speed of SynEdit Highlighters
« Reply #14 on: October 13, 2013, 10:15:39 pm »
But always, you'll need to do one more step (calculate the hush function). It's algorithm.

There is another difference:

* Hash functions:
  Make comparisons with n characters.

* First-char-prefix:
  Make comparisons with n-1 characters.
Lazarus 1.6 - FPC 3.0.0 - x86_64-win64 on  Windows 7