hum, hello Edson,
You will face up a problem with this method.
In fact, Modern "regular" languages where designed to deal with two things in mind :
1) The obvious : Line numbering was a pain in the "XXX" for coding. Open a any Magazine from the 80s and count all the listings dealing with "RENUM" routines: they are Legion ! But the numbering was essential in order to dis tinguish an immediate command from a line insert/editing after the "Ready" prompt :
Ready
PRINT "Hello"
Ready
10 PRINT "Hello"
2) Old school "Darmouth like" BASIC languages where designed for memory compactness (white spaces are just code formatters and are meaningfull to the interpreter) and not modular at all (procedure/functions do not exist) to the point that for that for certain BASIC Dialect "GO TO" and "GO SUB" where just the same that "GOTO" and "GOSUB" or "P R I N T" was meaning "PRINT" (not for the ORIC whose BASIC interpreter did allow "token splitting" in edition mode just for the line numbers.)
"GO TO" example :
https://en.wikipedia.org/wiki/BASIC#Unstructured_BASIC (line 70, the wikipedia highlighter can't handle it properly !
)
When typing :
Ready
2 10 5PRINT"Hello"
The interpreter was removin the first whitespace character until a non number or non white character was meet, was recombining the numbers and was adding a space so that the listing was rewritten like this :
Ready
LIST
2105 PRINT"Hello"
By the way, your solution fails here... mine is compliant.
I have tried to fiddle with this without any success :
<!--Line number-->
<Regex Text="[ ]*"/>
<Regex Text="[0-9\s]+" />
An highlighter for BASIC should Allow the same silly editing "facility", you understood it.
In fact, I was thinking about 2 mechanism for editing :
- when loading a file : recombining the line numbering like the interpretor of the Oric is doing, but it is a problem if you load a pascal file or a FORTRAN one, with the Oric BASIC highlighter loaded.
- from within edit windows : ignoring any spaces until a non number is entered (a kind of "dynamic" completion code"). Elegant if your highlighter deals only with BASIC, Not a good idea if you load a C file.
3) modern languages were also designed for "Context free grammars", bit old school BASIC is contextual and sometimes it is impossible to build a valid BNF grammar.
For instance, a few years ago I tried to write a parser for the Oric with Gold Parser, it was easy to do it until one point. In order to load from or save to a tape, the BASIC commands where CLOAD and CSAVE :
CLOAD "filename"(,S)
CLOAD ""(,S)
CLOAD "filename",J,(,S)
CLOAD "filename",V,(,S)
CLOAD "filename",V,(,S)
CSAVE "filename"(,AUTO)(,S)
CSAVE "filename",A adr,E adr(,S)
J=join (=merge) filename to existing BASIC listing in memory
V=verify filename on tape
AUTO=run the program AUTOmatically when using CLOAD
S=Program in "slow mode" on tape. Safer when loading back, but you were waiting 10 minutes instead of 3 in order to wait for your program to be loaded on the Oric !
The J,V or S options is totally contextual and impossible to handle with LLAR or LL parsers :they impose to be recognised as standard float variables !
It's the same problem with the strings in DATA that are allowed to not be quoted (see above).
Of course, in cross development, and emulators, we are building digitalized tapes or wav files and the ,S will not be used anymore (fiability of the signal is better than real tapes/tape recorders), and we can, now, live with it. But if we want to stick to the original BASIC language we face up some other problem with modern highlighter that can handle only Context free grammars. And that's why Notepad++ does not have a single highlighter for old school BASIC (not even one !).
All of that to say that the only tokpos exposition seems (to me) not the best choice.
Ideally, the use of grammar file should permit to redirect to an event handler in code or a script file.
Something like that, maybe :
<Regex Text="DATA" ifTrueCustomEvent="1"/>
TSynFacilSyn1.OnCustom1:=CustomTreatment1;
procedure CustomTreatment1(TokPos:integer;TokString:string);
begin
if TokString='DATA' then
begin
...
end;
end;