Summary: Trying to get TRegExpr to work, .Match and a .MatchNext loop find the correct number of hits, but .MatchNext neither returns the matched strings nor their start and length.
Details: Restricted to a minimal example, I have a RichMemo displaying larger amounts of text, in which all dates (in German notation "[D]D.[M]M.YYYY")) should be highlighted in color. Since it is a text with German umlauts, it is Unicode, under Linux it is encoded in UTF8. Because TRegExp seems to return the
byte position and not the
character position ("{off $DEFINE UniCode}" or "{$DEFINE UniCode}" made no difference), I couldn't think of anything better than just look at the matched string the RegEx yields and subsequently use UTF8Pos to find its character position.
So I thought the following would do the job:
procedure TForm1.Button1Click(Sender: TObject);
var
e: TRegExpr;
i: Integer;
MatchPosU, LastMatchPosU, MatchLenU: Integer; // Start and Length of Match in characters (not bytes)
const
RE_DATUM = '[0123]?[0-9]\.[01]?[0-9]\.[0-9]{4}'; // just a simple matching demo, no real date validation needed!
INI_HIGHLIGHT_COLOR = '#FF00FF';
begin
e := TRegExpr.Create(RE_DATUM);
i := 0;
if e.Exec(RichMemo1.Text) then
repeat
MatchPosU:=UTF8Pos(e.Match[i], RichMemo1.Text, lastMatchPosU+1); //search only in residual part of text
MatchLenU:=UTF8Length(e.Match[i]);
WriteLn(i, ': "', e.Match[i], '" at "', MatchPosU, ' for ', MatchLenU, ' characters');
RichMemo1.SetRangeColor(e.MatchPos[i]-1,
e.MatchLen[i],
StringToColor(INI_HIGHLIGHT_COLOR));
LastMatchPosU := MatchPosU;
i := i + 1 ;
until not e.ExecNext;
e.Free;
end;
That works, but only for the first match. Looking at STDOUT,
0: "9.5.2012" at "0 for 8 characters
1: "" at "0 for 0 characters
2: "" at "0 for 0 characters
3: "" at "0 for 0 characters
4: "" at "0 for 0 characters
5: "" at "0 for 0 characters
6: "" at "0 for 0 characters
7: "" at "0 for 0 characters
8: "" at "0 for 0 characters
9: "" at "0 for 0 characters
10: "" at "0 for 0 characters
11: "" at "0 for 0 characters
The remaining matches are apparently found by the regex object, but ".match[ i ]" always returns an empty string.
What am I doing wrong or could I have overlooked?
Contextual information: Lazarus 2.0.7 r62276M FPC 3.0.4 x86_64-linux-gtk2 on Linux, Programming experience: quite rookie ,-), Minimal sample project attached.