Recent

Author Topic: TRegExpr UNICODE  (Read 1725 times)

BubikolRamios

  • Full Member
  • ***
  • Posts: 194
TRegExpr UNICODE
« on: January 08, 2018, 02:34:44 am »
Code: Pascal  [Select][+][-]
  1. re := TRegExpr.Create();
  2. re.ModifierI := true;//case innsensitive
  3. re.InputString := '"Dasypyrum villosum","Žitec dlakavi"'
  4. re.Expression := '","[a-zčžš]+ [a-zčžš]+"$'
  5. re.Exec(1) //returns false
  6.  

As far as  I can see it does not understand Č,Ž and Č, at least not at start of string.
čžš are OK inside string

How to solve that ?

http://regexpstudio.com/en/tregexpr_interface#unicode
Quote
How to use Unicode

TRegExpr now supports UniCode, but it works very slow :(

Who want to optimize it ? ;)

Use it only if you really need Unicode support !

Remove . in {.$DEFINE UniCode} in regexpr.pas. After that all strings will be treated as WideString.

Did that - does not help.

« Last Edit: January 08, 2018, 02:48:39 am by BubikolRamios »
lazarus-2.0.2-fpc-3.0.4-win32

Jurassic Pork

  • Hero Member
  • *****
  • Posts: 825
Re: TRegExpr UNICODE
« Reply #1 on: January 08, 2018, 06:26:56 am »
hello,
i think that  not ascii characters don't work with case insensitive modifier. You need to put all the cases.
Exemple :
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button5Click(Sender: TObject);
  2. var re : TRegExpr;
  3. begin
  4. re := TRegExpr.Create();
  5. re.ModifierI := true;//case insensitive
  6. re.InputString := '"Dasypyrum villosum","Žitec dlakavi"';
  7. re.Expression := '","[a-z螚ȊŽ]+ [a-z螚ȊŽ]+"$';
  8. if re.Exec(1) then showMessage('expression found');
  9. end;  

Friendly, J.P
Jurassic computer : Sinclair ZX81 - Zilog Z80A à 3,25 MHz - RAM 1 Ko - ROM 8 Ko

BubikolRamios

  • Full Member
  • ***
  • Posts: 194
Re: TRegExpr UNICODE
« Reply #2 on: January 08, 2018, 03:36:55 pm »
Thanks. That does it.

But one could not expect that end user (entering regex) would carry that info in head.

https://github.com/BeRo1985/flre
Apparently this would work, did not test it jet (with exception that including required units into project makes all existing regex code unusable, apparently objec naming overlaps with existing regex module )
Or something.
« Last Edit: January 08, 2018, 03:40:11 pm by BubikolRamios »
lazarus-2.0.2-fpc-3.0.4-win32

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: TRegExpr UNICODE
« Reply #3 on: January 08, 2018, 09:15:01 pm »
The reason it does not work because it does not handle upper/lower case for letters with code above 127:
Code: Pascal  [Select][+][-]
  1. class function TRegExpr.InvertCaseFunction (const Ch : REChar) : REChar;
  2.  begin
  3.   {$IFDEF UniCode}
  4.   if (Ch >= #128) then
  5.    Result := Ch;
  6.  

simply change that to:
Code: Pascal  [Select][+][-]
  1. class function TRegExpr.InvertCaseFunction (const Ch : REChar) : REChar;
  2.  begin
  3.   {$IFDEF UniCode}
  4.   if (Ch >= #128) then
  5.   begin
  6.    Result := UpCase(Ch);
  7.    if Result = Ch
  8.      then Result := LowerCase(Ch);
  9.   end
  10.   else
  11.   {$ENDIF}
  12.    begin
  13.     Result := {$IFDEF FPC}AnsiUpperCase (Ch) [1]{$ELSE} {$IFDEF SYN_WIN32}REChar (CharUpper (PChar (Ch))){$ELSE}REChar (toupper (integer (Ch))){$ENDIF} {$ENDIF};
  14.     if Result = Ch
  15.      then Result := {$IFDEF FPC}AnsiLowerCase (Ch) [1]{$ELSE} {$IFDEF SYN_WIN32}REChar (CharLower (PChar (Ch))){$ELSE}REChar(tolower (integer (Ch))){$ENDIF} {$ENDIF};
  16.    end;
  17.  end; { of function TRegExpr.InvertCaseFunction

and it should work, I believe.

Here is my test:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   LazUTF8, RegExprU2;
  7.  
  8. var
  9.   s: string;
  10.   re: TRegExpr;
  11. begin
  12.   re := TRegExpr.Create();
  13.   re.ModifierI := true;//case innsensitive
  14.   s := '"Dasypyrum villosum","Žitec dlakavi"';
  15.   re.InputString := s;
  16.   s := '","[a-zčžš]+ [a-zčžš]+"$';
  17.   re.Expression := s;
  18.  
  19.   WriteLn(re.Exec(1));
  20.   ReadLn;
  21. end.

It returns True.

 

TinyPortal © 2005-2018