Lazarus

Programming => General => Topic started by: wytwyt02 on November 15, 2019, 12:57:41 pm

Title: Regexpr do not support \S?
Post by: wytwyt02 on November 15, 2019, 12:57:41 pm
I wanna to use `\S` to match anything(include newline) in regex I have set the RegExprModifierM to true, but seems not work

Code: Pascal  [Select][+][-]
  1.     RegExp.RegExprModifierM := true;
  2.  
Title: Re: How to enable multiple line mode when use Regexpr?
Post by: dpremus on November 15, 2019, 08:08:31 pm
Could you give as more detail about what are you trying to solve?
Maybe you don't need regex at all.
Title: Re: How to enable multiple line mode when use Regexpr?
Post by: wytwyt02 on November 16, 2019, 10:30:31 am
Could you give as more detail about what are you trying to solve?
Maybe you don't need regex at all.

For example, I have following text in html should be matched, a div with id "pagination" should end with </body> tag

Code: Text  [Select][+][-]
  1. <div id="pagination" style="position:absolute;top:328.5px;left:800px;height:220px;width:14.1px;padding:10px 10px 10px 8.4px;background:rgb(137,210,148)">
  2.         <a style="display:block;margin-right:3px;color:white;text-decoration: none;" href="fptxxxgf-01.htm"></a>
  3.         <br>
  4.         <a style="display:block;margin-right:3px;color:white; text-align: center;text-decoration: none;" href="fptxxxgf-000.htm">目录</a>
  5.         <br>
  6.         <a style="display:block;margin-right:3px;color:white;text-decoration: none;" href="fptxxxgf-03.htm"></a>
  7. </div>
  8. <script type="text/javascript">
  9.         var pagination = document.getElementById('pagination')
  10.         init()
  11.         document.addEventListener('scroll', function(e){
  12.                 init()
  13.         })
  14.  
  15.         function init(){
  16.                 var clientHeight = document.body.clientHeight
  17.                 var scrollTop = document.body.scrollTop
  18.                 var bottomTop = scrollTop + clientHeight
  19.                 pagination.style.top = bottomTop - (clientHeight / 2) + 'px'
  20.         }
  21. </script></body>

So I have write following regex to match that:

Code: Text  [Select][+][-]
  1. <div id="pagination"[.|\s|\S|\n]+</body>

This regex will work in some regex tool(https://regex101.com/r/BxGbbn/2), but not work in lazarus
Title: Re: Regexpr do not support \S?
Post by: dpremus on November 16, 2019, 06:44:06 pm
I'm not using RegExp at all but I found a link to documentation inside RegExpr unit.

https://regex.masterandrey.com/en/latest/regular_expressions.html#introduction.

You can find here what is supported.

Seems that dot inside expression "[.|\s|\S|\n]" causes problem.

But this code return results that you want:

RegExp.LineSeparators := #$d#$a;
RegExp.Expression := '<div id="pagination".+</body>';
Title: Re: Regexpr do not support \S?
Post by: Abelisto on November 16, 2019, 07:43:40 pm
I wanna to use `\S` to match anything(include newline) in regex

In regular expression the big letter means "not the small letter". For example: \d - any digit, \D any non-digit; \w - any letter, \W any non-letter; \s - any space, \S any non-space etc
Title: Re: Regexpr do not support \S?
Post by: wytwyt02 on November 16, 2019, 09:57:21 pm
I'm not using RegExp at all but I found a link to documentation inside RegExpr unit.

https://regex.masterandrey.com/en/latest/regular_expressions.html#introduction.

You can find here what is supported.
n
Seems that dot inside expression "[.|\s|\S|\n]" causes problem.

But this code return results that you want:

RegExp.LineSeparators := #$d#$a;
RegExp.Expression := '<div id="pagination".+</body>';

What's the #$d#$a main? LineSeparators  might be \n?
Title: Re: Regexpr do not support \S?
Post by: dpremus on November 16, 2019, 11:28:48 pm
What's the #$d#$a main? LineSeparators  might be \n?

$D  = 0x0D (hex)
$A  = 0x0A  (hex)
$D$A = CRLF

Line separators:

If you only want to extract everything from  '<div id="pagination"' to body '"</body>"' you can do it without regexp.
here is the function:



function ExtractStr(SourceStr, PrefixPattern, SuffixPattern: String): String;
var
  p1, p2: Integer;
begin
  Result := '';

  p1 := Pos(PrefixPattern, SourceStr);
  If p1 = 0 Then Exit;

  p2 := Pos(SuffixPattern, SourceStr);

  If p2 = 0 Then
   p2 := Length(SourceStr);

  Result := Copy(SourceStr, p1, p2 - p1 + Length(SuffixPattern));
end;


If you want to clear line separators from some string

Windows: '\r\n'
Mac (OS 9-): '\r'
Mac (OS 10+): '\n'
Unix/Linux: '\n'

you can make this function:

function ClearLineSeparators(SourceString: String): String;
begin
   Result := StringReplace(SourceStr, #13, '', [rfReplaceAll]);
   Result := StringReplace(Result, #10, '', [rfReplaceAll]);
end;

(There is a room for optimization, these functions can be done with only one pass from string using for loop)


Title: Re: Regexpr do not support \S?
Post by: zamronypj on November 17, 2019, 12:41:08 am
Try with modifier m. With this modifier string is consider multiline

https://regex.masterandrey.com/en/latest/regular_expressions.html#m-multi-line-strings

Then you can match with pattern <div id="pagination".*</body>
TinyPortal © 2005-2018