Recent

Author Topic: Help with search and repalce whole word without case sensitivity  (Read 2078 times)

avk

  • Hero Member
  • *****
  • Posts: 825
Re: Help with search and repalce whole word without case sensitivity
« Reply #15 on: November 13, 2025, 04:25:49 pm »
...
Though i just looked into the mentioned file which came with fpc3.2.2, and i can't find the mentionend commented Define.
I found it in "uregexpr.pp" though, which INCLUDES regexpr.pas
So it's probably just either
1) inserting a {$DEFINE UniCode} at the beginning of regexpr.pas
2) a "UniCode"-Constant in Project-Options
3) include "uregexpr" in Uses instead of "regexpr" (Everyone can have a guess, what the "u" in the unit-name stands for.....)

Sounds pretty good, let's choose option number 3.
So, the desired function could look something like this:
Code: Pascal  [Select][+][-]
  1. function ReplaceAllWholeWordsCI(const Source, Pattern, Replacement: string): string;
  2. var
  3.   Expr: RegExprString;
  4. begin
  5.   Expr := RegExprString(string('(?:\b') + Pattern + string('\b)'));
  6.   Result := string(ReplaceRegExpr(Expr, RegExprString(Source), RegExprString(Replacement), [rroModifierI]));
  7. end;
  8.  

But unfortunately, it seems that no miracle has happened, it only works correctly in the ASCII range.

Zvoni

  • Hero Member
  • *****
  • Posts: 3187
Re: Help with search and repalce whole word without case sensitivity
« Reply #16 on: November 13, 2025, 04:56:25 pm »
...
Though i just looked into the mentioned file which came with fpc3.2.2, and i can't find the mentionend commented Define.
I found it in "uregexpr.pp" though, which INCLUDES regexpr.pas
So it's probably just either
1) inserting a {$DEFINE UniCode} at the beginning of regexpr.pas
2) a "UniCode"-Constant in Project-Options
3) include "uregexpr" in Uses instead of "regexpr" (Everyone can have a guess, what the "u" in the unit-name stands for.....)

Sounds pretty good, let's choose option number 3.
So, the desired function could look something like this:
Code: Pascal  [Select][+][-]
  1. function ReplaceAllWholeWordsCI(const Source, Pattern, Replacement: string): string;
  2. var
  3.   Expr: RegExprString;
  4. begin
  5.   Expr := RegExprString(string('(?:\b') + Pattern + string('\b)'));  
  6.   Result := string(ReplaceRegExpr(Expr, RegExprString(Source), RegExprString(Replacement), [rroModifierI]));
  7. end;
  8.  

But unfortunately, it seems that no miracle has happened, it only works correctly in the ASCII range.
guessing here:
In your Line 5 and 6: Try casting to WideString instead of "String"

and the leading "\b" is not neccessary
widestring('(?:') + Widestring(Pattern) + Widestring('\b)')
should be enough
« Last Edit: November 13, 2025, 04:58:53 pm by Zvoni »
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

avk

  • Hero Member
  • *****
  • Posts: 825
Re: Help with search and repalce whole word without case sensitivity
« Reply #17 on: November 13, 2025, 06:26:23 pm »
...
guessing here:
In your Line 5 and 6: Try casting to WideString instead of "String"
...

Whatever you want, if you can explain for what purpose it is needed.

...
and the leading "\b" is not neccessary
widestring('(?:') + Widestring(Pattern) + Widestring('\b)')
should be enough

Are you sure?

Thausand

  • Sr. Member
  • ****
  • Posts: 445
Re: Help with search and repalce whole word without case sensitivity
« Reply #18 on: November 14, 2025, 02:15:29 am »
I have question.

I no more have question. Have find fix (I not know if fix ok but work) :D

Code: [Select]
============================================================

65 : []
Text    : This Is A String, and this is a number
Replace : this
With    : that
Pattern : \A(.*?)(this)
Result  : This Is A String, and that is a number

66 : [srfReplaceAll]
Text    : This Is A String, and this is a number
Replace : this
With    : that
Pattern : (this)
Result  : This Is A String, and that is a number

67 : [srfWholeWords]
Text    : This Is A String, and this is a number
Replace : this
With    : that
Pattern : \A(.*?)\b(this)\b
Result  : This Is A String, and that is a number

68 : [srfWholeWords,srfReplaceAll]
Text    : This Is A String, and this is a number
Replace : this
With    : that
Pattern : \b(this)\b
Result  : This Is A String, and that is a number

69 : [srfIgnoreCase]
Text    : This Is A String, and this is a number
Replace : this
With    : that
Pattern : \A(.*?)(this)
Result  : that Is A String, and this is a number

70 : [srfIgnoreCase,srfReplaceAll]
Text    : This Is A String, and this is a number
Replace : this
With    : that
Pattern : (this)
Result  : that Is A String, and that is a number

71 : [srfWholeWords,srfIgnoreCase]
Text    : This Is A String, and this is a number
Replace : this
With    : that
Pattern : \A(.*?)\b(this)\b
Result  : that Is A String, and this is a number

72 : [srfIgnoreCase,srfWholeWords,srfReplaceAll]
Text    : This Is A String, and this is a number
Replace : this
With    : that
Pattern : \b(this)\b
Result  : that Is A String, and that is a number

============================================================

89 : []
Text    : Sometimes a partial match is not replaced
Replace : partia
With    : pArTiA
Pattern : \A(.*?)(partia)
Result  : Sometimes a pArTiAl match is not replaced

90 : [srfReplaceAll]
Text    : Sometimes a partial match is not replaced
Replace : partia
With    : pArTiA
Pattern : (partia)
Result  : Sometimes a pArTiAl match is not replaced

91 : [srfWholeWords]
Text    : Sometimes a partial match is not replaced
Replace : partia
With    : pArTiA
Pattern : \A(.*?)\b(partia)\b
Result  : Sometimes a partial match is not replaced

92 : [srfWholeWords,srfReplaceAll]
Text    : Sometimes a partial match is not replaced
Replace : partia
With    : pArTiA
Pattern : \b(partia)\b
Result  : Sometimes a partial match is not replaced

93 : [srfIgnoreCase]
Text    : Sometimes a partial match is not replaced
Replace : partia
With    : pArTiA
Pattern : \A(.*?)(partia)
Result  : Sometimes a pArTiAl match is not replaced

94 : [srfIgnoreCase,srfReplaceAll]
Text    : Sometimes a partial match is not replaced
Replace : partia
With    : pArTiA
Pattern : (partia)
Result  : Sometimes a pArTiAl match is not replaced

95 : [srfWholeWords,srfIgnoreCase]
Text    : Sometimes a partial match is not replaced
Replace : partia
With    : pArTiA
Pattern : \A(.*?)\b(partia)\b
Result  : Sometimes a partial match is not replaced

96 : [srfIgnoreCase,srfWholeWords,srfReplaceAll]
Text    : Sometimes a partial match is not replaced
Replace : partia
With    : pArTiA
Pattern : \b(partia)\b
Result  : Sometimes a partial match is not replaced

============================================================

97 : []
Text    : Мон ярсан суликадо, ды зыян эйстэнзэ а ули
Replace : зыян
With    : няыз
Pattern : \A(.*?)(зыян)
Result  : Мон ярсан суликадо, ды няыз эйстэнзэ а ули

98 : [srfReplaceAll]
Text    : Мон ярсан суликадо, ды зыян эйстэнзэ а ули
Replace : зыян
With    : няыз
Pattern : (зыян)
Result  : Мон ярсан суликадо, ды няыз эйстэнзэ а ули

99 : [srfWholeWords]
Text    : Мон ярсан суликадо, ды зыян эйстэнзэ а ули
Replace : зыян
With    : няыз
Pattern : \A(.*?)\b(зыян)\b
Result  : Мон ярсан суликадо, ды няыз эйстэнзэ а ули

100 : [srfWholeWords,srfReplaceAll]
Text    : Мон ярсан суликадо, ды зыян эйстэнзэ а ули
Replace : зыян
With    : няыз
Pattern : \b(зыян)\b
Result  : Мон ярсан суликадо, ды няыз эйстэнзэ а ули

101 : [srfIgnoreCase]
Text    : Мон ярсан суликадо, ды зыян эйстэнзэ а ули
Replace : зыян
With    : няыз
Pattern : \A(.*?)(зыян)
Result  : Мон ярсан суликадо, ды няыз эйстэнзэ а ули

102 : [srfIgnoreCase,srfReplaceAll]
Text    : Мон ярсан суликадо, ды зыян эйстэнзэ а ули
Replace : зыян
With    : няыз
Pattern : (зыян)
Result  : Мон ярсан суликадо, ды няыз эйстэнзэ а ули

103 : [srfWholeWords,srfIgnoreCase]
Text    : Мон ярсан суликадо, ды зыян эйстэнзэ а ули
Replace : зыян
With    : няыз
Pattern : \A(.*?)\b(зыян)\b
Result  : Мон ярсан суликадо, ды няыз эйстэнзэ а ули

104 : [srfIgnoreCase,srfWholeWords,srfReplaceAll]
Text    : Мон ярсан суликадо, ды зыян эйстэнзэ а ули
Replace : зыян
With    : няыз
Pattern : \b(зыян)\b
Result  : Мон ярсан суликадо, ды няыз эйстэнзэ а ули

May be question if can make better (more logic) ?
« Last Edit: November 14, 2025, 05:24:06 am by Thausand »

Zvoni

  • Hero Member
  • *****
  • Posts: 3187
Re: Help with search and repalce whole word without case sensitivity
« Reply #19 on: November 14, 2025, 06:18:58 am »

...
and the leading "\b" is not neccessary
widestring('(?:') + Widestring(Pattern) + Widestring('\b)')
should be enough

Are you sure?
Look at the regex fiddle i posted earlier *shrug*

EDIT:
Quote
But unfortunately, it seems that no miracle has happened, it only works correctly in the ASCII range.
OK, i looked closer at the regexpr.pas provided on andrey's page and the file shipped with fpc 3.2.2
There are differences (have only looked at the defines), so maybe you actually have to download the file from his webpage and use that one?

https://github.com/andgineer/TRegExpr/tree/29ec3367f8309ba2ecde7d68d5f14a514de94511/src

What i also noticed: There is only the one file on his github, compared to the 3/4 files shipped with fpc 3.2.2
« Last Edit: November 14, 2025, 08:01:30 am by Zvoni »
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

avk

  • Hero Member
  • *****
  • Posts: 825
Re: Help with search and repalce whole word without case sensitivity
« Reply #20 on: November 14, 2025, 08:29:50 am »
...
May be question if can make better (more logic) ?

Try to avoid using regular expressions to search/replace such simple patterns? :)
At least it should get faster.

zxandris

  • Full Member
  • ***
  • Posts: 170
Re: Help with search and repalce whole word without case sensitivity
« Reply #21 on: November 14, 2025, 10:57:45 am »
Thanks for all the helpful replies peeps.  I'll be trying out the various replies here and I'm chuffed this has generated such interest in replying to my question.  I'll be digging into regex a fair bit I think.  Still confuses me I will not lie, but it seems like it will help.  That said I did find the below that does what I wanted too, but I'm fascinated by regex and what it can do so will likely change what I use.

Code: Pascal  [Select][+][-]
  1.  
  2. type
  3.   TSarSearchOption = (soIgnoreCase, soWholeWord, soUnderscoreIsWord, soReplaceAll);
  4.   TSarSearchOptions = set of TSarSearchOption;
  5.  
  6. // Simple ASCII word character test (ANSI). Treats letters and digits as word chars.
  7. // If soUnderscoreIsWord is in Options then '_' counts as a word character.
  8. function SarIsWordCharAnsi(ch: Char; Options: TSarSearchOptions): Boolean;
  9. begin
  10.   Result := (ch in ['0'..'9', 'A'..'Z', 'a'..'z']) or (soUnderscoreIsWord in Options) and (ch = '_');
  11. end;
  12.  
  13. // Find next whole-word match (1-based index) or 0 if not found.
  14. // StartPos is 1-based. Options controls case-sensitivity and whole-word behavior.
  15. function FindWholeWordANSI(const Source, OldWord: string; StartPos: Integer; Options: TSarSearchOptions): Integer;
  16. var
  17.   Hay, Needle: string;
  18.   LHay, LNeedle, i, relPos: Integer;
  19.   beforeOK, afterOK: Boolean;
  20. begin
  21.   Result := 0;
  22.   if (OldWord = '') or (Source = '') then Exit;
  23.   if StartPos < 1 then StartPos := 1;
  24.  
  25.   // prepare case folded hay/needle if ignoring case
  26.   if soIgnoreCase in Options then
  27.   begin
  28.     Hay := AnsiUpperCase(Source);
  29.     Needle := AnsiUpperCase(OldWord);
  30.   end
  31.   else
  32.   begin
  33.     Hay := Source;
  34.     Needle := OldWord;
  35.   end;
  36.  
  37.   LHay := Length(Hay);
  38.   LNeedle := Length(Needle);
  39.   if LNeedle = 0 then Exit;
  40.   if StartPos > LHay - LNeedle + 1 then Exit;
  41.  
  42.   // search loop using Pos on substring of Hay (keeps things ANSI-safe)
  43.   relPos := StartPos;
  44.   while relPos <= LHay - LNeedle + 1 do
  45.   begin
  46.     i := Pos(Copy(Needle, 1, LNeedle), Copy(Hay, relPos, LHay - relPos + 1));
  47.     if i = 0 then Exit;           // no more matches
  48.     i := i + relPos - 1;          // absolute position in Hay/Source
  49.  
  50.     // if whole-word checking requested, validate boundaries
  51.     if soWholeWord in Options then
  52.     begin
  53.       if i = 1 then
  54.         beforeOK := True
  55.       else
  56.         beforeOK := not SarIsWordCharAnsi(Source[i - 1], Options);
  57.  
  58.       if i + LNeedle - 1 >= LHay then
  59.         afterOK := True
  60.       else
  61.         afterOK := not SarIsWordCharAnsi(Source[i + LNeedle], Options);
  62.     end
  63.     else
  64.     begin
  65.       beforeOK := True;
  66.       afterOK := True;
  67.     end;
  68.  
  69.     if beforeOK and afterOK then
  70.     begin
  71.       Result := i;
  72.       Exit;
  73.     end
  74.     else
  75.       relPos := i + 1; // continue searching after this candidate
  76.   end;
  77. end;
  78.  
  79. // Replace whole-word occurrences. Returns number of replacements performed.
  80. // StartPos is 1-based; if soReplaceAll is in Options all matches after StartPos are replaced,
  81. // otherwise only the first matching occurrence is replaced.
  82. function sarReplaceWholeWordANSI(var Source: string; const OldWord, NewWord: string; StartPos: Integer; Options: TSarSearchOptions): Integer;
  83. var
  84.   posFound, LOld, LNew, nextStart: Integer;
  85.   hayUpper, needleUpper: string;
  86. begin
  87.   Result := 0;
  88.   if (OldWord = '') or (Source = '') then Exit;
  89.   if StartPos < 1 then StartPos := 1;
  90.  
  91.   // We'll use FindWholeWordANSI to locate matches; FindWholeWordANSI handles case folding.
  92.   LOld := Length(OldWord);
  93.   LNew := Length(NewWord);
  94.  
  95.   posFound := FindWholeWordANSI(Source, OldWord, StartPos, Options);
  96.   while posFound > 0 do
  97.   begin
  98.     // Replace exactly at posFound (ANSI byte/char indexing)
  99.     Delete(Source, posFound, LOld);
  100.     Insert(NewWord, Source, posFound);
  101.     Inc(Result);
  102.  
  103.     // If not replacing all, return after first replacement
  104.     if not (soReplaceAll in Options) then Exit;
  105.  
  106.     // Move search start just after the inserted replacement to avoid re-matching inside it
  107.     nextStart := posFound + LNew;
  108.     // Continue searching from nextStart
  109.     posFound := FindWholeWordANSI(Source, OldWord, nextStart, Options);
  110.   end;
  111.  
  112. end;
  113.  

Now as you can see it specifically works with ansi, but I would actually prefer to work with utf eventually.

Thausand

  • Sr. Member
  • ****
  • Posts: 445
Re: Help with search and repalce whole word without case sensitivity
« Reply #22 on: November 14, 2025, 12:41:33 pm »
Try to avoid using regular expressions to search/replace such simple patterns? :)
Yes, have agree. Not my suggest  ;D

I have try for see if get RE work for search/replace.

You have report TRegExpr no work for unicode then I want check. My test work when text in source or text have use stringlist for load file. Then no can make sure is unicode (or have look debug). Then I have try use stringstream and have load utf16, then stringstream is like dumb politic and say encoding is UTF8.

Any one have know how can make force stringstream have file utf16 (have bom) load data then store memory encoding UTF16 ? (I try several time and have fail and I have require unicodestring in memory from file for test) I some time many dumb and was reason for use stringstream. can have unicodedatastring :-[
« Last Edit: November 14, 2025, 02:06:58 pm by Thausand »

Sieben

  • Sr. Member
  • ****
  • Posts: 383
Re: Help with search and repalce whole word without case sensitivity
« Reply #23 on: November 14, 2025, 01:53:07 pm »
Why not use WordCount and ExtractWord from StrUtils to build such a function?
Lazarus 2.2.0, FPC 3.2.2, .deb install on Ubuntu Xenial 32 / Gtk2 / Unity7

Thausand

  • Sr. Member
  • ****
  • Posts: 445
Re: Help with search and repalce whole word without case sensitivity
« Reply #24 on: November 14, 2025, 01:56:58 pm »
Why not use WordCount and ExtractWord from StrUtils to build such a function?
no support unicode ?

Sieben

  • Sr. Member
  • ****
  • Posts: 383
Re: Help with search and repalce whole word without case sensitivity
« Reply #25 on: November 14, 2025, 03:24:16 pm »
Works for me:

Code: Pascal  [Select][+][-]
  1. procedure TfrmMain.Button2Click(Sender: TObject);
  2. var InStr,sWrd: string;
  3.   i,iWrds: integer;
  4. begin
  5.   InStr := 'Über Außen Näxt';
  6.   iWrds := WordCount(InStr,[' ']);
  7.   for i:=1 to iWrds do
  8.   begin
  9.     sWrd := ExtractWord(i,InStr,[' ']);
  10.     ShowMessage(sWrd);
  11.   end;
  12. end;
  13.  
Lazarus 2.2.0, FPC 3.2.2, .deb install on Ubuntu Xenial 32 / Gtk2 / Unity7

avk

  • Hero Member
  • *****
  • Posts: 825
Re: Help with search and repalce whole word without case sensitivity
« Reply #26 on: November 14, 2025, 04:52:04 pm »
...
OK, i looked closer at the regexpr.pas provided on andrey's page and the file shipped with fpc 3.2.2
There are differences (have only looked at the defines), so maybe you actually have to download the file from his webpage and use that one?
...

Wait, why would I download it?
It was your suggestion about RE, I just expressed doubt that the corresponding FPC package is suitable for this.
But to be honest, this functionality seems to be quite useful, I even have my own implementation for UTF-8 strings.

Thausand

  • Sr. Member
  • ****
  • Posts: 445
Re: Help with search and repalce whole word without case sensitivity
« Reply #27 on: November 14, 2025, 08:37:03 pm »
I just expressed doubt that the corresponding FPC package is suitable for this.
I have make verify and now more positive is work ok (I think, I no sure what you have mean when write "suitable" because is differ mean for all person). Before I not 100% positive because have use stringlist and think stringlist is make convert magic no visible. I have make verify and have make use stream then storage memory is same file-encoding. For work RE have need 'normal' unicode (no BE) and mean List item is convert unicode when need. Package RE fpc 3.2.2 is work and package repository link is work.

Output I have post before is result. I have try for weekend for clean example. If some one have example text and have search/replace example then please have share for test. I have problem find unicode encode document and have many different unicode character. ai I try is break for 2 or 3 paragraph and have need aggressive for make better and longer result  :o

avk

  • Hero Member
  • *****
  • Posts: 825
Re: Help with search and repalce whole word without case sensitivity
« Reply #28 on: November 15, 2025, 07:09:59 am »
Just retested ReplaceAllWholeWordsCI(), and it seems the problem was on my end: compile without cleanup.
Sorry for the noise.

BTW the pattern may contain characters that have a special value in RE, so the function should probably be changed
Code: Pascal  [Select][+][-]
  1. function ReplaceAllWholeWordsCI(const Source, Pattern, Replacement: string): string;
  2. var
  3.   Expr: RegExprString;
  4. begin
  5.   Expr :=
  6.     RegExprString('(?:\b') + QuoteRegExprMetaChars(RegExprString(Pattern)) + RegExprString('\b)');
  7.   Result := string(ReplaceRegExpr(Expr, RegExprString(Source), RegExprString(Replacement), [rroModifierI]));
  8. end;
  9.  
At the same time, this could prevent malicious use.
« Last Edit: November 15, 2025, 11:04:47 am by avk »

Thausand

  • Sr. Member
  • ****
  • Posts: 445
Re: Help with search and repalce whole word without case sensitivity
« Reply #29 on: November 16, 2025, 10:55:26 pm »
Thank for let know avk 👍

I have make small test attach for search replace file. It is use same implement avk and zvoni. I no expert expression so have play for RE.

 

TinyPortal © 2005-2018