Recent

Author Topic: Compare two text lines and highlight difference  (Read 20713 times)

DomingoGP

  • Full Member
  • ***
  • Posts: 113
Re: Compare two text lines and highlight difference
« Reply #30 on: February 09, 2023, 09:59:42 pm »
Quote
The result: last character missing from the compare.

@totya

I have found the problem with the last char in TDiff in the   
 function Execute(const s1, s2: string): boolean;

Code: Pascal  [Select][+][-]
  1.  
  2.    //finally, append any trailing matches onto compareList ...
  3.     with FLastCompareRec do
  4.     begin
  5.       AddChangeChr(oldIndex1,len1{len1Minus1}-oldIndex1, ckNone);   //<DomingoGP SOLVES BUG: strings index are 1 based not 0 based.
  6.     end;


Also I have modified the unit so the desired string type can be easily changed and I have set as unicodestring by default for FPC.

if you want you can try the demo provided, with modified TDiff2.pas. It works reasonably well for me. Anyway, as Martin said, unicode characters can consist on several widechars or codepoints, in which case it won't work as expected. As well as this it works with the most common characters like accented vowels áéíóú ñ ç.

Unicode is too complicated to cover all possible cases  :-[.

The https://github.com/DomingoGP/lazIdeDiffCompareFiles component don't compare strings, so it is not affected by this bug.

Sorry for my bad english.


totya

  • Hero Member
  • *****
  • Posts: 722
Re: Compare two text lines and highlight difference
« Reply #31 on: February 09, 2023, 10:45:11 pm »
Quote
The result: last character missing from the compare.
I have found the problem with the last char in TDiff in the

Seems to me your patch is working, thank you!

I use this https://github.com/rickard67/TextDiff version, and with your suggestion my last checked bug is disappear. :)

But I will look your Diff2.pas version too.

Thanks again!!!  O:-)

DomingoGP

  • Full Member
  • ***
  • Posts: 113
Re: Compare two text lines and highlight difference
« Reply #32 on: February 10, 2023, 06:40:28 pm »
You are welcome1.

Quote
But I will look your Diff2.pas version too.

You don't need to do, basically is the same as  https://github.com/rickard67/TextDiff with minor changes.

totya

  • Hero Member
  • *****
  • Posts: 722
Re: Compare two text lines and highlight difference
« Reply #33 on: February 10, 2023, 06:45:57 pm »
You are welcome1.

Quote
But I will look your Diff2.pas version too.

You don't need to do, basically is the same as  https://github.com/rickard67/TextDiff with minor changes.

I saw that, indeed, and thank you agan.

The simplest way to use these Delphi codes, I think Thaddy idea is the best (simplest) without modified variable type etc needed:

Because you use the wrong mode: you should have used {$mode delphiunicode}

Phoenix

  • Full Member
  • ***
  • Posts: 146
Re: Compare two text lines and highlight difference
« Reply #34 on: September 02, 2023, 09:02:29 pm »
I noticed that the source has been improved
https://github.com/rickard67/TextDiff
it seems to work fine.

Note: the method header must be corrected to compile it
Diff.pas
Code: Pascal  [Select][+][-]
  1. ..
  2. {$IFDEF FPC}
  3. function Execute(const alist1, alist2: TIntegerList; const aDiffAlgorithm: TDiffAlgorithm): boolean; overload;
  4. {$ELSE}
  5. ..
  6.  

avk

  • Hero Member
  • *****
  • Posts: 825
Re: Compare two text lines and highlight difference
« Reply #35 on: July 11, 2025, 05:33:49 pm »
I noticed that the source has been improved
https://github.com/rickard67/TextDiff
it seems to work fine.
...

I don't know how it's done in TextDiff, but ...

Tried to look into the TextDiff sources, better late than never.
Some observations:
  • compilation with FPC-3.2.2 requires some changes in the source code, despite the claimed compatibility with Lazarus;
  • char-by-char string comparison ignores the existence of surrogates;
  • the practical significance of comparing UTF-16 strings in the context of LCL is small in itself;
  • the correctness of the algorithms used raises doubts from the point of view of the optimality of the size of the resulting patch,
      for example, for the strings "01234567890"(source) and "9 0 1 2 3 4 5 6 7 8 9"(target) TNdDiff returns the following verdict:
    • matches:  5
    • adds:     10
    • deletes:  0
    • modifies: 6
      although even with the naked eye it is clear that there should be 10 matches;

      in turn, TNpDiff for the strings "012345678900"(source) and "9 9 0 1 2 3 4 5 6 7 8 9"(target) produces
    • matches:  0
    • adds:     11
    • deletes:  0
    • modifies: 12
      which looks even worse;
In short, the overall impression is rather negative.

BTW, the mention of O(NP) looks like false advertising, the TNpDiff implementation does not seem to exhibit such asymptotics in any way.
« Last Edit: July 12, 2025, 09:55:43 am by avk »

Thaddy

  • Hero Member
  • *****
  • Posts: 18764
  • To Europe: simply sell USA bonds: dollar collapses
Re: Compare two text lines and highlight difference
« Reply #36 on: July 11, 2025, 05:37:39 pm »
What is wrong with Boyer Moore?
Inventing wheels again?
Has been there for years....No: IONS.
And the implementation is correct.
 >:D >:D >:D >:D

Sometimes the answers are more silly than the question. >:(

It is a legitimate question.
And the answer is that it is covered by the rtl.
Also, there is enough code available to translate the difference in a visual way.
« Last Edit: July 11, 2025, 06:04:04 pm by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

avk

  • Hero Member
  • *****
  • Posts: 825
Re: Compare two text lines and highlight difference
« Reply #37 on: July 11, 2025, 06:28:01 pm »
What is wrong with Boyer Moore?
...

I hope they are okay, let's wish them long life.
But why did you remember them now?

440bx

  • Hero Member
  • *****
  • Posts: 6127
Re: Compare two text lines and highlight difference
« Reply #38 on: July 11, 2025, 06:53:43 pm »
@avk,

Good to see you in the forums again.  I had not seen you around here in a while.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

avk

  • Hero Member
  • *****
  • Posts: 825
Re: Compare two text lines and highlight difference
« Reply #39 on: July 11, 2025, 08:28:04 pm »
Thank you, nice to see you again too.

dbannon

  • Hero Member
  • *****
  • Posts: 3723
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Compare two text lines and highlight difference
« Reply #40 on: July 12, 2025, 02:42:30 pm »
What is wrong with Boyer Moore?
Apparently quite a lot. Says Wikipedia :
The original paper contained static tables for computing the pattern shifts without an explanation of how to produce them. The algorithm for producing the tables was published in a follow-on paper; this paper contained errors which were later corrected by Wojciech Rytter

But it s a search algorithm, not a compare one I suspect.

If there is an implementation in the RTL, it could be useful if we could find it. Sadly, so much good code, so little documentation !

EDIT : for the record, my interest in a good text compare algorithm is genuine, I use a poor one : https://github.com/tomboy-notes/tomboy-ng/blob/master/source/tb_sdiff.pas

Davo
« Last Edit: July 12, 2025, 02:49:58 pm by dbannon »
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

Thaddy

  • Hero Member
  • *****
  • Posts: 18764
  • To Europe: simply sell USA bonds: dollar collapses
Re: Compare two text lines and highlight difference
« Reply #41 on: July 12, 2025, 02:50:07 pm »
Here's a Myers diff unit:
Code: Pascal  [Select][+][-]
  1. unit uMyersDiff;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils;
  9.  
  10. type
  11.   TStringArray = array of string;
  12.   TDiffOpKind = (dEqual, dAdd, dDelete);
  13.  
  14.   PDiffOp = ^TDiffOp;
  15.   TDiffOp = record
  16.     Kind: TDiffOpKind;
  17.     OldIndex: Integer;  // Index in the old (original) sequence
  18.     NewIndex: Integer;  // Index in the new (modified) sequence
  19.     Length: Integer;    // Number of items in this operation
  20.   end;
  21.  
  22.   TDiffOps = array of TDiffOp;
  23.  
  24. function ComputeMyersDiff(const A, B: TStringArray): TDiffOps;
  25. function FormatDiffOutput(const DiffOps: TDiffOps; const A, B: TStringArray): string;
  26.  
  27. implementation
  28.  
  29. function ComputeMyersDiff(const A, B: TStringArray): TDiffOps;
  30. type
  31.   TVectorArray = array of Integer;
  32.  
  33.   function Max(const a, b: Integer): Integer;
  34.   begin
  35.     if a > b then Result := a else Result := b;
  36.   end;
  37.  
  38.   function ShortestEditScript(const A, B: TStringArray): TDiffOps;
  39.   var
  40.     N, M, MaxD, CurrentD, K, X, Y, PrevX, PrevY, PrevK: Integer;
  41.     V: array of Integer;
  42.     Trace: array of TVectorArray;
  43.     SnakeLength: Integer;
  44.     TempOp: TDiffOp;
  45.     i, j: Integer;
  46.   begin
  47.     N := Length(A);
  48.     M := Length(B);
  49.     MaxD := N + M;
  50.    
  51.     SetLength(V, 2 * MaxD + 1);
  52.     SetLength(Trace, MaxD + 1);
  53.    
  54.     CurrentD := 0;
  55.     while CurrentD <= MaxD do
  56.     begin
  57.       SetLength(Trace[CurrentD], 2 * MaxD + 1);
  58.       K := -CurrentD;
  59.       while K <= CurrentD do
  60.       begin
  61.         if (K = -CurrentD) or ((K <> CurrentD) and (V[K - 1 + MaxD] < V[K + 1 + MaxD])) then
  62.         begin
  63.           X := V[K + 1 + MaxD];
  64.           PrevK := K + 1;
  65.         end
  66.         else
  67.         begin
  68.           X := V[K - 1 + MaxD] + 1;
  69.           PrevK := K - 1;
  70.         end;
  71.        
  72.         Y := X - K;
  73.         PrevX := V[PrevK + MaxD];
  74.         PrevY := PrevX - PrevK;
  75.        
  76.         // Follow snake
  77.         while (X < N) and (Y < M) and (A[X] = B[Y]) do
  78.         begin
  79.           Inc(X);
  80.           Inc(Y);
  81.         end;
  82.        
  83.         V[K + MaxD] := X;
  84.         Trace[CurrentD][K + MaxD] := PrevK + MaxD;
  85.        
  86.         if (X >= N) and (Y >= M) then
  87.         begin
  88.           // Reconstruct the path
  89.           SetLength(Result, 0);
  90.           CurrentD := CurrentD;
  91.           while CurrentD >= 0 do
  92.           begin
  93.             PrevK := Trace[CurrentD][K + MaxD] - MaxD;
  94.             PrevX := V[PrevK + MaxD];
  95.             PrevY := PrevX - PrevK;
  96.            
  97.             SnakeLength := V[K + MaxD] - Max(PrevX, PrevY);
  98.             if SnakeLength > 0 then
  99.             begin
  100.               SetLength(Result, Length(Result) + 1);
  101.               with Result[High(Result)] do
  102.               begin
  103.                 Kind := dEqual;
  104.                 OldIndex := PrevX;
  105.                 NewIndex := PrevY;
  106.                 Length := SnakeLength;
  107.               end;
  108.             end;
  109.            
  110.             if CurrentD > 0 then
  111.             begin
  112.               SetLength(Result, Length(Result) + 1);
  113.               with Result[High(Result)] do
  114.               begin
  115.                 if K > PrevK then
  116.                 begin
  117.                   Kind := dAdd;
  118.                   OldIndex := PrevX;
  119.                   NewIndex := PrevY;
  120.                   Length := 1;
  121.                 end
  122.                 else
  123.                 begin
  124.                   Kind := dDelete;
  125.                   OldIndex := PrevX;
  126.                   NewIndex := PrevY;
  127.                   Length := 1;
  128.                 end;
  129.               end;
  130.             end;
  131.            
  132.             K := PrevK;
  133.             Dec(CurrentD);
  134.           end;
  135.          
  136.           // Reverse the result
  137.           for i := 0 to (Length(Result) div 2) - 1 do
  138.           begin
  139.             j := High(Result) - i;
  140.             TempOp := Result[i];
  141.             Result[i] := Result[j];
  142.             Result[j] := TempOp;
  143.           end;
  144.          
  145.           Exit;
  146.         end;
  147.         Inc(K, 2);
  148.       end;
  149.       Inc(CurrentD);
  150.     end;
  151.    
  152.     SetLength(Result, 0);
  153.   end;
  154.  
  155. begin
  156.   Result := ShortestEditScript(A, B);
  157. end;
  158.  
  159. function FormatDiffOutput(const DiffOps: TDiffOps; const A, B: TStringArray): string;
  160. var
  161.   I, J: Integer;
  162.   Op: TDiffOp;
  163. begin
  164.   Result := '';
  165.   for I := 0 to High(DiffOps) do
  166.   begin
  167.     Op := DiffOps[I];
  168.     case Op.Kind of
  169.       dEqual:
  170.         for J := 0 to Op.Length - 1 do
  171.           Result := Result + Format('  %s' + LineEnding, [A[Op.OldIndex + J]]);
  172.       dAdd:
  173.         for J := 0 to Op.Length - 1 do
  174.           Result := Result + Format('+ %s' + LineEnding, [B[Op.NewIndex + J]]);
  175.       dDelete:
  176.         for J := 0 to Op.Length - 1 do
  177.           Result := Result + Format('- %s' + LineEnding, [A[Op.OldIndex + J]]);
  178.     end;
  179.   end;
  180. end;
  181.  
  182. end.
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

cdbc

  • Hero Member
  • *****
  • Posts: 2665
    • http://www.cdbc.dk
Re: Compare two text lines and highlight difference
« Reply #42 on: July 12, 2025, 04:19:13 pm »
Hi
@Davo: You should have a 'LookSee' in 'StrUtils', IIRC there's an implementation hidden in there  ;)

edit: I Do Recall Correctly:
Code: Pascal  [Select][+][-]
  1. (*
  2.   FindMatchesBoyerMooreCaseSensitive
  3.  
  4.   Finds one or many ocurrences of an ansistring in another ansistring.
  5.   It is case sensitive.
  6.  
  7.   * Parameters:
  8.   S: The PAnsiChar to be searched in. (Read only).
  9.   OldPattern: The PAnsiChar to be searched. (Read only).
  10.   SSize: The size of S in Chars. (Read only).
  11.   OldPatternSize: The size of OldPatter in chars. (Read only).
  12.   aMatches: SizeInt array where match indexes are returned (zero based) (write only).
  13.   aMatchAll: Finds all matches, not just the first one. (Read only).
  14.  
  15.   * Returns:
  16.     True if at least one occurence was found.
  17.  
  18.   The function is based in the Boyer-Moore algorithm.
  19. *)
  20.  
  21. function FindMatchesBoyerMooreCaseSensitive(const S, OldPattern: PAnsiChar;
  22.   const SSize, OldPatternSize: SizeInt; out aMatches: SizeIntArray;
  23.   const aMatchAll: Boolean) : Boolean;
  24. var
  25.   i,j: SizeInt;
  26.   bm: BoyerMoore;
  27. begin
  28.   aMatches:=nil;
  29.   if OldPatternSize=0 then
  30.     Exit(False);
  31.   bm.Init(aMatches);
  32.   bm.MakeDeltaJumpTables(OldPattern,OldPatternSize);
  33.  
  34.   i:=OldPatternSize-1;
  35.   while i < SSize do begin
  36.     j:=OldPatternSize-1;
  37.     while (j>=0) and (S[i] = OldPattern[j]) do begin
  38.       dec(i);
  39.       dec(j);
  40.     end;
  41.     if (j<0) then begin
  42.       bm.AddMatch(i+1);
  43.       //Only first match ?
  44.       if not aMatchAll then break;
  45.       inc(i,bm.DeltaJumpTable2[0]+1);
  46.     end else
  47.       i:=i + bm.Max(bm.DeltaJumpTable1[s[i]],bm.DeltaJumpTable2[j]);
  48.   end;
  49.   SetLength(aMatches,bm.MatchesCount);
  50.   Result:=bm.MatchesCount>0;
  51. end;
  52.  
  53. function FindMatchesBoyerMooreCaseInSensitive(const S, OldPattern: PAnsiChar; const SSize, OldPatternSize: SizeInt; out
  54.   aMatches: SizeIntArray; const aMatchAll: Boolean): Boolean;
  55. var
  56.   i,j: SizeInt;
  57.   lPattern: PAnsiChar; //Lowercased OldPattern
  58.   bm: BoyerMoore;
  59.   lPatternStore: ansistring;
  60. begin
  61.   aMatches:=nil;
  62.   if OldPatternSize=0 then
  63.     Exit(False);
  64.  
  65.   //Build an internal array of lowercase version of every possible AnsiChar.
  66.   if not bm.LCaseArrayPrepared then
  67.     bm.PrepareLCaseArray;
  68.   ReadBarrier; // Read LCaseArray contents only after LCaseArrayPrepared.
  69.  
  70.   //Create the new lowercased pattern. Or avoid and reuse OldPattern if nothing to lowercase!
  71.   lPattern:=OldPattern;
  72.   for i := 0 to OldPatternSize-1 do
  73.     if bm.LCaseArray[OldPattern[i]]<>OldPattern[i] then begin
  74.       SetLength(lPatternStore,OldPatternSize);
  75.       lPattern:=PAnsiChar(Pointer(lPatternStore));
  76.       Move(OldPattern^,lPattern^,i*sizeof(AnsiChar));
  77.       for j := i to OldPatternSize-1 do
  78.         lPattern[j]:=bm.LCaseArray[OldPattern[j]];
  79.       break;
  80.     end;
  81.  
  82.   bm.Init(aMatches);
  83.   bm.MakeDeltaJumpTables(lPattern,OldPatternSize);
  84.  
  85.   i:=OldPatternSize-1;
  86.   while i < SSize do begin
  87.     j:=OldPatternSize-1;
  88.     while (j>=0) and (bm.LCaseArray[S[i]] = lPattern[j]) do begin
  89.       dec(i);
  90.       dec(j);
  91.     end;
  92.     if (j<0) then begin
  93.       bm.AddMatch(i+1);
  94.       //Only first match ?
  95.       if not aMatchAll then break;
  96.       inc(i,bm.DeltaJumpTable2[0]+1);
  97.     end else
  98.       i:=i + bm.Max(bm.DeltaJumpTable1[bm.LCaseArray[s[i]]],bm.DeltaJumpTable2[j]);
  99.   end;
  100.   SetLength(aMatches,bm.MatchesCount);
  101.   Result:=bm.MatchesCount>0;
  102. end;
Regards Benny
« Last Edit: July 12, 2025, 04:23:21 pm by cdbc »
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE6/QT6 -> FPC Release -> Lazarus Release &  FPC Main -> Lazarus Main

Thaddy

  • Hero Member
  • *****
  • Posts: 18764
  • To Europe: simply sell USA bonds: dollar collapses
Re: Compare two text lines and highlight difference
« Reply #43 on: July 12, 2025, 06:10:56 pm »
Yes, there should be a LCS somewhere.
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

simone

  • Hero Member
  • *****
  • Posts: 688
Re: Compare two text lines and highlight difference
« Reply #44 on: July 13, 2025, 12:04:05 am »
compilation with FPC-3.2.2 requires some changes in the source code, despite the claimed compatibility with Lazarus;

In February, I submitted patches to the author, who accepted them in the following commit:

https://github.com/rickard67/TextDiff/commit/97f0bce070658c6206616bbe5041284c06008851

The changes I proposed fixed the compatibility issues with newer versions of Lazarus/Fpc.

Now, I've noticed that, unfortunately, new incompatibilities have been introduced in subsequent commits, which I'll try to fix as soon as I can.
Microsoft Windows 10/11 64 bit - Lazarus 3.8/4.0 FPC 3.2.2 x86_64-win64-win32/win64

 

TinyPortal © 2005-2018