Print Page - Compare two text lines and highlight difference

Programming => Packages and Libraries => SynEdit => Topic started by: totya on February 08, 2023, 01:21:00 pm

Title: Compare two text lines and highlight difference
Post by: totya on February 08, 2023, 01:21:00 pm

Hi!

Is there any simple solution for compare two text lines and highlight difference with synedit?

If no, where am I start?

Thanks!

Title: Re: Compare two text lines and highlight difference
Post by: KodeZwerg on February 08, 2023, 01:51:38 pm

It depend what you mean with difference.
Show exemplary what you mean.

Exemplary:
line1: AaBbCc
line2: AabBcC

Should now bBcC be highlighted in line2 or BbCc in line1 or is that not the thing you want at all?

Be more specific!

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 08, 2023, 02:01:23 pm

Quote from: KodeZwerg on February 08, 2023, 01:51:38 pm

It depend what you mean with difference.
Show exemplary what you mean.

Normal text lines, for example (sorry my English):

This is a nice day, the sky is blue and wind blowing sun is rising.
This is a nice day with many kills, the sky is red and wind blowing sun is somwhere, because all dark here.

Title: Re: Compare two text lines and highlight difference
Post by: Zvoni on February 08, 2023, 03:05:36 pm

Nested loops comparing words/tokens?

Title: Re: Compare two text lines and highlight difference
Post by: avk on February 08, 2023, 03:15:24 pm

Quote from: totya on February 08, 2023, 02:01:23 pm

...
This is a nice day, the sky is blue and wind blowing sun is rising.
This is a nice day with many kills, the sky is red and wind blowing sun is somwhere, because all dark here.

Maybe you need a Diff algorithm?

Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 08, 2023, 03:35:37 pm

SynEdit doesn't have that....

The closest you currently get is the "same word highlight", which applies to current selection too.
- if you start selecting one line from the start,
- select a few chars only
- wait about half a second, so the "same word highlight" will be activated
(this can be set under Tools > Options: Editor > Display > Markup and Matches > top section "Highlight all occurrences of word under caret"
- Now the start of the other line should be highlighted.
- Extend the selection, until the other line looses the highlight => you are at the first diff in the line.

It's not what you want/need. Not even close....

It is a different feature, but it can be used for simple comparison ....

If you want to extend SynEdit....
1) Well you need your own code to find the diffs between the lines.

Once you have a list of sections that you want to highlight ( record Line, StartX, EndX: integer end; ):
2) You can write your on SynEditMarkup.
a) If you don't use any Highlighter (no Pascal or other HL) then you can use the "SynPos...Highlighter).
b) Otherwise, look at markups. Something like SynMarkupHighlightAll should be easy to modify. It already has a list....

Title: Re: Compare two text lines and highlight difference
Post by: Awkward on February 08, 2023, 05:37:40 pm

... looks like this feature
https://github.com/rickard67/TextDiff (https://github.com/rickard67/TextDiff)

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 08, 2023, 07:52:57 pm

Quote from: Awkward on February 08, 2023, 05:37:40 pm

... looks like this feature
https://github.com/rickard67/TextDiff (https://github.com/rickard67/TextDiff)

Thanks for the tip, but the basicdemo2 completely freeze my machine, when I click "open". Possible Dwarf/debug error, I must press reset (data loss).

But basicdemo1 works, but paintbox isn't really good for me, but for starting point is good.

Thank you!

Edit 1.: UTF8 characters loss with Demo1, fore example: "öüó"

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 08, 2023, 08:00:34 pm

Quote from: Martin_fr on February 08, 2023, 03:35:37 pm

SynEdit doesn't have that....

I mean Synedit components (group name). Thank you for the many tips!

Title: Re: Compare two text lines and highlight difference
Post by: avk on February 08, 2023, 08:14:20 pm

I don't know how it's done in TextDiff, but in LGenerics it's very easy:

Code: Pascal [Select][+]

...
uses
  ..., lgSeqUtils;
 
...
procedure TForm1.Button1Click(Sender: TObject);
type
  TUtil = specialize TGSeqUtil<string, string>;
var
  s1, s2: string;
  a1, a2: TStringArray;
  LDiff: TUtil.TDiff;
begin
  s1 := 'This is a nice day, the sky is blue and wind blowing sun is rising.';
  s2 := 'This is a nice day with many kills, the sky is red and wind blowing sun is somewhere, because all dark here.';
  a1 := s1.Split([' ', '.', ','], TStringSplitOptions.ExcludeEmpty);
  a2 := s2.Split([' ', '.', ','], TStringSplitOptions.ExcludeEmpty);
  LDiff := TUtil.Diff(a1, a2);
  Memo1.Append('Deleted from s1(i.e. not present in s2):');
  for I := 0 to High(LDiff.SourceChanges) do
    if LDiff.SourceChanges[I] then
      Memo1.Append(a1[I]);
  Memo1.Append('');
  Memo1.Append('Inserted into s2(i.e. not present in s1):');
  for I := 0 to High(LDiff.TargetChanges) do
    if LDiff.TargetChanges[I] then
      Memo1.Append(a2[I]);  
end;
 

it prints:

Code: Text [Select][+]

Deleted from s1(i.e. not present in s2):
blue
rising
 
Inserted into s2(i.e. not present in s1):
with
many
kills
red
somewhere
because
all
dark
here
 

Title: Re: Compare two text lines and highlight difference
Post by: KodeZwerg on February 08, 2023, 08:25:28 pm

@avk, very cool solution, a suggestion would be, have results as a pair (string, integer) to know at what positions a change happened.
Full respects for that! :-*

Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 08, 2023, 08:33:45 pm

Btw, this may interest you: https://forum.lazarus.freepascal.org/index.php/topic,62146.msg470014.html#msg470014

It does not compare 2 lines of text. But it does compare 2 texts line by line.

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 08, 2023, 09:09:14 pm

Quote from: avk on February 08, 2023, 08:14:20 pm

I don't know how it's done in TextDiff, but in LGenerics it's very easy:

Thank you, I'd like to try, but where can I found this LGenerics/lgSeqUtils ?

Now I'm use the official Lazarus package with fpc 3.2.2

Title: Re: Compare two text lines and highlight difference
Post by: sketch on February 08, 2023, 09:54:47 pm

On Unix

Code: [Select]

$ cat ttt
#!/usr/bin/ksh

for ((i=1;i<3;i++))
do
  echo "Unmatched words from string ${i}"
  comm <(echo "This is a nice day, the sky is blue and wind blowing sun is rising." |tr ' ' '\n' | sed 's/\.//' |sed 's/,//' |sort) <(echo "This is a nice day with many kills, the sky is red and wind blowing sun is somewhere, because all dark here." | tr ' ' '\n' | sed 's/\.//' | sed 's/,//' |sort) | cut -f${i} |sed '/^$/d'
  echo
done

Code: [Select]

$ cat t.pp
program t;
uses Unix;
Var S : Longint;
begin
  S:=fpSystem('./ttt');
end.

Code: [Select]

$ ./t
Unmatched words from string 1
blue
rising

Unmatched words from string 2
all
because
blue
dark
here
kills
many
red
rising
somewhere
with

$

Title: Re: Compare two text lines and highlight difference
Post by: avk on February 09, 2023, 05:04:56 am

Quote from: totya on February 08, 2023, 09:09:14 pm

...
Thank you, I'd like to try, but where can I found this LGenerics/lgSeqUtils ?
...

It lives here (https://github.com/avk959/LGenerics).

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 08:50:54 am

Quote from: avk on February 09, 2023, 05:04:56 am

Quote from: totya on February 08, 2023, 09:09:14 pm
...
Thank you, I'd like to try, but where can I found this LGenerics/lgSeqUtils ?
...

It lives here (https://github.com/avk959/LGenerics).

Thank you!

I install trunk version, because I see readme, and I'd like to see the json implemetation.
var declaration "I" is missing from sample, but I put it within 5 secs.

Well, this is a simplified solution, because works only with whole words.
This is mean, if only 1 letter different from words, this go to SourceChanges, and the TargetChanges list, and I don't see what letter changed exactly.
But it not a big problem, my priority looking for a simple solution.

Next problem, if any word repeated, I don't know, what word changed, so I can't colored the changes. For example:

Code: Pascal [Select][+]

    s1 := 'sun day sun';
    s2 := 'sunx day sun';
 

result:

Deleted from s1(i.e. not present in s2):
sun

Inserted into s2(i.e. not present in s1):
sunx

So thank you for this library, and the sample, but it isn't usable solution for me.
As I see TextDiff (new version, see this topic: https://forum.lazarus.freepascal.org/index.php/topic,62219.msg470413.html#msg470413 (https://forum.lazarus.freepascal.org/index.php/topic,62219.msg470413.html#msg470413) much better options, but as I see UTF8 not supported.

Thank you again!

Title: Re: Compare two text lines and highlight difference
Post by: Roland57 on February 09, 2023, 09:08:24 am

@totya

Not sure that it will match your needs, but there is also this project: https://github.com/DomingoGP/lazIdeDiffCompareFiles

And, since we are on this topic, I would like to mention diffoscope (https://diffoscope.org) that I discovered recently.

Title: Re: Compare two text lines and highlight difference
Post by: avk on February 09, 2023, 09:45:51 am

@totya, the vector of boolean values SourceChanges corresponds to the elements of the source sequence and contains True in those positions, the elements of which are not included in the target sequence. That is, if SourceChanges[2] is True, it means that the source sequence element with index 2 is not in the target sequence.

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 09:51:19 am

Quote from: Roland57 on February 09, 2023, 09:08:24 am

@totya

Not sure that it will match your needs, but there is also this project: https://github.com/DomingoGP/lazIdeDiffCompareFiles

Thanks, this based on diff.pas (diff2.pas) but nothing important changed in this code (I need UTF8 supprt).

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 12:36:02 pm

Quote from: avk on February 09, 2023, 09:45:51 am

@totya, the vector of boolean values SourceChanges corresponds to the elements of the source sequence and contains True in those positions, the elements of which are not included in the target sequence. That is, if SourceChanges[2] is True, it means that the source sequence element with index 2 is not in the target sequence.

Thanks for this information!

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 12:56:20 pm

Quote from: Awkward on February 08, 2023, 05:37:40 pm

... looks like this feature
https://github.com/rickard67/TextDiff (https://github.com/rickard67/TextDiff)

This is the best solution, as I see.

I wrote this isn't UTF8 comatible, for example "őóú" ets chars are lost when compare.

But I thinking. Tdiff is a delphi unit, with {$mode delphi}. Delhpi uses 2 byte coded chars (UTF-16). But {$mode delphi} do not works perfectly, so if I modify, for example:

char->widechar
string->WideString

The compare is working.

Title: Re: Compare two text lines and highlight difference
Post by: Thaddy on February 09, 2023, 01:11:25 pm

Quote from: totya on February 09, 2023, 12:56:20 pm

Delhpi uses 2 byte coded chars (UTF-16).

Wrong! UTF16 has between 2 and 4 bytes.

Quote

But {$mode delphi} do not works perfectly, so if I modify, for example:

char->widechar
string->WideString

Because you use the wrong mode: you should have used {$mode delphiunicode}

Also note that LCS - what you need for a diff - is a bytewise comparision, not a character based comparison and the latest TDiff is known for that reason to work with UTF8 too..

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 02:27:05 pm

Quote from: Thaddy on February 09, 2023, 01:11:25 pm

Quote from: totya on February 09, 2023, 12:56:20 pm
Delhpi uses 2 byte coded chars (UTF-16).
Wrong! UTF16 has between 2 and 4 bytes.

I know already, (older) Delphi use UTF-16 and never useUTF-32.

Quote from: Thaddy on February 09, 2023, 01:11:25 pm

Quote from: totya on February 09, 2023, 12:56:20 pm
But {$mode delphi} do not works perfectly, so if I modify, for example:
char->widechar
string->WideString
Because you use the wrong mode: you should have used {$mode delphiunicode}

This is not my fault, this isn't may package. But thanks for the info!

Let me see...
I swap $mode delphi to $mode delphiunicode in two places (tdiff and unit1).
Well, seems to me works badly!
input1: Change the text here & then compareöü
input2: Change the text here & then compareőü
The result: last character missing from the compare.
But it isn't the $mode delphiunicode fault, because this bad result same with the my modified code too.
But anyway, thanks for the {$mode delphiunicode} info. (I think compiler warning missing: $mode delphi -> warning, deprecated!)

Quote from: Thaddy on February 09, 2023, 01:11:25 pm

Also note that LCS - what you need for a diff - is a bytewise comparision, not a character based comparison and the latest TDiff is known for that reason to work with UTF8 too..

UTF-16 and UTF-32 is fixed size code, doesn't matter the compare is bytewise or character base I think.

---> Okay, thanks for the info, but where can I find the latest TDiff? <---

Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 09, 2023, 02:40:31 pm

Quote from: totya on February 09, 2023, 02:27:05 pm

UTF-16 and UTF-32 is fixed size code,

No UTF-16 is not fixed size.

UTF-16 has a CodeUnit size of 2 byte. (UTF-8 has 1 byte, and UTF-32 has 4).

In UTF-16: A Unicode codepoint can be represented by 1 or 2 CodeUnits (2 or 4 bytes).

A "character" can be either a single codepoint, or a combination of several codepoints. (That applies to Unicode itself, so that is the case for UTF-8, UTF-16 and UTF-32 and any other transfer encoding)

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 03:14:00 pm

Quote from: Martin_fr on February 09, 2023, 02:40:31 pm

Quote from: totya on February 09, 2023, 02:27:05 pm
UTF-16 and UTF-32 is fixed size code,

No UTF-16 is not fixed size.
UTF-16 has a CodeUnit size of 2 byte. (UTF-8 has 1 byte, and UTF-32 has 4).
In UTF-16: A Unicode codepoint can be represented by 1 or 2 CodeUnits (2 or 4 bytes).
A "character" can be either a single codepoint, or a combination of several codepoints. (That applies to Unicode itself, so that is the case for UTF-8, UTF-16 and UTF-32 and any other transfer encoding)

I know otherwise, for example I hate UTF8 because this NOT only 1 byte length, this is variable length (1-4 byte) so very complicated to handle it, but I see many function available (ex.: LazUTF8: UT8Pos, UT8copy etc).

Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 09, 2023, 03:27:13 pm

Quote from: totya on February 09, 2023, 03:14:00 pm

Quote from: Martin_fr on February 09, 2023, 02:40:31 pm
Quote from: totya on February 09, 2023, 02:27:05 pm
UTF-16 and UTF-32 is fixed size code,

No UTF-16 is not fixed size.
UTF-16 has a CodeUnit size of 2 byte. (UTF-8 has 1 byte, and UTF-32 has 4).
In UTF-16: A Unicode codepoint can be represented by 1 or 2 CodeUnits (2 or 4 bytes).
A "character" can be either a single codepoint, or a combination of several codepoints. (That applies to Unicode itself, so that is the case for UTF-8, UTF-16 and UTF-32 and any other transfer encoding)

I know otherwise,

Then you know wrong.

Quote

for example I hate UTF8 because this NOT only 1 byte length, this is variable length (1-4 byte) so very complicated to handle it, but I see many function available (ex.: LazUTF8: UT8Pos, UT8copy etc).

And in UTF-16 (unlike UCS-2) you got 2 or 4 bytes.

UTF-16 has surrogates. And they are 4 bytes.

For example the following emoticons use 4 bytes in UTF-16 https://www.compart.com/en/unicode/block/U+1F600
Click then, see the UTF-16 encoding.

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 03:35:16 pm

Quote from: Martin_fr on February 09, 2023, 03:27:13 pm

And in UTF-16 (unlike UCS-2) you got 2 or 4 bytes.
UTF-16 has surrogates. And they are 4 bytes.
For example the following emoticons use 4 bytes in UTF-16 https://www.compart.com/en/unicode/block/U+1F600
Click then, see the UTF-16 encoding.

Thanks, because you know wrong already, and u see here, UTF-8 is not only 1 byte... in this example it's 4 byte length.

Okay, UTF-16 is 2-4 byte length. Peace. :)

Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 09, 2023, 03:35:38 pm

And in addition to my last post (UTF-16 having 2 or 4 bytes) there is more.

This n-bytes size refers to how a single "Codepoint" is encoded in UTF-n.

However, in Unicode itself (even affecting UTF-32) a character can have more than 1 codepoint.

Example: "ä"
https://www.compart.com/unicode/U+00E4
Can be represented either as U+00E4 (1 codepoint)
Or as as U+0061 followed by U+0308 (2 codepoints)

But both are the same letter. (there are letters that can be represented in more than 2 forms).

There are also letters that have no composed (1 single codepoint) form, but are always several codepoints.

Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 09, 2023, 03:40:22 pm

Quote from: totya on February 09, 2023, 03:35:16 pm

Thanks, because you know wrong already, and u see here, UTF-8 is not only 1 byte... in this example it's 4 byte length.

You must have misread me.
I assume you refer to:

Quote from: Martin_fr on February 09, 2023, 02:40:31 pm

UTF-16 has a CodeUnit size of 2 byte. (UTF-8 has 1 byte, and UTF-32 has 4).

I did say: In UTF-8 a "CodeUnit" is 1 byte.

And that is true. A Codepoint is then represented by 1 to 4 CodeUnits.

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 03:51:47 pm

Quote from: Martin_fr on February 09, 2023, 03:40:22 pm

You must have misread me.

Codepoint and Codeunits are different okay.

But back to the topics... Thaddy said the new Tdiff UTF-8 capable (and I hope work better - in the last example works badly) U have any idea where can I find it?

Title: Re: Compare two text lines and highlight difference
Post by: DomingoGP on February 09, 2023, 09:59:42 pm

Quote

The result: last character missing from the compare.

@totya

I have found the problem with the last char in TDiff in the
function Execute(const s1, s2: string): boolean;

Code: Pascal [Select][+]

 
   //finally, append any trailing matches onto compareList ...
    with FLastCompareRec do
    begin
      AddChangeChr(oldIndex1,len1{len1Minus1}-oldIndex1, ckNone);   //<DomingoGP SOLVES BUG: strings index are 1 based not 0 based.
    end;

Also I have modified the unit so the desired string type can be easily changed and I have set as unicodestring by default for FPC.

if you want you can try the demo provided, with modified TDiff2.pas. It works reasonably well for me. Anyway, as Martin said, unicode characters can consist on several widechars or codepoints, in which case it won't work as expected. As well as this it works with the most common characters like accented vowels áéíóú ñ ç.

Unicode is too complicated to cover all possible cases :-[.

The https://github.com/DomingoGP/lazIdeDiffCompareFiles component don't compare strings, so it is not affected by this bug.

Sorry for my bad english.

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 10:45:11 pm

Quote from: DomingoGP on February 09, 2023, 09:59:42 pm

Quote
The result: last character missing from the compare.
I have found the problem with the last char in TDiff in the

Seems to me your patch is working, thank you!

I use this https://github.com/rickard67/TextDiff (https://github.com/rickard67/TextDiff) version, and with your suggestion my last checked bug is disappear. :)

But I will look your Diff2.pas version too.

Thanks again!!! O:-)

Title: Re: Compare two text lines and highlight difference
Post by: DomingoGP on February 10, 2023, 06:40:28 pm

You are welcome1.

Quote

But I will look your Diff2.pas version too.

You don't need to do, basically is the same as https://github.com/rickard67/TextDiff with minor changes.

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 10, 2023, 06:45:57 pm

Quote from: DomingoGP on February 10, 2023, 06:40:28 pm

You are welcome1.

Quote
But I will look your Diff2.pas version too.

You don't need to do, basically is the same as https://github.com/rickard67/TextDiff with minor changes.

I saw that, indeed, and thank you agan.

The simplest way to use these Delphi codes, I think Thaddy idea is the best (simplest) without modified variable type etc needed:

Quote from: Thaddy on February 09, 2023, 01:11:25 pm

Because you use the wrong mode: you should have used {$mode delphiunicode}

Title: Re: Compare two text lines and highlight difference
Post by: Phoenix on September 02, 2023, 09:02:29 pm

I noticed that the source has been improved
https://github.com/rickard67/TextDiff (https://github.com/rickard67/TextDiff)
it seems to work fine.

Note: the method header must be corrected to compile it
Diff.pas

Code: Pascal [Select][+]

..
{$IFDEF FPC}
function Execute(const alist1, alist2: TIntegerList; const aDiffAlgorithm: TDiffAlgorithm): boolean; overload;
{$ELSE}
..