Lazarus

Programming => Packages and Libraries => SynEdit => Topic started by: totya on February 08, 2023, 01:21:00 pm

Title: Compare two text lines and highlight difference
Post by: totya on February 08, 2023, 01:21:00 pm
Hi!

Is there any simple solution for compare two text lines and highlight difference with synedit?

If no, where am I start?

Thanks!
Title: Re: Compare two text lines and highlight difference
Post by: KodeZwerg on February 08, 2023, 01:51:38 pm
It depend what you mean with difference.
Show exemplary what you mean.

Exemplary:
line1: AaBbCc
line2: AabBcC

Should now bBcC be highlighted in line2 or BbCc in line1 or is that not the thing you want at all?

Be more specific!
Title: Re: Compare two text lines and highlight difference
Post by: totya on February 08, 2023, 02:01:23 pm
It depend what you mean with difference.
Show exemplary what you mean.

Normal text lines, for example (sorry my English):

This is a nice day, the sky is blue and wind blowing sun is rising.
This is a nice day with many kills, the sky is red and wind blowing sun is somwhere, because all dark here.
Title: Re: Compare two text lines and highlight difference
Post by: Zvoni on February 08, 2023, 03:05:36 pm
Nested loops comparing words/tokens?
Title: Re: Compare two text lines and highlight difference
Post by: avk on February 08, 2023, 03:15:24 pm
...
This is a nice day, the sky is blue and wind blowing sun is rising.
This is a nice day with many kills, the sky is red and wind blowing sun is somwhere, because all dark here.

Maybe you need a Diff algorithm?
Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 08, 2023, 03:35:37 pm
SynEdit doesn't have that....

The closest you currently get is the "same word highlight", which applies to current selection too.
- if you start selecting one line from the start,
- select a few chars only
- wait about half a second, so the "same word highlight" will be activated
  (this can be set under Tools > Options: Editor > Display > Markup and Matches > top section "Highlight all occurrences of word under caret"
- Now the start of the other line should be highlighted.
- Extend the selection, until the other line looses the highlight => you are at the first diff in the line.

It's not what you want/need. Not even close....

It is a different feature, but it can be used for simple comparison ....




If you want to extend SynEdit....
1) Well you need your own code to find the diffs between the lines.

Once you have a list of sections that you want to highlight ( record Line, StartX, EndX: integer end; ):
2) You can write your on SynEditMarkup.
a) If you don't use any Highlighter (no Pascal or other HL) then you can use the "SynPos...Highlighter).
b) Otherwise, look at markups. Something like SynMarkupHighlightAll should be easy to modify. It already has a list....
Title: Re: Compare two text lines and highlight difference
Post by: Awkward on February 08, 2023, 05:37:40 pm
... looks like this feature
https://github.com/rickard67/TextDiff (https://github.com/rickard67/TextDiff)
Title: Re: Compare two text lines and highlight difference
Post by: totya on February 08, 2023, 07:52:57 pm
... looks like this feature
https://github.com/rickard67/TextDiff (https://github.com/rickard67/TextDiff)

Thanks for the tip, but the basicdemo2 completely freeze my machine, when I click "open". Possible Dwarf/debug error, I must press reset (data loss).

But basicdemo1 works, but paintbox isn't really good for me, but for starting point is good.

Thank you!

Edit 1.: UTF8 characters loss with Demo1, fore example: "öüó"
Title: Re: Compare two text lines and highlight difference
Post by: totya on February 08, 2023, 08:00:34 pm
SynEdit doesn't have that....

I mean Synedit components (group name). Thank you for the many tips!
Title: Re: Compare two text lines and highlight difference
Post by: avk on February 08, 2023, 08:14:20 pm
I don't know how it's done in TextDiff, but in LGenerics it's very easy:
Code: Pascal  [Select][+][-]
  1. ...
  2. uses
  3.   ..., lgSeqUtils;
  4.  
  5. ...
  6. procedure TForm1.Button1Click(Sender: TObject);
  7. type
  8.   TUtil = specialize TGSeqUtil<string, string>;
  9. var
  10.   s1, s2: string;
  11.   a1, a2: TStringArray;
  12.   LDiff: TUtil.TDiff;
  13. begin
  14.   s1 := 'This is a nice day, the sky is blue and wind blowing sun is rising.';
  15.   s2 := 'This is a nice day with many kills, the sky is red and wind blowing sun is somewhere, because all dark here.';
  16.   a1 := s1.Split([' ', '.', ','], TStringSplitOptions.ExcludeEmpty);
  17.   a2 := s2.Split([' ', '.', ','], TStringSplitOptions.ExcludeEmpty);
  18.   LDiff := TUtil.Diff(a1, a2);
  19.   Memo1.Append('Deleted from s1(i.e. not present in s2):');
  20.   for I := 0 to High(LDiff.SourceChanges) do
  21.     if LDiff.SourceChanges[I] then
  22.       Memo1.Append(a1[I]);
  23.   Memo1.Append('');
  24.   Memo1.Append('Inserted into s2(i.e. not present in s1):');
  25.   for I := 0 to High(LDiff.TargetChanges) do
  26.     if LDiff.TargetChanges[I] then
  27.       Memo1.Append(a2[I]);  
  28. end;
  29.  

it prints:
Code: Text  [Select][+][-]
  1. Deleted from s1(i.e. not present in s2):
  2. blue
  3. rising
  4.  
  5. Inserted into s2(i.e. not present in s1):
  6. with
  7. many
  8. kills
  9. red
  10. somewhere
  11. because
  12. all
  13. dark
  14. here
  15.  
Title: Re: Compare two text lines and highlight difference
Post by: KodeZwerg on February 08, 2023, 08:25:28 pm
@avk, very cool solution, a suggestion would be, have results as a pair (string, integer) to know at what positions a change happened.
Full respects for that!  :-*
Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 08, 2023, 08:33:45 pm
Btw, this may interest you: https://forum.lazarus.freepascal.org/index.php/topic,62146.msg470014.html#msg470014

It does not compare 2 lines of text. But it does compare 2 texts line by line.
Title: Re: Compare two text lines and highlight difference
Post by: totya on February 08, 2023, 09:09:14 pm
I don't know how it's done in TextDiff, but in LGenerics it's very easy:

Thank you, I'd like to try, but where can I found this LGenerics/lgSeqUtils ?

Now I'm use the official Lazarus package with fpc 3.2.2
Title: Re: Compare two text lines and highlight difference
Post by: sketch on February 08, 2023, 09:54:47 pm
On Unix
Code: [Select]
$ cat ttt
#!/usr/bin/ksh

for ((i=1;i<3;i++))
do
  echo "Unmatched words from string ${i}"
  comm <(echo "This is a nice day, the sky is blue and wind blowing sun is rising." |tr ' ' '\n' | sed 's/\.//' |sed 's/,//' |sort) <(echo "This is a nice day with many kills, the sky is red and wind blowing sun is somewhere, because all dark here." | tr ' ' '\n' | sed 's/\.//' | sed 's/,//' |sort) | cut -f${i} |sed '/^$/d'
  echo
done
Code: [Select]
$ cat t.pp
program t;
uses Unix;
Var S : Longint;
begin
  S:=fpSystem('./ttt');
end.
Code: [Select]
$ ./t
Unmatched words from string 1
blue
rising

Unmatched words from string 2
all
because
blue
dark
here
kills
many
red
rising
somewhere
with

$
;D
Title: Re: Compare two text lines and highlight difference
Post by: avk on February 09, 2023, 05:04:56 am
...
Thank you, I'd like to try, but where can I found this LGenerics/lgSeqUtils ?
...

It lives here (https://github.com/avk959/LGenerics).
Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 08:50:54 am
...
Thank you, I'd like to try, but where can I found this LGenerics/lgSeqUtils ?
...

It lives here (https://github.com/avk959/LGenerics).

Thank you!

I install trunk version, because I see readme, and I'd like to see the json implemetation.
var declaration "I" is missing from sample, but I put it within 5 secs.

Well, this is a simplified solution, because works only with whole words.
This is mean, if only 1 letter different from words, this go to SourceChanges, and the TargetChanges list, and I don't see what letter changed exactly.
But it not a big problem, my priority looking for a simple solution.

Next problem, if any word repeated, I don't know, what word changed, so I can't colored the changes. For example:

Code: Pascal  [Select][+][-]
  1.     s1 := 'sun day sun';
  2.     s2 := 'sunx day sun';
  3.  

result:

Deleted from s1(i.e. not present in s2):
sun

Inserted into s2(i.e. not present in s1):
sunx

So thank you for this library, and the sample, but it isn't usable solution for me.
As I see TextDiff (new version, see this topic: https://forum.lazarus.freepascal.org/index.php/topic,62219.msg470413.html#msg470413 (https://forum.lazarus.freepascal.org/index.php/topic,62219.msg470413.html#msg470413) much better options, but as I see UTF8 not supported.

Thank you again!
Title: Re: Compare two text lines and highlight difference
Post by: Roland57 on February 09, 2023, 09:08:24 am
@totya

Not sure that it will match your needs, but there is also this project: https://github.com/DomingoGP/lazIdeDiffCompareFiles

And, since we are on this topic, I would like to mention diffoscope (https://diffoscope.org) that I discovered recently.
Title: Re: Compare two text lines and highlight difference
Post by: avk on February 09, 2023, 09:45:51 am
@totya, the vector of boolean values SourceChanges corresponds to the elements of the source sequence and contains True in those positions, the elements of which are not included in the target sequence. That is, if SourceChanges[2] is True, it means that the source sequence element with index 2 is not in the target sequence.
Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 09:51:19 am
@totya

Not sure that it will match your needs, but there is also this project: https://github.com/DomingoGP/lazIdeDiffCompareFiles

Thanks, this based on diff.pas (diff2.pas) but nothing important changed in this code (I need UTF8 supprt).
Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 12:36:02 pm
@totya, the vector of boolean values SourceChanges corresponds to the elements of the source sequence and contains True in those positions, the elements of which are not included in the target sequence. That is, if SourceChanges[2] is True, it means that the source sequence element with index 2 is not in the target sequence.

Thanks for this information!
Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 12:56:20 pm
... looks like this feature
https://github.com/rickard67/TextDiff (https://github.com/rickard67/TextDiff)

This is the best solution, as I see.

I wrote this isn't UTF8 comatible, for example "őóú" ets chars are lost when compare.

But I thinking. Tdiff is a delphi unit, with {$mode delphi}. Delhpi uses 2 byte coded chars (UTF-16). But {$mode delphi} do not works perfectly, so if I modify, for example:

char->widechar
string->WideString

The compare is working.

Title: Re: Compare two text lines and highlight difference
Post by: Thaddy on February 09, 2023, 01:11:25 pm
Delhpi uses 2 byte coded chars (UTF-16).
Wrong! UTF16 has between 2 and 4 bytes.
Quote
But {$mode delphi} do not works perfectly, so if I modify, for example:

char->widechar
string->WideString
Because you use the wrong mode: you should have used {$mode delphiunicode}

Also note that LCS - what you need for a diff - is a bytewise comparision, not a character based comparison and the latest TDiff is known for that reason to work with UTF8 too..
Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 02:27:05 pm
Delhpi uses 2 byte coded chars (UTF-16).
Wrong! UTF16 has between 2 and 4 bytes.

I know already, (older) Delphi use UTF-16 and never useUTF-32.

But {$mode delphi} do not works perfectly, so if I modify, for example:
char->widechar
string->WideString
Because you use the wrong mode: you should have used {$mode delphiunicode}

This is not my fault, this isn't may package. But thanks for the info!

Let me see...
I swap $mode delphi to $mode delphiunicode in two places (tdiff and unit1).
Well, seems to me works badly!
input1: Change the text here & then compareöü
input2: Change the text here & then compareőü
The result: last character missing from the compare.
But it isn't the $mode delphiunicode fault, because this bad result same with the my modified code too.
But anyway, thanks for the {$mode delphiunicode} info. (I think compiler warning missing: $mode delphi -> warning, deprecated!)

Also note that LCS - what you need for a diff - is a bytewise comparision, not a character based comparison and the latest TDiff is known for that reason to work with UTF8 too..

UTF-16 and UTF-32 is fixed size code, doesn't matter the compare is bytewise or character base I think.

---> Okay, thanks for the info, but where can I find the latest TDiff? <---

Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 09, 2023, 02:40:31 pm
UTF-16 and UTF-32 is fixed size code,

No UTF-16 is not fixed size.

UTF-16 has a CodeUnit size of 2 byte. (UTF-8 has 1 byte, and UTF-32 has 4).

In UTF-16: A Unicode codepoint can be represented by 1 or 2 CodeUnits (2 or 4 bytes).

A "character" can be either a single codepoint, or a combination of several codepoints. (That applies to Unicode itself, so that is the case for UTF-8, UTF-16 and UTF-32 and any other transfer encoding)
Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 03:14:00 pm
UTF-16 and UTF-32 is fixed size code,

No UTF-16 is not fixed size.
UTF-16 has a CodeUnit size of 2 byte. (UTF-8 has 1 byte, and UTF-32 has 4).
In UTF-16: A Unicode codepoint can be represented by 1 or 2 CodeUnits (2 or 4 bytes).
A "character" can be either a single codepoint, or a combination of several codepoints. (That applies to Unicode itself, so that is the case for UTF-8, UTF-16 and UTF-32 and any other transfer encoding)

I know otherwise, for example I hate UTF8 because this NOT only 1 byte length, this is variable length (1-4 byte) so very complicated to handle it, but I see many function available (ex.: LazUTF8: UT8Pos, UT8copy etc).
Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 09, 2023, 03:27:13 pm
UTF-16 and UTF-32 is fixed size code,

No UTF-16 is not fixed size.
UTF-16 has a CodeUnit size of 2 byte. (UTF-8 has 1 byte, and UTF-32 has 4).
In UTF-16: A Unicode codepoint can be represented by 1 or 2 CodeUnits (2 or 4 bytes).
A "character" can be either a single codepoint, or a combination of several codepoints. (That applies to Unicode itself, so that is the case for UTF-8, UTF-16 and UTF-32 and any other transfer encoding)

I know otherwise,
Then you know wrong.

Quote
for example I hate UTF8 because this NOT only 1 byte length, this is variable length (1-4 byte) so very complicated to handle it, but I see many function available (ex.: LazUTF8: UT8Pos, UT8copy etc).

And in UTF-16 (unlike UCS-2) you got  2 or 4 bytes.

UTF-16 has surrogates. And they are 4 bytes.

For example the following emoticons use 4 bytes in UTF-16 https://www.compart.com/en/unicode/block/U+1F600
Click then, see the UTF-16 encoding.

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 03:35:16 pm
And in UTF-16 (unlike UCS-2) you got  2 or 4 bytes.
UTF-16 has surrogates. And they are 4 bytes.
For example the following emoticons use 4 bytes in UTF-16 https://www.compart.com/en/unicode/block/U+1F600
Click then, see the UTF-16 encoding.

Thanks, because you know wrong already, and u see here, UTF-8 is not only 1 byte... in this example it's 4 byte length.

Okay, UTF-16 is 2-4 byte length. Peace. :)
Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 09, 2023, 03:35:38 pm
And in addition to my last post (UTF-16 having 2 or 4 bytes) there is more.

This n-bytes size refers to how a single "Codepoint" is encoded in UTF-n.

However, in Unicode itself (even affecting UTF-32) a character can have more than 1 codepoint.

Example: "ä"
https://www.compart.com/unicode/U+00E4
Can be represented either as  U+00E4  (1 codepoint)
Or as as  U+0061 followed by U+0308  (2 codepoints)

But both are the same letter. (there are letters that can be represented in more than 2 forms).

There are also letters that have no composed (1 single codepoint) form, but are always several codepoints.

Title: Re: Compare two text lines and highlight difference
Post by: Martin_fr on February 09, 2023, 03:40:22 pm
Thanks, because you know wrong already, and u see here, UTF-8 is not only 1 byte... in this example it's 4 byte length.

You must have misread me.
I assume you refer to:
UTF-16 has a CodeUnit size of 2 byte. (UTF-8 has 1 byte, and UTF-32 has 4).

I did say: In UTF-8 a "CodeUnit" is 1 byte.

And that is true. A Codepoint is then represented by 1 to 4 CodeUnits.

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 03:51:47 pm
You must have misread me.

Codepoint and Codeunits are different okay.

But back to the topics... Thaddy said the new Tdiff UTF-8 capable (and I hope work better - in the last example works badly) U have any idea where can I find it?
Title: Re: Compare two text lines and highlight difference
Post by: DomingoGP on February 09, 2023, 09:59:42 pm
Quote
The result: last character missing from the compare.

@totya

I have found the problem with the last char in TDiff in the   
 function Execute(const s1, s2: string): boolean;

Code: Pascal  [Select][+][-]
  1.  
  2.    //finally, append any trailing matches onto compareList ...
  3.     with FLastCompareRec do
  4.     begin
  5.       AddChangeChr(oldIndex1,len1{len1Minus1}-oldIndex1, ckNone);   //<DomingoGP SOLVES BUG: strings index are 1 based not 0 based.
  6.     end;


Also I have modified the unit so the desired string type can be easily changed and I have set as unicodestring by default for FPC.

if you want you can try the demo provided, with modified TDiff2.pas. It works reasonably well for me. Anyway, as Martin said, unicode characters can consist on several widechars or codepoints, in which case it won't work as expected. As well as this it works with the most common characters like accented vowels áéíóú ñ ç.

Unicode is too complicated to cover all possible cases  :-[.

The https://github.com/DomingoGP/lazIdeDiffCompareFiles component don't compare strings, so it is not affected by this bug.

Sorry for my bad english.

Title: Re: Compare two text lines and highlight difference
Post by: totya on February 09, 2023, 10:45:11 pm
Quote
The result: last character missing from the compare.
I have found the problem with the last char in TDiff in the

Seems to me your patch is working, thank you!

I use this https://github.com/rickard67/TextDiff (https://github.com/rickard67/TextDiff) version, and with your suggestion my last checked bug is disappear. :)

But I will look your Diff2.pas version too.

Thanks again!!!  O:-)
Title: Re: Compare two text lines and highlight difference
Post by: DomingoGP on February 10, 2023, 06:40:28 pm
You are welcome1.

Quote
But I will look your Diff2.pas version too.

You don't need to do, basically is the same as  https://github.com/rickard67/TextDiff with minor changes.
Title: Re: Compare two text lines and highlight difference
Post by: totya on February 10, 2023, 06:45:57 pm
You are welcome1.

Quote
But I will look your Diff2.pas version too.

You don't need to do, basically is the same as  https://github.com/rickard67/TextDiff with minor changes.

I saw that, indeed, and thank you agan.

The simplest way to use these Delphi codes, I think Thaddy idea is the best (simplest) without modified variable type etc needed:

Because you use the wrong mode: you should have used {$mode delphiunicode}
Title: Re: Compare two text lines and highlight difference
Post by: Phoenix on September 02, 2023, 09:02:29 pm
I noticed that the source has been improved
https://github.com/rickard67/TextDiff (https://github.com/rickard67/TextDiff)
it seems to work fine.

Note: the method header must be corrected to compile it
Diff.pas
Code: Pascal  [Select][+][-]
  1. ..
  2. {$IFDEF FPC}
  3. function Execute(const alist1, alist2: TIntegerList; const aDiffAlgorithm: TDiffAlgorithm): boolean; overload;
  4. {$ELSE}
  5. ..
  6.  
TinyPortal © 2005-2018