Programming => Databases => Topic started by: Gizmo on April 22, 2021, 08:36:44 pm

Title: [SOLVED] Mind bending SQL problem with UPDATE function
Post by: Gizmo on April 22, 2021, 08:36:44 pm

I am really struggling with something that I hope is easy for someone to spot.

I have a program that compares folders of files by hash.

It finds all the files in FolderA and adds those names to a stringlist.
It finds all the files in FolderB and adds those names to a second stringlist.

It then hashes all the files in FolderA, using the first stringlist as a direction, and adds the filename and hash value to Columns 1 and 2 of my database using the UPDATE SQL syntax.
It then hashes all the files in FolderB, using the second stringlist as direction, and adds the filename and hash value to Columns 3 and 4 of my database also using the UPDATE SQL syntax.

If all the files match, and critically if the count matches, all is good. And even the files do not match but their count does match, all is good too.

But imagine where the file that was found in FolderA was NOT in FolderB at all, or visa versa. No entry gets put into Columns 3 and 4 for that missing file. So there are two empty cells; one in Col3 and one in Col4 (or rather, there needs to be two empty cells to represent the fact it was not there).

So, my problem is that when the next file is found in FolderB, UPDATE adds the details of that file to Col 3 and Col 4 but using the last empty cell so it goes into the empty cell of the preceeding entry where no file was found for its corresponding FolderA file, instead of adding the data to the next row. And then they all get out of line like a waterfall effect.

It is hard to explain and I hope the screenshot attached illustrates? You'll see the few at the top were all matching, then one is missing, and from there they all go out of whack. 

As you will see, 29.raw starts with a hash of 9B31...7349 and its copy was found in FolderB. As did all the files before match. But then comes 3-Copy.raw which IS NOT in FolderB (I created these folders by copying and pasting and then I removed one from the 2nd folder and I also edited one file to make its hash differ). But instead of it being left empty, the details for 3.raw have been inserted into Column 3 and 4. Then 3.raw is listed again on the next row, , and the whole mis-alignment trickles down. So from that point on, it's all wrong.

So, the code I have that populates the database and the 4 columns is as follows. Basically, first it works out what the highest number of rows are that are needed, and inserts those rows. Then it uses two procedures that call UPDATE to populate those rows with the data.

Can anyone happen to see how I can correct this?

Code: Pascal  [Select][+][-]
  2. //Populate database with all rows required for the biggest of the two folders
  3.         // If they match, just use the FolderA file count
  4.         if FolderAFileCount > FolderBFileCount then
  5.         begin
  6.         frmSQLiteDBases.Write_INSERT_All_Rows_Required(FolderAFileCount);
  7.         end
  8.           else if FolderBFileCount > FolderAFileCount then
  9.           begin
  10.           frmSQLiteDBases.Write_INSERT_All_Rows_Required(FolderBFileCount);
  11.           end
  12.             else if FolderAFileCount = FolderBFileCount then
  13.             begin
  14.             frmSQLiteDBases.Write_INSERT_All_Rows_Required(FolderAFileCount);
  15.             end;    
  16. ...
  18. // Populate the database table with empty rows first, to then be populated using UPDATE
  19. procedure TfrmSQLiteDBases.Write_INSERT_All_Rows_Required(RowCount : integer);
  20. var
  21.   i : integer;
  22. begin
  23.   for i:= 0 to RowCount -1 do
  24.     begin
  25.       try
  26.         sqlCOMPARETWOFOLDERS.Close;
  27.         sqlCOMPARETWOFOLDERS.SQL.Text := 'INSERT into TBL_COMPARE_TWO_FOLDERS (FolderAndFileNameA, FolderAndFileNameAHash, FolderAndFileNameB, FolderAndFileNameBHash) values (:FolderAFilename,:FolderAFileHash,:FolderBFilename,:FolderBFileHash)';
  28.         SQLTransaction1.Active := True;
  29.         sqlCOMPARETWOFOLDERS.Params.ParamByName('FolderAFilename').AsString := '';
  30.         sqlCOMPARETWOFOLDERS.Params.ParamByName('FolderAFileHash').AsString := '';
  31.         sqlCOMPARETWOFOLDERS.Params.ParamByName('FolderBFilename').AsString := '';
  32.         sqlCOMPARETWOFOLDERS.Params.ParamByName('FolderBFileHash').AsString := '';
  33.         sqlCOMPARETWOFOLDERS.ExecSQL;
  34.       except
  35.           on E: EDatabaseError do
  36.           begin
  37.             MessageDlg('Error','Unable to create all rows required for comparison. Technical error message: ' + E.Message,mtError,[mbOK],0);
  38.           end;
  39.       end;
  40.     end;
  41. end;        
  43. procedure TfrmSQLiteDBases.Write_COMPARE_TWO_FOLDERS_FolderA(Col1, Col2 : string; Counter : integer);
  44. { Col1 : Filename from Folder A
  45.   Col2 : Hashvalue from FileA in FolderA
  46. }
  47. begin
  48.   try
  49.     sqlCOMPARETWOFOLDERS.Close;      
  50.     sqlCOMPARETWOFOLDERS.SQL.Text := 'UPDATE TBL_COMPARE_TWO_FOLDERS SET FolderAndFileNameA =:FolderAFilename, FolderAndFileNameAHash =:FolderAFileHash WHERE Id=:Counter';
  51.     SQLTransaction1.Active := True;
  52.     sqlCOMPARETWOFOLDERS.Params.ParamByName('FolderAFilename').AsString := Col1;
  53.     sqlCOMPARETWOFOLDERS.Params.ParamByName('FolderAFileHash').AsString := Col2;
  54.     sqlCOMPARETWOFOLDERS.Params.ParamByName('Counter').AsString := IntToStr(Counter);
  57.   except
  58.       on E: EDatabaseError do
  59.       begin
  60.         MessageDlg('Error','A database error has occurred. Technical error message: ' + E.Message,mtError,[mbOK],0);
  61.       end;
  62.   end;
  63. end;
  65. procedure TfrmSQLiteDBases.Write_COMPARE_TWO_FOLDERS_FolderB(Col3, Col4 : string; Counter : integer);
  66. { Col3 : Filename from Folder B
  67.   Col4 : Hashvalue from FileB in FolderB
  68. }
  69. begin
  70.   try
  71.     sqlCOMPARETWOFOLDERS.Close;      
  72.     sqlCOMPARETWOFOLDERS.SQL.Text := 'UPDATE TBL_COMPARE_TWO_FOLDERS SET FolderAndFileNameB =:FolderBFilename, FolderAndFileNameBHash =:FolderBFileHash WHERE Id=:Counter';
  73.     SQLTransaction1.Active := True;
  74.     sqlCOMPARETWOFOLDERS.Params.ParamByName('FolderBFilename').AsString := Col3;
  75.     sqlCOMPARETWOFOLDERS.Params.ParamByName('FolderBFileHash').AsString := Col4;
  76.     sqlCOMPARETWOFOLDERS.Params.ParamByName('Counter').AsString         := IntToStr(Counter);
  79.   except
  80.       on E: EDatabaseError do
  81.       begin
  82.         MessageDlg('Error','A database error has occurred. Technical error message: ' + E.Message,mtError,[mbOK],0);
  83.       end;
  84.   end;
  85. end;  
Title: Re: Mind bending SQL problem with UPDATE function
Post by: engkin on April 22, 2021, 10:08:23 pm
I think the table should have three fields only: NameA, NameB, and Hash
when two files have identical hash, you record the three values.
when one file is not matched, you record its name and hash and leave the other name empty.

You build the two lists first with the hashes, then you write the results. When you write an entry from the first list, you search the second list for a matching hash. If you found a matching hash* you take the names and the hash as values for the table fields, and you delete both entries from the two lists.

If no matching entry was found, take one name and the hash and record it in its appropriate fields, and delete the entry from the list

It does not matter which list you begin with, when done with the first list you switch to the second list. All its entries, at this point, are files with no match.

When filling the table, you just need one go using INSERT.

* You should try to match the names AND the hashes first, if failed, then try the hash alone. This is to account for more than one copy of the same file in the same folder.
Title: Re: Mind bending SQL problem with UPDATE function
Post by: dseligo on April 22, 2021, 10:39:05 pm
You didn't say what database you use, I will write you for MySQL (MariaDB) (edit: I saw later that you use SQlite).
I would organize database table little different: you don't need two columns for FolderAndFileName (since you want the same filename in one row), so the table would look like:
Code: MySQL  [Select][+][-]
  1. CREATE TABLE `tbl_compare_two_folders` (
  3.         FolderAndFileName VARCHAR(500) NOT NULL,
  4.         FolderAndFileNameAHash VARCHAR(36) NULL,
  5.         FolderAndFileNameBHash VARCHAR(36) NULL,
  6.         PRIMARY KEY (ID),
  7.         UNIQUE INDEX FolderAndFileName (FolderAndFileName));

If you will insert names and hashes into database, I don't see point into filling stringlist.

Now you go through folder 'A', read files calculate hash and call InsertHash procedure:
Code: Pascal  [Select][+][-]
  1.   ... FindFirst(...)...
  2.   repeat
  3.     sHash:=YourCalculateHashFunction(...);
  4.     InsertHash(FileName,sHash,'FolderAndFileNameAHash'); // for 'B' folder use FolderAndFileNameBHash column name
  5.   until FindNext(...)<>0;
  6.   FindClose(...);

You repeat this for folder 'B', just use different column name.

Code: Pascal  [Select][+][-]
  1. procedure InsertHash(sFileName,sHash,sColumnName:String);
  2. begin
  4.   sqlCOMPARETWOFOLDERS.SQL.Text := 'INSERT INTO tbl_compare_two_folders (FolderAndFileName, '+sColumnName+') VALUES (:FolderAndFileName, :hash) ON CONFLICT(FolderAndFileName) DO UPDATE SET '+sColumnName+' = :hash;';
  5.   SQLTransaction1.Active := True;
  6.   sqlCOMPARETWOFOLDERS.Params.ParamByName('FolderAndFileName').AsString := sFileName;
  7.   sqlCOMPARETWOFOLDERS.Params.ParamByName('FolderAFileHash').AsString := sHash;
  9. //   SQLTransaction1.Active := False; // don't you need this? I am not sure, I don't work with SQlite much
  10. end;

If you want to check files that exists in both folders you use:
Code: MySQL  [Select][+][-]
  1. from tbl_compare_two_folders
  2. where FolderAndFileNameAHash is not null
  3. and FolderAndFileNameBHash is not null

If you want to check files that exists in both folders but not the same you use:
Code: MySQL  [Select][+][-]
  1. from tbl_compare_two_folders
  2. where FolderAndFileNameAHash<>FolderAndFileNameBHash

If you want to check files that don't exists in folder A you use:
Code: MySQL  [Select][+][-]
  1. from tbl_compare_two_folders
  2. where FolderAndFileNameAHash is null

And if you want to check files that don't exists in folder B you use:
Code: MySQL  [Select][+][-]
  1. from tbl_compare_two_folders
  2. where FolderAndFileNameBHash is null
Title: Re: Mind bending SQL problem with UPDATE function
Post by: dseligo on April 22, 2021, 10:48:07 pm
Oh, and if you wan't to check if there exists different file names with same hash you could use something like this:
Code: MySQL  [Select][+][-]
  1. from tbl_compare_two_folders a, tbl_compare_two_folders b
  2. where a.FolderAndFileName<>b.FolderAndFileName
  3. and (
  4.   a.FolderAndFileNameAHash=b.FolderAndFileNameAHash
  5. or a.FolderAndFileNameAHash=b.FolderAndFileNameBHash
  6. or a.FolderAndFileNameBHash=b.FolderAndFileNameAHash
  7. or a.FolderAndFileNameBHash=b.FolderAndFileNameBHash
  8. )
Title: Re: Mind bending SQL problem with UPDATE function
Post by: engkin on April 22, 2021, 11:12:58 pm
@dseligo, beautiful approach. How do you go about filling dbgrid[?] as in the attached image in the first post. I think that's his goal.
Title: Re: Mind bending SQL problem with UPDATE function
Post by: dseligo on April 23, 2021, 12:07:13 am
I wrote couple of SQL selects so he can use them to fill DBGrid.
Title: Re: Mind bending SQL problem with UPDATE function
Post by: engkin on April 23, 2021, 12:24:00 am
I just saw the last one, awesome.
Title: Re: Mind bending SQL problem with UPDATE function
Post by: Gizmo on April 23, 2021, 10:00:19 am
Gents, I just wanted to quickly acknowledge and than you for the hero contributions. I didn't think anyone would even try with this one, so thanks a million. I am going to try and look tonight and over the weekend and yes, perhaps I just need to rethink the entire thing. I was hoping to be able to quickly fix it but on reflection I think it needs a complete overhaul as you have described. Thanks
Title: Re: Mind bending SQL problem with UPDATE function
Post by: dseligo on April 23, 2021, 01:57:32 pm
This isn't very different from what you have now, to me it looks simpler.

Oh and one more thing: in your original table design and in my solution in column(s) FolderAndFileName you should put only file name - without path, because it's hard to search and compare file name if it's together with path. If you need to store path (if the paths will change in time) then it would be better to store paths in separate columns.

I don't know what your goal is with your program: if main purpose is comparing two folders then this will work, but if it will be involved more folders then maybe structure like this would be more appropriate and it would be more flexible if table looks like this:
Code: MySQL  [Select][+][-]
  1. CREATE TABLE tbl_compare_two_folders (
  2. FileName VARCHAR(500) NOT NULL,
  3. Folder VARCHAR(500) NOT NULL,
  4. UNIQUE INDEX FolderFileName (Folder,FileName));

Then your select would look like this:
Code: MySQL  [Select][+][-]
  1. from tbl_compare_two_folders a
  2.   left join tbl_compare_two_folders b on a.filename=b.filename and b.folder=:folder_b
  3. where a.folder=:folder_a
  4. from tbl_compare_two_folders a
  5.   right join tbl_compare_two_folders b on b.filename=a.filename and a.folder=:folder_a
  6. where b.folder=:folder_b

Above SQL will return all files from two folders which you set in params 'folder_a' and 'folder_b'.
You can then add in where clause additional conditions if you want to see which files doesn't exists in folder, or check hashes.

I didn't test this so there is possibility that some errors slipped in. This is from mySQL, so I hope joins and union will work the same in SQlite.
Title: Re: Mind bending SQL problem with UPDATE function
Post by: Gizmo on April 23, 2021, 10:31:13 pm
Hi Dseligo

I've spent the last couple of evening having a go with this, and perhaps given your appetite to help you'll allow me to add some flesh to the bones.

1) The reason I use stringlists is two fold. a) It allows me, if I use FindAllFiles first, to generate a figure of how many files are in FolderA and FolderB, and b) it provides me the filename and path all on one string, which my hash function needs as it expects the path to a file, and the filename. So for that reason, FindFirst and FindNext I don't think are quite the ticket (I did start to try but quickly realised that although Filename is easily accessible, file path, especially once you dig into sub-folders of the selected top level folder, becomes tricky\messy.

2) What I am trying to achieve is based not so much on filename, but file hash,, and allowing the user to know that the file CONTENT of two folder are the same, or not. So FolderA could have 3 files in it called A.txt, B.txt and C.txt. FolderB could have copies of the exact same three files, but be called D.txt, E.txt and F.txt. In that case, I need it to report a match based on hash and filecount even if the filenames differ.   

3) And so herein is the problem. As you saw from post 1, it works fine using sorted stringlists when the filecount is the same. Everything is kept in line. But if one folder contains less than the other, and because I need to have one DBGrid with the content of FOlderA on one side and the content of FolderB on the other, I am having to use UPDATE in SQL, instead of INSERT. INSERT of course adds rows. UPDATE updates existing rows. But if a file is missing, and the cells are empty, then UPDATE places the next entry in that. What I need to do is somehow mark it as blank. That is what I was hoping to do because aside from this one issue, the rest of it works perfectly. 

I have attempted to implement some of your suggestions but, as you have just illustrated, if I stick with stringlists, and I stick with two InsertHash routines, it's not a million miles away from what I have now I think, so I'm not quite sure whether to try again or keep pursuing. Your implementation is better of course, but I don't understand all of it and so far I have only got as far as showing the results of of all the files from both FolderA and FolderB all in one column, with hashes for FolderA in Column2, and then hashes for FolderB in Column 3.

It's all making my head hurt to be honest. So much so I am thinking of just discarding the functionality. The only probelm is a lot of people use it and like it (at least for instances where the filecounts are the same). I need to think outside the box somehow. You have two folders. One has less files than the other, but you don't know which file(s) are missing until you have itterated both folders and hashed all the files. How to I display that, in a DBGrid? That is my challenge.

By the way, if you want to see this issue for yourself, go to and run the program (full code of latest branch not including your suggestion yet  is Create two folders, with matching files first, and then remove one from FolderB. Use the 'Compare Two Folders' tab and it might make more sense to you. 
Title: Re: Mind bending SQL problem with UPDATE function
Post by: dseligo on April 24, 2021, 12:20:39 am
I downloaded and successfully compiled your program. I'll take a look at it and make a suggestion to you. But it will take a while.

One question: if in folder A you have one file with same hash as two files in folder B, how do you want to display this?

Code: Text  [Select][+][-]
  1. Folder A    Hash A  Folder B    Hash B
  2. ------------------------------------------------------
  3. file01      111111  file01      111111
  4.                     file02      111111

Same question is for empty files: if in one folder you have 3 empty files and in the other 2 empty files, how do you want to show this?

And I have found bug btnCompareClick method. You create TStringList to slFileListA and then you assign to slFileListA new string list with RetrieveFileList function:
Code: Pascal  [Select][+][-]
  1.       // slFileListA := TStringList.Create; // this will cause memory leak
  2.       // slFileListA.Sorted := true;
  3.       slFileListA := RetrieveFileList(FolderA);

Same goes for sFileListB.
Title: Re: Mind bending SQL problem with UPDATE function
Post by: Gizmo on April 24, 2021, 12:33:48 am
Oh man...that is so kind of you to help out. I really appreciate it. I'll be sure to give you a mention in the release notes when we're done!

Re Q1. As you have shown I think. It doesn't matter if multiple files have the same hash, as long as the reported hash is correct.

Re Q2. QH will not hash empty files. Instead it adds "ZERO BYTE FILE" as the 'hash string', because if you attempt to hash a zero byte file you get an initialisation hash which can be (and was until I changed it) misleading.

Point 3 : You are of course correct. I think the reason I did that way was because the lists needed to be sorted, and I read somewhere it was faster to specify the .sorted parameter before data was added, then using .sort of an unsorted structure. But you're right because in fact a new list is returned by that function from FindAllFilesEx! I'll need to correct that. UPDATE: I've just commented those two lines out for each list and it works as it should so I've pushed that to the v3.3.0 GitHub branch so you may need to git pull again to stay current. And if you need the Windows SQLite DLL's, they are included in the download for both 32 and 64 bit platforms from the website link (not in Github) and work fine though I have compiled new SQLite DLL's just the other evening based on the latest release of SQLite to accompany v3.3.0 but I've not got them onto the website yet as haven't put them in GitHub as they are compiled DLLs and not code.

Thanks again for agreeing to help me out here. I'm feeling really rather stuck to be honest, and am fearful of changing things and then never being able to get it right again!
TinyPortal © 2005-2018