Recent

Author Topic: Scanning files and finding the common ones  (Read 6769 times)

vaggos22

  • Newbie
  • Posts: 5
Scanning files and finding the common ones
« on: April 27, 2016, 11:24:03 am »
Hi I'm not very good at programming but I have a small project and I believe I can make it with some help.
I need to make a program which will scan 2 drives for example C: and F: and see if there is the same file in both drives just in a different path so I can delete it. I need 2 buttons so I can choose the drives that are about to be scanned and a box below that will show the common files.


Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Scanning files and finding the common ones
« Reply #1 on: April 27, 2016, 11:51:11 am »
Do you need such a program for real life? If so there are several free programs out there that can do that reliably.

You day you are not very good at programming?
How far have you gotten with this project?
What works and what does not?
Do you have a general idea on how the task may be achieved (can you come up with a step-by-step approach: just describe in words, no actual code needed at this point). I.o.w. did you come up with some sort of algorithm?

We'll be glad to help you but show us what you've got so far first please.

(B.t.w. Homework assignment??)

Bart

vaggos22

  • Newbie
  • Posts: 5
Re: Scanning files and finding the common ones
« Reply #2 on: April 27, 2016, 12:05:41 pm »
What are these programs are you refering? Well no homework I'm doing my practise on a company and my boss asked me to do it for him. So far I've added some edit buttons and they work I can choose which Directory I want so now I need to see how I will make the table that will show the common files on these Drives.

balazsszekely

  • Guest
Re: Scanning files and finding the common ones
« Reply #3 on: April 27, 2016, 05:16:30 pm »
@vaggos22
Total Commander has a "Synchronize Directories" feature, which is perferctly suited for the job. On the other hand, if you want to learn pascal, this is a good oportunity. I recommend to write a recursiv function for searching files in a specific drive. Move the function to a worker thread, for better performance. If you have questions feel free to ask!

eny

  • Hero Member
  • *****
  • Posts: 1634
Re: Scanning files and finding the common ones
« Reply #4 on: April 27, 2016, 05:22:49 pm »
Do you need such a program for real life? If so there are several free programs out there that can do that reliably.
That would be most interesting
All posts based on: Win10 (Win64); Lazarus 2.0.10 'stable' (x64) unless specified otherwise...

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Scanning files and finding the common ones
« Reply #5 on: April 27, 2016, 11:42:26 pm »
google serach term: deduplicate files

Bart

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11458
  • FPC developer.
Re: Scanning files and finding the common ones
« Reply #6 on: April 27, 2016, 11:51:39 pm »
The algorithm is simple:

1. scan both drives for files
2. When you find a file during (1), put them in a list depending on size.
3. When all files are scanned, check the list with sizes, and see if such size category requires special consideration.

If you just want duplicate files that is simply (files in category>1), but if you want to ignore duplicates on the same drive, you have to investigate some more.

4. calc md5 sums (or some other hash) for all files that need to be investigated. (if you have relative large files, you might to only want to md5 to maximally 1MB or so)

5. If md5 sums match, do a whole file compare.

All files that pass that are duplicates. The step with the hashes is mostly if you want to store the state (to update/do a later step again).

cdbc

  • Hero Member
  • *****
  • Posts: 1090
    • http://www.cdbc.dk
Re: Scanning files and finding the common ones
« Reply #7 on: April 28, 2016, 06:13:07 am »
Hi
Code: Pascal  [Select][+][-]
  1. uses FileUtil

Have a look at "TFileSearcher"  ;D
Regards Benny
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE5 -> FPC 3.2.2 -> Lazarus 2.2.6 up until Jan 2024 from then on it's: KDE5/QT5 -> FPC 3.3.1 -> Lazarus 3.0

vaggos22

  • Newbie
  • Posts: 5
Re: Scanning files and finding the common ones
« Reply #8 on: April 28, 2016, 11:34:25 am »
Thank you all for your replies! They are all very helpful!

balazsszekely

  • Guest
Re: Scanning files and finding the common ones
« Reply #9 on: April 28, 2016, 11:58:30 am »
I attached a threaded application(source only) which will recursively search for files in a specific directory/drive. In my opinion this is the hardest part, you can easily adapt to search 2 folders then compare the files. Tested under windows(Win 7/Lazarus Trunk/FPC 3.0.0).
Code: Pascal  [Select][+][-]
  1. unit uFileSearch;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils;
  9.  
  10. type
  11.   { TFileSearch }
  12.   TOnSearching = procedure(Sender: TObject; APath: String; AFileCount: Integer) of object;
  13.   TOnSearchComplete = procedure(Sender: TObject; AFileList: TStringList) of object;
  14.   TFileSearch = class(TThread)
  15.   private
  16.     FPath: String;
  17.     FFileList: TStringList;
  18.     FDone: Boolean;
  19.     FNeedToBreak: Boolean;
  20.     FOnSearching: TOnSearching;
  21.     FOnSearchComplete: TOnSearchComplete;
  22.     procedure FindFiles(const APath: String);
  23.     procedure UpdateSearching(const APath: String);
  24.     procedure UpdateSearchComplete;
  25.   protected
  26.     procedure Execute; override;
  27.   public
  28.     constructor Create(APath: String);
  29.     destructor Destroy; override;
  30.   public
  31.     property NeedToBreak: Boolean read FNeedToBreak write FNeedToBreak;
  32.     property OnSearching: TOnSearching read FOnSearching write FOnSearching;
  33.     property OnSearchComplete: TOnSearchComplete read FOnSearchComplete write FOnSearchComplete;
  34.   end;
  35.  
  36. implementation
  37.  
  38. { TFileSearch }
  39. procedure TFileSearch.FindFiles(const APath: String);
  40.   procedure Search(const APath: String);
  41.   var
  42.     SearchRec: TSearchRec;
  43.     Path: String;
  44.   begin
  45.     Path := IncludeTrailingBackslash(APath);
  46.     if FindFirst(Path + '*.*', faAnyFile - faDirectory, SearchRec) = 0 then
  47.     begin
  48.       repeat
  49.         if FNeedToBreak then
  50.           Break;
  51.         FFileList.Add(Path + SearchRec.Name);
  52.         UpdateSearching(Path + SearchRec.Name);
  53.       until FindNext(SearchRec) <> 0;
  54.       FindClose(SearchRec);
  55.     end;
  56.  
  57.     if FindFirst(Path + '*.*', faAnyFile, SearchRec) = 0 then
  58.     begin
  59.       repeat
  60.         if FNeedToBreak then
  61.           Break;
  62.         if ((SearchRec.Attr and faDirectory) <> 0) and (SearchRec.Name <> '.') and (SearchRec.Name <> '..') then
  63.           Search(Path + SearchRec.Name)
  64.        until FindNext(SearchRec) <> 0;
  65.        FindClose(SearchRec);
  66.     end;
  67.   end;
  68. begin
  69.   Search(APath);
  70.   FDone := True;
  71. end;
  72.  
  73. procedure TFileSearch.UpdateSearching(const APath: String);
  74. begin
  75.   if Assigned(FOnSearching) then
  76.     FOnSearching(Self, APath, FFileList.Count);
  77. end;
  78.  
  79. procedure TFileSearch.UpdateSearchComplete;
  80. begin
  81.   if Assigned(FOnSearchComplete) then
  82.     FOnSearchComplete(Self, FFileList);
  83. end;
  84.  
  85. procedure TFileSearch.Execute;
  86. begin
  87.   FindFiles(FPath);
  88.   while not (FDone or Terminated) do
  89.     Sleep(0);
  90.   Synchronize(@UpdateSearchComplete);
  91. end;
  92.  
  93. constructor TFileSearch.Create(APath: String);
  94. begin
  95.   inherited Create(True);
  96.   FPath := IncludeTrailingBackslash(APath);
  97.   FFileList := TStringList.Create;
  98.   FFileList.Sorted := True;
  99.   FDone := False;
  100.   FNeedToBreak := False;
  101. end;
  102.  
  103. destructor TFileSearch.Destroy;
  104. begin
  105.   FFileList.Free;
  106.   inherited Destroy;
  107. end;
  108.  
  109. end.

vaggos22

  • Newbie
  • Posts: 5
Re: Scanning files and finding the common ones
« Reply #10 on: April 28, 2016, 03:44:49 pm »
Thank you I appreciate that!

ezlage

  • Guest
Re: Scanning files and finding the common ones
« Reply #11 on: May 04, 2016, 01:43:36 pm »
I built a project that searches in some path for deduplication possibilities.
So, my app reads each part of each file, generates, stores and compares the checksums.

This way, it tells me about repeated data blocks.

I built it for a specific case and computer (with a lot of resources), so you will need to change somethings.

Warnings:
1) It opens one thread for each file and the threads processing occurs simultaneously. Will be necessary to limit the number of threads running at the same time depending of your computer;
2) This app checks for similarity between file pieces. Will be necessary to adapt it to check file redundancy;
3) You will need sqlite3 library, ZeosDBO and DCPCrypt components.

The project and a printscreen are attachments in this post. Anyone can do anything with my code, but I would appreciate receiving the codes derived from my project.

Sorry by my poor english.
« Last Edit: May 04, 2016, 01:53:51 pm by ezlage »

balazsszekely

  • Guest
Re: Scanning files and finding the common ones
« Reply #12 on: May 06, 2016, 06:49:59 pm »
@vaggos22

You can download the whole project(windows only for now). Files with similar name and size will be added to the list.
« Last Edit: May 08, 2016, 03:56:48 pm by GetMem »

 

TinyPortal © 2005-2018