GitHub | Docs

A command line utility that recursively scans the set directory to find exact duplicate files inside all the sub-directories. The files so found can be listed in an output file. If required the duplicates can also be removed, thereby preserving a single unique file.

Features:

Recursive scan of the set directory.
Generate list of duplicate files.
Scan all files or filter based on file extension.

Future Plans:

Set a recursive scan depth for the set directory.
A way to exclude certain directories.
A way to include only some directories.
Robust error handling for synchronization issues.
Make the program interactive.

Caution: As of now, there is no way to select which of the duplicate files will be preserved. The selection happens on the order in which they are loaded into std::map. The first file is the one which is preserved.

Dependencies:

For main program:

sudo apt-get install libssl-dev libboost-filesystem-dev libboost-system-dev
For tests, apart from the dependencies for main program:

sudo apt-get install libcppunit-dev

Downloading and Building

Clone the project:

git clone https://github.com/vishal-wadhwa/Duplicate-File-Remover.git
Change directory to src:

cd Duplicate-File-Remover/src
Build project using Make utility (assuming you've downloaded the dependencies):

make main
Run it (See Usage):

./main ...

Testing

From the root directory of the project go to tests directory:

cd Duplicate-File-Remover/tests
Build tests using Make utility (assuming you've downloaded the dependencies):

make test
Run them tests, bruh:

./test

You should see OK if all the tests pass and then you can go on to using the program. ;)

Usage

Use -d switch to set the directory to be scanned.
Use -e switch to provide a list of extensions to filter the files scanned.
Use -o switch to generate an output(log) file. If this switch is not followed by a name/path, then a default file dupl_file.txt is generated in the current directory.
Use -r switch to remove the duplicates and keep only one copy.
Use -h switch to display this help:

Usage: ./main -d [DIRECTORY]
or: ./main -d [DIRECTORY] -e [EXTENSIONS]...
or: ./main -d [DIRECTORY] -o [OUTFILE]
Scan the provided directory and its sub-directories recursively and find duplicates.
Not using either of -o or -r switch is pointless as no action is performed.
-d switch is necessary to set the search directory.
Other switches:
    -d      provided argument is the directory to be scanned.
    -e      following arguments treated as extensions.
    -o      generate file list (default file: "dupl_file.txt").
    -h      prints this help.
    -r      remove the duplicates so found.

Note: Use sudo if required.

Examples

./main -d ./ -o -r
./main -e png jpg jpeg -d ./../ -o log.out
./main -d ./ -r