Duplicate File Remover  1.1.1
Command line utility to remove exact duplicate files.
DUPLICATE FILE REMOVER

GitHub | Docs

A command line utility that recursively scans the set directory to find exact duplicate files inside all the sub-directories. The files so found can be listed in an output file. If required the duplicates can also be removed, thereby preserving a single unique file.

Features:

  1. Recursive scan of the set directory.
  2. Generate list of duplicate files.
  3. Scan all files or filter based on file extension.

Future Plans:

  1. Set a recursive scan depth for the set directory.
  2. A way to exclude certain directories.
  3. A way to include only some directories.
  4. Robust error handling for synchronization issues.
  5. Make the program interactive.

Caution: As of now, there is no way to select which of the duplicate files will be preserved. The selection happens on the order in which they are loaded into std::map. The first file is the one which is preserved.

Dependencies:

  1. For main program:

    sudo apt-get install libssl-dev libboost-filesystem-dev libboost-system-dev

  2. For tests, apart from the dependencies for main program:

    sudo apt-get install libcppunit-dev

Downloading and Building

  1. Clone the project:

    git clone https://github.com/vishal-wadhwa/Duplicate-File-Remover.git

  2. Change directory to src:

    cd Duplicate-File-Remover/src

  3. Build project using Make utility (assuming you've downloaded the dependencies):

    make main

  4. Run it (See Usage):

    ./main ...

Testing

  1. From the root directory of the project go to tests directory:

    cd Duplicate-File-Remover/tests

  2. Build tests using Make utility (assuming you've downloaded the dependencies):

    make test

  3. Run them tests, bruh:

    ./test

You should see OK if all the tests pass and then you can go on to using the program. ;)

Usage

  1. Use -d switch to set the directory to be scanned.
  2. Use -e switch to provide a list of extensions to filter the files scanned.
  3. Use -o switch to generate an output(log) file. If this switch is not followed by a name/path, then a default file dupl_file.txt is generated in the current directory.
  4. Use -r switch to remove the duplicates and keep only one copy.
  5. Use -h switch to display this help:
Usage: ./main -d [DIRECTORY]
or: ./main -d [DIRECTORY] -e [EXTENSIONS]...
or: ./main -d [DIRECTORY] -o [OUTFILE]
Scan the provided directory and its sub-directories recursively and find duplicates.
Not using either of -o or -r switch is pointless as no action is performed.
-d switch is necessary to set the search directory.
Other switches:
-d provided argument is the directory to be scanned.
-e following arguments treated as extensions.
-o generate file list (default file: "dupl_file.txt").
-h prints this help.
-r remove the duplicates so found.

Note: Use sudo if required.

Examples

  1. ./main -d ./ -o -r
  2. ./main -e png jpg jpeg -d ./../ -o log.out
  3. ./main -d ./ -r