finddup is a command-line utility which compares the contents of files to check if any of them match. What is considered a match depends on the chosen method; three methods are available:
For further processing of the results, you can choose between seven output modes:
There are many more options that let you control which files are ignored, which files should be compared, how accurate the heuristic comparison methods should be, how the utility should handle symbolic links, and whether to look for files in subdirectories.
So far, I have used finddup only on macOS, therefore I can only describe how to install it on a Mac — although the instructions should work just as well on Linux.
If Homebrew is installed, you can run this command:
brew install vbwx/utils/finddup
completion/finddup
to a directory like /etc/bash_completion.d
.completion/_finddup
to a directory like /usr/share/zsh/site-functions
.perl -v
to check.)cpan .
Alternatively, if you have cpanminus installed and want more flexibility with regards to installation directories, you can run these commands:
cpanm --installdeps .
perl Makefile.PL INSTALL_BASE=<your_install_dir>
make
make install
Run finddup --help
to get a quick overview of how to use this utility.
The following command calculates how much storage is taken up by duplicates in the entire file hierarchy of the working directory.
finddup -ra0 | xargs -0 du -ch --
Here is how to delete the newest exact copies of files located in different directories (a.k.a. keep only the originals):
finddup -pC0 some_folder another_folder | xargs -0 rm -f
Instead of running diff
in a loop, finddup can be used to determine which files have been changed, even across multiple copies of a directory.
finddup -rn folder-v*/
You can find a detailed explanation of all options, a tutorial, and more technical information in the User Manual.