finddup is a command-line utility that compares the contents of files to check if any of them match. What is considered a match depends on the chosen method; three methods are available:
For further processing of the results, you can choose between seven output modes:
There are many more options that let you control which files are ignored, which files should be compared, how accurate the heuristic comparison methods should be, how the utility should handle symbolic links, and whether to look for files in subdirectories.
findlink is a command-line utility that compares the inode numbers of hard links to check if they point to the same file. It accepts most of the options accepted by finddup.
So far, I have used finddup only on macOS, therefore I can only describe how to install it on a Mac — although the instructions should work just as well on Linux.
If Homebrew is installed, you can run this command:
brew install vbwx/utils/finddup
completion/finddup and completion/findlink to a directory like /etc/bash_completion.d.completion/_finddup and completion/_findlink to a directory like /usr/share/zsh/site-functions.perl -v to check.)cpan .
Alternatively, if you have cpanminus installed and want more flexibility with regards to installation directories, you can run these commands:
cpanm --installdeps .
perl Makefile.PL INSTALL_BASE=<your_install_dir>
make
make install
Run finddup --help to get a quick overview of how to use this utility.
The following command calculates how much storage is taken up by duplicates in the entire file hierarchy of the working directory.
finddup -ra0 | xargs -0 du -ch --
Here is how to delete the newest exact copies of files located in different directories (a.k.a. keep only the originals):
finddup -pC0 some_folder another_folder | xargs -0 rm -f
Instead of running diff in a loop, finddup can be used to determine which files have been changed, even across multiple copies of a directory.
finddup -rn folder-v*/
The following command lists all files with multiple hard links located in the entire file hierarchy of the working directory.
findlink -rd
You can find a detailed explanation of all options, a tutorial, and more technical information in the User Manual.