Dupseek 1.3 review

Download
by rbytes.net on

Dupseek is a command-line interactive perl program to find and remove duplicate files.

License: Freeware
OS: Mac OS X
File size: 13K
Developer: Antonio Bellezza
Price: $0.00
Updated: 05 Dec 2005
0 stars award from rbytes.net


Dupseek is a command-line interactive perl program to find and remove duplicate files.

A few strategies are possible for finding duplicate files in a big set, such as a heavily populated directory.

One of the most widely used consists of grouping files by size (because files of different size can't be identical) and then computing a short digital fingerprint (such as a md5 checksum) for the files. Files with a different fingerprint are different, and files with the same digital fingerprint are very probably the same. Just to be sure, one can further check possible duplicates.

Dupseek does something different:

• It starts by grouping files by size.
• Then it starts reading small chunks of the files of the same size and comparing them. It creates smaller groups depending on these comparisons.
• It goes on with bigger and bigger chunks (of size up to a hard-coded limit).
• It stops reading from files as soon as they form a single-element group or they are read completely (which only happens when they have a very high probability of having duplicates).

This algorithm is much more efficient than competitors when dealing with large files of the same size. When files differ, reading usually stops after very few reads.
What's New:
Added -m and -M options.
While reading directories now reads device and inode numbers and stops if sees the same directory twice. This avoids the danger caused by the same directory appearing twice.

Requirements:
Perl >= 5.6 Perl modules: File::Find, IO::File, Getopt::Std.

Dupseek 1.3 keywords