Deleting extra files with similar filenames

smpatil4 · September 30, 2009, 1:35pm

Hello,

I have a large amount of files under a root directory, with several sub-directories, and many of these sub-directories have similar files with similar names. I need to clean this up.

The filenames are of the format:

/path/to/dir/subdir/file name.dat
/path/to/dir/subdir/file name 1.dat

I want to keep only

/path/to/dir/filename.dat

and remove the other file. I have tried some tools including fslint, but it didn't work because the actual content of the files may vary slightly.

Help in creating a bash script or similar to weed out the unneeded files would be highly appreciated.

Thanks!

Scrutinizer · September 30, 2009, 4:17pm

If you have only single level subdirectories you could use

rm /path/to/dir/*/filename.dat

to delete all occurences of "filename.dat" in the subdirectories immediately underneath /path/to/dir/
or if you want also remove similar files you could e.g. try something like:

ls /path/to/dir/*/file*name*.dat

if it lists exactly the files you need to remove you can

rm /path/to/dir/*/file*name*.dat

smpatil4 · October 1, 2009, 11:39am

Unfortunately its not that straightforward.

For one, I don't know the file names beforehand. Also, the sub-directories are nested, up to three levels deep. There's a total of 30,000 files with an estimated 10,000 duplicates.

Let me try and clarify: I want to delete duplicate files. These duplicate files are in the same folder as the original and have a " 1" at the end of the filename. Some originals may also have the " 1" at the end of the filename, but duplicates for them don't exist.

Scrutinizer · October 1, 2009, 4:31pm

OK, that is quite a different requirement :). Assuming the duplicates always have " 1" at the end, followed by the extension ".dat", you could try this ksh/bash code:

BASEDIR='/path/to/dir'
ext='.dat'
duplext=' 1'
find "$BASEDIR" -type d | while read dir; do
  find $dir -name "*$ext" -maxdepth 1 -type f | while read file; do
    duplicate="${file%$ext}$duplext$ext"
    if [[ -f "$duplicate" ]]; then
      rm "$duplicate"
    fi
  done
done