Hello,
I have a large amount of files under a root directory, with several sub-directories, and many of these sub-directories have similar files with similar names. I need to clean this up.
The filenames are of the format:
/path/to/dir/subdir/file name.dat
/path/to/dir/subdir/file name 1.dat
I want to keep only
/path/to/dir/filename.dat
and remove the other file. I have tried some tools including fslint, but it didn't work because the actual content of the files may vary slightly.
Help in creating a bash script or similar to weed out the unneeded files would be highly appreciated.
Thanks!
If you have only single level subdirectories you could use
rm /path/to/dir/*/filename.dat
to delete all occurences of "filename.dat" in the subdirectories immediately underneath /path/to/dir/
or if you want also remove similar files you could e.g. try something like:
ls /path/to/dir/*/file*name*.dat
if it lists exactly the files you need to remove you can
rm /path/to/dir/*/file*name*.dat
Unfortunately its not that straightforward.
For one, I don't know the file names beforehand. Also, the sub-directories are nested, up to three levels deep. There's a total of 30,000 files with an estimated 10,000 duplicates.
Let me try and clarify: I want to delete duplicate files. These duplicate files are in the same folder as the original and have a " 1" at the end of the filename. Some originals may also have the " 1" at the end of the filename, but duplicates for them don't exist.
OK, that is quite a different requirement :). Assuming the duplicates always have " 1" at the end, followed by the extension ".dat", you could try this ksh/bash code:
BASEDIR='/path/to/dir'
ext='.dat'
duplext=' 1'
find "$BASEDIR" -type d | while read dir; do
find $dir -name "*$ext" -maxdepth 1 -type f | while read file; do
duplicate="${file%$ext}$duplext$ext"
if [[ -f "$duplicate" ]]; then
rm "$duplicate"
fi
done
done