Searching File in Directory and all Subdirectory and Delete

Hi All,

My directory structure is like

  • Directory1
    [list]
  • SubDirectory1
  • SubDirectory2
  • SubDirectory3
    [/list]

I have main directories and subdirectories underneath. I want to write a shell script where I will be passing file name as a parameter, Now I want to find all the files in Directory1 and all subdirectories having same file contents as the parametrized file and delete them.

I am new to Shell scripting, please help me in writing this.

Please send me full script, thanks a lot

Hi what have you tried?

Did not try anything, new to Unix.

Where would the file to be compared to reside? And what would you consider "same file contents"?

Thanks for reply.

The file to be compared will reside in different directory structure. I will pass file name along with directory structure as parameter. Yes contents must be same.

Thanks

And what would you consider "same file contents"?

Yes please. Same file contents.

Is this a homework assignment?

If not, please explain what determines whether or not a file compared to your "parametrized" file should be kept or removed:

  1. Do the file names have to be the same?
  2. Do the file permissions have to be the same?
  3. Do the file owner and group IDs have to be the same?
  4. Do the file timestamps have to be the same?
  5. Do the file inode numbers have to be the same?
  6. Do the file contents have to be the same?
  7. If a file is a symbolic link, what should be removed? (The file to which the symlink points? The symlink itself? Both?)

Here are the answers to your questions:

Is this a homework assignment?* Work related

If not, please explain what determines whether or not a file compared to your "parametrized" file should be kept or removed:

  1. Do the file names have to be the same? No
  2. Do the file permissions have to be the same? No
  3. Do the file owner and group IDs have to be the same? Yes
  4. Do the file timestamps have to be the same? No
  5. Do the file inode numbers have to be the same? No
  6. Do the file contents have to be the same? Yes

If a file is a symbolic link, what should be removed? (The file to which the symlink points? The symlink itself? Both?) File is not a symbolic link, both are physical files, both needs to be removed i.e. parametrized file and all other files which has exactly same contents under the directory structure which I explained above.

You haven't said what operating system or shell you're using. The following seems to do what you have said you want done when using a system where ls -l output conforms to the standards and your shell accepts basic Bourne shell syntax (such as bash , ksh , and on almost any system /bin/sh ):

#!/bin/ksh
for pfile in "$@"
do	ls -l "$pfile" | (
		read x x user group x
		find Directory1 -user "$user" -group "$group" -type f \
			-exec cmp -s "$pfile" {} \; \
			-exec echo rm {} +
		echo rm "$pfile"
	)
done

If you save the above script in a file (for example tester ) and make it executable with:

chmod +x tester

then executing this script with one or more operands will show you all of the rm commands that would be needed to remove all files with the same user ID, group ID, and contents as the files named by operands in the file hierarchy rooted in Directory1 :

./tester file1 file2

If the output from ls -l on your system doesn't print the user and group names as the 3rd and 4th fields in the output, adjust the read command to capture the user and group IDs in the correct fields.

If the output looks like it is correct, remove the echo in both lines shown in red to actually remove the files instead of just showing you what files would be removed.

You must be sure that the operands you pass to this script name files that are not in the file hierarchy rooted in Directory1 . If any file operands do reside in the file hierarchy rooted in Directory1 , that file (or those files) may be removed before all of the files that meet your criteria have been found.

Thanks a lot Don, I really appreciate it. I am going to execute it and let you know.

---------- Post updated at 12:34 PM ---------- Previous update was at 10:47 AM ----------

Hi Don,

When I execute the script, it does not compare given file with others in the path. It gives me following output

Usage: cmp [-l | -s] File1 File2
rm /Dir1/o34620760.out

Thanks

What operating system and shell are you using?
The cmp in the find -exec primary I supplied matches the synopsis form:

cmp -s File1 File2

shown in the diagnostic message, so please show us the exact text in your copy of my suggested script.

OS: Unix
Shell is Korn Shell.

Script:

#!/usr/bin/ksh
for pfile in "$@"
do	ls -l "$Filename" | (
		read x x user group x
		find /u50/payments1 -user "$user" -group "$group" -type f \
			-exec cmp -s "$Filename" {} \; \
			-exec echo rm {} +
		echo rm "$Filename"
	)
done
return 0

File Name passed as parameter was /home/rjohn/abcd.out

The code you showed above would not produce the output you showed us in post #11. Note that in the script I suggested, there was no $Filename ; all of the places you are using $Filename were $pfile in the script I suggested. If you want to use Filename as your variable name instead of pfile , that is fine; but you have to be consistent. You can't assign the pathnames given on the command line to the variable pfile and then use $Filename to reference that pathname. All four of the variables marked in red above must be identical.

And, for the record, UNIX is not an operating system; it is a brand that applies to several operating systems (such as AIX, HP/UX, OS X, and Solaris). If someone asks you what model of car you drive, they would expect an answer like 2014 Toyota Camry hybrid, not sedan.

Hi Don,

Sorry its AIX 6. I changed the pfile instead of filename but still result is same.

My question is we are giving cmp only one parameter but 2nd parameter we are not giving which should be recursive picking file name one by one from directory structure /u50/payments1. Sorry to disturb you again and again.

Thanks

No. The {} in:

-exec cmp -s "$Filename" {} \; 

is replaced by the pathname of a file to be tested by find and it works perfectly when I try it on my MacBook Pro running OS X Yosemite. The only reason to get that diagnostic message from cmp is if the variable name used in the for loop does not match the variable name expanded in the find -exec primary or something else on that line does not match the text I suggested.

Please change the 1st line of your script to the following two lines:

#!/usr/bin/ksh
set -xv

rerun your script, and show us the exact output it produces (in CODE tags).

Thanks Don. I will run it tomorrow and let you know.

---------- Post updated 02-10-15 at 10:44 AM ---------- Previous update was 02-09-15 at 09:16 PM ----------

This is what I got

Getting list of parameters.
File Name is: /home/rkath/o34620760.out

Getting list of files to purge from /xxtd/u20/xx_payments:

for pfile in "$Filename"
do	ls -l "$pfile" | (
		read x x user group x
		find /xxtd/u20/xx_payments -user "$user" -group "$group" -type f \
			-exec cmp -s "$pfile" {} \; \
			-exec echo rm {} +
		echo rm "$pfile"
	)
done
+ ls -l /home/rkath/o34620760.out
+ read x x user group x
+ find /xxtd/u20/xx_payments -user rkath -group staff -type f -exec cmp -s /home/rkath/o34620760.out {} ; -exec echo rm {} +
Usage: cmp [-l | -s] File1 File2
+ echo rm /home/rkath/o34620760.out
rm /home/rkath/o34620760.out
return 0
+ return 0

There is a HUGE difference between the code you have here and what I suggested in post #10 in this thread:

for pfile in "$@"

Since Filename is an undefined variable in this script, $Filename expands to an empty string while "$@" expands to a list of your command-line arguments. Please try it the way I suggested in the first place.

Thanks a lot Don. Your script was not working in my environment but thanks a lot I learned a lot from you during this process. I finally wrote mine which is working fine now:

for f in $(find /home/u11/payments -type f)
do
echo comapring file now "$f" with $Filename
cmp -s $Filename $f > /dev/null
if [ $? -eq 1 ]; then
    echo is different    
else
    echo is not different
    echo Removing file now $f
    rm $f
fi
done

This might work, but you're living dangerously...

The exit code from cmp is 0 if the files have the same contents, 1 if the contents are different, or some value greater than 1 if an error occurred. Your if condition is ignoring the error case and treating it as if the files compared equal when the comparison was not completed. (This might cause you to remove files that you couldn't compare.)

There is no need to redirect standard output from cmp -s since nothing is written to standard output when the -s option is present.

You might want to try this alternative (although I would still like to see a trace showing what failed in the script I suggested in post #10):

find /home/u11/payments -type f | while read -r f
do	printf 'Comparing file "%s" with "%s"...\n' "$f\" "$Filename"
	if cmp -s -- "$Filename" "$f"
	then	printf 'is not different\nRemoving "%s" now\n' "$f"
		rm -- "$f"
	else	echo 'is different or error encountered'
	fi
done

Using find ... | while read f allows the find command and the compare loop to run in parallel.

Using for f in $(find ...) requires the find command to complete before the compare loop starts.

Although your current filenames might not contain any spaces, tabs, or other characters special to the shell, quoting the expansions of $f and $Filename will keep your script running smoothly if conditions like that arise in the future. Similarly, preceding the file name operands to cmp and rm with -- protects against file names that might start with a minus sign ( - ) from being interpreted as options instead of as operands. And, the behavior of echo varies from system to system and shell to shell if the 1st operand given to echo starts with a minus sign or if any operand contains a backslash ( \ ) character (both of which are possible from the expansions of $f and $Filename in your script). Using printf with a format string operand instead of echo avoids these possible problems.

Hope this helps...