I'm trying to write a script that finds and deletes files that are older than 300 days. The script will read a table that contains the following 3 columns:
1st col: �Y� means sub-directory scan; "N" means no subdirectory scan
2nd col: sub-directory location
3rd col: File prefix (* indicates all files)
For ex:
Y|/data/informatica/ming/Logs|*
N|/data/informatica/ming/ScriptsLogs|*
N|/data/sw/apps/informatica8/JBoss403/bin/|heapdump*.phd, javacore*.txt
Reason I'm using a table is that the list of directories can grow and I'm trying to avoid hard-coding anything in the script.
I have searched the forum but I haven't been successful in finding any examples on what I am trying to accomplish.
I have tried the following with the find command so far:
-prune option (but this requires hard-coding the sub-directories)
-maxdepth 1 (option does not work in my environment)
ls -l | grep -v ^d (not sure how to use this in conjuction with the find command)
In addition, the 3rd example from my table:
N|/data/sw/apps/informatica8/JBoss403/bin/|heapdump*.phd, javacore*.txt
How can I dynamically parse the 3rd field to scan for these files since there could be more files added to the 3rd field column in the future?
I believe that the script offered by aigles will have issues because the OP indicates that the 'sub directory location' is a full path, and the -name option (according to the man page) takes a basename, not a path name.
Instad of the -name option the -path option should be used.
This script needs only one pass over the filesystem and defaults to listing the files. Supply 'rm' as the parameter to the script and the files will be deleted:
#!/usr/bin/env ksh
# read table and generate find commands
# {Y|N} | filesystem | filelist
awk -F "|" -v cmd="${1:-ls -l}" '
{
if( $1 == "Y" || $1 == "y" ) # if no-decend, add prune logic that skips directories
no_depth = sprintf( "\\( ! -path \"%s\" -type d -prune \\) -o", $2 );
else
no_depth = ""; # ok to examine directories -- nothing extra needed
printf( "find %s %s ", $2, no_depth ); # output first part of cmd
gsub( ",", " ", $3 ); # allow for pattern,pattern or pattern patern in the list
n = split( $3, a, " " ); # process each filename pattern
for( x = 1; x <= n; x++ )
printf( "-name \"%s\" ", a[x] ); # add to command
printf( " -print | xargs %s\n", cmd ); # finish the command
}
' | ksh # finally pipe to ksh to execute the commands
Thank you guys for your quick responses and proposed solutions.
I will have a look at each one and try them out.
---------- Post updated at 04:14 PM ---------- Previous update was at 10:37 AM ----------
Hi Guys.
I finally had a chance to test out the 3 proposed solutions and here are some results:
ygemici - maxdepth option does not work in my environment
aigles - script does not work when a full path is provided. My table will contain the full paths of the directories that I would like to cleanup. How can the script be modified to accept full path directories?
agama - How do I go about embedding your code into a script?
Copied entire code just prior to the last pipe into a new script member test.sh - how to execute the script to read my table pchang.dat?
Sorry -- I failed to realise that the testing I was doing was reading the table in from stdin and thus no filename was presented to awk and thus my info wasn't percise.
You can either redirect the table into your script like this:
test.sh <pchang.dat
or pass the table name on the command line, and use add a $1 following the last single quote of the awk programme.
awk '
# lines of awk code
' $1
My recommendation would be the first way. If you use the second, then you should add a test in your script and generate an error message if the user forgets to put the table name on the command line.
The output will be the find commands, they won't be executed unless you pipe them into Kshell or bash. You can put the pipe into the script, or pipe the output manually -- I do this sometimes so that I can verify the commands before I execute them.
# execute the commands generated by test.sh
test.sh <table.dat | ksh
Small change. Requires cd'ing to the directory first and specifying . as the search location on the find command. Changed lines in bold:
#!/usr/bin/env ksh
# read table and generate find commands
# {Y|N} | filesystem | filelist
# provide filename of table as standard input e.g. test_script <table_name
# put rm on the command line as the first parameter to actually
# generate commands that delete files e.g. test_script rm <table_name
# if rm is not put on the command line, the default is to list what would be scratched.
awk -F "|" -v cmd="${1:-ls -l}" '
{
if( $1 == "Y" || $1 == "y" ) # if no-decend, add prune logic that skips directories
no_depth = sprintf( "\\( ! -name \".\" -type d -prune \\) -o" );
else
no_depth = ""; # ok to examine directories -- nothing extra needed
printf( "(cd %s; find . %s ", $2, no_depth ); # output first part of cmd
gsub( ",", " ", $3 ); # allow for pattern,pattern or pattern patern in the list
n = split( $3, a, " " ); # process each filename pattern
for( x = 1; x <= n; x++ )
printf( "-name \"%s\" -mtime +300", a[x] ); # add to command
printf( " -print | xargs %s)\n", cmd ); # finish the command
}
' | ksh # finally pipe to ksh to execute the commands
The last change was to add -mtime +300. I noticed that in order to test my script I had omitted that, but forgot to put it back in. Don't know if you picked up on that, but figured I'd point it out here. (Another reason I almost always list what will be deleted before turning the script loose to actually do the real work. Better to take some time to review than to be sorry.)
There are two things to be aware of if no files are older than 300 days. First, if running in list mode, it will list all of the files in the directory. You can prevent this with a small change: