Your script is slow because it is invoking several utilities ( perl
(multiple times), awk
, and find
) for each file it is processing.
And although it is invoking perl
three times to get the month, day, and year for each file and again for each file that it is being compared to, the awk
statement that is looking for a match on the month and year is still using the ls
timestamp or year field to compare against the year field for the current file. Therefore, it is not listing all of the files eligible for purging that are in months that contain days that are 90 to 180 days ago. For example, in a directory that contains the files:
-rw-r--r-- 2 dwc staff 0 Oct 31 12:00 z.txt
-rw-r--r-- 2 dwc staff 0 Oct 30 13:00 z10.2.txt
-rw-r--r-- 2 dwc staff 0 Oct 30 12:00 z10.txt
-rw-r--r-- 2 dwc staff 0 Oct 1 2013 b.txt
your script will not list b.txt
as a purge candidate.
---------
One of your find
statements:
find . ! -name -prune -type f -mtime +90
is weird. Are you really trying to exclude a file named -prune
? Were you, perhaps, trying to exclude files in subdirectories instead? That would be:
find . ! -name . -prune -type f -mtime +90
but it still won't work because you have another find
statement nested inside the loop that doesn't ignore subdirectories. So, assuming that /purge_dir
doesn't contain any subdirectories, you just need:
find . -type f -mtime +90
(Note that you can process directories with subdirectories as long as there aren't any files in the subdirectories with the same names as files in /purge_dir
if you make the change suggested above.)
Assuming that you are using a Solaris system (since you're script contains nawk
instead of awk
) and that you're using an old Bourne shell (rather than ksh
or bash
since you're using the `command
` form of command substitution rather than $(command)
), the following should work for you. In a test on a small directory with one subdirectory containing the files:
ls -lR
total 24
-rwxr-xr-x 1 dwc staff 1512 Apr 29 16:07 Makarand.sh
-rw-r--r-- 2 dwc staff 0 Feb 21 2012 a.txt
-rw-r--r-- 3 dwc staff 0 Oct 1 2013 b.txt
-rw-r--r-- 3 dwc staff 0 Mar 19 2012 c.txt
-rw-r--r-- 3 dwc staff 0 Mar 21 2012 d.txt
-rw-r--r-- 3 dwc staff 0 Apr 12 01:02 e.txt
-rw-r--r-- 3 dwc staff 0 Mar 22 2012 f.txt
-rw-r--r-- 3 dwc staff 0 Apr 21 03:04 g.txt
-rw-r--r-- 3 dwc staff 0 Mar 24 2012 h.txt
-rw-r--r-- 3 dwc staff 0 Apr 22 05:06 i.txt
-rw-r--r-- 2 dwc staff 0 Feb 27 2012 j.txt
-rw-r--r-- 2 dwc staff 0 Feb 23 2012 k.txt
-rw-r--r-- 3 dwc staff 0 Apr 23 07:08 m.txt
-rw-r--r-- 3 dwc staff 0 Apr 27 09:10 n.txt
-rw-r--r-- 1 dwc staff 2636 Apr 29 10:01 problem
-rw-r--r-- 2 dwc staff 0 Feb 12 2012 q.txt
-rw-r--r-- 2 dwc staff 0 Feb 22 2012 s.txt
drwxr-xr-x 16 dwc staff 544 Apr 29 13:32 sub
-rwxr-xr-x 1 dwc staff 832 Apr 29 16:43 tester
-rw-r--r-- 3 dwc staff 0 Mar 1 2013 y.txt
-rw-r--r-- 3 dwc staff 0 Oct 31 12:00 z.txt
-rw-r--r-- 3 dwc staff 0 Oct 30 13:00 z10.2.txt
-rw-r--r-- 3 dwc staff 0 Oct 30 12:00 z10.txt
./sub:
total 0
-rw-r--r-- 3 dwc staff 0 Oct 1 2013 b.txt
-rw-r--r-- 3 dwc staff 0 Mar 19 2012 c.txt
-rw-r--r-- 3 dwc staff 0 Mar 21 2012 d.txt
-rw-r--r-- 3 dwc staff 0 Apr 12 01:02 e.txt
-rw-r--r-- 3 dwc staff 0 Mar 22 2012 f.txt
-rw-r--r-- 3 dwc staff 0 Apr 21 03:04 g.txt
-rw-r--r-- 3 dwc staff 0 Mar 24 2012 h.txt
-rw-r--r-- 3 dwc staff 0 Apr 22 05:06 i.txt
-rw-r--r-- 3 dwc staff 0 Apr 23 07:08 m.txt
-rw-r--r-- 3 dwc staff 0 Apr 27 09:10 n.txt
-rw-r--r-- 3 dwc staff 0 Mar 1 2013 y.txt
-rw-r--r-- 3 dwc staff 0 Oct 31 12:00 z.txt
-rw-r--r-- 3 dwc staff 0 Oct 30 13:00 z10.2.txt
-rw-r--r-- 3 dwc staff 0 Oct 30 12:00 z10.txt
the script:
#!/bin/sh
function criteria_purge {
print "Files eligible for purging...."
ls -lt `$1` | /usr/xpg4/bin/awk -v cy=`date +%Y` '
BEGIN { y["Jan"] = y["Feb"] = y["Mar"] = cy
y["Apr"] = y["May"] = y["Jun"] = cy
y["Jul"] = y["Aug"] = y["Sep"] = cy - 1
y["Oct"] = y["Nov"] = y["Dec"] = cy - 1
}
NF > 8 {if(length($8) == 4) # Do we have a year or a timestamp?
yr = $8 # year
else yr = y[$6] # timestamp
}
lmo != $6 || lyr != yr {
dim = ld = 0
lmo = $6
lyr = yr
}
ld != $7 {
dim++
ld = $7
}
dim > 2 {
printf("%s %s %s %s\n", $9, $7, $6, yr)
}'
}
cd /purge_dir
# Uncomment one, and only one, of the follwoing definitions for var3.
# Use following line to process files in current directory and subdirectories.
# var3="find . -type f -mtime +90"
# Use to process files in current directory only.
var3="find . ! -name . -prune -type f -mtime +90"
criteria_purge "$var3"
produces the output:
./b.txt 1 Oct 2013
./d.txt 21 Mar 2012
./c.txt 19 Mar 2012
./s.txt 22 Feb 2012
./a.txt 21 Feb 2012
./q.txt 12 Feb 2012
in about 0.02 seconds on an old MacBook Pro laptop, while your script (modified to use the same setting for var3
produces the output:
./a.txt 21 Feb 2012
./q.txt 12 Feb 2012
./s.txt 22 Feb 2012
in about 3.51 seconds.
If I switch the setting of var3 from:
var3="find . ! -name . -prune -type f -mtime +90"
to:
var3="find . -type f -mtime +90"
in both scripts, your script produces the output:
Files eligible for purging....
./a.txt 21 Feb 2012
./q.txt 12 Feb 2012
./s.txt 22 Feb 2012
in about 5.84 seconds, while the script above produces the output:
./b.txt 1 Oct 2013
./sub/b.txt 1 Oct 2013
./d.txt 21 Mar 2012
./sub/d.txt 21 Mar 2012
./c.txt 19 Mar 2012
./sub/c.txt 19 Mar 2012
./s.txt 22 Feb 2012
./a.txt 21 Feb 2012
./q.txt 12 Feb 2012
still in about 0.02 seconds. I believe the output from the above script is producing the desired output.
However, the order of the output from the above script is sorted in decreasing date order instead of being sorted in increasing alphanumeric filename order. If you want the script above to print the results in alphanumeric order, change the line:
}'
at the end of the awk script to:
}' | sort
Doing that will add about another 0.01 seconds running time for the sample data shown.
If the argument list given to ls
is too long, we can work on an alternative, but it won't be quite as fast.