I have here a script which is used to purge older files/directories based on defined purge period. The script consists of 45 find commands, where each command will need to traverse through more than a million directories. Therefore a single find command executes around 22-25 mins approximately :(. The entire script runs approximately 18.30hrs which is very very huge. Could someone suggest a better idea/logic to achieve this purpose and also reducing the execution time as much as possible. Thanks!
Sunday morning guesswork (with by far not enough files available for serious testing, based on experience with a similar but smaller problem though):
First, I'd separate the finding from the killing, i.e. generate a shitlist (of data to be removed) to be processed as a background task of lesser priority.
Second, I'd rewrite thus reduce the number of searches by means of (more) regular expressions, maybe even using 'ls' and 'grep' instead of 'find' (?).
Thank you for responding dr.house, and yes I did think about the option of separating the search logic and redirect them into a dump file. Later pick the entries from the dump file and proceed to delete/purge. But this one too had a negative impact on the size of dump file created(very huge), and the filesystem reached 100% before the search was complete:mad:.
Honestly speaking, I am very weak in shell scripting especially involving complex logic:confused:. It should be grateful if you could post the script with the logic you suggested. Thanks!
Put together all those lines where rule is same, use find -o option to put together name rules using one find.
There is also meny different kind of finds, some days old or one year old. Think carefully how often you really need to run different kind of finds+rm. Make more than find script file, run once in month, run once in ...
Thanks kshji, seems to be quite a good suggestion. If I group together all the search patterns having common purge period, I am left with only 13 find commands:). If the find command with multiple search patterns also run for approximately 22-25 mins, your logic would drastically reduce the execution time. Let me try it and post a reply.
One quick question, may I precede using "-type f" before "-name"
Taking into consideration what has been posted by kshji, I'd rewrite the script approximately as follows, with every "schedule" representing one 'mtime', exemplified for the first three (- Linux Bash code, subject to AIX adaption):
#! /bin/bash
function displayState()
{
case $2 in
S1)
echo "Schedule: $1 - Finding started at: $( date '+%d.%m %H.%M' )" ;;
S2)
echo "Schedule: $1 - Removal started at: $( date '+%d.%m %H.%M' )" ;;
S3)
echo "Schedule: $1 - Processing done at: $( date '+%d.%m %H.%M' )" ;;
esac
}
function removeOldies()
{
cat $1 | while read ENTRY
do
if [ -f $ENTRY ] # single file
then
rm -f $ENTRY
elif [ -d $ENTRY ] # directory
then
rm -fdr $ENTRY
fi
done
rm -f $1
}
# main function
displayState $1 S1
case $1 in
007)
${FIND} ${Purge_DIR} -type f -name '*TM-5193*' -mtime +7 -print >> stuff.list ;;
010)
${FIND} ${Purge_DIR} -type f \( -name '*DAILY*' -o -name '*(weekly)*' \) \
-mtime +10 -print >> stuff.list ;;
014)
${FIND} ${Purge_DIR} -type f \( -name '*(WEEK)*' -o -name '*(MON)*' -o -name '*(TUE)*' \
-o -name '*(WED)*' -o -name '*(THU)*' -o -name '*(FRI)*' -o -name '*(SAT)*' \
-o -name '*(SUN)*' -o -name '*(WEEKLY)*' \) -mtime +14 -print >> stuff.list ;;
esac
displayState $1 S2
removeOldies stuff.list
displayState $1 S3
exit 0
# game over ;-)
Other than the original one, this script executes each "search & destroy" task individually, e.g.:
# /bin/bash kickOut.bash 014
Thus, larger tasks could be performed one by one (and at different times) as well as smaller ones in parallel (by calling the same script multiple times) - which at least should make the "monster task" less daunting