<AIX>Problem in purge script, taking very very long time to complete 18.30hrs

Hi,

I have here a script which is used to purge older files/directories based on defined purge period. The script consists of 45 find commands, where each command will need to traverse through more than a million directories. Therefore a single find command executes around 22-25 mins approximately :(. The entire script runs approximately 18.30hrs :eek: which is very very huge. Could someone suggest a better idea/logic to achieve this purpose and also reducing the execution time as much as possible. Thanks!

Sunday morning guesswork (with by far not enough files available for serious testing, based on experience with a similar but smaller problem though):

  • First, I'd separate the finding from the killing, i.e. generate a shitlist (of data to be removed) to be processed as a background task of lesser priority.

  • Second, I'd rewrite thus reduce the number of searches by means of (more) regular expressions, maybe even using 'ls' and 'grep' instead of 'find' (?).

Thank you for responding dr.house, and yes I did think about the option of separating the search logic and redirect them into a dump file. Later pick the entries from the dump file and proceed to delete/purge. But this one too had a negative impact on the size of dump file created(very huge), and the filesystem reached 100% before the search was complete:mad:.

Honestly speaking, I am very weak in shell scripting especially involving complex logic:confused:. It should be grateful if you could post the script with the logic you suggested. Thanks!

Put together all those lines where rule is same, use find -o option to put together name rules using one find.

There is also meny different kind of finds, some days old or one year old. Think carefully how often you really need to run different kind of finds+rm. Make more than find script file, run once in month, run once in ...

Example something like this:

find . \( -name '*(MON)*'  -o -name '*(TUE)*' \) -type f

Thanks kshji, seems to be quite a good suggestion. If I group together all the search patterns having common purge period, I am left with only 13 find commands:). If the find command with multiple search patterns also run for approximately 22-25 mins, your logic would drastically reduce the execution time. Let me try it and post a reply.

One quick question, may I precede using "-type f" before "-name":confused:

The answer is yes

Thank you;)

Taking into consideration what has been posted by kshji, I'd rewrite the script approximately as follows, with every "schedule" representing one 'mtime', exemplified for the first three (- Linux Bash code, subject to AIX adaption):

#! /bin/bash
 
function displayState()
{
  case $2 in
  S1)
    echo "Schedule: $1 - Finding started at: $( date '+%d.%m %H.%M' )" ;;
  S2)
    echo "Schedule: $1 - Removal started at: $( date '+%d.%m %H.%M' )" ;;
  S3)
    echo "Schedule: $1 - Processing done at: $( date '+%d.%m %H.%M' )" ;;
  esac
}
 
function removeOldies()
{
  cat $1 | while read ENTRY
  do
    if [ -f $ENTRY ] # single file
    then
      rm -f $ENTRY
    elif [ -d $ENTRY ] # directory
    then
      rm -fdr $ENTRY
    fi
  done
  rm -f $1
}
 
# main function
 
displayState $1 S1
case $1 in
007)
  ${FIND} ${Purge_DIR} -type f -name '*TM-5193*' -mtime +7 -print >> stuff.list ;;
010)
  ${FIND} ${Purge_DIR} -type f \( -name '*DAILY*' -o -name '*(weekly)*' \) \
    -mtime +10 -print >> stuff.list ;;
014)
  ${FIND} ${Purge_DIR} -type f \( -name '*(WEEK)*' -o -name '*(MON)*' -o -name '*(TUE)*' \
    -o -name '*(WED)*' -o -name '*(THU)*' -o -name '*(FRI)*' -o -name '*(SAT)*' \
    -o -name '*(SUN)*' -o -name '*(WEEKLY)*' \) -mtime +14 -print >> stuff.list ;;
esac
displayState $1 S2
removeOldies stuff.list
displayState $1 S3
 
exit 0
 
# game over ;-)

Other than the original one, this script executes each "search & destroy" task individually, e.g.:

# /bin/bash kickOut.bash 014

Thus, larger tasks could be performed one by one (and at different times) as well as smaller ones in parallel (by calling the same script multiple times) - which at least should make the "monster task" less daunting :wink: