Help - Find command to exclude sub-directories

pchang · August 13, 2010, 2:49pm

Hi Forum.

I'm trying to write a script that finds and deletes files that are older than 300 days. The script will read a table that contains the following 3 columns:

1st col: �Y� means sub-directory scan; "N" means no subdirectory scan
2nd col: sub-directory location
3rd col: File prefix (* indicates all files)

Reason I'm using a table is that the list of directories can grow and I'm trying to avoid hard-coding anything in the script.

I have searched the forum but I haven't been successful in finding any examples on what I am trying to accomplish.

I have tried the following with the find command so far:

-prune option (but this requires hard-coding the sub-directories)
-maxdepth 1 (option does not work in my environment)
ls -l | grep -v ^d (not sure how to use this in conjuction with the find command)

In addition, the 3rd example from my table:
N|/data/sw/apps/informatica8/JBoss403/bin/|heapdump*.phd, javacore*.txt

How can I dynamically parse the 3rd field to scan for these files since there could be more files added to the 3rd field column in the future?

Any help will be greatly appreciated.

ygemici · August 14, 2010, 9:03am

# ./justdoit mytablefile

 
### justdoit ###
#!/bin/bash
while read -r  i
  do
    if [[ $(echo $i | cut -d"|" -f3 |grep ",") ]] ; then
      ar=( `echo $i|cut -d"|" -f3|sed 's/^/"/;s/,/"/;s/ / "/;s/$/"/'` )
      namex=""
      for nm in ${ar
[*]}
       do
        nm=$(echo $nm|sed 's/"//g')
        namex="$nm -o -name $namex"
       done
       namex=$(echo $namex |sed 's/-o -name$//')
    else
     namex=$(echo "$i" | cut -d"|" -f3)
    fi
     if [[ $(echo $i | cut -d"|" -f1) = "N" ]] ; then
         echo ""
         echo "\"Patterns\" -> $namex <-----> \"Path\" -> $(echo $i | cut -d"|" -f2) "
         echo "==========================================================="
          if [ ${#ar[@]} -gt 0 ] ; then
            find $(echo $i | cut -d"|" -f2) -maxdepth 1 \( -name $namex \) -type f  -mtime +300 -exec rm {} \;
            echo ""
          else
            find $(echo $i | cut -d"|" -f2) -maxdepth 1 -name "$namex" -type f  -mtime +300 -exec rm {} \;
            echo ""
          fi
     else
       echo ""
       echo "\"Patterns\" -> $namex <-----> \"Path\" -> $(echo $i | cut -d"|" -f2) "
       echo "==========================================================="
       find $(echo $i | cut -d"|" -f2) -name "$namex" -type f  -mtime +300 -exec rm {} \;
       echo ""
     fi
  done < $1

aigles · August 14, 2010, 12:04pm

Try and adapt the following script (it just print file names) :

while IFS='|' read subdir dir files filler
do
   echo
   echo "-----------------------------------"
   echo "Entry: <$subdir>|<$dir>|<$files>"
   echo "-----------------------------------"
   echo
   find_names=$(echo "$files" | sed "s/[[:space:]]*//g;s/\([^,]*\)\(,\|\$\)/-name '\1'\2/g;s/,/ -o /g")
   case "$subdir" in
     N|n) dir_name=`basename $dir`
          find_subdir="'(' -type d ! -name '$dir_name' -prune ')' -o" ;;
       *) find_subdir="" ;;
   esac
   eval find "$dir" ${find_subdir} -type f "'(' '('" ${find_names} " ')' -print ')'"
done < pchang.dat

$ cat pchang.dat
Y|ARCHIVE|*
Y|ARCHIVE|*.txt, *.sh
N|ARCHIVE|*
N|ARCHIVE|*.txt
$ dir -R ARCHIVE
ARCHIVE:
dir1  dir2  grep_with_context.txt  my_test.sh  tree  tree.sed

ARCHIVE/dir1:
readme.txt  subdir_1  subdir_2

ARCHIVE/dir1/subdir_1:
list_1.txt  scan_dir.opt  scan_dir.sh

ARCHIVE/dir1/subdir_2:
compile.sh

ARCHIVE/dir2:
$ sh pchang.sh

-----------------------------------
Entry: <Y>|<ARCHIVE>|<*>
-----------------------------------

ARCHIVE/dir1/readme.txt
ARCHIVE/dir1/subdir_1/list_1.txt
ARCHIVE/dir1/subdir_1/scan_dir.opt
ARCHIVE/dir1/subdir_1/scan_dir.sh
ARCHIVE/dir1/subdir_2/compile.sh
ARCHIVE/grep_with_context.txt
ARCHIVE/my_test.sh
ARCHIVE/tree
ARCHIVE/tree.sed

-----------------------------------
Entry: <Y>|<ARCHIVE>|<*.txt, *.sh>
-----------------------------------

ARCHIVE/dir1/readme.txt
ARCHIVE/dir1/subdir_1/list_1.txt
ARCHIVE/dir1/subdir_1/scan_dir.sh
ARCHIVE/dir1/subdir_2/compile.sh
ARCHIVE/grep_with_context.txt
ARCHIVE/my_test.sh

-----------------------------------
Entry: <N>|<ARCHIVE>|<*>
-----------------------------------

ARCHIVE/grep_with_context.txt
ARCHIVE/my_test.sh
ARCHIVE/tree
ARCHIVE/tree.sed

-----------------------------------
Entry: <N>|<ARCHIVE>|<*.txt>
-----------------------------------

ARCHIVE/grep_with_context.txt
$

For your purpose modify the find command like that :

eval find "$dir" ${find_subdir} -type f "'(' '('" ${find_names} " ')' -mtime +300 -print ')'" | xargs rm

Jean-Pierre.

agama · August 14, 2010, 6:29pm

I believe that the script offered by aigles will have issues because the OP indicates that the 'sub directory location' is a full path, and the -name option (according to the man page) takes a basename, not a path name.

Instad of the -name option the -path option should be used.

This script needs only one pass over the filesystem and defaults to listing the files. Supply 'rm' as the parameter to the script and the files will be deleted:

#!/usr/bin/env ksh

# read table and generate find commands
# {Y|N} | filesystem | filelist
awk -F "|" -v cmd="${1:-ls -l}" '
        {
                if( $1 == "Y" || $1 == "y" )            # if no-decend, add prune logic that skips directories
                        no_depth = sprintf( "\\( ! -path \"%s\"  -type d -prune \\) -o", $2 );
                else
                        no_depth = "";     # ok to examine directories -- nothing extra needed

                printf( "find %s %s ", $2, no_depth );   # output first part of cmd

                gsub( ",", " ", $3 );                # allow for pattern,pattern or pattern patern in the list
                n = split( $3, a, " " );            # process each filename pattern
                for( x = 1; x <= n; x++ )
                        printf( "-name \"%s\" ", a[x] );   # add to command

                printf( " -print | xargs %s\n", cmd );   # finish the command
        }
' | ksh           # finally pipe to ksh to execute the commands

pchang · August 16, 2010, 4:14pm

Thank you guys for your quick responses and proposed solutions.

I will have a look at each one and try them out.

---------- Post updated at 04:14 PM ---------- Previous update was at 10:37 AM ----------

Hi Guys.

I finally had a chance to test out the 3 proposed solutions and here are some results:

ygemici - maxdepth option does not work in my environment
aigles - script does not work when a full path is provided. My table will contain the full paths of the directories that I would like to cleanup. How can the script be modified to accept full path directories?
agama - How do I go about embedding your code into a script?
Copied entire code just prior to the last pipe into a new script member test.sh - how to execute the script to read my table pchang.dat?

Thank you very much for all your feedback/help.

agama · August 16, 2010, 6:40pm

Sorry -- I failed to realise that the testing I was doing was reading the table in from stdin and thus no filename was presented to awk and thus my info wasn't percise.

You can either redirect the table into your script like this:

test.sh <pchang.dat

or pass the table name on the command line, and use add a $1 following the last single quote of the awk programme.

awk '
# lines of awk code
' $1

My recommendation would be the first way. If you use the second, then you should add a test in your script and generate an error message if the user forgets to put the table name on the command line.

The output will be the find commands, they won't be executed unless you pipe them into Kshell or bash. You can put the pipe into the script, or pipe the output manually -- I do this sometimes so that I can verify the commands before I execute them.

# execute the commands generated by test.sh
test.sh <table.dat | ksh

Hope this makes sense.

pchang · August 17, 2010, 4:14pm

hi agama.

I tried your suggestion and was able to run the script.

But it seems that I cannot use the -path option with the find command.

This is the message that I'm getting.

"find: 0652-017 -path is not a valid option."

Any other ideas?

Thanks.

agama · August 17, 2010, 9:26pm

Small change. Requires cd'ing to the directory first and specifying . as the search location on the find command. Changed lines in bold:

#!/usr/bin/env ksh

# read table and generate find commands
# {Y|N} | filesystem | filelist
# provide filename of table as standard input e.g.  test_script <table_name
# put rm on the command line as the first parameter to actually 
# generate commands that delete files e.g. test_script rm <table_name
# if rm is not put on the command line, the default is to list what would be scratched.

awk -F "|" -v cmd="${1:-ls -l}" '
        {
                if( $1 == "Y" || $1 == "y" )            # if no-decend, add prune logic that skips directories

                        no_depth = sprintf( "\\( ! -name \".\"  -type d -prune \\) -o" );
                else
                        no_depth = "";     # ok to examine directories -- nothing extra needed


                printf( "(cd %s; find . %s ", $2, no_depth );   # output first part of cmd

                gsub( ",", " ", $3 );                # allow for pattern,pattern or pattern patern in the list
                n = split( $3, a, " " );            # process each filename pattern
                for( x = 1; x <= n; x++ )

                        printf( "-name \"%s\"  -mtime +300", a[x] );   # add to command

                printf( " -print | xargs %s)\n", cmd );   # finish the command
        }
' | ksh           # finally pipe to ksh to execute the commands

The last change was to add -mtime +300. I noticed that in order to test my script I had omitted that, but forgot to put it back in. Don't know if you picked up on that, but figured I'd point it out here. (Another reason I almost always list what will be deleted before turning the script loose to actually do the real work. Better to take some time to review than to be sorry.)

There are two things to be aware of if no files are older than 300 days. First, if running in list mode, it will list all of the files in the directory. You can prevent this with a small change:


awk -F "|" -v cmd="${1:-ls -l no-such-file 2>/dev/null}" '

This will ensure that the ls command has something to try to list and won't by default list everything when no files were presented by find.

Secondly, the rm command will error. A similar trick can be used, but it hides any real errors from the rm command and that might not be wise.

Hope this does better for you.