recursive searching for files in directory that matches a particular name - taking care of links

vickylife · April 13, 2010, 9:36am

Hi,
I am writing a shell script that finds all files named <myFile> in a directory <dir> or any of its subdirectories, recursively. I also need to take care of symbolic links that may form cycles, to avoid infinite loops.

I started writing the code but got stuck. I thought using recursion may be a smart way, but its not working. Kindly help

#!/bin/sh


findFiles()
{
thisDIR=$1
#cd $thisDIR
for eachFile in `ls $thiDIR`
do
	if [ "$eachFile" = "$FILE" ]; then
		echo "$FILE found in $thisDIR"
	elif [ -d $eachFile ]; then
		findFiles ${thisDIR}/${eachFile}
	fi 
done 		

}
if [ $# -ne 2 ]; then
	echo "Please run the script as $0 NameOfFile PathToDirectory"
	exit 1
fi

FILE=$1
DIR=$2

findFiles $DIR

bsnithin · April 13, 2010, 9:40am

If you just want to find a file inside directory & its sub-directories, why don't you use find command?

find <parent_dir> -type'f' -name <file_name>

vickylife · April 13, 2010, 9:57am

i dont think i am allowed to use find command. I am actually preparing for a test and this is one of the sample question for the test

vickylife · April 16, 2010, 4:23am

bump

Plz help me

maxy · April 16, 2010, 4:46am

you can use 'ls' ?

then try "ls -R"

methyl · April 16, 2010, 5:17am

Hi vickylife. I tried your recursion approach but could not make it work for a tree of any size because my shell collapsed due due to nesting loops too deep.

Just for interest tried another approach to navigating the directory tree using just "ls" and Shell commands. The script looks at the start directory and then looks for directories. If it finds any directories it stores the list in a file. Next time round it reads the list from the previous iteration and repeats the process.
The script looks long but you will see that half the script is concerned with tidying up workfiles.
The output is a list of all the directories from the start directory down.

It is relatively trivial to adapt the script to take parameters and to record files of a particular name into an output file, but I'll leave that bit to you. If you are non-root you may need to test for read access to each directory.

I couldn't make out whether you wanted to follow links or not. If you do, the "ls" needs a "-L" switch.

#!/bin/ksh
PN="`basename $0`"
#
WORKFILE1=/var/tmp/${PN}.wf1.$$
WORKFILE2=/var/tmp/${PN}.wf2.$$
#
MYEXIT ()
{
if [ -f "${WORKFILE1}" ]
then
        rm "${WORKFILE1}"
fi
if [ -f "${WORKFILE2}" ]
then
        rm "${WORKFILE2}"
fi
#
exit
}
#
trap 'MYEXIT' 1 2 3 15
#
# Seed process with current directory
pwd > ${WORKFILE1}
#
while true
do
        > ${WORKFILE2}
        cat ${WORKFILE1}|while read DIR
        do
                cd "${DIR}"
                pwd             # Output to screen
                # Find any subdirectories. Allow for none
                ls -1d * 2>/dev/null | while read DIR2
                do
                        if [ -d "${DIR2}" ]
                        then
                                cd "${DIR2}"
                                pwd >> ${WORKFILE2}
                                cd ..
                        fi
                done
        done
        # If no more subdirectories we have finished
        if [ -s ${WORKFILE2} ]
        then
                # Seed next iteration
                cp -p ${WORKFILE2} ${WORKFILE1}
        else
                break
        fi
done
#
MYEXIT

Footnote: Also looked at "ls -laR" approach by felt that processing the output really needed "awk" which would defeat the object because we would then be tempted to do the whole process in "awk". The output format of "ls -1R" is similarly awkward to process.

vickylife · April 17, 2010, 7:01am

Hi Methyl,

Thank you for your quick help. I have adopted your code and make subtle changes to suit my purpose. this is how the code currently looks like :

#!/bin/ksh
PN="`basename $0`"
#
WORKFILE1=/var/tmp/${PN}.wf1.$$
WORKFILE2=/var/tmp/${PN}.wf2.$$
#

MYEXIT ()
{
if [ -f "${WORKFILE1}" ]
then
        rm "${WORKFILE1}"
fi
if [ -f "${WORKFILE2}" ]
then
        rm "${WORKFILE2}"
fi
#
exit
}
#
trap 'MYEXIT' 1 2 3 15
#
# Seed process with current directory
if [ $# -ne 2 ]; 
	then echo "Please run the script as $0 NameOfFile PathToDirectory" 
	exit 1 
fi 
FILE=$1 
DIRECTORY=$2


echo $DIRECTORY > ${WORKFILE1}
#
while true
do
        > ${WORKFILE2}
        cat ${WORKFILE1}|while read DIR
        do
                cd "${DIR}"
                if [ -f $FILE ]; then
			thisPath=`pwd`
			echo "$thisPath/$FILE"   # Output to screen
		fi
                # Find any subdirectories. Allow for none
                ls -1d * 2>/dev/null | while read DIR2
                do
                        if [ -d "${DIR2}" ]
                        then
                                cd "${DIR2}"
                                pwd >> ${WORKFILE2}
                                cd ..
                        fi
                done
        done
        # If no more subdirectories we have finished
        if [ -s ${WORKFILE2} ]
        then
                # Seed next iteration
                cp -p ${WORKFILE2} ${WORKFILE1}
        else
                break
        fi
done
#
MYEXIT

Now my question is related to links. I need to traverse through links too provided they do not loop back and form an infinite loop. How do i do that??

alister · April 17, 2010, 12:54pm

vickylife:

I started writing the code but got stuck. I thought using recursion may be a smart way, but its not working. Kindly help

#!/bin/sh


findFiles()
{
thisDIR=$1
#cd $thisDIR
for eachFile in `ls $thiDIR`
do
   if [ "$eachFile" = "$FILE" ]; then
   echo "$FILE found in $thisDIR"
   elif [ -d $eachFile ]; then
   findFiles ${thisDIR}/${eachFile}
   fi 
done         

}
if [ $# -ne 2 ]; then
   echo "Please run the script as $0 NameOfFile PathToDirectory"
   exit 1
fi

FILE=$1
DIR=$2

findFiles $DIR

A few things stand out:

thisDIR=$1

I seriously doubt you want a nested call to the function to stomp on a variable that is still in use by its parent. This approach requires that each instance of the function maintain its own private version of $thisDIR; the variable needs to be local.

 for eachFile in `ls $thiDIR`

Shouldn't that be "$thisDIR"? Also, note that using command substitution in that manner means that it is impossible for the loop to properly handle filenames with IFS characters (by default, space, tab, and newlines).

[ -d $eachFile ]

The "ls $thisDIR" command returns basenames, not an absolute path. Your code does not cd into the directory being ls'd; the -d test will never be correct, as it will be testing for the presence of a directory whose name was taken from a location other than the current working directory (which with the cd commented out, will remain unchanged for the duration of the script's execution).

Alister