Script execution is very slow when trying to find all files and their owners on HP-UX box

Hi,

I have a HP-UX server were I need to list all the files in the entire file system, their directory path, last modified date, owner and group. I do not need to search the file contents. I created the script given below and I am excluding directories and files of type tmp, temp and log. The issue I am facing is the script runs very fast for some minutes but slows down and adds only 1 or 2 records per second. Is that normal or can this be expedited by some script tweeking? My server is supposed to have more than 500000 files and with this speed, script execution is taking too much time. I believe since I am not searching the file contents and only fetching the file name and some of its attributes, it should be very fast.

root_dir="/"

for dir in $(ls -d $root_dir*)
do

	for file in $(find $dir -type d \( -name '*tmp*' -o -name '*temp*' -o -name '*log*' \) -prune  -o -type f \( ! -name '*.log*' ! -name '*.LOG*' ! -name '*temp*' \) -print)
		 do
			last_modified_date=`ls -l $file | awk '{print $6, "", $7, "", $8}'`
			fileowner=`ls -l $file | awk '{print $3}'`
			filegroupowner=`ls -l $file | awk '{print $4}'`
			echo $(hostname),$(basename $file),$(dirname $file),$last_modified_date,$fileowner,$filegroupowner
		 done
done

I tried removing the print option but still the script execution is slow.

What you are doing is known as "thrashing the disk", the sort of thing you're only supposed to do late at night when interactive users aren't around to experience the lagging. (i.e. how the locate database is updated).

Walking every single directory to call stat on every single individual file is never going to be fast, period. Your implementation is far from efficient but I don't think there's much improvement to be had since the wall you're hitting is your disk speed.

In addition to the many stat()s you do many fork()s.
Each call to external programs like ls, awk, hostname, basename, dirname is a fork().
The following has the number of fork's reduced:

# constant, evaluate once
hostname=$(hostname)
root_dir="/"

for dir in $root_dir*
do
  find "$dir" -type d \( -name '*tmp*' -o -name '*temp*' -o -name '*log*' \) -prune  -o -type f \( ! -name '*.[Ll][Oo][Gg]*' ! -name '*temp*' \) -exec ls -l {} + |
  while read permissions links owner groupowner size d1 d2 d3 filename
  do
    echo "$hostname,${filename##*/},${filename%/*},$d1 $d2 $d3,$owner,$groupowner"
  done
done
1 Like

And, as with your previous thread (Error while script execution - 0403-029 there is not enough memory available now), there is no need for the outer loop; let find do its job:

# constant, evaluate once
hostname=$(hostname)
root_dir="/"
find "$root_dir" -type d \( -name '*tmp*' -o -name '*temp*' -o -name '*log*' \) \
    -prune  -o -type f \( ! -name '*.[Ll][Oo][Gg]*' ! -name '*temp*' \) \
    -exec ls -l {} + |
while read permissions links owner groupowner size d1 d2 d3 filename
do  echo "$hostname,${filename##*/},${filename%/*},$d1 $d2 $d3,$owner,$groupowner"
done
1 Like

Thanks MadeInGermany/Don, I will try to optimize the script as per the suggestions provided.