script to check large directory--help

All,
I have one script which gives me the O/P of "percentage of filesystems utilization". we have four filesystem for which i want to check and get the mail when utilization is more than 40%. below are the filesystems.

/AB/Filesy1 
/AB/Filesy2
/AB/Filesy3
/AB/Filesy4

Below script is working fine and I am getting mail as expected.

Now I want information of large directories for that filesystem which utilization is more than 40% by using same script. I mean to say.. suppose filesystem /AB/Filesy1 is utilized more than 40%, so in the same mail I want to see the list of directories which is occupying more space under /AB/Filesy1. like by using du -ks /AB/Filesy1/* |sort -n ...
Could any one please help me to edit the same script and in the same mail I can get the details of filesystem utilization and list of big directories.

You processing might be simpler if you interrogate each file system, one at at time: "for fs in /AB/* . . . .", as then you can grep -q for % you dislike. For finding biggies, I like a mixed report of big dir/ and big file. You can get du to do most of it, and in ksh on /dev/fd/# systems or in bash, this runs in pipeline parallel, with about a 100Kb cutoff between sed pattern and file options. You need a few more options if you want to keep find and du on one filesystem, but you can read the man pages, too:

/usr/bin/bash
 
sort -nrm <(
  du -k $1 | sed '
    /^[0-9]\{3\}/d
    s/$/\//
   ' | sort -nr
 ) <(
  find $1 -type f -size +102400c |xargs -n999 du -k | sort -nr
 ) 
#!/bin/bash
big=100M ; mount=/AB/
df -k $mount |grep [0-9]% |awk '{print $4,$5}' |sed 's/%//g'| while read utilization uses;
 do
  if [ $uses -ge 30 ]; then
   echo -e "Big Files More Than $big\n" >results
   find $mount -size +$big -exec ls -lh {} \; >>results
   mail -s "Alert: Almost out of disk space $uses%" abc@xyz.com <results
  fi
 done

"find ... -exec xxx" is far less scalable than "find ...|xargs -n999 xxx"

From a functional point of view, you might send the summary every time, but then break out the biggies. You might even send a separate email for each file system that is over.

Why -n999 anyway? Doesn't xargs know the maximum argument size for the system?

xargs has more advantages but always is not best solution.
xargs has some limits for functional point also.

for example xargs may have problems with files that contain embedded spaces.
for this you must add this in your script

-print0|xargs -0

and xargs does not support the -0 option in solaris..(as additional,gnu find works)

if you think usage of arg list with exec, so problem is argument list, as far as I know, Linus has worked on this issue (especially exec.c and mm.h and other related files) and removed arg_max since 2.6.23 (and also 2.6.23.rc1).
so the total size for argument list is limited to 1/4 the allowed of stack size.

However in your script , i dont want to use -n999 like Corona688 said.
Maybe we can use less than 999 for some systems that has argv + envp limits.

But functional point is argumentative for unix and linux variants (and architectures has no mmu)

I can prefer maybe shell internals like

for f in `find ..` ; do .. ; done

Although, xargs is much faster than exec always :b:

regards
ygemici

Easily solved with -d '\n' for the most part.

It does have -d though.

Oh, goodie. 300 miles more rope to hang ourselves with. :wink:

I think you missed my point -- xargs would know the maximum size of args for the system already and split accordingly.

$ cat >argc.c <<EOF
> #include <stdio.h>
> int main(int argc, char *argv[])
> { printf("argc=%d\n", argc); return(0); }
> EOF
$ gcc argc.c
$ while true ; do echo -e "a\na\na\na\na\na\na\na" ; done | xargs ./a.out
argc=65533
argc=65533
argc=65533
argc=65533
argc=65533
^C
$

...so the -n999 is redundant.

Shoving the too many args into backticks and for doesn't make too many args not be too many args. You have to do while read FILENAME ; do stuff ; done

Where is -d option ? and all system has it ?
Yep..xargs must know max-args :slight_smile:
we don't need xargs because of shell internals is enough
I can all process in for loop but we can use while loop when required control expression

regards
ygemici

The problem with this is there is no pipeline parallelism, no looping until all the find is done, last dir searched:

for f in `find ..` ; do .. ; done

so I prefer this:

find .. | while read f ; do .. ; done

I use xargs -n999 because in old xargs implementations, it was the only arg to prevent 'dry' calls. Old xargs had a reltively short string buffer and assembled a command line to fit, so the 999 was over the top. I wrote my own xargs, fxargs2, with i/o overlap and every line is always an argument, and it does N args (the argv length - argc - 1) or M bytes (the input buffer with line feeds turned to nulls), whichever runs out first, using static allocation for low overhead.

Is there any system limit on args? If you deliver them by exec*() not system(), which is a flat string shell call, I am not sure there is any limit (and you avoid all quoting issues). Some commands like ls have their own arg limits, so it is not just a system thing.

It's a POSIX option, yes. See man xargs. I didn't need it in my example.

It's not. Put my infinite print statement into backticks and it'd die from too many arguments. If there wasn't any argument limit, it'd never do anything -- just sit there forever waiting for the input to finish, consuming boundless memory in the process until the system kills it.

Storing everything in one giant variable is fundamentally inefficient. It wastes memory storing things you don't need to store, and wastes time waiting for input to finish -- if indeed it ever finishes, time you could've been using to process what you already have.

Yes, that is why fxargs2 tries to detect when it would block and spin off what it has, whereas xargs probably persists until it fills the memory or hits a max or EOF.

That's a good idea.

xargs wouldn't fill memory on any system with a sane argument limit though.

Thankyou everyone ..!! I got my result ..!!
I edited my existing script. I created two script... 1st one is for to check the list of big directories and 2nd is to check list of big files. Below is the scripts ..!

1st Script ---- to check big directories

#!/bin/bash
# this script --send a mail if utilization of filesystem is more than 78% and list of big directories#
df -k /AB/* |grep % |awk '{print $4,$5}' |sed 's/%//g'| while read OP;
do
  echo $OP
  uses=$(echo $OP | awk '{ print $1}' | cut -d'%' -f1  )
utilization=$(echo $OP | awk '{ print $2 }' )
  if [ $uses -ge 40 ]; then
    echo "Running out of space \"$utilization ($uses%)\" on $(hostname) as on $(date) Below are the list of Big Directories" >output
du -sk $utilization/*|sort -rn >>output
mail -s "Alert: Almost out of disk space $uses%" abc@xyz.com <output
fi
done

2nd script --- to check list of 200 big files.

#!/bin/bash
df -k /AB/* |grep % |awk '{print $4,$5}' |sed 's/%//g'| while read OP;
do
  echo $OP
  uses=$(echo $OP | awk '{ print $1}' | cut -d'%' -f1  )
utilization=$(echo $OP | awk '{ print $2 }' )
  if [ $uses -ge 40 ]; then
    echo "Running out of space \"$utilization ($uses%)\" on $(hostname) as on $(date) Below are the list of 200 Big Files" >output1
find $utilization/* -type f -exec ls -la {} \; | sort -r -n -k 5,5 | head -200 >>output1
mail -s "Alert: Almost out of disk space $uses%" abc@xyz.com <output1
fi
done

Magic number 101 -- captures 99% of any economy of scale:

find $utilization/* -type f | xargs -0 -n101 ls -l | sort -r -n -k 5,5 | head -200 >>output1

The ls option -a on file names does nothing. Nice find's have an internal -ls option (with dev and inode, so the sort key offset changes).

All,
Thanks .. :slight_smile:

Now I am trying to create a generic script. but I am thinking that I am missing something in my script or I am not putting correct syntax somewhere. could any one please check and let me know where I am wrong .. If I am. Actually I used some old script and edited with my command.
Below is the script ..!!