Help need to improve performance :Parallel processing ideas

justchill · September 21, 2010, 5:02am

Hi,
Please tell me how to include parallel processing for the below code. Thanks in advance
I have a list of users directories in root directory. Each user has a directory by his /her username.
I am finding the size of each directorry using du -g command.. and checking if the size exceeds 3GB a limit.
The problem is that it takes around 30 minutes for around 1000 users.

for i in `ls -l | grep -i <username>`
do
du -g $i | awk '{if ($1 > 3) print $0}' >> size.txt
done

matrixmadhan · September 21, 2010, 5:42am

try this

ls -1  | xargs -n 100 | while read entries
do
du -shk "$entries" >> out.txt &
done

methyl · September 21, 2010, 5:49am

The script as posted does not work for several reasons. For example, where does "<username>" come from? What is "ls -l" for? What does the "list of users directories in root directory" look like, what created the file, and where is that file?
Please post the script you actually used.

If these are user home directories (same as those in /etc/passwd) there are much easier ways of finding the totals. I wouldn't expect user home directories to be directly under the root directory so maybe this is not what you are trying to do.

achenle · September 21, 2010, 6:00am

Is there any reason to believe doing the lookups in parallel will be faster? The performance limiter is probably going to be how fast the data can be retrieved from disk anyway.

It'd also probably be faster to just pass the file names to one instance of du instead of running du over and over again.

matrixmadhan · September 21, 2010, 6:08am

I think max parameter it can take is 256, hence I have restricted it to 100 in my example

methyl · September 21, 2010, 8:40am

I note that matrixmadhan has used "ls -1" (number one) which makes more sense than the "ls -l" (letter ell) in the original post.
Because the original post contains "du -g" I wonder if this is an IBM AIX machine? i.e. one with a very limited command line length.

matrixmadhan · September 21, 2010, 8:48am

just

ls -1

should do and can be streamed to loop constructs without any extraction or special parsing

Independent of the shell, underlying OS there is a max number of values that can be given as arguments to any command