Command to calculate space for all subdirs under a dir

ajaypatil_am · December 20, 2011, 4:35am

du -hs command calculates the space for all the subdirs under a dir ...but it is very slow if the dir is huge....is there any quick way ...I am using Sun OS.

Thanks,
Ajay

radoulov · December 20, 2011, 4:37am

I believe du is the fastest one.

ajaypatil_am · December 20, 2011, 4:50am

is there any other way ...can be write a script ...

radoulov · December 20, 2011, 4:51am

You can, but it will be slower ...

jayan_jay · December 20, 2011, 5:00am

Not sure about the performance .. Test it ..

find $DIR -exec ls -ltr {} \; 2>/dev/null | nawk '{sum+=$5}END {print sum}'

otheus · December 20, 2011, 1:16pm

Jayjan jay!! Don't use find with exec unless you have to do stuff inside the exec!

Replace with :

find $DIR -printf "%k\n" | awk '{sum+=$1}END{print sum}'

---------- Post updated at 07:16 PM ---------- Previous update was at 06:59 PM ----------

@ajaypatil_am

Here's what you can do: find all the dirs in depth-first order, then feed each dir into du.

find $DIR -type d -print | 
  awk 'BEGIN{OFS="\t"}{ print $0,split($0,x,"/"); }'  |
  sort  -k 2nr,2 -k 1,1  | 
  cut -f1 |
  xargs -L 1  -t du -sh 2>&1 |
  $PAGER

Here's what we're doing: The find gets all the directories (your target dir is $DIR) and prints them out one line at a time. The awk command then prints out each directory name followed by its depth -- determined by the number of times split found the / character in the input line. We separate the fields by tabs to prevent long and unusual dir names from getting clobbered by the next steps. Next, sort orders the list by directory depth. Now that the list is ordered, we dont need the depth information anymore, so the cut command strips it (you can also do this with awk or sed). Finally, the resulting list is fed into xargs, which prints out and executes the command "du -sh" on each directory, one directory at a time.

The output is then sent through your $PAGER, which ought to be defined. If not, use "less" or "more" whichever works for you.

This way, you can find a particularly large directory without waiting forever for the job to finish.

Corona688 · December 20, 2011, 1:39pm

No matter how you cut it, getting the total space for something means scanning inodes for each and every individual file inside. No matter how you do it, the same amount of disk thrashing will happen.

Breaking the job into smaller parts is a neat idea, otheus.

Another thing you could do is look for large directories. Directories have a file size too. Finding a large directory won't tell you how much space the contents use, or precisely how many files are inside, but will warn you about directories directly containing very large amounts of entries.

drl · December 20, 2011, 1:47pm

Hi.

If you are looking for the space infrequently, then du seems like an appropriate solution, or perhaps allowing some tasks to run in parallel, say for the first-level sub-directories.

However, if you need to see the space more often, then there is a way that can you see it very quickly: make it a separate file system, and use df:

$ df -h
Filesystem             size   used  avail capacity  Mounted on
rpool/ROOT/s10x_u6wos_07b
                       7.8G   3.8G   2.9G    57%    /
swap                   476M   372K   476M     1%    /etc/svc/volatile

This was produced in:

OS, ker|rel, machine: SunOS, 5.10, i86pc
Distribution        : Solaris 10 10/08 s10x_u6wos_07b X86

and the results appeared almost instantaneously after the Return was pressed -- try it for yourself.

I have not done this in Solaris (it looks different on my Linux box), but I'm supposing that you'd allocate space on a disk, create a filesystem, create a directory in /, mount the filesystem, and copy the large directory to it, which would take time and disk space. After that, you could use the new directory as usual, and get your space calculations very fast.

It's not a trivial matter to do this, and it is drastic, but if you really, really needed it, then it may be a solution to consider. ... cheers, drl

jgt · December 20, 2011, 2:19pm

I use the following line to find where disk usage is going.

du -a |sort -r -n >/tmp/list

The list contains all directories and files sorted in descending order by size.

jayan_jay · December 21, 2011, 2:20am

@otheus, My system doesnt support -printf option in find command .. Thats why used -exec ..

$ uname -sr
SunOS 5.9
$ find $DIR -printf "%k\n"
find: bad option -printf
find: path-list predicate-list
$

itkamaraj · December 21, 2011, 4:12am

 
find . -type d | xargs du -h

otheus · December 22, 2011, 12:01am

@jayjan_jay

That's incredible to me, since I learned find in 1992 while using a SunOS 4.1 server.

.... (update) ....

Then again, we probably had GNU fileutils installed. Do yourself a favor and install the SFW binaries for find and other GNU fileutils.

ftp://ftp.sunfreeware.com/pub/freeware/sparc/10/findutils-4.4.2-sol10-sparc-local.gz

There are some dependencies that you must also install. Dependencies: coreutils, libiconv, libintl, and to have /usr/local/lib/libgcc_s.so.1 install either the libgcc-3.4.6 or gcc-3.4.6 or later. You can find them at http://sunfreeware.com/

biomix · February 10, 2012, 3:19pm

This is an alias that I have setup for doing a similar thing. It is about as fast as everything else previously listed, and has a clean output [since you seemed to want -h].

alias lsS='du -s * 2> /dev/null |sort -n |cut -f 2- |while read a; do du -sh "$a" 2> /dev/null; done'

Hope that helps...

Shirishlnx · February 11, 2012, 5:33am

Intresting... have tested the behaviour in different scenarios and I can say # du is best and more accurate as compare to any ...

Here's for ur ref ..
####################### First Flushed catched ..

 [root@SKS Shirish]# sh -c sync; echo 3 > /proc/sys/vm/drop_caches; time  du -sb /var
195404352       /var
real    0m2.597s
user    0m0.244s
sys     0m0.468s
[root@SKS Shirish]# sh -c sync; echo 3 > /proc/sys/vm/drop_caches; time  perl  size.pl
Total Size in Byte:  194019894 `in 4686 files
real    0m3.399s
user    0m0.420s
sys     0m0.528s
[root@SKS Shirish]#

size.pl

 
[root@SKS Shirish]# cat size.pl
#!/usr/bin.perl -w
use strict;
use File::Find;
my $byte_total;
my $file_total;
find (\&sum_bytes, "/var");
sub sum_bytes
{
 return() unless (-f $File::Find::name);
 $byte_total += -s _; 
 $file_total++;
}
print "Total Size in Byte:  $byte_total `in $file_total files\n";

vipinkumarr89 · April 4, 2012, 5:32am

if you want to check the size of subdirectories under a directory, then it will be better to to check the size of parent directory, for parent directory run this command,

du -k /home/vipin/data

(data is the directory,that i want to check the size ).

methyl · April 5, 2012, 6:48am

@vipinkumarr
Your "du" command does not match the requirement. The O/P posted the correct "du" command in post #1 but was looking for something faster.
Also this thread went cold a month ago. Please look at at the date on threads and read any other solutions before posting.

On topic:
I don't think that the O/P posted how many files there were in this directory. On poster suggested creating a separate filesystem which was a good idea because this would also defragment the directory files in the process.

vipinkumarr89 · April 11, 2012, 9:15am

After applying so many option i find the solution to save the time.
let say i have /home/vipin/ (size of vipin dir is 5GB)
then

cd /home/vipin/
du -ha >tempfile &

(temp file is temporary file)(command will run in the background)
Hence the command is running in background you can do some other works.
Its not a permanent solution but I think it can help you..

methyl · April 11, 2012, 10:42am

cd /home/vipin/
du -ha >tempfile &

This command is totally unsuitable because it outputs the size of every file instead of just the total.
As pointed out after your last post, the correct command is in Post #1 .

Clovis_Sangrail · April 30, 2012, 3:30pm

otheus: "...Don't use find with exec unless you have to do stuff inside the exec!"

Well, he was executing "ls -ltr" 'inside the exec', wasn't he?

What is so bad about using the '-exec' with the 'find' command?

Corona688 · April 30, 2012, 4:30pm

Well, it's often redundant and can multiply the amount of time needed drastically. That would run ls thousands of times, for one thing, when find's own features may have been able to do an ls of sorts by itself, avoiding the need to create thousands of processes. I'm uncertain if sun find supports -printf, however.