Anything that counts files will have to do so the same way: checking directory entries. So there's no special "faster ls". (If there was, why wouldn't we use it for everything?)
If you can compile on this machine, this program can provide a running total, updated once a second:
#include <stdio.h>
#include <ftw.h>
static long file_count = 0L;
int ftw_callback( const char *path, const struct stat *sb, int flag )
{
if ( FTW_F == flag )
{
file_count++;
/* print every 1000 files */
if ( 0 == ( file_count % 1000 ) )
{
fprintf( stderr, "%ld\n", file_count );
}
}
return( 0 );
}
int main( int argc, char **argv )
{
int ii;
for ( ii = 1; ii < argc; ii++ )
{
ftw( argv[ ii ], ftw_callback, 256 );
}
fprintf( stderr, "Final count: %ld\n", file_count );
return( 0 );
}
That won't need to have another process reading the inode data and feeding it via a pipe. Compile it with the -m64 flag if you're on a 64-bit platform and you won't even need to worry if you have over 2 billion files....
At 200 lines per minute I doubt my application's the bottleneck. On my system it was able to process tens of thousands of lines per second... I wonder just how badly this disk's fragmented.
Jim's code "instantly" throws up the same values as df -i ; After all I think I can use the inode count as an acceptable approximation.
achenle's code runs a little faster; I tried it first with a small directory like /opt with no issues:
[root@atlas ~]# time ./achenle_counter /opt
1000
2000
3000
4000
5000
6000
Final count: 6534
real 0m4.503s
user 0m0.010s
sys 0m0.161s
Now when I try to count the real directory it becomes slow again, I'm not sure why. E.g.:
[root@atlas ~]# time ./achenle_counter /export/archives/2010/storage
1000
2000
3000
4000
5000
Ctrl^C
real 2m52.076s
user 0m0.138s
sys 0m8.671s
I also found that the directory in question, besides having millions of files has also millions of directories (although I only want to count files). Could this be causing a slow counting?
Yes. find uses ftw() or nftw(), it opens every directory and stats every entry. When directories are large this takes a long time. One directory with 1M entries can take literally minutes to process.
Can moderator take this thread to GNU ls/find and ask to provide option for counter. We all know that every admin uses counting of files into daily operation and into almost every script. It would be good value addition into those command lines.
Why a mod? The mods here aren't on the GNU committee AFAIK. In other words your suggestion has as much clout as theirs.
The best way to get what you want is to make your own modifications and submit a patch for them. You're far more likely to get what you want when you do the work.
Is this filesystem physically attached to the computer on which you are running the "find" ? If it is actually attached to another computer I'd run the "find" there.
Normally "find" is much faster than "ls" because "find" does not sort the output.
Also the simple ls command defaultly sort the ouput by name even when skipping aliases \ls
To avoid this you can use a special option -f (may depends on your plateform) this will force ls to display all found entries WITHOUT ordering them ... which can be ... MUCH FASTER in some case.
That is why when used with wc -l , and especially for direcory containing numerous entries, the ls command should be used with such "nosort" option
---------- Post updated at 12:12 AM ---------- Previous update was at 12:08 AM ----------
try this : ls -fR /users/home 2>/dev/null | wc -l
---------- Post updated at 12:12 AM ---------- Previous update was at 12:12 AM ----------
... or on any other PATH containing a bunch of entries
---------- Post updated at 12:15 AM ---------- Previous update was at 12:12 AM ----------
look how fast it can be : more than 23000 entries in less than a sec !
[ctsgnb@shell ~/sand]$ date && ls -fR /users/home 2>/dev/null | wc -l && date
Sat Mar 19 17:14:16 MDT 2011
23932
Sat Mar 19 17:14:16 MDT 2011
[ctsgnb@shell ~/sand]$