Get size of a file using ls on all UNIX OSes

SkySmart · May 2, 2013, 2:26pm

to get the most granular size of a file, you can do so with:

solaris hosts:

ls -l /tmp/filea | awk '{print $4}'

linux hosts:

ls -l /tmp/filea | awk '{print $5}'

Is there a more universal command that will give the file size?

i'm leery of this ls command because the fields in which the file size can be found differs on sunos and linux, and i'm not even sure if it is also like that with other unix oses.

so, is there a command that can accomplish the above without needing to check a field/column which, for all we know, wont be the same on another unix os.

i've tried "du -s". but that doesn't seem to work across all unix systems.

hanson44 · May 2, 2013, 2:31pm

wc -c file

will give the file size in bytes.

Corona688 · May 2, 2013, 3:08pm

It must read the entire file to do so, so would be a big problem for big files...

hanson44 · May 2, 2013, 3:13pm

Reading a non-cached file, on generic Dell OptiPlex 380:

$ time wc -c 2011.bb
4234978 2011.bb

real    0m0.009s
user    0m0.000s
sys     0m0.004s

wisecracker · May 2, 2013, 5:18pm

Last login: Thu May  2 21:50:22 on ttys000
Barrys-MacBook-Pro:~ barrywalker$ ls -l SCOPE.GIF
-rw-r--r--  1 barrywalker  staff  17424 11 May  2012 SCOPE.GIF
Barrys-MacBook-Pro:~ barrywalker$ printf `wc -c SCOPE.GIF`
17424Barrys-MacBook-Pro:~ barrywalker$ printf `wc -c SCOPE.GIF` > length.txt
Barrys-MacBook-Pro:~ barrywalker$ ls -l length.txt
-rw-r--r--  1 barrywalker  staff  5  2 May 22:04 length.txt
Barrys-MacBook-Pro:~ barrywalker$ less length.txt

Note the length on line 5, 5 characters long...
Also on line 5 gives the same as a file...
Line 7 proves the file length.txt is 5 characters long...

Last line gives...

17424
length.txt (END)

IMPORTANT NOTE...
"wc -c filename" gives below...

Barrys-MacBook-Pro:~ barrywalker$ wc -c SCOPE.GIF
   17424 SCOPE.GIF
Barrys-MacBook-Pro:~ barrywalker$

Note the spaces and filname...

MadeInGermany · May 2, 2013, 5:58pm

True. Too much overhead.
But the following is smart like ls -l :

du -sk file

gives size in kbytes.
NB du -sk directory is recursive and prints the sum, du -k directory also gives the individual sizes.

SkySmart · May 2, 2013, 6:15pm

ok. so it looks like "wc -c" is the answer here? it gives the exact same numbers as "ls -l". difference is, with "wc -c", the file size will always be in the first column. that's perfect! not too sure about the potential overhead though, if there ever is one.

hanson44 · May 2, 2013, 6:23pm

Yes, wc -c is the best answer. It gives the exact correct answer.

As you suggest, do not worry about "overhead" at this point. As you can see from my previous post with the time test, wc runs very fast on large files. If there were really an "overhead" problem, if your script takes too long, you could worry about it then.

Yoda · May 2, 2013, 6:24pm

By the way if you want just the file size and forget about piping the output to another process:

wc -c < file

MadeInGermany · May 2, 2013, 6:45pm

No, "wc -c" causes too much I/O load, because it must read the whole file!
Better stick to the "ls -l"; you can set a consistent locale:

LC_ALL=C ls -l

most OS take the -o option

LC_ALL=C ls -lo

SkySmart · May 2, 2013, 6:49pm

madeingermany:

No, "wc -c" causes too much I/O load, because it must read the whole file!
Better stick to the "ls -l"; you can set a consistent locale:
LC_ALL=C ls -l
most OS take the -o option
LC_ALL=C ls -lo

how can you measure the I/O load consumption of a process?

MadeInGermany · May 2, 2013, 6:57pm

Have a 10GB file on an NFS share, and run "wc -c" on it on a hundred NFS clients simultaneously.
This will take very long, and your NFS server will be overloaded the whole time.
And your server admin will be angry. (The redness of his face is proportional to the overhead that you have caused.)

hanson44 · May 2, 2013, 6:58pm

There is I/O load and time load.

I/O load is normally the file size, unless the file is already cached into RAM.

Time load is measured with time command, using an uncached copy, such as:

$ time wc -c 2011.bb
4234978 2011.bb

real    0m0.009s
user    0m0.000s
sys     0m0.004s

Unless you are dealing with large files, much bigger than the test file above, or you run into a problem, I would not worry about such "overhead" concerns. It's called "premature optimization" to "fix" something by making it complicated, before there is a known problem. On the other hand, if your script is too slow, then time to try something that does not read the actual data on the disk.

fpmurphy · May 2, 2013, 7:41pm

Sorry but I totally disagree with you, Hanson44. Using wc -c to retrieve the size of a file is absolutely the wrong approach to take.

A better approach is to use some very simple shell script logic to parse the the output of ls -l in order to figure out the file size column.

hanson44 · May 2, 2013, 8:11pm

Go back to the original post. They wanted an alternative to that approach.

alister · May 2, 2013, 8:39pm

Perhaps the script can utilize uname to determine which field to use.

If you can depend on perl availability, then you can use its stat builtin function identically on any platform. A simple example that does not implement any error handling:

find . | perl -lpe '$_ = (stat($_))[7] . "\t$_"'

du was mentioned in several posts. Note that du does not report the file size; it reports the amount of storage that the file occupies. These two values are usually unequal.

Regards,
Alister

Don_Cragun · May 2, 2013, 9:26pm

skysmart:

to get the most granular size of a file, you can do so with:
solaris hosts:

ls -l /tmp/filea | awk '{print $4}'
linux hosts:

ls -l /tmp/filea | awk '{print $5}'
Is there a more universal command that will give the file size?

i'm leery of this ls command because the fields in which the file size can be found differs on sunos and linux, and i'm not even sure if it is also like that with other unix oses.

so, is there a command that can accomplish the above without needing to check a field/column which, for all we know, wont be the same on another unix os.

i've tried "du -s". but that doesn't seem to work across all unix systems.

This is very strange. On any UNIX System (such as Solaris, HP/UX, OS X, and AIX), the output from the ls -l command should have the file size in the 5th field; not the 4th. Are you using an alias on the Solaris hosts for ls that is employing other options that alter the output? Could you please show us the output of just the ls:

"ls" -l /tmp/filea

on one of your Solaris hosts without feeding it through awk? (Note that I put the ls in quotes to prevent any possible alias substitutions.)

SkySmart · May 2, 2013, 9:54pm

don cragun:

This is very strange. On any UNIX System (such as Solaris, HP/UX, OS X, and AIX), the output from the ls -l command should have the file size in the 5th field; not the 4th. Are you using an alias on the Solaris hosts for ls that is employing other options that alter the output? Could you please show us the output of just the ls:
"ls" -l /tmp/filea
on one of your Solaris hosts without feeding it through awk? (Note that I put the ls in quotes to prevent any possible alias substitutions.)

here, i'm still getting them in different fields:

[sunbox-01] "ls" -l filea 
-rw-------   1 root       71126 Apr 27 07:21 filea

[linuxbox-01]#  "ls" -l filea 
-rwxr-xr-x 1 root root 71126 Apr 29 20:27 filea

Don_Cragun · May 2, 2013, 10:08pm

skysmart:

here, i'm still getting them in different fields:

[sunbox-01] "ls" -l filea 
-rw-------   1 root       71126 Apr 27 07:21 filea

[linuxbox-01]#  "ls" -l filea 
-rwxr-xr-x 1 root root 71126 Apr 29 20:27 filea

Please try:

"ls" -ln filea

With the amount of space between "root" and the file size, it looks like you might have a group name that is entirely made up of spaces and/or tabs??? Using the l and n options will print user and group numbers instead of names.

Please also show us the output from uname -a on sunbox-01.

MadeInGermany · May 3, 2013, 3:08am

You are certainly using /usr/ucb/ls
an old version of ls, only present to provide compatibility with the old SunOS 4, that was terminated 10 years ago (and in turn was compatible with 4.2 BSD Unix).
Use

ls -lo

or

PATH=/bin:/usr/bin ls -l

or both

PATH=/bin:/usr/bin ls -lo

or also set locale with

LC_ALL=C PATH=/bin:/usr/bin ls -lo