Why is cut slower than awk?

BandGap · February 5, 2009, 5:10am

Hi all,

for test reasons I tried the following two one-liners:

time awk '{print $4}' T_64xSC_128RW_K500.dat > /dev/null

and

time cut -d" " -f6 T_64xSC_128RW_K500.dat > /dev/null

The file contains approx. 250k lines. awk does it in 0.15 secs (real), cut in 0.44. The user time has about the same relation, whereas the sys time is almost identical in both cases.

The fact that awk is almost 8 times larger than cut (in kB) seems to make no difference.

Why is cut almost 4 times slower?

Cheers,
BG

Annihilannic · February 5, 2009, 10:59pm

Why would that make a difference?

Good question; I guess cut's code is just inefficient. Without seeing the source code though we can only guess; what OS is this on? I just tried no HP-UX and awk was more than 3 times slower than cut. It probably depends on the nature of the input data too.

angheloko · February 6, 2009, 1:55am

Out of curiosity, I also tried some testing myself. On a file with 250 lines there's no difference.

But on a file with 1000 lines, cut was faster by 1 sec.

On a file with 10000 lines, the result is the same:

FILE
450000 Feb  6 14:32 test.file

CONTENTS
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
...
...

USING AWK
real    0m0.07s
user    0m0.05s
sys     0m0.10s

USING CUT
real    0m0.02s
user    0m0.01s
sys     0m0.02s

I'm using HP-UX. Maybe awk is better for files far larger than 10000 lines while cut is better for smaller files. Or maybe because of their differences in primary use. I'm not sure about this though. Need to do some more tests.

BandGap · February 6, 2009, 10:40am

Well I did some testing as well. The system which I used in the first post was a QuadCore AMD with Lustre file system (used mainly on clusters).

I just did the same thing on a Pentium 4 on ext3, with basically the same filesize and the results were exactly opposite. The 'awk' time was almost the same on both systems, but the 'cut' worked approx. 6 times faster.

Of course, the times still range in the sub-second regime but the test file was one of the smaller ones I need to process...

Thanks for the feedback!

BG