Which is faster AWK or CUT

dopple · August 21, 2008, 10:49am

If I just wanted to get andred08 from the following ldap dn
would I be best to use AWK or CUT?

uid=andred08,ou=People,o=example,dc=com

It doesn't make a difference if it's just one ldap search I am getting it from but when there's a couple of hundred people in the group that retruns all the dns, then it makes a difference.

ynilesh · August 21, 2008, 10:59am

CUT is lighter than AWK.
Its again depend on choice and usage of them.

Any specific reason ?

nilesh

sudhamacs · August 21, 2008, 11:48am

Check the time taken by each according to ur requirement.
It varies depends on the usage.

redoubtable · August 21, 2008, 12:55pm

Tsunami speed # du -h testfile 
54M     testfile
Tsunami speed # time awk -F":" '{print $1}' testfile >awk

real    0m5.687s
user    0m5.311s
sys     0m0.330s
Tsunami speed # time cut -d":" -f1 testfile >cut

real    0m0.730s
user    0m0.542s
sys     0m0.160s
Tsunami speed #

testfile has various repetitions of "AAA:BBB\n".
Like other posters said, if you can use cut for you problem you should choose it instead of awk, but there are situations where cut just isn't enough.

dopple · August 22, 2008, 4:03am

Cut is fine for what I need it for. It's just that sometimes on my LDAP script, the command is running a couple of hundred times. I would rather use the speedier option.

broli · August 25, 2008, 8:52am

well, in my opinion.
awk is more powerfull than cut.
if you need to use tail or head or sort or similars, and cut, you can make one single awk for that.
some times is inevitable, but is worth looking

otheus · August 25, 2008, 9:28am

I really was hoping you were going to say "there are situations where cut just won't cut it."

redoubtable · August 25, 2008, 10:30am

Won't let it slip next time

ghostdog74 · August 25, 2008, 9:21pm

 # du -h file1
207M    file1
# time cut -d":" -f1 file1 > /dev/null

real    0m46.075s
user    0m43.075s
sys     0m0.396s
# time awk -F":" '{print $1}' file1  > /dev/null

real    0m41.344s
user    0m38.422s
sys     0m0.324s
# time cut -d":" -f1 file1 > /dev/null

real    0m45.266s
user    0m43.055s
sys     0m0.328s
# time awk -F":" '{print $1}' file1  > /dev/null

real    0m41.220s
user    0m38.358s
sys     0m0.452s

(g)awk is faster on my machine. version 3.1.5

dopple · August 26, 2008, 3:24am

:rolleyes:

otheus · August 26, 2008, 3:42am

Hrm, cut might be slower in some situations...

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |wc -l
4806462

# Run cut twice
[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time cut -d" " -f3 >/dev/null
10.36user 1.91system 0:20.07elapsed 61%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+157minor)pagefaults 0swaps

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time cut -d" " -f3 >/dev/null
10.41user 1.81system 0:19.29elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+157minor)pagefaults 0swaps

# average cut time: 10.39s

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time awk '{ print $3 }' >/dev/null
5.58user 2.11system 0:18.16elapsed 42%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2major+235minor)pagefaults 0swaps

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time awk '{ print $3 }' >/dev/null
5.48user 2.21system 0:17.15elapsed 44%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+227minor)pagefaults 0swaps

# average time: 5.50s

But why?? The results were similar regardless whether the first or third field was printed, whether another delimiter was chosen, although awk did slow down with larger fields. (50% longer when '-' was used as the delimiter -- meaning the fields were longer).

It could be that GNU coreutils' cut is not very optimized. (GNU awk was used here.)

So when is cut shorter? Perhaps it's the parsing routines that make awk slower sometimes. To test this, I took 10 lines of my HTTP access file and timed two runs each of processing this same file 8000 times inside a bash-while loop. One run used field 1, the second run used field 3.

cat to /dev/null
cut to /dev/null
awk to /dev/null

For cut and awk, the cat was part of the pipeline. Thus we should be able to subtract the first time from the other two. Here's what I got:

cat: 16.1 (real) 1.9s (user)
cut: 29.3s (real) 6.5s (user)
awk: 28.9s (real) 8.0s (user)

The idea was to see if cut was better on smaller files. It is relatively better, but even for short files, GNU awk takes less processing time than GNU cut! However, cut would appear to take fewer user-clockticks, if that's any concern to anyone for accounting reasons.

To sum, cut isn't as sharp as it's awkward cousin.