If I just wanted to get andred08 from the following ldap dn
would I be best to use AWK or CUT?
uid=andred08,ou=People,o=example,dc=com
It doesn't make a difference if it's just one ldap search I am getting it from but when there's a couple of hundred people in the group that retruns all the dns, then it makes a difference.
Tsunami speed # du -h testfile
54M testfile
Tsunami speed # time awk -F":" '{print $1}' testfile >awk
real 0m5.687s
user 0m5.311s
sys 0m0.330s
Tsunami speed # time cut -d":" -f1 testfile >cut
real 0m0.730s
user 0m0.542s
sys 0m0.160s
Tsunami speed #
testfile has various repetitions of "AAA:BBB\n".
Like other posters said, if you can use cut for you problem you should choose it instead of awk, but there are situations where cut just isn't enough.
Cut is fine for what I need it for. It's just that sometimes on my LDAP script, the command is running a couple of hundred times. I would rather use the speedier option.
well, in my opinion.
awk is more powerfull than cut.
if you need to use tail or head or sort or similars, and cut, you can make one single awk for that.
some times is inevitable, but is worth looking
# du -h file1
207M file1
# time cut -d":" -f1 file1 > /dev/null
real 0m46.075s
user 0m43.075s
sys 0m0.396s
# time awk -F":" '{print $1}' file1 > /dev/null
real 0m41.344s
user 0m38.422s
sys 0m0.324s
# time cut -d":" -f1 file1 > /dev/null
real 0m45.266s
user 0m43.055s
sys 0m0.328s
# time awk -F":" '{print $1}' file1 > /dev/null
real 0m41.220s
user 0m38.358s
sys 0m0.452s
But why?? The results were similar regardless whether the first or third field was printed, whether another delimiter was chosen, although awk did slow down with larger fields. (50% longer when '-' was used as the delimiter -- meaning the fields were longer).
It could be that GNU coreutils' cut is not very optimized. (GNU awk was used here.)
So when is cut shorter? Perhaps it's the parsing routines that make awk slower sometimes. To test this, I took 10 lines of my HTTP access file and timed two runs each of processing this same file 8000 times inside a bash-while loop. One run used field 1, the second run used field 3.
cat to /dev/null
cut to /dev/null
awk to /dev/null
For cut and awk, the cat was part of the pipeline. Thus we should be able to subtract the first time from the other two. Here's what I got:
cat: 16.1 (real) 1.9s (user)
cut: 29.3s (real) 6.5s (user)
awk: 28.9s (real) 8.0s (user)
The idea was to see if cut was better on smaller files. It is relatively better, but even for short files, GNU awk takes less processing time than GNU cut! However, cut would appear to take fewer user-clockticks, if that's any concern to anyone for accounting reasons.
To sum, cut isn't as sharp as it's awkward cousin.